Collecting audio, video, and computing recordings of a setting of human interaction provides a rich, revisitable source of records of group process; but these records are unwieldy: the benefits of the data are often overshadowed by tedious sequential access. Random-access digital video and audio (instantaneous seek times), pen-based computing (informal interaction), and signal analysis (speaker identification, scene change detection), can be combined to provide users with heretofore unfathomable capabilities; but functional systems require careful design work. Therein lies our challenge--designing tools that let people work with these rich, time-based media in facile ways--helping rather than hindering their interactions and extracting indices from the structure of their natural activity, rather than imposing regularity upon their process.
Our group was particularly well poised for taking on this challenge, because three of our ongoing projects provided critical capabilities: (1) WhereWereWe [Minneman and Harrison, 1993] offered digital audio, video, and computing streams capture, indexing, and playback; (2) the Tivoli application [Pedersen et al., 1993] furnished digital whiteboard functionality on a large, pen-based electronic display device called the LiveBoard(1) [Elrod et al., 1992]; and (3) the Inter-Language Unification (ILU) project [Janssen, 1994] added a powerful distributed-object programming facility. These efforts have been converging to provide a confederation of tools for unobtrusively capturing time-based records(2) of group activity, indexing the recordings, and accessing the indexed material for browsing, searching, and reexperiencing the original activity.
Support. Because we want tools to support informal activity, we have limited ourselves thus far to simple, generic tools for notes and shared representations. Our LiveBoard technology provides an informal shared workspace; its whiteboard metaphor is quickly accessible to users. This application provides several advantages over physical whiteboards--editing, printing, saving and retrieving, multiple pages, etc.--that make it useful by itself.(3) Also, we utilize laptop computers for notetaking, which is becoming standard practice for many people. We also provide ways for the different devices to send information to each other. It is possible to provide more elaborate meeting tools, but these tend to impose a constraining structure on user activities. Instead, we have focused on simple tools that provide a basic level of support, but in addition serve as capture devices.
Capture. Multimedia capture of activity first involves initiating and coordinating the recording apparatus for a variety of media (audio, video, text, program logs). Our current audio and video media recording is workstation-based, using Sun Sparcstation built-in audio and prototype video digitizing hardware.(4) We have more platform flexibility in tools for capturing computing records, having developed capture tools on various UNIX systems, PCs, and Macs. In particular, the support tools mentioned above produce time-stamped records of their behaviors. And we are developing an architecture, based on the use of network distributed objects, for the uniform treatment of records of diverse timestream data.
Indexing. Indices are meaningful (or at least heuristically useful) pointers into the captured multimedia records, providing the means for users to randomly access those records. We are exploring a variety of methods for creating indices--let us consider them in four broad classes. First, there are intentional annotations, which are indices that participants create during an activity for the purpose of marking particular time points or segments of activity. A prime example of this is sequential notetaking. A participant, taking the role of "scribe," takes brief notes on the activities as they progress. What is crucial for us here is not just what a note contains, but also when it is created, making the note an index (while a single-person scribe is common practice, there can, of course, be multiple notetakers). Second, there are side-effect indices. These are activities whose primary purpose is not indexing, but which provide indices because they are automatically timestamped and logged. An example is switching pages on the LiveBoard. The purpose of switching to another page is to see other material or to start a new page. This may indicate a topic switch, and thus is a potentially useful index into the overall activity. In fact, in our work every event on the LiveBoard is a potential index. Third, there are derived indices, which are produced by automated analyses of detailed timestream records. For example, signal analysis of audio/video records can produce speaker identification indices and scene change indices. Finally, there are post hoc indices, produced by anyone who later accesses the activity records--an intentional annotation, but after the fact. Indices are often easier to make when reflecting on the activities rather than in the heat of the moment. Although we have explored all of these methods of indexing, we have concentrated on intentional and side-effect indexing in the work reported here.
Access. Tools to support the access of captured and indexed records are a ripe area for research. The tools must support the user in finding the records of the desired session, assembling the indices in a comprehensible format, controlling the playback of the multimedia records, and in creating new multimedia artifacts from the captured materials. There is great potential for new tools in this arena. For example, we have discovered the need for making the computational tools into players, so that the state of those tools can be seen in coordination with the playback of audio and video. Another example is a timeline tool for presenting diverse indices. Further, the access tools should allow the user to add further annotations and indices. The accessing activity should, itself, be revisitable. In the work reported here, we have developed a very simple, yet quite useful, environment for access. We are currently exploring new tools.
In the remainder of this paper, we first describe our particular application domain, demonstrating how these kinds of tools can be effective and useful for users. Then we describe the Coral architecture and two particular applications in more detail. We then consider a broader range of uses for these tools and describe how these are being explored. Finally, we address some of the lessons we've learned over the course of this effort about tools, infrastructure, and uses.
There is a manager of the intellectual property processes; let us call him Ron. One of his duties is to schedule and moderate these meetings. Another duty is to document the technical assessments, the issues raised, and the decisions made at these meetings. This documentation is important--it gives feedback to the inventors; it informs the research management and the patent attorneys; and it becomes part of the corporate legal records.
Traditionally, Ron wrote these documents from his handwritten notes taken during the meetings. His problem in doing this was not only the great number of meetings to document, but the diversity of the technologies under discussion. He is knowledgable in some arenas (having been a PARC researcher himself) but a complete novice in others. This meant that he often couldn't immediately assimilate comments made during the meeting into his notes and had to subsequently consult with those present at the meeting to help create accurate documentation.
There are two different kinds of settings involved in this process--the capture setting, which is the meeting with its discussions, and the access setting, where the captured meeting materials are "salvaged" to produce the required documentation.
The capture setting is shown in Figure 1. This photo depicts a mock-up of an assessment meeting. The 4 to 10 meeting participants sit around a table facing each other. They bring hardcopies of the IPs, which they have read beforehand and which they use during the meeting. There is a LiveBoard that is prepared with the meeting's agenda. Microphones on the table capture the audio, which is digitized and stored. Ron uses a laptop computer on the table to type notes during the meeting. Thus, tools are in place to capture three streams of activity: audio, LiveBoard interaction, and text.
Figure 1. An activity capture setting. The microphone, camera,
LiveBoard, and laptop capture the audio, video, scribbling, and
and textual notetaking activities of the meeting.
A meeting proceeds as follows. Recently submitted IPs form the agenda, and the IPs are dealt with sequentially. This partitions the meeting into natural segments of about 10-30 minutes each. Tivoli has been prepared ahead of time with a page for collecting notes for each IP. Thus the activity of switching Tivoli pages produces indices of these IP segments.
During each IP segment there are two kinds of activity, discussion and conclusion. Discussion activity takes place across the table and involves all the participants, who are interacting with each other. This is the most critical activity, and we want our technology to be non-intrusive. For example, we do not require that the participants focus on the LiveBoard. Ron takes notes during the discussion on the laptop; he acts as a recorder, only occasionally participating in the discussion himself. Ron's notes are not totally private, however. We found it useful to "beam" his notes from the laptop to the LiveBoard as he takes them. Participants tend not to orient towards these beamed notes, but rather monitor them "out of the corner of their eyes" to make sure their contributions are being noted by Ron.
Ron brings the discussion activity to a close and initiates the conclusion activity of the IP segment. This activity involves making a decision on how to handle the IP and noting any associated action items. This activity is different from the open discussion in that Ron stands at the LiveBoard, where he marks the rating and disposition of the IP and handwrites the action items. Although there is discussion during this part, the participants are more focused on the board and on Ron than on each other, because they all want to see and make certain they concur with the conclusions.
Later, Ron documents the discussions and conclusions reached in each meeting (this may be days or weeks afterwards). He does this in the access setting, which is simply an office with a Sun workstation and audio/video playback devices, pictured in Figure 2. We call this particular configuration the "Salvage Station."
Figure 2. An access setting. The Salvage Station consists of a workstation,
monitor, and speaker. The workstation contains the LiveBoard display,
playback controls, and an editor for creating documentation.
Creating the documentation in this setting involves a careful review of the captured materials. Ron summarizes each IP discussion in about a page of text. His typed notes from the meeting are often cryptic, serving more as indices into the recordings than as substantive summaries in themselves.(6) The Salvage Station, shown as a screen shot in Figure 3, provides him with playback controls, a Tivoli application showing the same pages as were created on the LiveBoard in the meeting, and a text editor to create the documentation. Every mark or note that was made on or beamed to the LiveBoard serves as a time index into the recordings. Thus, in order to get access to the recorded materials for a given IP, he buttons the Tivoli application to display the page containing the notes for that IP. Then by simply touching a mark or note on the displayed page he causes the media to play from the time when that mark or note was made in the meeting. This gives him meaningful random access into the recorded material. He listens and relistens to some portions of the recordings until he understands the significance of what was said. Although he may alternate between listening and typing the summary documentation, he often does both simultaneously, listening only for important points that he may have missed the first time around.
Figure 3. Salvage Station screen showing Tivoli, a text editor (buffers
of meeting notes and summary), and a simple timeline controller.
Each of the tools in the Coral confederation is fully functional without one or more of the others (as we know from various weeks where we were still sorting out bugs in the individual pieces), but the combination of the tools makes for a more powerful union. Thus, the "system design" is emergent, based on a developing a shared infrastructure and developing protocols for exporting functionality to neighboring tools. Toward this end, major efforts were focused on the design of a suitable application programmer's interface (API) to the WhereWereWe multimedia system resources and similar APIs to other tools (e.g., Tivoli's beaming functionality), which are exported with a distributed object protocol.
The proxy objects are operated on by "client" programs which can assume these proxies to be performing their respective functions, although in reality the proxy makes a remote procedure call to the real object (the "server" object), which actually carries out the operation. For example, a WhereWereWe client may create an ILU object from the API (e.g., a video Player object), and not concern itself with the details of the fact that in reality this object is communicating with the WhereWereWe server to carry out its functions. Furthermore, the object may be transparently shared--WhereWereWe allows multiple proxies to represent the same server object to facilitate resource sharing. For instance, multiple users may share a single video Recorder object, each potentially having control over the state of the device (e.g., pause and resume, frame rate), thus reducing hardware resource demands for compression and storage. Most of these extensions are transparent to typical ILU API users, and the simpler nomenclature of clients and servers will be retained throughout this paper.
Sessions, Streams, and Events are classes of objects that are used to do naming in WhereWereWe. Sessions are named collections of Streams, which correspond to semantic occasions, such as "Project meeting from October 15". Streams are media data that can be played back, such as audio, video, or a program activity log. Events are "occurrences" that happen at some point or interval in a Stream. This association with a Stream is purely for the convenience of retrieval, but is one natural way of thinking about the relationship between Events and Streams. Also, each of these three classes supports a property list on each instance of the class, so that application programmers may associate arbitrary application-specific data with each object.
Players, Recorders, and Data objects are used to convert Streams into other forms. A Recorder both creates a new Stream object and takes responsibility for storing the data associated with that Stream (often this is simply a disk file which can be replayed later). A Player displays for the user the data of a previously recorded Stream. A Data object converts a Stream's recorded data into a raw form that a processing application can use as input for its algorithms.
A Notifier object is used by client applications that need to stay informed of the status of ongoing playback or recording activities.
It should be noted here that WhereWereWe can be thought of as "glue" that allows index-making and browsing activity for stream data. It does not attempt to provide general media playback services, but rather provides an infrastructure into which such services can be inserted and utilized in a uniform way. WhereWereWe, at present, has built-in drivers for digital audio and video in one format.(8) WhereWereWe also provides a limited mechanism for additional drivers to be installed and used with no changes to the client software and minimal changes to the server software.
These WhereWereWe API elements may be combined in many useful configurations.
Figure 4. Typical Tivoli screen shot, showing an application-specific
form, pen-drawn strokes, keyboard text, and clock objects.
Tivoli has within it a stand-alone history mechanism that allows an "infinite undo" of drawing and editing operations; this history facility also gave us a leg up on allowing the drawing/editing process to be replayed. Late in 1993, Tivoli was extended to use the WhereWereWe API and become a marking and browsing application. Very little needed to be done to extend Tivoli to support indexing, as its history was already retaining timestamps of drawing and editing operations. The application was modified to write that information into the files that it retained about sessions where audio and/or video were recorded. Tivoli was eventually further modified to produce other timing indices, but considerable utility as a side-effect indexer accrued from simply tying into its existing history mechanism.
In addition to the indexing functionality outlined above, the Tivoli application can be used to drive the various WhereWereWe resources for playback. Since each stroke drawn on the LiveBoard is timestamped, it is possible to select a stroke and have WhereWereWe and Tivoli replay all of the recordings made at that time. Thus, the user can utilize a Tivoli page's display as an interface, answering questions such as "what was Joe saying when I jotted this down?" or "what's this all about?"(11) Thus the strokes themselves constitute an important index into the activity.
The user is also presented with a simple timeline interface to the playback functions. This timeline panel offers the user several kinds of control that are not available with the direct selection interface described above. For instance, gross controls such as going to a general portion of the recording (say, "2/3 of the way in") or starting and stopping the session playback, but also finer-grained control such as hopping forward or backward several seconds to catch an unintelligible utterance.
Oftentimes, much of the graphical activity that constitutes the indices of side-effect markers comes after the event--making notes about a point that was made, or sketching a suggested solution. We added a feature that blurs the boundary between Tivoli as a side-effect indexer and as an intentional indexer. Users can insert a sort of temporal bookmark, a graphical object whose primary purpose is to initiate later replay from its creation time. These graphical objects display as a clock face (showing the time that they were created, often allowing users to see the progression of a discussion) and often further serve as bullets in list items (Fig. 4). These clockmarks have become very popular graphical elements, strewn throughout the pages of Tivoli.
A simple interface, especially designed for portable computers, has been built for notetaking with an Emacs connected to WhereWereWe (this has been dubbed WEmacs). In this interface, the user can "make a note" about something in the current Session(12) by typing a particular keystroke (currently Tab). WEmacs generates an Event whose start time is the current time and whose duration is zero; this Event is now in the WhereWereWe database for later use by browsers. WEmacs currently represents this Event as a distinguished line of characters (a "timestripe") in the buffer. Whenever an additional timestamp is indicated, the editor submits the region between it and the previous one as an annotation (which goes onto the event's property list) on the prior Event. Submitting the Event does not preclude further changes; saving also parses the entire buffer and updates the annotations (as well as saving enough other pertinent information so that the Session can be revisited later). Playback works similarly; WEmacs' algorithm looks back for the previous Event string and begins play from the corresponding Event.
Figure 5. Emacs' WhereWereWe mode, showing the timestamping
character strings and user annotations.
WEmacs also interacts directly with Tivoli, using an ILU interface exported for external control of the application. When WEmacs is in automatic update mode, each time the user enters a tab character, in addition to submitting the previous annotation to the WhereWereWe database, the event is "beamed" to the LiveBoard, appearing on the current Tivoli page, further denoted by one of the clock objects (set to the event's initiation time from its timestripe) discussed above. If necessary, Tivoli scrolls to assure that the beamed text is visible.
The WEmacs user is also provided with other controls of the Tivoli application. Scrolling and page changing are available to the seated user; this can be very convenient for reflecting (even initiating) agenda progress in meeting settings, or when preparing to go to the board for pen-based drawing activity.
WEmacs also functions in the retrieval setting, both as a text editor for refining meeting notes, and as a method for controlling the session appliances.
In our domain, only one person was taking notes; and thus all the notetaking indices reflected that one person's point of view. In many situations, such as less constrained discussions or brainstorming, it may be important to see the activity from multiple points of view. This suggests a multiplicity of various kinds of devices that give each individual participant the capability to take notes and create indices.
Another limitation that seemed reasonable in our domain was that we focused only on recording audio, under the assumption that most of the content of the activity was carried in the audio. This is not true in many situations, such as engineering design, where physical artifacts play a crucial role in the activities. Even in our domain, there are many indexical actions, such as pointing to a particular point in a document during the discussion, that we do not capture. While the WhereWereWe infrastructure supports video capture and playback, the storage requirements were deemed prohibitive for the many hours of recording we anticipated. We look forward to improved support for video, and foresee interesting challenges in effectively utilizing video in these multi-participant settings.
In our domain, indices are created manually at a fairly coarse grain--notes are taken at one per minute at the fastest--and long periods of time can go by without any indices being created. More complete and finer indices can be produced by analyzing the audio to identify the speakers, which is also important since people often tend to orient to who said what.
It is on the access side that perhaps the broader opportunities lie. In our domain, there is a particularly demanding access task--generate a detailed summary of the content of the discussion. As we have noted, this is a costly process. But users might access captured materials for quite different reasons. One may want to quickly "skim" a session to be reminded of what transpired (e.g., in preparation for another meeting). Another may want to find information germane to a new IP they're considering. These suggest tools geared for skimming (e.g., Arons' Speech Skimmer [1993]) or searching.
One crucial variable that affects the nature of the access task is whether the person was present at the captured activity or not. One who was at the captured session can rely on a myriad of remembered cues (e.g., the interesting part happens right after John left the meeting). The person who was not there is "flying blind" and will place greater reliance on the captured indices. This variety has numerous user interface implications, and needs to be explored.
Thus far, textual documentation has been the dominant product of the captured activity. But there is wide variety of different kinds of multimedia documentation of collaborative activities. Creating a detailed textual summary is a demanding task, whereas creating a few pointers to the highlights of an activity might be much easier to do; and may be just as useful for many situations. The spectrum of possibilities needs to be explored.
Although these opportunities for further use are attractive, they must be approached with a sensitivity to issues of security, access, and context. The users supported by the current system speak freely in the capture setting, knowing that only Ron has later access to their comments. If a change to this situation is being considered, we must respect the privacy of our users, and make clear the range of reuse that is possible. We also believe that material of this sort could be damaging (and/or useless) if it is decontextualized, so we face further challenges as we attempt to broaden the scope of this work.
The MarkTab application uses a unistroke alphabet recognizer [Goldberg and Richardson, 1993] to allow free text entry on the Tab's touch-sensitive screen, and the Tab's simple 3-button arrangement for control. The user can think of the MarkTab application as a recipe box of 3x5 cards, one card for each event--cards that may additionally be sorted (at creation time) into categories. One of the two buttons on the Tab cycles through the categories, another signals an event and gives the user a blank card of the category that was selected when the event button was depressed. The user then uses the stylus to jot down, with unistrokes, an arbitrarily detailed textual annotation about the event in question and indicates (with a soft button) that she is done.
Marquee was built to index analog video tape; it was modified to use WhereWereWe as its indexing and streams capture and replay subsystem. When the modified Marquee starts up, it creates or joins the recorders for the streams of a Where Were We session, and then begins operation in its notetaking role. When it needs a "timestamp" (where it previously would retrieve a videotape time code), Marquee now asks Where Were We what the current time is(14) and records that instead of the SMPTE time code information. When Marquee is put in "review mode" it creates or joins a set of players and instructs them to seek to points in absolute time that it noted previously. For simplicity of the transition from analog to WhereWereWe, Marquee maintains its own ink database; the switch to events would not be difficult.
Coral pushes on numerous aspects of software engineering where the state of the world leaves something to be desired. However, we feel that we are well poised to take advantage of improvements in these related areas--distributed systems, operating systems, object-oriented databases, multimedia compression and network transport, and others.
For example, the infrastructure could benefit greatly from synchronization primitives provided by the underlying operating system. If WhereWereWe had more control over the timing of its media streams, better playback synchronization performance would result, especially for streams that need to stay synchronized for long periods of time and/or cross several pause and resume boundaries.
Another example is in the area of distributed object storage. WhereWereWe currently implements its own object persistence atop a standard relational database. This method is completely ad hoc, and the performance of our initial stab at the problem suffers from our not knowing the characteristics of our eventual use (we do a form of lazy evaluation that often ends up resulting many more database hits than would have been necessary). While we've now had enough experience that we're poised to do a better job with a second implementation, it is very clear that a system which focused its efforts on providing fast and efficient persistent object storage and retrieval would be of considerable value.
Perhaps the most successful implementation decision in Coral was to develop it such that minimal buy-in was required for an application to begin participating in the Coral framework. A program that wished to become a indexing client simply needed to locate a master WhereWereWe object and submit Events. Simple dedicated indexing clients can be written in less than a page of Python; piggybacking on Tivoli or Emacs requires minimal initial programming investment. This meant that, after minimal modifications, programs could participate in activity capture settings without major interruption to their own research agendas.(15) Coral's basis as a loose confederation has proved to be very powerful, because applications can choose to participate at a variety of levels.
Generalizing and extending the infrastructure turned out to be more difficult than desired. For example, the inclusion of new media types, perhaps higher quality audio or video, required that WhereWereWe be recompiled. While not fundamentally a big deal, this ran counter to the spirit of a collection of loosely connected elements. We have since redesigned and repartitioned the Coral infrastructure to include the notion of independent media servers which implement and serve all media-specific functionality, using something like the MIME types mechanism to determine what media server is needed for a particular stream, and a broker to connect to an existing instance or launch a new one.
Time is a slippery quantity in the WhereWereWe internals, the API, and in many of the application programs. Although the use of absolute time makes many problems simpler, the complexity of time in the tools does not vanish. Indexing applications used during playback, creating post hoc indices, obviously create marks whose creation time is not coincident with the time they mark--both times need be retained, but representation is problematic. Further, it is clear that client applications may need the ability to access the future with events, as they may need to begin the event generation process before the actual event occurs. While the application programmer can easily reference these quantities using absolute time and the current WhereWereWe API, supporting these capture and access concepts is a significant implementation challenge.
While Coral's confederation approach has worked well for getting a suite of diverse applications working together, it has resulted in a few problems. Applications have the opportunity to stay blissfully unaware that they are participating in an activity capture setting. We need to provide lightweight ways to keep the tools adequately coordinated. For example, WEmacs beams text up onto the current Tivoli page, submits it to WhereWereWe as an event, and retains a local copy in its buffer. Modifications made in each of those locations is not necessarily reflected in the others. Our current suite of applications has evolved a set of ad hoc interfaces for portions of this functionality (e.g., WEmacs to Tivoli for beaming does not go through the WhereWereWe infrastructure). We are working on an extended notification system--one that includes events--that will help with some of these difficulties, but a general solution to this problem remains a major challenge.
At the level of applications, we are still gaining experience with various types of functionality and their numerous interactions. WEmacs and Tivoli offer a solid start in using multimedia capture in simple capture settings, but are both somewhat lacking in the access setting. If hooking Tivoli to WhereWereWe spotlighted how any program with a time-based history is already 90% of a marking client, then writing access applications is revealing how everything is potentially a stream. If we want Tivoli or WEmacs to look the way it did when a particular utterance was made, then the best way to have that happen is for Tivoli or WEmacs to act as players. Tivoli has already been augmented with some of this functionality, working in both a playback mode, which animates the exact appearance and construction of past states, and a "bouncing-ball" mode, where a cursor points to the area where drawing or editing was happening.
Once more and more of the functionality of the capture and access tools is exported via recorder and player interfaces, we gain a uniformity that can be exploited to solve other interface problems. Currently, using the suite of tools for review is plagued by having a variety of applications that each may want to control the playback of assorted multimedia streams and each other. This coordination has been the source of many of the ad hoc inter-process communication paths described above or compromises in user interface generality. Once these programs all appear as players, they can then more easily be gathered into composite players and uniformly handled.
The unification of tools into composite streams quickly gathers other potential uses. Recording the activity of a composite stream, i.e., its constitution and the messages it distributed to its member objects, will allow us to playback a playback session. This is potentially a crucial notion when a user wants to review the accessing done by another user (e.g., seeing what a close colleague found interesting in a recorded seminar). These situations quickly bring up the time and past- vs. present-event subtleties discussed above.
We currently have minimal query support; application programmers end up writing code to sift through all the Events for a Session in order to find those that they want to represent. As we shift to a greater focus on accessing, we will need finer grain query support for getting subsets of events and sessions. Furthermore, we will need to devise formalisms for formulating and performing temporal queries.
Although video is a supported datatype in the infrastructure and current suite of tools, we have had little chance to adequately explore its utility. The low resolution and framerate of our current video offering leaves much to be desired, particularly in settings where documents or detailed physical objects are of interest. On the other hand, the sheer size of video streams is, in large part, the root of our inexperience with video, so improved quality will need to be balanced against the costs of transmission and storage. We are improving the quality and reliability of the video datatype in this current reworking of our infrastructure, and expect to be using more prevalently in the near future.
The Coral architecture and the particular tools described here have already proven remarkably flexible, and are proving their utility in regular use. As the examples illustrate, the WhereWereWe API, coupled with ILU, makes it relatively painless to explore multimedia recording, playback, and indexing in a variety of settings. The Coral suite of tools is providing us with a foundation for interesting applications and has supplied invaluable fodder for our current infrastructure revision efforts. We think there are further gains that can be made, particularly in areas involving automatic and semiautomatic indexing, studies of use and refinement of the applications, and in browsing tools for timestream data.
Electronic Meeting Rooms. Early electronic meeting rooms, such as the EDS Capture Lab [Mantei, 1989], attempt to provide computer support for meeting process. The CaptureLab and our project both support and extend some otherwise paper or whiteboard-based activity using computational tools. However, CaptureLab was focused on decision-making and more formal meeting process, while our project involves making records from informal aspects of meeting room activity.
More recent meeting-room systems that have included multimedia focus on making and accessing recordings of technical presentations; e.g., Bellcore's STREAMS [Cruz and Hill, 1994] is aimed directly at this application. Importantly, these tend to be monolithic systems with a clearly defined model of use which all tools buy into: there is a speaker/audience model of setting, they are integral with a multimedia telecommunications system, and notetaking is a purely private activity outside the scope of the system. Consistent with the focus on presentation, such systems provide individuals with means of locating and displaying meetings in remote or post-facto settings. In contrast, our system does not distinguish or privilege particular users' activity and integrates through a confederation strategy.
Memory Aids. One way in which the recordings and notes are employed is to improve recollections of the meeting; a couple of systems have tackled this problem area directly. The IBM We-Met system [Wolf et al., 1992] started down this path; followed by H-P's Filochat [Whittaker et al., 1994], which used a pen-based computer and digital audio recording to provide a single user with a means to take notes in a meeting and, by selecting the handwritten note, replay the recording made when the note was taken. Although discussing many issues common to our effort, the Filochat work, with its emphasis on personal use, excludes many aspects that arise when that same functionality becomes collaborative and is offered as a network service.
Pepys [Newman et al., 1991] kept an automatic diary of offices visited and colleagues encountered using a network of sensors and communicating identification badges; it did not employ recording--in our parlance, it created events but not streams. Although the system demonstrably stimulated recall of some events, remembering is but one step in recovering content from casual activity.
Xcapture [Hindus et al., 1993] is a short-term memory device; by constantly rerecording the last few minutes of audio, it is possible to replay something that was just uttered. This scheme obviates the need for marking but requires immediate action on the part of the user.
Video on demand. Video on demand systems allow a user to select a video clip (perhaps a long clip, like a movie) and have the video, audio, and perhaps supporting documents, be instantly available for viewing [Rangan et al., 1992; Rowe and Smith, 1992]. The data usually can be played back at various speeds and with random access. These systems concentrate on allowing synchronous access to data recorded at a previous time.
Teleconferencing. Video conferencing systems tend to be modelled on telephony; they give the user the capability to conduct a face-to-face type interaction with a user a remote location [Fish et al., 1990; Watabe, 1990; Ahuja and Ensor, 1992]. These systems do not usually allow the user to review the session; the data is not stored in the system. These systems focus on connecting people who are synchronized in time.
Multimedia documents. Multimedia document systems focus on the construction, layout, and retrieval of mixed media documents, especially those containing video or high-resolution images [Buchanan and Zellweger, 1992; Hardman et al., 1993]. These systems focus on the presentation of previously constructed data, allowing asynchronous communication between author and reader. Of particular note in this category is Raison d'Etre [Carroll et al., 1994]. This system did not augment the capture of activity, but rather organized fragments of recorded video using an issue-based hypermedia framework. The source material consisted of video recordings of interviews with members of a design project that were then manually segmented and categorized. Thus, segment retrieval was conceptualized as a pre-structured (rather than emergent) activity, organized around the content and not an augmented one based on activity indices.
The Coral confederation of applications resulted from a mixture of top-down and bottom-up development; the flexible approach permits expedient changes to serve the needs of our users, while supporting a smooth transition from prototype to architectural changes. The confederation approach has served us well over the course of the project, but elements of the system are currently being redesigned to better support the uses and demands that have emerged from our expereinces with real applications and actual users. In particular, the move to media servers will permit easier exploration of new datatypes, and an improved notification system should ease application coordination.
We are indeed shifting some attention from the capture setting to the range of accessing that might be useful for a population of users. A wide range of scenarios surface here, from looking over a meeting that one missed to searching for a remembered comment to maintaining a group notebook. These applications take further advantage of the network and multi-user aspects of the infrastructure, allowing us to investigate the power of merging information from multiple users' marking activity and derived indices.
Activity capture and access via the recording of time-based data has turned out to be an extremely rich area with diverse research threads--speech signal processing, pen-based user interfaces, distributed object systems, real-time multimedia indexing, and so on. The niche of near-synchronous and pre-narrative multimedia has proven to hold opportunities for both novel applications and truly useful functionality.