From ActiveArchives
Jump to: navigation, search

Version 0.6 Planning. See also the API under development.



Active Archives Wiki is software tool that enables writing with the web. AAW envisions the Web as a collaborative collage and creative writing space. An active archive contains material pulled from (or pushed) online: videos, images, sound, text, but also the hypertext and metadata that interlink these elements.

AAW is an archive providing the means to:

  • Store
  • Name
  • Access
  • Retrieve
  • Version (manage the history of changes of)
  • Search...

As well as a collage/composition tool providing the means for:

  • Cutting up
  • Transforming
  • Composing...

Material, all while preserving editability and links back to original sources so as to encourage further (de-)composition.

Overview of tools / technologies employed:

  • (cutting up) css selectors & xpath
  • (self-containment) crawlers & page rewriting
  • transformation & cutting up) linux (command line) tools (ffmpeg, melt), and pipeline mentality!
  • (metadata / linking: rdf, metadata sniffers, microdata/formats?
  • storage & versioning: filesystem, git?
  • indexing: rdf triple stores, couchdb?, solr?
  • web service: django
  • writing: structured text (textile + mediawiki) + custom syntax

Pages (Wiki orientation)

Wiki-style pages that can contain text / links. Pages are also the context in which resources are annotated. See the 0.4 version.

Order of tags

A recurrent issue (in this and other experiences of tagging systems) is the desire to switch from the “bottom-up” activity of applying tags to items, to then manually assert some order the the resulting list of tags. Alphabetizing is one solution, but often order is a significant editorially determined operation. For instance, consider a list of questions as tags. One solution has been to prefix question tags with numbers, but this is a clumsy solution and breaks the self-standing nature of a tag. The numbering truly belongs to a different entity, namely an editoral ordering of questions in a particular context (ie LGM: the order of the questions mirrors the order that they were asked in, but the same question might appear in another context in a different order).

LGM’s aawiki suggests part of an answer as static documents were used in the “spider” process to serve as a starting point for an automatic query. It would be interesting to think about how a document could be partially automatically constructed, then manually allow editing to change order, exclude some elements, etc, and then used to “publish” a result. A hybrid solution of semi-automatic retrieval, with an ability to manuall override via an edit would be most interesting.

Markup to Add & Embed Resources

{{ http://some.resource }}

Syntax for transcluding resources, including fragment style indexing (time, image portion, temporarlly in terms of versions). Handling transclusion is key to avoiding potential “double-indexing” problems.


Markup with Embedding & Plugins to extract

{{ http://somehtmlpage | cut css=#foo }}
{{ http://someimage | crop top=10 }}

Necessary Markup examples:

  • Reference to a timed position of another resource
  • Image with explicit size


An annotation is any section of a page (typically/always a div?) with an about attribute. Annotations can be included in another document via a css selector filter on the source document. Currently annotations exist primarily inside the database. This has implications in terms of exportability, sharability, distributedness. By making annotations part of HTML pages and interpreted purely in terms of their presence in the page, this allows annotations of many kinds to exist, and an independence in the resulting pages. (Question: Is the structured text stored in the document or is it can it be consistently "reversed" from the HTML?, pandoc? or something simpler perhaps) In this way exporting annotations between systems is simply a case of adding / indexing the HTML page.


Annotation language can be set with some markup (?). This translates into rdfa xml:lang attribute on the annotation div.

Timed based retrieval of annotations

The API method to retrieve a list of annotations allows start and end times.

In-place editing of Annotation

Or is this solved by a general purpose HTML transclusion (ie that parts of pages can be embedded).

An annotation is a div with an about attribute that names its subject. An annotation exists in a particular HTML page (that has a URL). The URL of an annotation would be something like:


Annotation (previously section) is a first-class object (with a URL), embeddable in any page with markup.

Markup to embed for instance:


“Direct editing” of remote sections via an “automatic playlist” allows a very useful kind of search and replace to rename tags in context.

(How to avoid confusion around duplicate sections of an “automatic playlist”?)

The ability to manually edit an automatic playlist, for instance via search and replace of a tag name, would be very useful.

Simultaneous editing

  • Message stream of activity in the system.
  • Locking and/or warning on editing the same annotation.

Tag inheritance

In a playlist, a tag made in section zero (the first untimed annotation of the playlist) should be carried to subsequent annotations.

In tagging for the QNA projects (LGM, TransLearning, and now Seed Sovereignty), it was very clear that the need to redundantly tag a clip section with the project and speaker is unnatural. A much better solution would be to apply the tags to the “preamble” section with the implication that they apply to the subsequent sections.

?? How will this work exactly, and how in relation to Headers as Semantic Markup

Headers as semantic markup / Annotation types

Headers should be translated into rdfa so that annotations can be filtered based on headers.

Transcription, Notes, Clips, etc. can be differentiated.

A significant discovery in testing the system was the strength of making the (structured text) markup the means of accomplishing different tasks. For instance, to differentiate the different annotations, the obvious solution was to start the annotation with a meaningful header.


  • Synonyms


  • How might analysis of an audio track result in visualisations, or other kinds of "data tracks"?
  • How might a playlist be used to create a new resource based on a "hard" edit of materials (and how might matching annotations automatically be imported?


  • How might a splitscreen be implemented?
  • How could fade-in/out effects be implemented to smooth playlist transitions?
  • How can the "driving element" on a page be clearly visualized?
  • How to make the current timecode visible at all times when editing?

(Timecode widget with buttons for the controls, back / forward, time delta adjustment)

  • How can graceful "side stepping" between playlists be implemented?
  • How might a temporal stylesheet be implemented?
  • How to move away from a database-centric to a document-centric design?
  • How to add an entire folder of resources (apache directory listing)?
  • How to deal with “authorization” multiple users?
  • How to store the history of changes to annotations (like a real wiki)?
  • How to support asynchronous actions (like transcoding, downloading, etc)?

Radical Pipeline Design

Maximize "plugin"-ability of the system, See API

Fragment syntax for cutting-up resources[6]

Umbrella URLs

How can links be made between different URLs for the same resource?

Example: I download Sandro Hawke’s “Linked Data Presentation.pdf” and want to refer to page 20. What’s the URL? Being a good W3C guy, the presentation lists a URL plus a date and the location of the (original) talk: June 8 2010, Cambridge Semantic Web Gathering

This URL is actually to a directory listing of files in various formats: ...

Often a URL refers to an abstract event or recording, and may have multiple actual "versions" or kinds of information available. API governed resources, like YouTube clips, are by nature handled this way. How can AA work with other kinds of (API-less) models (example of, also what's the MOST SIMPLE way to distribute material (Apache directory listings + named files). Also, how to support a sliding scale (from directory of stuff, to gradually more structured, without imposing a strict API.


Document as primary object

Search (list) as markup

A search query, instead of being a “system operation” to be performed in some “outside” context, is instead a kind of document element. The trick is to allow the results of a search query to then be manually edited while still preserving knowledge of its original query. In time, differences between the “live” result of the query and the manually edited list can be consulted and managed editorially. This addresses the “question of order” as automatic lists, such as tag names, can be then manually ordered, or even exluded.

Feeding back on the idea of “document structure as semantic markup” -- the inclusion of an item in a semi-automatic search list.

Examples: Semantic media wiki’s semantic results object (for example of a list that can be represented in various forms, ul, li, timeline, etc. though not manually edited).

Provision for and proper UI for long asynchronous operations Localization Wiki-style bootstrapping of resources (e.g. help texts) Plugins structure Resource spiders URLs & Permalinks & Content-based access & ...

In general, design system that supports subtleties such as URL for “content” versus URL for a file.


Goal is to design a system that: (1) Allows “same as” relationships to be (automatically) established between resources that allows annotations to be properly shared. (2) Allows for a differentiation between “same file” and “same content” to, for instance, allow varying versions of the same recorded material to be properly dealt with (ie some annotations are shared across content, and some only across file -- ie bitrate)

A resource is added to an AA install, say:

In the AA editor, annotations are made, which reference the above URL. (Here already, the question is whether some sort of “permalink” is appropriate to use. This could simply be the addition of a timestamp indicating version of file.)

“cached” and annotated on a local AA install. The annotations should be linkable to the original URL without resorting to current URL -> filesystem mapping, but rather more robustly via a special annotation (listing original URL and timestamp of download -- ie the “download receipt”). Currently -- all annotations are made to the “live URL”, but as soon as this URL can be traced to a content-index (SHA1)


Output modules

How can projects like QNA, Oral Site, best be added "to the side" of an AA install to extend it.

Personal tools