All that SPARQLs is not gold
From ActiveArchives
Contents |
Morning
Introductions
Introduction to the larger context of the Active Archives project
Round of introductions (Relations to Linked Data, acknowledge the different positions vis à vis Linked Data ...)
Screening: Web 3.0 (Kate Ray's documentary on the Semantic Web): Web 3.0 Kate Ray's documentary on the Semantic Web
- The relevance of Linked Open Data to Cultural Institutions
- Linked Data: The Good Parts
- Ted Nelson's A File Structure for the Complex The Changing and the Indeterminate
Why we (AA team) think Linked Data is interesting.
- Decentralized data sharing
- Searching your own way
- "Private" writing spaces
Issues:
- Historical link with "hard" AI / logical / decision making utility (just the facts)
- Centralizing (Implication that centralized crawlers / knowledge stores are essential)
- Complete disregard to the work of writing / tagging (CF BOWKER/Star Sorting things out: Invisible work of the database)
Morning Exercise
Project Gutenberg offers their full catalog as RDF, either as a single text file (over 200MB!), or as one file per book. We can start by downloading a single book's RDF file.
Alice in Wonderland on Project Gutenberg
Books by anonymous authors: http://www.gutenberg.org/ebooks/search.html/?default_prefix=author_id&sort_order=title&query=216
Working in pairs with "rapper" (librdf/Redland/Linux)
- Download the RDF of Book "x"
less pg11.rdf
- Use rapper to convert the xml format into turtle
rapper --output turtle pg11.rdf
and to save the output to a file
rapper --output turtle pg11.rdf > pg11.ttl
Graphviz is a visualization program for graph data. Rapper can output a "dot" file which graphviz can then use to turn the data into a diagram.
- Use rapper to output graphviz dot format, and then use graphviz to draw an SVG file
rapper --output dot pg fdp -Tsvg pg12345.dot > pg12345.svg
Outcome: SVG reveals some, but also obscures through it's density of linking and completness.
Next step: Filter & Combine with more context
Afternoon
SPARQL
SPARQL is a query language for Linked Data. It supports a number of forms: SELECT, and CONSTRUCT are the primarily useful ones. The key difference is that SELECT returns tabular results, and as such comes closest to the SQL language of relational databases (which are themselves table-oriented). SELECTS are often the most direct means of requesting information from an RDF file or store (database).
How could we filter & give more context to the single RDF source (using Linked Data).
Save this code as say "alice_01.rq"
SELECT ?p ?o FROM <pg11.rdf> WHERE { <http://www.gutenberg.org/ebooks/11> ?p ?o . }
To perform the query, use roqet!
roqet alice_01.rq
Roqet supports a number of different results formats ("simple" is the default), but a few are more useful for reading on the Terminal like: table, tsv & csv.
roqet alice_01.rq -r table
Output of "html" (an HTML table) is also possible, useful for viewing in a browser.
roqet alice_01.rq -r table > alice_01.html firefox alice_01.html
NB: Because the results of a SELECT query are tabular, using a format like "turtle" doesn't make sense (it actually makes the results harder to read). Instead, a tabular format like tab-separated-values (tsv) is actually the most straightforward.
Exercise: SPARQL Stories
- Mad (G)libs: http://www.madglibs.com/
- Vladimir Propp, Russian phililogist who's seminal "Morphology of the Folktale" was an early example of studying and codifying the recurring structures of, in this case, traditional Russian folktales.
- Scott Malec's work on PFTML, a Proppian Fairy Tale Markup Language