Visualizing RDF

From ActiveArchives
Jump to: navigation, search

RDF is quite hard to visualize. While the abstract nature of the "triples for everything" approach is the key to RDF's flexibility and applicability for different contexts, it also makes it difficult to get a grip on what the resulting structures actually look like. To take an example, consider the data available from Freebase (the "Wikipedia" of linked data), on visual artist Eva Hesse: http://rdf.freebase.com/rdf/en.eva_hesse. When viewed Firefox one gets a syntax colored and outline-style view of the XML with collapsible hierarchy. Here a small fragment:

EvaHesseRDFXML.png

As with many "machine-readable" formats, RDF introduces a lot of "packaging" that one needs to unwrap to get to the core data that one is actually interested in seeing/using. Luckily, in addition to its "native" RDF-XML format, there are a number of other textual representations for RDF. Dave Beckett, creator of the standard Redland library for working with RDF, has also produced a number of command-line tools to access the library. The rapper command, for instance, can be used translate RDF data between a number of different formats. The default input format is RDF-XML and the default output format is called "ntriples".

Typing the command:

rapper http://rdf.freebase.com/rdf/en.eva_hesse

... displays the following output (here just 9 of the 139 total triples are shown):

...
<http://rdf.freebase.com/ns/m.02wnjvc> <http://rdf.freebase.com/ns/education.education.institution> <http://rdf.freebase.com/ns/en.yale_university> .
<http://rdf.freebase.com/ns/m.02h5_2w> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.freebase.com/ns/education.education> .
<http://rdf.freebase.com/ns/m.02h5_2w> <http://rdf.freebase.com/ns/education.education.end_date> "1953" .
<http://rdf.freebase.com/ns/m.02h5_2w> <http://rdf.freebase.com/ns/education.education.institution> <http://rdf.freebase.com/ns/en.pratt_institute> .
<http://rdf.freebase.com/ns/m.02h5_2w> <http://rdf.freebase.com/ns/education.education.start_date> "1952" .
<http://rdf.freebase.com/ns/m.02h5_2w> <http://rdf.freebase.com/ns/education.education.student> <http://rdf.freebase.com/ns/en.eva_hesse> .
_:genid1 <http://rdf.freebase.com/ns/type.key.namespace> <http://rdf.freebase.com/ns/wikipedia.es_id> .
_:genid1 <http://rdf.freebase.com/ns/type.value.value> "1576301" .
<http://rdf.freebase.com/ns/en.eva_hesse> <http://rdf.freebase.com/ns/type.object.key> _:genid1 .
...

Ntriples is very useful as a format for understanding the essential structure of RDF. An RDF source is essentially a set of "triples" in the form:

SUBJECT PREDICATE OBJECT .
SUBJECT PREDICATE OBJECT .
SUBJECT PREDICATE OBJECT .
...

The first element in a triple is called the subject (that which is being described), the second the predicate (or type of relationship being described), and the third, the object (that which is referred to). The linguistic underpinings of RDF is echoed in the terminology and the use of a period to mark the end of each triple. Unlike a natural language text, however, the order of the lines is not considered important (just the order within each "sentence"). In addition, while RDF databases are sometimes described as databases of "facts", the actual semantic meaning or "truth" of what's being described by a triple is entirely up to the author(s).

Triples can also be thought of as qualified links, or "links with flavors". Building on the familiar basis of HTML links (where the source page containing a link would be the subject, and the links reference (or "href") the object), RDF adds the ability to give each link a kind of tag or category (also in the form of a URL) to further specify the meaning or kind of relationship of the link. (In fact in the first versions of HTML, link elements contain a "rel" attribute already hinting at how semantic links might be written.) In addition to URLs, textual data (called literals) can be used as objects to refer to information like names, dates or numeric values. In a nutshell, the components of a triple can be one of three possible things:

  • A URI (or URL), shown between less-than/greater-than symbols, (i.e. <http://automatist.org/>)
  • A "literal" value displayed in quotation marks (e.g. "Michael Murtaugh"),
  • A "blank" node, (in the form _:blaHdiEBlah), which functions like a temporary URL, effectively allowing for grouping of triples without needing to actually come up with a URL.

While the flatness of ntriples makes the essential structure of RDF clear, it also obscures the repetitive and potentially hierarchical nature of the data as a whole. In ntriples, many nodes are listed repeatedly for each bit of information (making it hard to see what information belongs to what), and the order of the triples is not necessarily logical from the point of view of reading. While the RDF-XML structure (potentially) addresses a number of these issues, the "turtle" format (also created by Dave Beckett) does the same without (re)introducing the bloat of XML. The command:

rapper --output turtle http://rdf.freebase.com/rdf/en.eva_hesse

produces the output:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix fb: <http://rdf.freebase.com/ns/> .
...
 
<http://rdf.freebase.com/ns/en.eva_hesse>
    fb:people.deceased_person.cause_of_death <http://rdf.freebase.com/ns/en.brain_tumor> ;
    fb:people.deceased_person.date_of_death "1970-05-29" ;
    fb:people.deceased_person.place_of_death <http://rdf.freebase.com/ns/en.new_york> ;
    fb:people.person.date_of_birth "1936-01-11" ;
    fb:people.person.education <http://rdf.freebase.com/ns/m.02h5_2w>, <http://rdf.freebase.com/ns/m.02h5_33>, <http://rdf.freebase.com/ns/m.02h5_3c>, <http://rdf.freebase.com/ns/m.02wnjvc> ;
...
    a <http://rdf.freebase.com/ns/base.jewlib.original_owner>, <http://rdf.freebase.com/ns/base.jewlib.topic>, <http://rdf.freebase.com/ns/base.smarthistory.topic>, <http://rdf.freebase.com/ns/base.smarthistory.visual_artist>, <http://rdf.freebase.com/ns/base.yalebase.person>, <http://rdf.freebase.com/ns/base.yalebase.topic>, <http://rdf.freebase.com/ns/book.author>, <http://rdf.freebase.com/ns/common.topic>, <http://rdf.freebase.com/ns/influence.influence_node>, <http://rdf.freebase.com/ns/m.04l1354>, <http://rdf.freebase.com/ns/people.deceased_person>, <http://rdf.freebase.com/ns/people.person>, <http://rdf.freebase.com/ns/visual_art.visual_artist> ;
...
<http://rdf.freebase.com/ns/m.02h5_3c>
    fb:education.education.end_date "1959" ;
    fb:education.education.institution <http://rdf.freebase.com/ns/en.yale_school_of_art> ;
    fb:education.education.start_date "1957" ;
    fb:education.education.student <http://rdf.freebase.com/ns/en.eva_hesse> ;
    a <http://rdf.freebase.com/ns/education.education> .

Here information is grouped hierarchically so that triples with the same subject appear as lists, subgrouped by predicate (type). In the case of more grouped information, things are still spread out over multiple "nodes", as in the case of the list of "education" information, where information is grouped into clumps for a time periods (start_date, end_date) for a particular educational institution.

Rapper also supports output as a "dot" file, which can be rendered as an image using the tools provided by graphviz:

rapper --output dot http://rdf.freebase.com/rdf/en.eva_hesse > evahesse.dot
dot -Tsvg evahesse.dot > evahesse.svg

EvaHesse.dot.crop2.png

EvaHesse.dot.crop.png

View full SVG here

This rather "spacy" visualization of the file is indeed graphical; with some effort one can read the fact that in 1957, Eva Hesse both finished her studies as Cooper Union, and began her studies at the Yale School of Art. What is also made visible is how the literal boxes and lines algorithmic rendering of the data is not the most readable visual means of grouping and connecting pieces of information -- certainly when many relationships are being depicted and no sensitivity is given (or available in the case of an algorithmic layout such as that graphviz performs) to the relative importance or meaning to the relationships.

For comparison, if we can go back to the "human-readable" freebase page on Eva Hesse, where one finds summaries of information in more familiar tabular "info box" style:

EvaHesseFreebase01.png

EvaHesseFreebase02.png

The challenge is how to produce a means of visualizing and navigating RDF-encoded data in a way that accepts the generality of a graph structure while still allowing for familiar tabular / hierarchical and concise views that provide a legible overview of the that which is being described.

Filtering with SPARQL

Visualizing RDF, Part 2

Resources

Rapper, and other RDF tools

By Dave Beckett, author of the "redland" rdf library (librdf), the turle format, and various commandline tools to work with RDF.

Installing rapper on debian:

apt-get install redland-utils

Graphviz

http://www.graphviz.org/

Installation:

apt-get install graphviz

Read the man page:

man graphviz

RDFCookbookTutorial

What links here

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox