Faceted Browsing with RDF
From ActiveArchives
This page is currently being worked on.
What is faceted browsing?
- MIT Simile Project, Facet Browser widget
- My own first take at "faceted browsing", Jerome Wiesner "hyper-portrait" made in 1996.
Implementing Faceted Browsing with SPARQL
PREFIX dc:<http://purl.org/dc/elements/1.1/> SELECT ?doc ?title WHERE { ?doc dc:title ?title . { {?doc dc:LANGUAGE "Dutch"@nl .} } { {?doc dc:creator <http://www.sarma.be/tags/Jeroen_Peeters> .} UNION {?doc dc:creator <http://www.sarma.be/tags/Alexander_Baervoets> .} } } ORDER BY ?title
Implementing Faceted Browsing in Python & SPARQL
This code uses the rdflib library (version 3.1), and a homegrown SparqlQuery convenience class.
The process is basically:
- Get a list of all the relationships for the current set of items.
- For each relationship, produce list of all possible values and their associated counts.
In this case the "context" if defined by two things:
- Look for only "items" that have a title (using the dc:title predicate).
- Apply the current "facet" to further filter the current set where:
- The "facet" is a dictionary mapping relationship urls to a list of values (rdflib entities).
def getRelations (http, baseurl, facet=None, norels=None): """ Returns the list of all (unique) relations (urirefs) to things that have titles norels: optional list of relationship URLs to exclude """ q = SparqlQuery() q.prefix("dc:<http://purl.org/dc/elements/1.1/>") q.select("?rel", distinct=True) q.where("?doc dc:title ?title .") q.where("?doc ?rel ?obj .") if facet: for rel, values in facet.items(): for val in values: q.where_clause("?doc <%s> %s ." % (rel, val.n3())) q.where_clause_end() if norels: for relurl in norels: q.filter("(?rel != <%s>) ." % relurl) q = q.render() # print q results = sparql_query_list(http, baseurl, q) relations = [b['rel'] for b in results] return relations
...
def getRelationValueCounts (http, baseurl, relation, facet=None): """ Returns a count dictionary mapping values to document (things with titles) counts. """ q = SparqlQuery() q.prefix("dc:<http://purl.org/dc/elements/1.1/>") q.select("?obj") q.where("?doc dc:title ?title .") if facet: for rel, values in facet.items(): # don't filter a category with values in the same category # e.g. only filter a count list using selections in the *other* filters if rel == relation: continue for val in values: q.where_clause("?doc <%s> %s ." % (rel, val.n3())) q.where_clause_end() q.where("?doc <%s> ?obj ." % relation) q.orderby("?obj") results = sparql_query_list(http, baseurl, q.render()) results = [b['obj'] for b in results] ret = [] curvalue = None for value in results: if curvalue != value: if curvalue and curcount: ret.append((curvalue, curcount)) curvalue = value curcount = 0 curcount += 1 if curvalue and curcount: ret.append((curvalue, curcount)) return ret