Burningbird: The RDF Query-o-matic

October 01, 2002

The RDF Query-o-matic

I created a small application, the RDF Query-o-Matic, using Java and HP's Jena (a Java RDF API), and hosted it on my Tomcat server. The Query-o-Matic accepts the name of an RDF file (any valid RDF file), and an RDFQL (RDF Query Language) query, and will print out a test value found as a result of that query. I created the tool as a way of testing queries without having to go back into my code as I work.

You don't have to be a techie, or a programmer, or familiar with RDF or even XML to work with RDFQL, as the Query-o-Matic will demonstrate. All you need is a bit of logic, and a familiarity with old nursery rhymes.

Taking it one step at a time...

steps RDF is a meta-model of information, similar to the relational data model. RDF/XML is a way of serializing the model information, as one would use a relational database to store relational data. Carrying the analogy to its natural conclusion, as SQL is to relational data, RDFQL is to RDF data.

RDFQL is actually not that complex. The key is remembering that every 'statement' in an RDF file is made up of a subject, predicate (property), and value. If you view Mark Pilgrim's FOAF file in graphical format, using the RDF Validator (access here), the predicate (property) always appears on an arc -- the subject is to the left of the arc and the value of the predicate, the object, is to the right. Every RDF statement can be broken down into one of these <subject, predicate, object> triples.

RDF queries are nothing more than patterns based on this triple. This might sound confusing, but not if you take the queries one step at a time.

For instance, if I want to access and print out all of the NAME elements in Mark Pilgrim's FOAF file, I would use a query like the following:

select ?name where (?subject, <http://xmlns.com/foaf/0.1/name>, ?name)

In this query, the SELECT clause ('select ?name') references the variable I'll access from the results; the rest of the query, the WHERE clause has the actual query. In this instance, I don't care what the subject is so I'm using a placeholder ?subject that's basically ignored. It's followed by the predicate that forms the query, in this case the NAME. Since all elements in RDF belong to a namespace, I'm preceding the element with its namespace, and including the whole within angle brackets.

The angle brackets are used to destinguish an element from a literal value

Following the predicate is another placeholder, this one for the name element's value (i.e. the actual names).

The whole is entered into the Query-o-matic as follows:

URL: http://www.diveintomark.org/public/foaf.rdf

query: select ?name where (?subject, <http://xmlns.com/foaf/0.1/name>, ?name)

value to print: name

View the result.

Let's say I want to refine the query -- I only want the value of 'name' for the subject f8dy. I would then need to modify the query to add the subject as well as the predicate:

URL: http://www.diveintomark.org/public/foaf.rdf

query: select ?name where (<http://www.diveintomark.org/public/foaf.rdf#f8dy>, <http://xmlns.com/foaf/0.1/name>, ?name)

value to print: name

This time only one value is returned (if Mark's RDF file doesn't change), Mark Pilgrim.

Well, this is great for finding all elements of a certain type of if you're accessing a specific statement given a subject. but what if you want to find all elements of a certain type that have a specific relationship with another element? After all, the power of RDF is the ability to record statements and relate these same statements to one another.

spidertree
Piece of cake. All you have to remember is an old, old nursery rhyme:

The itsy bitsy spider
Crawled up the water spout
Down came the rain
And washed the spider out
Out came the sun
And dried up all the rain
And the itsy bitsy spider
Crawled up the spout again

If you sang this as a kid (or sing this song with your own kids), you would play out the motion of the spider climbing by placing your hands together, the small finger of your right hand against the thumb of your left, and the small finger of your left against the right thumb. As you sing the song, you twist your hands, keeping the top two digits in contact, bringing up the bottom in a circular motion, re-joining these digits at the top. You would repeat this action, twisting on the top digits, bringing up the bottom and so on, never breaking the contact between the two hands.

The objective with your hands during this song was to always keep contact between the two and still have motion. That's the basic foundation of more complex queries in RDFQL: mapping one element of one triple, to another element on another triple in a chained path that eventually gets you from point A all the way to point Z.

As an example, within Mark's FOAF file, he has listed a group of people that he 'knows', each of whom has a NAME. To print out just the names of these people, we'll need to adjust the query to find each statement that has 'know' as predicate, and then use the object of that statement, as the subject of the next triple. This gets us a list of people who Mark knows. To get their actual names, the NAME element is then used in the predicate of the second triple, to refine the result.

Well, this one definitely needs an example:

url: http://www.diveintomark.org/public/foaf.rdf

query: select ?name where
(?a, <http://xmlns.com/foaf/0.1/knows>, ?object),
(?object, <http://xmlns.com/foaf/0.1/name>, ?name)

print: name

In this, the first triple returns statements where the predicate is the 'knows' element -- all known people. The results of this triple are then passed to the next. In the second triple, the object of the first triple -- the identifier as it were of the individual people, is the subject of the new triple. This will return all of the properties for each of the known people. Since we're only interested in the 'name' property, we further refine the query to only return the name values, which are printed out.

Check out the results.

The key to this query working is that not all objects (property values) are literal values -- sometimes they can be subjects, too, as occurs with the 'knows' relationship in FOAF. These objects can then be plugged in as the subject of a new query (note the highlighted ?object), and the results combined to return not only 'names' of people, but names of people that Mark knows.

Just like walking that spider up the wall.

Of course, not all queries are going to be as straight forward as they are in the FOAF example, and the next installment on RDFQL will take a look at additional and increasingly complex examples. In addition, the Tomcat/Java Query-o-Matic will be joined by its PHP cousin: Query-o-matic Light.

Posted by Bb at October 01, 2002 09:21 PM

Trackback Count (1)

Comments

This is right nifty, Bb. Thanks.

Posted by: Dorothea Salo on October 2, 2002 12:23 PM

Ever the glutton for punishment, I'm back with
some more questions :) (Or are you the glutton
for answering them, thus encouraging me?)

"and the results combined to return not only
'names' of people, but names of people that Mark knows. "

How do you know that? How do you *know* that?

You assigned a *meaning* to the results.

You obviously read the FOAF specification.
You read it.
You perceived it.
You assigned it meaning.
And only then were you able to construct the query.
Your Java application knows nothing of the
meaning of the results it returned.

How is this any different than querying the
same file using XML tools? I could build
a simple little app in C# that used
a similar sql-like language along with
XPath to do similar queries. I could
also read the FOAF spec, determine it's
meaning, then build a query based on
the knowledge of the file format.
The only difference I see between the
two approaches is that the RDF forces me
to structure my XML in a very torturous way.

Thanks for humoring this
not-quite-an-RDF-curmudgeon-yet.

Posted by: joe on October 2, 2002 09:36 PM

I never mind questions from interested RDF-curmudgeon types.

There is no inherent knowledge or AI involved with any of this. I think the problem with using 'semantic web' is we're implying something we have no capability or interest in supplying.

You can create tools to read a FOAF file, but would it also be able to read an RDF/RSS file? Or my resume.rdf file? And so on? By applying a structure to the XML, any tool that can process one set of RDF files can also process others.

It really is very little difference from the benefits the relational data model has applied to storing business data. Sure, you could use other data storage structures (network, hierarchical), and other query languages (such as QUEL) -- but by everyone agreeing to one structure, one model, one language -- we have functional reuse and can focus on the specific business needs rather than on creating the underlying structure to support the data.

Do we _have_ to have relational data model/databases/SQL? Nope. Same with RDF, RDF/XML, and RDFQL. All they do is provide a commonly agreed on meta-model, structure, and language.

One piece of code that can process any type of query of any RDF valid document without a change to the code -- I don't know, but it seems that this is worth a little extra effort in how we form the XML, don't you think?

However, to each their own. Personally, I find it rather fun myself, but then, I have a pretty good feel for RDF, so this gives me an edge.

Posted by: Shelley aka Bb on October 2, 2002 10:36 PM

"One piece of code that can process any type of
query of any RDF valid document without a change
to the code -- I don't know, but it seems that
this is worth a little extra effort in how we
form the XML, don't you think?"

Hmmmm, actually I do think it is too much to
ask. IMO the meta-data doesn't belong mixed
in with the data. It should be something
like CSS, specified externally.

Here is a better example of why I think that
is the case:

"This means that opinions, pointers, indexes, and anything that helps people discover things are going to be commodities of very high value. Nobody thinks that everyone will use the same vocabulary (nor should they), but with RDF we can have a marketplace in vocabularies. Anyone can invent them, advertise them, and sell them. The good (or best-marketed) ones will survive and prosper. Probably most niches of information will come to be dominated by a small number of vocabularies, the way that library catalogs are today."
[Tim Bray, http://www.xml.com/pub/a/2001/01/24/rdf.html?page=3]

What happens when ( not if ) I choose the wrong
vocabulary?

Think of it this way, what if SQL didn't just
specify how to return data from queries but
actually mandated a specific binary
disk image layout for the data in the
database?

Posted by: joe on October 2, 2002 11:13 PM