I love you 25% of the time

Oooo. This is fun. *claps hands*

David Weinberger asks:

Let’s say I want to express in an RDF triple not simply that A relates to B, but the degree of A’s relationship to B. E.g.:

Bill is 85% committed to Mary

The tint of paint called Purple Dawn is 30% red

Frenchie is 75% likely to beat Lefty

Niagara Falls is 80% in Canada

Other than making up a set of 100 different relationships (e.g., “is in 1%,” “is in 2%,” etc.), how can that crucial bit of metadata about the relationship be captured in RDF?

In my opinion, there is no one way to record a percentage in RDF. That’s the same as saying that being faithful to a lover 50% of the time is equivalent to eating only 50% of a banana split.

So let’s take just one of the examples David gives us: Niagra Falls is 80% in Canada. At first glance, if we wanted to limit ourselves to recording this fact, using one and only one triple, we could do the following:

Niagra Falls — has an 80% existence — in Canada.

That records the fact. If I were specifically looking this information up, I would have it. The only point is, that’s all I would have. I could continue this, as David says, with an 81% existence, and an 82% existence and so on. How tedious. Humans don’t work this way. We don’t memorize every single number in existence. No we memorize ten characters, and we devise a numeric system to derive the rest–learning how to use this number system instead of memorizing all possible numbers.

What we need is a way of capturing that ability to derive new concepts from existing facts using a set of triples in the form: subject predicate object.

Rather than dive straight into the triples, let’s look at the question from a perspective of David, being David, and me being me, and this being April, 2006. In other words–let’s look at what David is really saying when he gives the sentence: Niagra Falls is 80% in Canada.

When David said Niagra Falls is 80% in Canada, what he’s saying, in an assumed short-hand way, the following:

Niagra Falls exists 80% in Canada.

This statement was made in 2006.

Canada is a country.
A country is a political entity, which may, or may not have, a fixed physical location.

Niagra Falls is a physical entity.
Niagra Falls has a physical location.
Niagra Falls has an area, bounded by longitude and latitude.

Niagra Falls’ physical location has nothern terminus longitude of ____.
Niagra Falls’ physical location has a southern terminus longitude of ____.
Niagra Falls’ physical location has a western terminus latitude of _____.

In 2006 Canada’s southern most border is at longitude ____.
In 2006 Canada’s western border is at latitude of ____.
In 2006 Canada’s northern most border is at longitude ____.
In 2006 Canada’s eastern most border is at latitude of ____.

Why all of the different sentences? Because there’s more to the statement “Niagra Falls is 80% in Canada” than first appears from just the words. We want to capture not only the essence of the words, but also the assumptions and inferences that we, as humans, make based on the words.

Given David’s statement that Niagra Falls is 80% in Canada, what can we infer?

That the statement about Niagra Falls being 80% in Canada was made in 2006.
That Niagra Falls has an area bordered by such and such latitude and such and such longitude. This is a physical, fixed, location (though not immutable).
That in 2006, Canada has an area border by such and such latitude and such and such longitude. This is a mutable, political border, though rarely changing.

Based on all of these, we can determine that 80% of Niagra Falls is in Canada.

The semantic web means capturing information so that we can make inferences based on conclusions. Since wetware is still experimental, and we haven’t yet created machines that can build inferences without a little help from us’ons, we provide enough of the other details to reach a point where we can infer all the facts from a given statement.

Therefore, we have the following triples (using English syntax rather than Turtle or some other mechanistic format, since I’m writing for people not machines right at the moment):

A geographical object has a physical existence at a point in time.
A geographical object’s physical existence can be measured in area.
The area of a geographical physical object’s physical existence is found by taking the length of one side and multiplying it by the length of the other (broadly speaking).
The length of one side can be found by finding the difference of it’s boundaries, as measured by it’s southern and nothern longitudes.
The length of the other side can be found by finding the difference of it’s boundaries, as measured by it’s western and eastern latitudes.

A geopolitical object is also a geographical object.
A country is a geopolitical object.
Canada is a country.

Canada’s 2006 border has a northern most longitude of ____.
Canada’s 2006 border has a southern most longitude of ____.
Canada’s 2006 border has a western most latitude of _____.
Canada’s 2006 border has a eastern most latitude of ______.

Niagra has a northern most longitude of ______.
Niagra has a southern most longitude of ______.
Niagra has a western most latitude of _____.
Niagra has an eastern most latitude of _____.

Seems like a lot, but this is actually capturing what David is saying; he just doesn’t know he’s saying it. If we just recorded the fact Niagra Falls is 80% in Canada, we would be leaving all the important bits behind.

There’s better schema folk than I, and they can, most likely, come up with better triples. The point is that RDF doesn’t record facts. We have existing models that do a dandy job of recording facts. Given an infinitely long, one-dimensional flat plane where all facts have a single point of existence, we have systems that can capture snapshots of this plane far more efficiently than RDF.

Consider instead, a model of knowledge that consists of an infinite number of finite planes of information, intersecting infinitely. That’s RDF’s space, recording these points of intersection.

This entry was posted in Technology. Bookmark the permalink.

15 Responses to I love you 25% of the time

  1. Danny says:

    :Canada p0wns :Niagra, surely?

    - or maybe -

    being faithful to a lover 50% of the time is equivalent to eating only 50% of a :Viagra

    I felt obliged to comment over at David’s, the only bit worth repeating here being that what he asked is altogether doable (though you’re absolutely right about capturing stuff), as in n-ary relations

  2. tim finin says:

    There are some other approaches to this, including using fuzzy set theory and using probabilities. Yun Peng and his students in the UMBC ebiquity lab have been doing some interesting work on integrating Bayesian reasoning and OWL — the probabilistic approach. Here are some recent papers.

  3. Tim Bray says:

    It’s after dinner and I don’t feel up to any formal syntaxes, so let’s do this in prose. Anything that has a ‘*’ in front means ‘A URI identifying this’.

    So, *NiagaraFalls has a property *PartiallyOwnedBy whose value is *Canada. Reify that assertion. *thatAssertion has a property *PercentageOwnership whose value is a literal, 80.

  4. Shelley says:

    You guys are too literal. Where’s the fun in that?

  5. Su says:

    This is a bit off to the side, but you’ve reminded me of something that’s been bugging me.

    Do you know of a CMS/wiki that deals specifically(or even just additionally) in relationships rather than individual bits of information? Without delving too deply into it, let’s say I want to build a site about not just books, but the connections between them. Ideally, rather than the current limitation(as I see it) of pointing at item A which happens to link to item B, I’d like to be able to address the actual connection A->B itself as an object/URL. Am I even making sense?

    I’ve only run across one wiki script (in very early development, and which I of course can’t recall right now) that deals in RDF, which gives me a vague sense it might be able to do this, but I don’t know enough about RDF to really say.

  6. Stu Savory says:

    Roger Shanks in the 80′s at Yale covered this with ‘Frame based reasoning’ surely?

    Stu Savory

    BTW There’s the source code for a freeware Framebased inferencing system in one of the AI books I wrote. The source code is in Prolog, but I don’t remember off hand which of the books it was. I’ll append that later.

  7. Elaine says:

    Su: sounds fascinating…I have this book/movie review app that I use where I’d love to do something like that….

    Shelley: thank you! I had a serious lightbulb moment about RDF. (admittedly, I haven’t really looked into it all that much…but now I want to.)

  8. Seth Russell says:

    I am ashamed and sad to report that i have never discovered a useful fact through an RDF inference. Oh sure, i have seen RDF inferences demonstrated academically … but i am talking about a fact that i actually used for some purpose which was unknown except that it was uncovered by inference from an automated RDF processor. Has anyone? Can they document how the fact was discovered and how it was actually used.

    … if not, is RDF inference not one of the biggest snow jobs ever perpetrated on the web?

  9. Dan Brickley says:

    Seth, while I’d agree that the inference aspect of RDF is often over-played, I have found it useful for identity reasoning: figuring out when 2 descriptions are talking about the same thing. See http://rdfweb.org/mt/foaflog/archives/2003/07/10/12.05.33/ for details, or google around ‘rdf’ and ‘smushing’ for code from the FOAF crowd.

  10. Shelley says:

    Su, I know there’s a RDF-enabled wiki–I can’t remember the context. You might search on wiki and RDF, see what pops up.

    Elaine, I did go off on a tangent, but I am very glad to help it has been of help.

    Dan, thank you for providing such an excellent response to Seth’s question.

  11. Seth Russell says:

    Actually, Dan and Shelley, i was looking not for an application like foaf where smushing is necessary … but rather a real example of someone actually finding something that they were looking for and where the fact that they found was inferred and not just typed into a ground fact. Such an real example might look like … one 3/2/2006 i needed to find a friend of Dan Brinkley who was attending SWSX and was flying from SEA on flight 239, so i just went to my laptop and typed in this query (provided) and came up with Susie Q who i talked with on the flight. Also, please note that you can’t just make it up … it had to really happen … and sufficient documentation should be provided so that we can fisk it.

  12. Shelley says:

    Seth, I’ll send you my hourly rates.

  13. Seth Russell says:

    Shelley, my point is that if RDF inference maps to the null set of practical real world examples, then why spend so much time on it … rather concentrate on RDF as a medium to record ground facts. I’ve posed this challenge before and nobody has ever met the challenge. If this were a practical thing, these examples would be as common as ass holes. Doesn’t that make your think … it does me… i mean how many years has RDF Inference been around and still nobody has gotten anything real out of it. And no, LOL, I’m not going to pay you for it … rather i would require such an example out front before i decided to invest in this vaporal notion.

  14. Phil says:

    A geographical object has a physical existence at a point in time.
    A geographical object’s physical existence can be measured in area.

    A geopolitical object is also a geographical object.

    Not sure about the last bit – I’d define a class of objects which have extension in space and a sub-class of those which have a fixed location relative to the surface of the earth, and then say that a geopolitical object is one of them. On second thoughts, maybe that’s what you just did.

    (God, I love this stuff. I used to be a data analyst/DB designer, and when I first started using set logic in my current job it felt like coming home.)

    Incidentally, Danny – I accidentally googled your blog today. It’s the top hit (out of not very many) for the phrase

    “good good good good, double double good, double double good”

    Just thought I’d mention it.

  15. Shelley says:

    Phil, one has to ask: how do you know Danny owns “good good good good, double double good, double double good”?

    One is curious.