September 06, 2002
RSS -- proof is in the implementation
Sam Ruby had taken a first shot at RSS 2.0 with an RSS document demonstrating the new, simplified RSS syntax. No evidence of RDF, RSS version, no RDF Seq.
Mark expanded on this with what looks to be the same specification, different examples and the use of included HTML (parseLiteral in RDF terms). (Correct me if I misread this Mark).
Since Sam has published an example of his version, allow me to work with the assumption that whatever works with his proposed RSS 2.0 should work with Mark's, with the addition of HTML literals.
In this weblog page, I have PHP processing for the Book recommendation list. I copied the page and modified it to process Sam's new proposed RSS file. You can see it in action here. The process took me about 10 minutes because the SHIFT key on my laptop doesn't work well, and I am using vi to make the edits.
Now, I want to show you something. Here is my MT generated RDF/RSS file. Taking this and Sam's and Mark's proposed RSS 2.0, I came up with a simplified RDF/RSS syntax, seen in this file and also duplicated here:
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://purl.org/rss/1.0/">
<channel rdf:about="http://weblog.burningbird.net/">
<title>Burningbird</title>
<link>http://weblog.burningbird.net/</link>
<description></description>
<item>
<rdf:Description rdf:about="http://weblog.burningbird.net/archives/000514.php">
<link>http://weblog.burningbird.net/archives/000514.php</link>
<title>Myths about RDF/RSS</title>
<description>Lots of discussion about the direction that RSS is going to take, which I think is good. However, the first thing that
happens any time a conversation about RSS occurs is people start questioning the use of RDF within the...</description>
<dc:subject>Technology</dc:subject>
<dc:creator>shelley</dc:creator>
<dc:date>2002-09-06T00:53:16-06:00</dc:date>
</rdf:Description>
</item>
<item>
<rdf:Description rdf:about="http;//weblog.burningbird.net/archives/000515.php">
<link>http://weblog.burningbird.net/archives/000515.php</link>
<title>ThreadNeedle Status</title>
<description>I provided a status on ThreadNeedle at the QuickTopic discussion group. I wish I had toys for you to play with, but no
such luck. To those who were counting on this technology, my apologies for not having it for...</description>
<dc:subject>Technology</dc:subject>
<dc:creator>shelley</dc:creator>
<dc:date>2002-09-06T00:19:28-06:00</dc:date>
</rdf:Description>
</item>
</channel>
</rdf:RDF>
Differences are:
- RDF element rather than RSS
- No versioning - not necessary with the concept of namespaces
- Use of namespaces to differentiate modules
- Surrounding the ITEM's properties with a RDF:Description. The ITEM can have either literal data or XML elements that should be parsed. By using RDF:Description, I'm giving a hint to the processors that what follows is XML data to be parsed for new elements, so turn off literal text processing optimization, and use the more memory and CPU intensive XML parser, please.
Notice that there is no RDF:Seq in this RDF/RSS version. Why? You don't have to use the Seq element for valid RDF. I believe Seq was used with RSS 1.0 because the originators of RSS 1.0 wanted to provide ordering information to the tool builders. However, this really seems to be an absolute sticking point with everyone. Fine. Dump it.
Run my new RDF/RSS through the RDF validator (here), and you'll see it's valid RDF.
Now, I created a third copy of my weblog page with the PHP processing and had it parse and print out this new RSS file. The changes necessary? I changed DC:DATE to DC:CREATOR -- I wanted to print out the latter not the former. Here's the new page.
Next, I copied the PHP page and had the code process my original RDF/RSS 1.0 file, the one that's generated automatically from MovableType. Changes to the code? Nada. Not one single change other than the name of the RDF file. Time to make change? 4 seconds. See the new page here.
Now, all of these pages (including this one) use PHP-based XML processing to process the data (xml_parser). No specialized RSS or RDF APIs. Pure XML processing. And it took me about, well, honestly, probably a couple of hours to write the original code for my Books RDF/RSS application. That darn shift key you know.
I'm not trying to downplay other's concerns or existing work or effort, and I realize that I have a better understanding of RDF than most of you (not bragging, but give me this as an accepted for discussion purposes at this moment) and that this gives me an edge when working with RDF.
What I'm trying to show is that keeping RDF in the RSS specification doesn't nececssarily mean that simplified processing is impossible, or that we can't use 'regular' XML tools, and that there will be a huge burden on tool writers.
We don't have to keep Seq if it really bothers everyone. Let's work this change. Let's. Let us work this change. I like that phrase, don't you?
By keeping RDF in RSS now -- and really are those changes I made to the proposed RSS 2.0 so hard to swallow? -- we keep the door open for the benefits that will be accured some day when RDF does have broader use.
I guess what I'm trying to show, demonstrate, prove is that RDF doesn't have to make things arbitrarily complicated, or confusing. That we can write documentation that clarifies those few bits of RDF in the specification so that it isn't complicated for folks writing or reading this stuff by hand (or processing it with various languages).
I'm hoping with this demonstration that I'll convince a few of you that we can keep the door open on this discussion rather than arbitrarily throwing RDF out -- a specification I'd like to gently remind you all that's been in work for years by some of the best markup minds in the business. And as easy as it is to criticize the RDF working group for taking time, remember that they're trying to create a specification that will stand the test of of time, rather than break through every version, as we had with HTML.
Mark provided a summary of the RSS issue, and I know that this discussion has been going on for years. And I know that there are a lot of people who say, let's just fork. But folks, this didn't work for SQL and QUEL (remember QUEL?) years ago when the decision was being made about which query format to use when accessing relational database data. I really do want to see these specs come together, with members and players from all sides.
And I'll also be honest and say that I really don't want to see this owned by any private company or person. Sorry, but I just can't accept this, it goes everything I believe in. I am not belittling Dave's and Userland's contribution to RSS. I realize that Userland popularized RSS and a debt is owed.
What I am asking is that Dave become part of a team working on this, a team that's open to people who literally have something to contribute on this issue, each with an equal vote. Yes, people like me, like Mark, like Sam, Jon, Joe, Bill -- all the people who have something to contribute to make this specification rock. And hopefully prevent something like this from happening again in the future.
Am I too late though? Is the decision made? Can't we talk?
Where's the fire?
Posted by Bb at September 06, 2002 12:29 PM
Trackback Count
(2)
Your re-formulation of RSS is exactly what
I was looking for when
I asked about a reformulation of RSS 1.0
on the rss-dev mailing list, and what I was trying to accomplish in my post yesterday.
I'm kinda suprised that no one on rss-dev
came up with it.
Thanks!
Especially thanks for not throwing up your hands and walking away!
Actually, Bird, the Seq at the top was all about bringing the items into the channel virtually since RSS 0.9 had them outside of the channel element. It could have been a Bag, but Seq brought the additional ordering semantics in.
This problem doesn't exist in RSS 0.9x since the items are actually inside channel. But then 0.9x had dropped RDF, so we had to go back to the earlier version (0.9) to put the toothpaste back into the tube. RDF wanted to be the outside-most element and we couldn't make that happen with 0.9x - not without breaking backward compatibility.
Hope that blathered explanation reads as well as it sounded.
Rael
Me again, Bird.
Now perhaps backward compatibility will be thrown to the wind as things move along (I sincerely hope not), but it was a huge goal for us in RSS-DEV.
The version you have here is, of course, the ideal, and something we wanted very badly. But, sadly, it won't parse as either RSS 0.9 or RSS 0.9x and would need to be considered another flavour entirely. The magic of 1.0 is that it passed as 0.9 with just about every aggregator, tool, and application under the sun. Those who thought it was 0.9 got what they were expecting. Those who thought it was 1.0 got to rake in all the yummy goodness that brought with it -- mostly namespaces, mind.
Rael
Rael, thanks for coming in and adding this explanation about the whys and wherefores of the container.
See, this is the type of thing we all need to calm down and discuss.
There is a cost to every move we make -- enough so that we owe it to all of ourselves to take a deep break -- hold it -- exhale, and then come together as a team and work this through.
Thanks, Joe and Rael. Very much.
Sorry, typo city: deep breath -- hold it -- exhale...
OK, this is an excellent example of how RDF can be simplified. But it still contains what would appear to the uninitiated developer to be redundant data (the URL is duplicated in rdf:Description and link). What does this duplication allow you to do that justifies its existence? What features does this make possible that would otherwise be impossible with a simpler syntax?
The rdf:about in the rdf:Description (and in 1.0's ) is the "permanent identifier" of the item (like ), whereas the can change over time.
Sorry to keep harping back to the relational data model, but this is a similar effort went through this same pain years ago, and we can see the benefit today.
Think of the rdf:about value as the identifier for a table. As Ken said, this never changes. Now I use the URL of the file as the URI for the rdf:about, but I didn't have to. And if I were to move the file, I would change the link, but I wouldn't change the URI -- that's my contract with systems that consume this RSS that this identifier will always represent this resource regardless of the resource's true location, or status.
Absolutely essential for the use of RDF/RSS if one wants to persist the data to a store (such as ThreadNeedle if I didn't hit other problems with its implementation.)
This is a long winded answer to what Ken said.
One issue: RSS 1.0 defines "item" as a class, but you're using it as a property. That means RDF parsers that understand RSS 1.0 (as opposed to XML-based RSS parsers) will probably be confused.
RDF tools will also not see an explicit ordering for items. I suppose that's okay from an application perspective if items are dated and we want a chronological order.
How about this sytax? (I'm using parentheses for angle brackes, because I don't know how this board handles markup)
(channel)
...
(items)
(rdf:Seq)
(rdf:li)
(item rdf:about="...")
...
(/item)
(/rdf:li)
(rdf:li)
(item rdf:about="...")
...
(/item)
(/rdf:li)
(/rdf:Seq)
(/items)
...
(/channel)
This is valid RSS 1.0, and all a RSS 0.9x parser would have to do is ignore the items, rdf:Seq, and rdf:li elements.
Mark: Technically speaking, you don't need the rdf:about attributes in the item element, but including them makes life easier for generic RDF processors.
In RDF, the rdf:about attribute is used to indicate the URI associated with a resource. Resources do not need to be identified by URI, but there is no general way to refer to them without one.
Again, I'm using parentheses instead of angle brackets.
If I write:
(item)
(link http://example.org/51(/link)
...
(/item)
I am saying "an item with a link of http://example.org/51'" Note that the value of link would be interpreted as a string, not a URI.
Whereas, if I write this:
(item rdf:about="http://example.org/51")
...
(/item)
I am saying "the item http://example.org/51". Note the use of the definite article there: "the item", not "an item". A generic RDF processor seeing two items with the same value for link has to assume they're two different items.
There are proposed languages (like DAML+OIL) which would allow us to define the link property in such a way that two resources with the same link property can be assumed to be identical, but it's a lot less effort to just use rdf:about. If I were to get rid of anything, it would be the link property.
Thanks for clarifying what I said about the URI Dave.
As for backwards compatibility with RSS 1.0, didn't even try. I was trying to demonstrate to Mark and Sam and Joe how one can have the new simple syntax and still keep the product compatible with RDF with the additions of two simple constructs. Keeping the dialog open as it were.
This is, by far, the best format I've seen to come out of the discussion so far.
One question of an RDF newbie: Why do we need that (rdf:Description) element? Why can't we simply put the @rdf:about attribute on the (item)?
Another RDF newbie question (and perhaps this is not the best place to ask...) is why do we need to be explicit about Seq/Bag issues in the document itself? Why can't a "true" RDF processor understand what elements are which by reading an external schema which the document can link to?
This is, by far, the best format I've seen to come out of the discussion so far.
One question of an RDF newbie: Why do we need that (rdf:Description) element? Why can't we simply put the @rdf:about attribute on the (item)?
Another RDF newbie question (and perhaps this is not the best place to ask...) is why do we need to be explicit about Seq/Bag issues in the document itself? Why can't a "true" RDF processor understand what elements are which by reading an external schema which the document can link to?
Hi Ziv
I tried to answer this in the comments, but you really are asking the question that goes to the heart and core of RDF. So, I'm pulling the answer into a separate post.
I don't know, Bb. The problem *this* ignoramous sees is that it's just too unnecessarily complicated syntactically. The discussion is at the level of principal at this point and a bunch of coders are running around... coding. They are creating numerous, overlapping, redundant (counter)examples of the same thing. No added value, no alternative views, no sense of gradually expanding capability and power through modularization, etc.
You argue (I think) to "keep RDF in" because doing so doesn't make any difference in the present implementation space. Others argue to leave it out because it doesn't make any difference in the implementation space but then allow for its inclusion on an arbitrary, application specific basis.
This isn't a well reasoned discussion that's going on. Rather than thinking like techy coders, why not think like the rest of us. Start from the premise: Just becasue something *can* be done is insufficient reason to actually do it. Why? Because it will make more work and cost more money, without a corresponding sense of the payoff. Think about the dumbies like me who are just trying to understand so they can explain (and sell) these ideas to other, even more ignorant, people. What to say about what it is? A way of notifying people about new content of interest somewhere on the internet; come and get it. How to do it? Notify the subscriber about the who, what, where and when of the content, period, finito. Is there added-value opportunity? Yes, you can enrich the message with just about anything you want assuming you have a business case for doing so that includes the impact of added complexity, design, development, maintenance, processing and band-width.
That's the story.
As a note on your post, from a curmungeony English grad: you make me work too hard to find out what your point is. I had to read through the whole, to me boring, exposition, all the tags, try to remember about namepaces and how controversial they still are, place RDF in the confusing welter of other specs generated by the hot-house heads at W3C, before I got to what you "were trying to say." Please recall that, while you may be rolling with the coders, you're still visible to the rest of the world and, like the prevailing attitude toward browsers, lowest common denominator is a good rule to keep in mind. And, thanks for your effort and work; it is appreciated.
...edN
Ed, this posting was directed primarily at the techies, not because I want to ignore the greater community of RSS consumers, but because I'm trying to convince other techies to keep the door open a little longer, keep the conversation going a little longer.
It was, in some ways, a desperate act.
I'm a little late getting back to you; my apologies.
I understand desperation. I should probably have put that into the lumpy flow of what I was trying to say; the point being: the level of discourse is, to a large extent, the problem with this problem.
If the discussion were moved to a different plane, e.g. business models, user requirements, goals, plans, whatever, and away from the syntactical components, things might improve. Now its, "here is this thingy and *my* version of it is either *right* or demonstrates that *your* version a) has no merit or b) is inadequate.
Isn't this an endemic problem with "open" processes? Even with the blogging process? I like getting my hands dirty with this stuff as much as the next person, but it's important to realize when a proliferation of dirty hands is causing an illness. In such cases, wash the hands and move to a different mode of work, of communication. At least that's what I tend to try to do.
...edN
PS: hope your interviews are fruitful.