BURNINGBIRD
a node at the edge  


September 06, 2002
TechnologyMyths about RDF/RSS

Lots of discussion about the direction that RSS is going to take, which I think is good. However, the first thing that happens any time a conversation about RSS occurs is people start questioning the use of RDF within the RSS 1.0 specification, and the necessity of keeping RSS "simple".

Mark Pilgrim writes:

    Many people in the RSS community feel that, while the lack of extensibility in RSS 0.9x is too limiting, the full-blown RDF syntax of RSS 1.0 is overkill for the purposes of syndicating weblogs.

Jon Udell writes:

    RSS is becoming too complex. It needs to remain simple, human-readable and -writable.

Well, this just plain peeves me. Not Mark or Jon's statements, but the idea that a) RSS must be human readable and writeable and b) RDF makes RSS overly complex.

Specifically, there are three myths I want to address:

Myth 1. RDF adds complexity to RSS because the RDF Seq element is unnatural and adds an extra layer of processing.

Hanging from a tree dressed in orange, purple, and lime green while reciting the Gettysburg Address and drinking a glass of water dyed blue at the same time is unnatural -- the use of RDF containers (which is what the Seq element is) in RSS is to provide some structure to the data. (See my RDF/RSS file, generated by Movable Type for examples during this dicussion.)

The RDF Seq container provides an explicit ordering -- top down -- to all the elements contained within the tag. Without the Seq element, there is only an implicit assumption that all items are processed in a certain order.

I'm not fond of RDF containers myself principally because there is built-in processing associated with them, though I understand their use in maintaining relationships between elements; however if I was a tool builder, I would at least understand what Seq means, and that helps eliminate confusion about the specification. If you didn't have the RDF Seq container, there might be an assumption that the item ordering is important, but there's nothing enforcing this assumption.

Not using the Seq container is as bad as the defining the <em> element in HTML -- exactly what are we, as tool builders, supposed to provide with this element?

Joe: Well, I'm building my browser to use italic font, same weight and line height as the surrounding text. That's emphasis.

Sara: Well, I'm building my browser to use a bolder font, and to increase the size as well. This is emphasis we're trying to define here.

Dubya: Em? Auntie Em?

Myth 2. RSS must be human readable/writeable

Let's get real about markup -- markup in not human readable and writeable. I don't care if you're talking SGML, HTML, or XML, markup is not meant to be created and consumed by humans. Now, we may adapt and learn to work with markup. However, we can also adapt to spending 8 or more hours a day in a small, cramped, walled in, windowless, artificially lighted and ventilated environment, too, and that's no more human than markup reading and writing. Markup exists to be generated by automated processes and consumed by automated processes.

All you webloggers out there that create your RSS feeds by hand, raise your hands. Now, those with their hands in the air, dump whatever tools you're using to build your weblogs and get Moveable Type and let the machines do what we pay them to do.

Myth 3: RDF doesn't add anything to RSS

I remember a debate several years back about how the relational data model was too complex and didn't add any value to a company's business.

RDF is the relational data model of XML. Now, it's true, I'm writing a book on the subject and am biased. However, I'm writing the book because I believe in the concepts of RDF, I don't believe in RDF because I'm writing a book on it.

RDF provides a strutured meta-data language that can be used to define any XML vocabularly, providing rules to ensure that all instances of the XML that use the vocabularly are consistent with one another. In addition, with RDF you have a host of pre-built tools and APIs that allow you to access the data from many different business vocabularies with little or no change to the underlying technology. May not seem like much, but believe me, this will get you buy in on new technology at a company faster than whether there's a version tag in the specification. After all, it worked for Oracle.

I'll have more to say on this debate but it's late, and I'm tired. Another day.



Posted by Bb at September 06, 2002 12:53 AM


Trackback Count (1)

Comments

Though I'm not really an inhabitant of geekspace anymore, at least in terms of employment, I'll got to chime in with you there, at least in terms of markup 'human-readability'. Used to be one of my pet peeves (one of those things that I'd jump up and down about and hurl my feces at others in meetings (which may be why I'm not employed by a tech company any more (heh))) when people trying to sell to the Suits the wisdom of data expressed with XML by talking about 'human-readability'. Poop and pshaw.

Posted by: stavrosthewonderchicken on September 6, 2002 03:07 AM

Have the machines do what we pay them to do... dreamer!

Posted by: Kafkaesquí on September 6, 2002 07:39 AM

Sorry BB, but I have to disagree with your take on myth #2. Let's take RSS as an example and push the complexity to the limits just for the sake of illustration. For example, lets have all the element names not be sensible things like 'title' and 'description' but MD5 hashes of the sensible names. Shouldn't make a difference should it, it's just read by a machine. But would you want to implement that? Would you like to debug the handling of a file that looked like line noise? Someone, a human, has to write the code that manipulates that file. Some human has to debug that code. And if you are talking a widely used format like RSS than that implementation pain has to be endured multiple times, at least once for Perl, once in C#, once in Python, etc. There is a cost to software. Even if that cost can be amortized by building a library and that library is widely used, there is still a cost to be paid. In this particular case you can see that cost in the very slow uptake of RSS 1.0 even after being out two years.

Posted by: joe on September 6, 2002 07:52 AM

Joe, if we had APIs that could process this data, then the people who create the applications wouldn't have to build their own functionality.

There are the APIs, there are the tools, that's the power of having a meta-language because one API can be used for many purposes.

Is RDF/RSS really that difficult? There are a shitload of tools that work with it now. I don't think that's the problem.

The problem is that Userland has fought it, undercut the ground it lives on, and not because of RDF being in the mix.

Posted by: Shelley aka Bb on September 6, 2002 08:32 AM

Re: Myth #1: the structure of an XML document provides an implicit ordering. As far as I can tell, no one has ever had a problem parsing an RSS 0.91 file and wondering what order the items came in. Providing an explicit ordering is redundant for this application.

Re: Myth #2: (raises hand). At least, I've had to build an RSS template by hand while I was creating a tool. And I've had to repeatedly tweak my Movable Type template by hand because Ben and Mena, two very intelligent and very well-meaning programmers, repeatedly got it wrong. And then I changed it during the dog days of DIA to only contain 2 items (since my accessibility posts were so long, and only once a day) and I twiddled the wrong bit and suddenly the rdf:Seq and the items lists were out of sync. *This happened to me*. I'm a smart guy. I actually understand RDF (in its pure theoretical form); I've used real RDF parsers (RDFLib.net) to things that RDF was meant to do. But the only thing I have at my disposal for this application is a TEXTAREA with a bunch of angle brackets and template tags.

Re: Myth #3: RDF is great, RDF is wonderful. I'm agreeing. It's also overkill for this application, and those tools you speak of are not so widely known, widely understood, or widely used. This point was made back in 2000; I linked to it. In two years, the situation has not improved as much as you might think; if anything, it's gotten worse as RSS has become more popular. There are hundreds of private little RSS parsers that are using nothing more than regexp, coded by people working in VB or Flash, or in hosted environments running Python 1.5.2 without the ability to install any new software.

Dave has my full support for taking back the RSS name, and for removing RDF for RSS once again. Maybe it'll stay out this time.

Posted by: Mark Pilgrim on September 6, 2002 08:49 AM

Bluntly, the RDF community dropped the ball on this by not getting more involved in this issue, but they've been working on the very simplification that people have been asking for.

Mark, between you and Dave and all the other 'names', I haven't a chance on this issue. I'm not going to heard, and I'm not going to even try, you've all decided among yourselves that you're going to allow Userland to own this specification, not try and work with a meta-language that can open doors (albeit in the future, it's true, and weblogging is nothing more than "I want it and I want it now").

And the very fact that a few A-list names will arbitrarily make this decision makes the need for this being a team effort with open access to all even more apparent, but that's beside the point.

You all said, why do we need Seq? I explained why, because didn't we learn anything from HTML? That we can't rely on the "assumptions" of tool builders? We need to start adding structure, so we can start having some order out of the chaos? Sure it takes work now -- the early days of relational databases was a mess. But it, like with RDBMS, has a huge pay off in the end.

But what's the use? No matter what I say, the decision has been made.

You all won't stop me from talking, but what will be the point? Dave stayed up all night to work it, throw some shit in, case is closed.

Posted by: Shelley aka Bb on September 6, 2002 09:00 AM

I can only comment from personal experience. I have written Aggie the news aggregator and I am working on Pamphlet a blogging tool.

I may have missed it but I have not found any such API/tools for RSS/RDF in .Net.

RSS/RDF is slightly more difficult to work with than RSS 0.9x versions but I chose to utilize it as my base format for Pamphlet regardless because it offered the use of namespaces, that is if I got stuck on something I could safely add my own elements to RSS 1.0 in a standard mannner and be assured that I wouldn't break any aggregators. Now if I could choose between RSS 1.0 and an RSS 2.0 as outlined by Mark Pilgrim than I would hands down choose 2.0. Why? Because I get no benefit from RDF. It is extra markup I add to my XML that gives me no benefit at the moment. Now if google were to start indexing RDF/RSS 1.0 and allow querying on that RDF with all the semantic goodness that RDF promises than I could see the benefit and would probably pick 1.0. But as it stands today I get no benefit from the RDF.

We're all very familiar with Dave's antagonism
toward RSS 1.0, that is very true, but I think
that if the RSS 1.0 spec had looked like
Mark Pilgrims 2.0 proposal that no amount of
ranting and raving by Dave would have stifled
its growth. I am not defending Dave Winer
here, I am saying that he is just one person
and could not be singly responsible for
keeping the RSS 1.0 adoption rate so low.

Posted by: joe on September 6, 2002 09:06 AM

Joe, RSS 1.0 doesn't have that low an adoption rate. It probably has about as much as the 0.9x. However, within the weblogging sphere, Dave wields disproportionate influence, enough to keep this effort from becoming a team effort rather than a Userland effort. Or enough to keep the split going.

However, as I said, what good does it do me to even continue this discussion -- the decision has been made. I'm just wasting everyone's time. No one's really willing to debate the issue, to really look at how much actual complexity is involved with having RDF in the mix, to see what it can buy, to look at the tools, to compare. Maybe, just maybe, think about it a bit before blithly just saying "Well, this is being returned to Dave. Dave owns it. I support Dave."

Posted by: Shelley aka Bb on September 6, 2002 09:11 AM

Bb, I don't think it's quite fair to dismiss Mark and Joe's arguments by calling them "A-listers."

Like Mark, I disagree with your Myth #2. XML *can* be human-readable, if it's designed that way. It doesn't take but fifteen minutes to explain enough XML syntax to let a person read well-designed XML. That is a *blessing*, and should not be lightly thrown away.

RDF is by and large not human-readable -- not by this human, anyway. Part of that is not RDF's fault -- it's the namespaces morass. Part of it *is*, though, and I find myself in considerable sympathy with those who bash RDF on this score.

Is RDF harder to read than it needs to be? I frankly don't know. I can't read it well enough to suggest redesign possibilities. Are they working on it? You say yes -- great! I look forward to the next version. Until then, though, is it worth shoving the non-human-readable version at humans? Maybe RDF needs to wait for RSS 3.

I sympathize with you on this. I really do. The OEBF has been considering similar issues for some time, and the arguments I'm seeing here on both sides are eerily familiar.

Still. In this case, I do want something I can manage with a text editor. RDF gets in the way of that, at least for me, and I'm just plain goofy about the kind of thing I'll manage via text editor. I can only imagine how weird this all seems to someone less enamored of text editors than I.

Posted by: Dorothea Salo on September 6, 2002 09:23 AM

This is where I am getting my facts about the adoption rate of
RSS 1.0:
http://www.syndic8.com/stats.php?Section=feeds#RSSVersion

If you have other sources I would like to see them.

I really tried to work on the complexity of RDF in RSS.
I read the specs and tutorials, etc.
My little example on my web page even Validates as RDF.
After all that my little example is still too complex,
in my opinion, and personally I think the XML
Serialization of RDF is broken. I'll wait for your
book to prove me wrong ;-)

About Dave, please go back and re-read my messages.
I have consistently referred to 'Mark Pilgrims' proposal.
I, too, would like to see the RSS 2.0 spec be an open
process and not 'owned' solely by Dave Winer.
Please don't just throw up your hands and walk away now.

Posted by: joe on September 6, 2002 09:26 AM

Wow, BB you called me an A-lister?

Posted by: joe on September 6, 2002 09:29 AM

Reducing Joe and Mark to A-Listers is wrong, but so is a statement such as "Dave has my full support for taking back the RSS name, and for removing RDF for RSS once again. Maybe it'll stay out this time."

This just came up this week and we've not even had a chance to discuss this. Past is past and there's new players involved now, new issues, new everything. Now's a time when we should say, well let's talk about this.

I felt like I just started talking yesterday, and the door was slammed in my face with that one statement. I don't have the influence to try and push this door open.

And I'm not sure I even care anymore.

Posted by: Shelley aka Bb on September 6, 2002 09:29 AM

Believe me, it is with no small amount of trepidation that I chose to support Dave in this matter. I know exactly how politically charged the issues are. Both the technical issues (RDF or not), and the political issues (naming). I've done a lot of catch-up reading in the past few days, and based on what I've read, I have come to the conclusion that the introduction of RSS 1.0 was handled incredibly poorly. I read Aaron explaining the issue to Dave by saying "RSS is RDF now, isn't that great?" and explaining to others who complain about the added complexity that they don't need to worry themselves with all that and that they should just get better tools. I read Ken admitting that it might have been a good idea to have a working group *before* the proposal went public, instead of after. I see lots of people other than Dave wondering publicly why RSS 1.0 couldn't just be called XRSS (a cooler name anyway) or something, and trying to do something about it, and failing.

RDF, in and of itself, is not a virtue. It's what it allows you to do that makes it good or bad. In the two years that RSS 1.0 has been around, I have yet to see the killer app that required it to be RDF. Modules? Yeah, modules rock. I use RSS 1.0 for the modules.

I see an analogy to CSS. CSS, in an of itself, is not a virtue. But I finally convinced people (even Dave) that CSS was a Good Thing by showing how having good semantic markup actually matters for some people (for accessibility reasons) and therefore you need CSS to munge your markup back into whatever you want it to look like. I have not yet seen a similar argument for RDF. (Doesn't have to be accessibility-related, of course; I haven't seen *any* "killer app" for RSS 1.0 that required it to be RDF.)

Posted by: Mark Pilgrim on September 6, 2002 09:38 AM

Actually, joe, I think that 14 percent is pretty good considering the early base that 0.9x had from the my netscape day, and Userland's push of same. That 14% came in bucking a pre-established trend, and that ain't bad. I also have a strong feeling that this number will be changing in the next six months.

Regardless, what I say doesn't matter now. The decision has been made. This is incredibly frustrating.

Posted by: Shelley aka Bb on September 6, 2002 09:40 AM

Well, it's made for now. Doesn't mean it'll *stay* made.

Repeating myself somewhat -- I like RDF, what little I understand of it; it's just the XML serialization that strikes me as horrendously messed up. *This is fixable*, and being fixed.

RSS 2 (or whatever it ends up being called) is highly unlikely to be the end of the the line. RDF isn't banished forever (especially not by a throwaway line from Mark). I can easily envision an uptake in future development, when RDF tools are ubiquitous and Bb's RDF book makes it all clear to everyone, even stupids like me.

If I were you, BB, I'd work on anticipating that day... and perhaps on some XSLT stylesheets to move RSS 2 to an RDF-compatible form, insofar as possible. (*Is* it possible?)

But I'm just a big stupid, so don't mind me.

Posted by: Dorothea Salo on September 6, 2002 10:05 AM

Dorothea, if RDF is pulled out now, and this new 'RSS 2.0' continues the direction it's going, that will most likely be the end of RDF/RSS -- you don't need two specifications for syndication.

As for the book, boy I know you and Joe mean well with this, and that was a kindness, but this reference is about the same to me as Mike calling you sexy, Dorothea. This wasn't the book. This was about me genuinely believing that ultimately, the community will benefit by the extra overhead of RDF now, for future gain.

If I can't marshall effective arguments for this, then this is my problem for not communicating my interest effectively. Okay, this happens.

My frustration is that I don't know if my arguments are 'poor', or if decisions were made and no matter how effective my arguments could be, nothing I, or anyone, would say will work at this point to change minds.

If the reason is the former, and my writing isn't at least opening the door to further discussion because I'm not providing effective counter-points, then I'm a hell of a person to write the RDF book, aren't I?

And if it's because of the latter, then what does that say about this whole process? Expediency over quality?

I hope this explains my frustration, if nothing else.

Posted by: Shelley aka Bb on September 6, 2002 10:28 AM

Sorry, Bb, didn't mean to wound. I will point out, though, that it's hard to convince people of something when they don't have enough knowledge to follow you. Ergo not succeeding at this task of convincing says nothing about how successful your book will be; two very different writing tasks starting from two very different places.

As for expediency vs. quality -- I'm the wrong person to tackle that one. I don't understand the quality issues, and I wasn't involved in the decision. (If it is a decision. I'm not *quite* sure it is.)

As for two specs -- what I'm saying is, if we're too blind right now to see the benefits of RDF, that doesn't imply we will remain blind forever. No two specs, therefore -- just evolution toward RDF of the existing spec.


But perhaps this isn't possible? I wouldn't know.

Posted by: Dorothea Salo on September 6, 2002 10:50 AM

Following up on Joe's point, my RSS 2.0 prototype is what I wanted RSS 2.0 to look like. Clean core namespace -- title-link-description, and maybe language -- and everything else in modules. Others agree with this philosophy:
http://w3future.com/weblog/2002/09/06.html#a123

I advocated this philosophy to Dave Winer in email last night, and he replied that he would prefer to keep the existing (albeit optional) elements in the core namespace. I feel this is unnecessary, and I agree with Sam Ruby that this represents a missed opportunity.

Posted by: Mark Pilgrim on September 6, 2002 10:51 AM

For reference, Sam's post is here: http://www.intertwingly.net/blog/?entry=804

Posted by: Mark Pilgrim on September 6, 2002 10:53 AM

Lies, damn lies and statistics. Wasn't that Twain's saying? The format stats on Syndic8 may be misleading. There are a ton of feeds from aggregators like moreover and newsisfree. There are also a significant number of dead or broken feeds in there. Those numbers are not as clear as you might think. They should not be used to argue the validity of one protocol over another as it's more complex that it appears.

All these arguments about human-readability get on my nerves. Sure, it's handy to hack out an XML document by hand. But the amount of stuff that gets left out, like oh say, encoding, can have significant impact on the target system. Refusing to use a tool designed to manage the creation of content smacks of refusing to use a CRT because you liked console switches. Please, spare us. But at the same time expecting to be able to do XSLT on what might be a resource impaired reader is also being extreme. That cell phone or poorly written proprietary desktop app might not have the ability to use transforms. So let's strike a balance somewhere in between.

The RDF folks, however, need to do a much more convincing pitch. Show me the Visicalc for RDF and I'll be prepared to believe.

Posted by: Bill Kearney on September 6, 2002 11:48 AM

I didn't see anyone refusing to use tools, not even me. I *do* see people who want to be able to read, understand, and occasionally reproduce by hand the output of tools. I'm kinda ornery that way. Probably my problem as much as anyone else's.

The XSLT stylesheet I suggested was meant at least as much for human as machine consumption. Such a thing would help me, at least, understand "oh, that corresponds to that, and this goes there -- gee, it starts to make sense now."

Plus it would be a gimme for toolmakers who want to hedge their bets. Hardcode one and XSLT to the other. Or don't even XSLT -- just use the stylesheet as a guide to hardcoding your own transform.

Posted by: Dorothea Salo on September 6, 2002 11:55 AM

Okay, channeled my frustration into useful actions. I hope. Please look at the next posting I just did (http://weblog.burningbird.net/archives/000516.php).

Doesn't this at least open the door for talk? From this new posting, am I being unreasonable? Do I have a voice in this?

Posted by: Shelley aka Bb on September 6, 2002 01:41 PM


Post a comment

Name:


Email Address:


URL:


Comments:


Remember info?