September 23, 2002
Who is your audience, and what are you trying to acccompish?
Just posted the following over at RSS-Dev (edited to remove typos):
There seems to be three separate threads running along the lines of "Who are we and what are we trying to accomplish", mixed in with proofs and justification of keeping RDF in the mix. How can the energy expended into these threads be coalesced into a determined course?
I asked the question, here and elsewhere, who is your audience? This isn't marketing or make work. This is a genuine attempt to understand what this group hopes to accomplish other than working with cool technology for the sake of the technology. What is the business of this group?
If RSS, past and current, is based on providing syndication and aggregation feeds, and nothing more, than I agree with those that say RDF adds nothing to the mix, and not because RDF adds complexity -- the reason is because the business of RSS isn't necessarily compatible with the business of RDF.
In the last few weeks, Phil Ringnalda has been working on a application to process RSS 1.0 files and combine this with FOAF to provide a sophisticated interface allowing us to find who has posted or commented on what topic. Yesterday he hit what is probably the core difference between the business of RSS and the business of RDF -- the fact that tools generate labels for blank nodes, and that these labels will vary each time the same file is parsed. (See
http://philringnalda.com/archives/002327.php). RDF/RSS (RSS 1.0) has blank nodes.
RDF is a meta-language for describing items that exist in such a way that this data can be processed with the same set of tools and combined with a great deal of confidence that this mergence results in a valid pool of rich data. It is literally a markup version of the relational data model, and as such, is extremely useful and necessary to help with the chaos that XML created. However, there is an implied persistence to the items described with RDF, the same as there is with relational databases. Data may change and be removed, but there is no temporal self-destruct attached to the items.
RSS, as the majority of those who view it (the users, not the tool developers) is a syndication feed -- nothing more than recently updated items that can be polled and aggregated. There is no implied persistence. In fact, the business of RSS is based on impermanence.
This is a major difference in 'business' between the two concepts. From a database perspective, this is equivalent to using an RDBMS when a flat file of comma-delimited data is all you need.
If this group wants to continue providing a specification that defines syndication feeds, then it needs to consider that RDF not only doesn't buy the group anything -- it can harm the tool developers that use the spec. (Not to mention that trying to use RDF inappropriately can actually negatively impact the acceptance of the RDF specification.)
If, however, this group sees that what they're working on transcends throwaway syndication feeds, then it needs to formally define exactly what the business is _before_ trying to create a spec that implements it. Hence my questions: who is your audience and what are you trying to accomplish?
Specific instances of technology aren't an answer to these questions. This isn't answered by, "Well, we'll just continue as is and use XSLT to handle any problems in the future" or "We'll use modules". If you find yourself answering these questions by referencing technology, then either you're missing the point, or (more likely) I'm doing a piss-poor job of explaining myself.
What is the problem this group is trying to resolve? What is the benefit this group is trying to provide that no other technology or specification provides? Who is your audience? Not the tool developers -- people don't write tools for no reason. Who are the consumers of the tools developed?
What are you trying to accomplish?
This understanding of the basic business goes beyond a name, though the name of 'RSS' is drastically adding to the problem by forcing a type of business on this group that this group really doesn't want, as well as adding an element of competition that is both unnecessary and harmful.
Perhaps this group really isn't interesting in throwaway syndication feeds. Perhaps this group is interested in finding ways of describing publication units that may or may not be smaller or bigger than an individual web page, and a side benefit of this is that the data can be used for aggregation purposes. Or not. I don't know -- the group hasn't told me what the business is.
If you continually have to justify the use of something over and over again, either you're wrong, or your audience is wrong. In either case, you need to re-focus your efforts, and either find a different audience, or stop beating a dead effort.
Posted by Bb at September 23, 2002 07:58 AM
Trackback Count
(2)
My thinking about the intersection is this. When I create a persistent RDBMS object, like say an article, or a comment like this I may want to look up later, or a word document for a group of collaborators, and I want to publish it to its audience with some description, I use metadata (possibly automatically generated) to describe it, and provide this metadata in a standard form(RDF).
Now this 'publishing event' has occurred. And I want to popularize it to some community, either a group, or the public at large. I may do this on a each item basis, or on an hourly or daily basis, collecting the objects I publish. So, I need a format which publicises these objects.
For better or for worse, it seems that RSS is taking up the role of being the event bus. It says nothing about the ephermality of thee objects themselves, but rather than they wer just published, or modified..
I agree with you Rahul, that RSS really is tantamont to an event bus. But normally in these circumstances, you want to keep the data small and simple. Normally, you don't attach the entity meta-data that pinged the event to the event itself -- only enough information to determine the event with the ability to re-create the event.
I think you're on to something here.
Well, looking at RSS as it is now, it seems to me that we've got an odd mixture of persistent and non-persistent data. Sure, individual items in a blog don't persist -- but the blog does, the blog's author does, the main blog URL does, etc.
My question is, is there a purpose/use/case for syndicating the permanent blog-related info as well as the individual items? If so, RDF looks pretty good for that, but I don't know that RSS is the place for it. If not... I'm where Shelley seems to be going -- why bother?
Maybe there are two efforts going here -- syndication of ephemeral content, and publication/aggregation of non-ephemeral metadata.
Or maybe I'm blowing smoke out of somewhere unmentionable.
I look at it this way:
An RSS feed is a resource.
When you download (HTTP GET) the feed, you get an instance of that resource (a snapshot at that moment in time).
When you interpret that instance as RDF, it (the instance) asserts the truth of a set of statements.
If you download the feed again, you might get a different instance which could assert different triples, but even if it asserts the same triples, they are being asserted in a different context. The triples which the previous instance asserted are still asserted in their context.
I imagine that's as clear as mud, so let me try an example. Let's say I have an RSS feed at <http://example.org/rss>. You download it at 10:00 and note the information it says, such as its title being "Example feed". This means "At 10:00, <http://example.org/rss> said 'the title of <http://example.org/rss> is "Example feed".'" You probably believe it, and add "The title of <http://example.org/rss> is 'Example feed'." to your database.
Let's say you download it again at 11:00, but now it says that its title is "Example RSS". The general interpretation is "At 11:00, <http://example.org/rss> said 'the title of <http://example.org/rss> is "Example RSS".'" You might choose to change the value of the title of <http://example.org/rss> in your database to reflect this new assertion.
Even though the title asserted by the feed changed, the general statements "At 10:00..." and "At 11:00..." are permanent and unchanging.
Context is one of the big open issues with RDF right now. I don't think there is any consensus on how best to represent it, but this doesn't prevent people from working out their own solutions or deciding what statements to believe.
I don't see a conflict between RSS and RDF. An instance of an RSS 1.0 feed represents a set of statements which the instance asserts to be true. I am free to believe or disblieve it. The most likely decision is to believe the most recent instance, discard the triples from the earlier ones, and ignore the contextal issues.
You wrote about this in your recent article RDF: As Simple as A, B, C: "RSS captures a rich set of information about a specific web page or weblog posting ... What a pity to put this into a form that will only be thrown away." At the time, you mentioned that you store that information in RDF format in the header of your archive pages, which, as I wrote in my blog earlier today, is a much more sensible use of RDF than putting it into an ephemeral syndication feed. As you've said today, RDF is only relevant when the information being syndicated has some permanence.
Phil writes, "As you've said today, RDF is only relevant when the information being syndicated has some permanence."
I disagree. Consider movie listings, or job adverts. Both are (relatively) transient forms of information dissemination. On the scale of minutes, hours, days, maybe weeks. Or last minute cheap flight offers. In all cases, the core RSS vocabulary of channels, items and their titles and links is enough to convey a basic blurb about the events and associated things. But additional vocabulary could in each case be used to make sure interested parties are notified of the associated short-term opportunity (see a movie; apply for a job; get a cheapo holiday) before the chance dissapears. RDF is as applicable to this task as to the description of long-lived resources. Perhaps more so, in that precise description and filtering is all the more urgent when time is of the essence.
Dan
Dorothea, I think you have it dead on. And this also goes to what Phil said about me attaching my RDF/RSS to my individual posting.
Dan, you've gone beyond simple syndication/aggregation when you talk about jobs, tickets, movies. This goes back to my core question -- what is the purpose of this group? If it's syndication/aggregation of web content, then you don't need the power of RDF. But you're not talking syndication/aggregation anymore. You're talking Rahul's event bus. And again, you don't attach entity meta-data to an event.
RDF is most likely the solution -- now what is the problem?
Anyone care to phrase it in "We need a..." plain language paragraph?
Why isn't the RDF model simply acceptable as the substrate upon which multiple meta-data feeds are based, with context (and temporality) left in the domain of the RDF-consuming application? I don't seem to see the point of objecting to simple syndication and aggregation via RDF due to the constraints of the application consuming RDF triples.
Shelley,
The metadata one attaches depends upon the application. In the syndication of an article, one needs stuff like title and description, but say I was publishing a video file(enclosure in radio terms). Then I'd put in a URL to it, if you like, a sort of xlink. And I may want to add metadata such as actors, directors, basically whatever allows the client app to categorize as necessary.
Namespaces in modules define the metadata a community needs to summarize the object.
But with metadata, RSS would seem to be a rather interesting event bus, one that transports 'references' or 'glorified links with attributes' around.
Do you really want to transport that stuff in a newsfeed, though? Presumably it exists in a permanent data store somewhere. Why not just link to it?
Maybe not in a newsfeed, but in a datafeed (or a metadatafeed or a resultsfeed), sure. And maybe yes in a datafeed if those small snippets of data were useful to the application. Perhaps just based on locality of reference: if I was syndicating a movie review I'd want to ship along with that review (or perhaps some intermediary service would inject into the syndication feed) quick artist/director bios or local playing times.
It seems that most informational websites have two kinds of data, news and archives, each of which can contain both native content and links to content elsewhere on the Internet. (This model deliberately leaves out web applications except insofar as search and other information-retrieval applications may be concerned.) Weblogs and RSS feeds address the whole "keeping the site fresh with news and updates" conundrum but they don't help with the other side (accumulating a useful and retrievable archive of information that is visible to the rest of the Internet). If RDF can help with this other side, we may be able to get beyond blogrolls and "search my archives."
(sorry, I may be using an email quoting idiom in a weblog context...:)
"""Dan, you've gone beyond simple syndication/aggregation when you talk about jobs, tickets, movies. This goes back to my core question -- what is the purpose of this group? If it's syndication/aggregation of web content, then you don't need the power of RDF."""
I don't think this is as obvious as you suggest. It all hinges on what we mean by 'web content'. I have in mind ordinary Web sites, not fancy ecommerce Web service systems. The sort of Web content that lots of people are right now busy clicking on.
The examples I give are certainly intended to show syndication and aggregation of news items from very mainstream Web content sources. The shift is that these are news items from ordinary Web sites (that are about, in my eg. Jobs, Movies etc etc), and not from 'news' websites (journalists) nor from the Weblog/blog and url-sharing community. This is all very ordinary mainstream web content, and we're talking about tech for publishing, sharing, merging, aggregating, hoarding, using. The problem spaces don't seem to have a crisp dividing line, unless we want to seek terminology that separates (for some reason) journalism and weblogs from the rest of the information-sharing Web world.
Things aren't so black and white: we _could_ share all this 'Web content' with plain old HTML. The more structure we impose (with XML, namespaces, RDF, modules...), the greater the opportunities for automation, sophistication etc when dealing with this content. Less work for humans (we hope). To cast things in a 'this isn't _xyz_ content therefore RDF is too powerful / irrelevant' seems overly boolean. I believe RDF is yet to prove its value in the app areas I sketch, but to dismiss RDF because it is 'obviously' irrelevant to 'syndication/aggregation of web content' seems to be taking a somewhat restricted notion of Web content. Lots of Web content is in the areas such as I sketched (see http://dmoz.org/ for ~4 million sites).
Hmm, maybe we really are just talking past each other thru using 'syndication' and 'aggregation' in different ways. Do you have a particular definition in mind?
This thread got me thinking about what makes RSS useful in the first place. What is RSS about at its core?
My understanding now is that the key feature of RSS is the list of items in the channel. Everything else--textareas, images, titles, descriptions--is secondary (but useful). Boiled down to its essence, the essential part of RSS is rss:items.
With that in mind, I threw together this quick description of RDF Channel, which manages to capture the semantics of RSS in five terms, four of which are optional. In fact, RSS can be viewed as an instance of RDF Channel.
Food for thought.
I may be missing something here, but where does it say that RSS entries aren't permanent? I've always assumed that the entries in an RSS (or RDF feed) describe the entities referred to by their *permalink* URI.
RSS is a set of Web links. Like any set of links, some of them disappear over time, but ideally they *shouldn't*.
To Dave Menendez, I agree that the essence of a RSS feed is its items, but RSS does a much better job of capturing itself in that regard than RDF does. Keep it simple.
To Danny, I agree. I expect the items in an RSS feed to be persistent. But sometimes RDF advocates are making a much "deeper" statement, too deep for my little mind to comprehend. This might be one of those times.
To Dave Winer, simplicity means different things to different people. The RDF data model is simpler than XML, for example. (RSS in XML is simple because it ignores the XML information it doesn't need, which is most of it.) From your link, I'm guessing that you're referring to authoring simplicity. Well, RDF Channel (which for now is just a thought experiment, by the way) seems just as easy to author in its most basic form, and someone could make a program akin to HTML Tidy to help out those unfamiliar with the RDF/XML sytax.
As for persistency, I think the confusion stems from a lack of experience in combining RDF graphs. (I'm talking about collective experience, not about individuals. There don't seem to be any good writing about methods and algorithms and such.) A single RSS 1.0 instance by itself is pretty easy to model, but if you want to add that graph to an existing data store you run into a lot of issues you might not have considered.
i havn't read Shelley's original question answered in this thread...
what is it you are trying to accomplish?
put that another way...
what is the problem you're trying to solve?
try write down that question... without giving a technical solution.
I'd love to hear people express this.
For example...
"We need a way to determine when new content has been posted to a site and to know, via a summary and link, what that content is about and where it is located. We need to do this in a way that's on demand."
To me that's what earlier versions of RSS have attempted to tackle. Maybe that's not what RSS *is* (and that's a very difficult question) - but to me that's the problem it's trying to solve.
Now what is it? Try and express it without suggesting the solution.
As I see it, RSS is a way to associate a resource (called a "channel") with a list of resources. This list changes over time, and the contents of the list at any given time have the connotation of being "current".
There's also a whole bunch of stuff that lets you give additional information about these resources, but the list is the key.
Shelley,
I sent a response to your mail (see http://groups.yahoo.com/group/rss-dev/message/4005); I assume the noise level must have made that vanish...
Hi Dave. That's a statement of what RSS is.
I want to know the problem it is attempting to solve. I think that was Shelley's query as well.
That's a subtle, but I think, very important difference.
>As I see it, RSS is a way to associate a >resource (called a "channel") with a list of >resources. This list changes over time, and the >contents of the list at any given time have the >
>connotation of being "current".
What is the problem being solved?