<Glazblog/>

Microformats 2.0

Summary : current microformats are a "quick and dirty" way of doing. We can do better.

Microformats, hAtom, Webslices... A set of nice extensions to the HTML4/XHTML1 language that all have the same problem : they tamper with the value set of the class attribute. Before Microformats, the value set of the class attribute was unaltered. A web author could pick up ANY class for his document. Including hfeed, entry-title or entry-content... It's not possible any more. I said it in the past and say it again : despite of my huge interest in microformats, that's bad design. As the HTML5 spec says "Authors may use any value in the class attribute".

That's bad design because a web site author - let's take the example of the hAtom-enabled AOL Sports - has to tweak his content template to enable microformats even though the template often already contains everything  needed for hatom, but with different class names. Tweaking the content template is always dangerous, can lead to errors making the web site's data unavailable to visitors/customers.

A much much better solution is to tweak only the metadata in the document, namely the contents of the HEAD element. So what do we have here to enable some sort of Microformats 2.0 ?

  • the META element is all we need to declare a CSS selector targeting for instance hfeeds or hentrys. For instance:
    <meta scheme="hAtom" name="hentry" content="body .hentry"/>

    I think this is clear enough... We're declaring that for the hAtom microformat, hentry elements are elements matching the CSS3 selector in the content attribute, namely elements included in the BODY of the document and carrying class hentry.

    One could perfectly imagine a much simpler declaration where all entries are DL elements where the first DT element represents the entry title and all following DD elements representing the entry contents... No need to use a class here.

    If you really want to be clear and valid, you need to add a profile attribute to the HEAD element of your document. Oh well.

  • the content attribute above can perfectly be scoped:
    <meta scheme="hAtom" name="hentry" content="body .hentry"/>
    <meta scheme="hAtom" name="entry-title" content=".entry-title"/>

    In that case, the hAtom format could say the entry-title definition is valid only in the scope of elements matching the hentry definition. In other words, entry-title elements are here elements matching the selector body .hentry .entry-title

  • it's trivial for a web page to retrieve the elements specified by such a selector using the W3C Selectors API spec. I urge all browser vendors to implement that spec as soon as possible. it's also trivial to dynamically style them, just reuse the selector in the content attribute, form a style rule that you add to the document through the DOM.
  • since it's CSS-based, it's also trivial to add XBL or XBL 2.0 bindings to those elements, from chrome or page JS.
  • microformats do NOT have to care about each other any more... Each new microformat using the class attribute only introduces new "predefined" and meaningful values for the class attribute. hAtom uses hentry. No other microformat can use it and that's bad design.
Let's take an example, an in-page webslice. Here's the original document, compliant to the current Microsoft webslices "specification":
<html>
<head>
<title>my own in-page webslice example</title>
</head>
<body>
<h1>some dummy title we don't want to see in the webslices</h1>
<div id="#item1345" class="hslice">
<div class="entry-title">Canon EOS 40D</div>
<div class="endtime" title="2008-03-15T11:34:00">Expires tomorrow same time</div>
<dl class="entry-content">
<dt>pixmania.fr</dt> <dd>989,00 EUR</dd>
<dt>ldlc.fr</dt> <dd>1147,20 EUR</dd>
</dl>
</div>
<div id="#item7398" class="hslice">
<div class="entry-title">Canon EOS 5D</div>
<div class="endtime" title="2008-03-15T11:34:00">Expires tomorrow same time</div>
<dl class="entry-content">
<dt>pixmania.fr</dt> <dd>2099,00 EUR</dd>
<dt>ldlc.fr</dt> <dd>2399,00 EUR</dd>
</dl>
</div>
</body>
</html>

Please note how the title attribute is used on endtime entries. I hate it. Browsers will show that ISO date as a tooltip when the pointer hovers over the element.Anyway, the code above could become something like

<html>
<head>
<title>my own in-page webslice example</title>
<meta scheme="MS-webslice" name="hslice" content="div[id^='item']"/>
<meta scheme="MS-webslice" name="entry-title" content="div:first-of-type"/>
<meta scheme="MS-webslice" name="endtime" content=".expiration"/>
<meta scheme="MS-webslice" name="entry-content" content="dl"/>
</head>
<body>
<h1>some dummy title we don't want to see in the webslices</h1>
<div id="#item1345">
<div>Canon EOS 40D</div>
<div><span class="expiration" title="2008-03-15T11:34:00"/>Expires tomorrow same time</div>
<dl>
<dt>pixmania.fr</dt> <dd>989,00 EUR</dd>
<dt>ldlc.fr</dt> <dd>1147,20 EUR</dd>
</dl>
</div>
<div id="#item7398">
<div>Canon EOS 5D</div>
<div class="expiration" title="2008-03-15T11:34:00">Expires tomorrow same time</div>
<dl>
<dt>pixmania.fr</dt> <dd>2099,00 EUR</dd>
<dt>ldlc.fr</dt> <dd>2399,00 EUR</dd>
</dl>
</div>
</body>
</html>

It's totally different and it's still the same thing. It's much cleaner and more powerful. It's safer for the web author. The web author has probably nothing to change in its content template. The web author has total control over his class attribute's values, the structure of the document. Multiple entry-content elements don't need to be all tagged with a class, they can be identified by the context. I also easier to mix multiple microformats in the same document.

I am calling for a Microformats 2.0 effort based on the existence of the W3C Selectors API, and the META element.

Comments

1. On Friday 14 March 2008, 16:35 by Tobu

If you want flexibility and clean separation from the content, there is GRDDL: http://www.w3.org/TR/grddl/#grddl-x... .

2. On Friday 14 March 2008, 18:58 by Stephen Paul Weber

Couldn't get past "Authors may use any value in the class attribute"

With microformats, authors still can. class is space-separated, you can have as many values as you want.

3. On Friday 14 March 2008, 19:10 by Michał Bartoszkiewicz

@Stephen Paul Weber: Yes, you may still use another classes, but you can't use 'hentry' if you don't want the associated microformat's semantic. And ANY class value you use may be redefined to mean something in the future and break your site.

4. On Friday 14 March 2008, 21:51 by FraGag

Sounds like a good idea to me. The need to opt-in for microformats will prevent ill-formed data coming from websites that could have used a class in a new microformat.

However, this may make it impossible for some websites to offer microformats if the HEAD can't be modified. Currently, wikis can show examples but if they don't allow to add META elements to the HEAD, microformats will not be available. (Perhaps plug-ins could be developed that would allow this for wikis, but for some other websites, there isn't as much luck.)

5. On Saturday 15 March 2008, 05:23 by Stephen Paul Weber

Yes, you can, parsers have to be smart enough to see invalid data and not parse it.