EPUB3 fun #3

Clearly, Wysiwyg-editability was, as too often in W3C specs, the last concern on EPUB3 spec editors' mind... EPUB3 is not a W3C spec but an IDPF spec, but the result is the same, unfortunately. I have a concrete example here: in EPUB2, here's how an author is specified in the OPF metadata of the package:

<dc:creator opf:file-as="Murakami, Haruki" opf:role="aut">Haruki Murakami</dc:creator>

That is quite easy to map into a simple and efficient UI.

But in EPUB3, that's pretty different since the properties refining the <dc:creator> element are expressed in standalone <meta> elements using an ID/IREF reference mechanism:

<dc:creator id="mainauthor">Haruki Murakami</dc:creator>
<meta refines="#mainauthor" property="role" scheme="marc:relators" id="role">aut</meta>
<meta refines="#mainauthor" property="alternate-script" xml:lang="ja">村上 春樹</meta>
<meta refines="#mainauthor" property="file-as">Murakami, Haruki</meta>
<meta refines="#mainauthor" property="display-seq">1</meta>

Please note the alternate-script property that was not expressable at all in EPUB2. Please also note the display-seq property that allows to specify an rendering order for the element (in the list of creators). These are cool features but...

  1. any mechanism based on ID/IDREF introduces an extra keyword, the ID of the element ("mainauthor" here)... Asking a non-techie user to provide an ID is clearly suboptimal in terms of UX. That means the editing environment should pick one ID for the author, at the risk of using a human language not understood by the package's author or even a meaningless random ID.
  2. there can be multiple alternate-script properties and that is really, really tricky to offer in a simple and clean UI.
  3. CSS cannot style the above since there is no ID/IDREF mechanism in CSS; I understand that it was difficult to change the content model of DC elements but still, this is really a pity.
  4. the display-seq property  is absolutely suboptimal: since <dc:creator> elements can be refined by a role property, the display sequence should apply for all elements of the same localName having the same role and not only all elements of the same localName; having the display sequence's value in PCDATA instead of inside an attribute on the <dc:creator> itself seems to me a design error. Even worse, the spec says in section 4.3.2 that "When the display-seq property is attached to some, but not all, of the members in a set, only the elements identified as having a sequence should be included in any rendering". Weird!
  5. dealing with such metadata is expensive: for each <dc:creator> found, you have to look for all <meta> elements refining it. Please note the spec does not say what happens to a <meta> element when the IRI in the refines attributes has no target (I think such element should be made invalid).
  6. unless I missed it, I think nothing is said about conflicting properties; for instance a creator having two roles specified, "auth" and ""pmn".

All in all, EPUB3 creator/contributor metadata are painful to deal with and edit, using ID/IDREF mechanisms in a specification (also) made for authoring environments while we have the full power of XML to avoid them seems to me a strategic error.

Don't get me wrong, I do understand why the <meta> and its refines attribute were introduced. But I think the cost to pay for that is too high.

  1. we probably don't need a potentially multivalued alternate-script property. I think a monovalued original-script attribute on <dc:creator> or <dc:contributor> was probably enough.
  2. file-as and role were perfect as attributes in EPUB2. Having the possibility to declare a scheme for role seems to me useless, most package authors will use marc roles anyway for compatibility reasons with EPUB2. file-as meta property is, compared to the file-as attribute of EPUB2, useless bloat in terms of footprint, speed of access, editability and maintainability.
  3. the display-seq property seems to me far from meeting package authors' expectations. Its specification makes it painful to deal with since creators/contributors can become hidden if they have no display sequence while others have one. This property seems to me useless and the ordering of similar elements inside the <metadata> element is most certainly enough. I even bet that this feature will be drastically underused.
  4. all in all, that says the <meta> element of EPUB3 is a suboptimal solution for problems touching only an extreme minority of package authors. The complexity it induces is, in my humble opinion, counter-productive and EPUB2 metadata were better designed and specified.