Tuesday 30 October 2012

EPUB3 fun #12

The EPUB 3 Content Documents spec reads:

The EPUB 3 CSS Profile includes @media and @import rules with media queries as defined in the Media Queries [MediaQueries] specification.

But it says nothing about stylesheets linked through a <link> element and Media Queries ! So, normatively per EPUB 3.0, <link> elements with Media Queries are currently forbidden...

Friday 14 September 2012

EPUB3 fun #11, not all metadata are metadata

Excerpt from EPUB 3 Publications section 4.3.2 (strong emphasize mine):

Cardinality: In the metadata section: zero or more; Attached to other metadata: zero or one

Other metadata? Outside of <metadata>?!? I don't understand. At all.

It can't be related to the META-INF/metadata.xml file since section 2.5.4 of EPUB3 Container Format reads (strong emphasize mine):

This file, if present, must be used for container-level metadata. This version of the OCF specification does not specify any container-level metadata.

And the other sections of the OPF file do not contain metadata but a manifest, a spine and other stuff but no metadata...

I think - but I can't be sure - it in fact means:

Cardinality: as a primary expression (i.e. applies to the Publication): zero or more; as a subexpression (i.e. refining a primary expression or another subexpression: zero or one for the refined expression or subexpression

Anyway, the original prose is very badly worded, to say the least. Carefully reading the spec, I have no idea what are "other metadata" so there is at least an important definition (sic) here that remains undefined...

EPUB3 fun #10, all your chains of ID/IDREF are belong to us

Remember when I said the following?

any mechanism based on ID/IDREF introduces an extra keyword, the ID of the element ("mainauthor" here)... Asking a non-techie user to provide an ID is clearly suboptimal in terms of UX. That means the editing environment should pick one ID for the author, at the risk of using a human language not understood by the package's author or even a meaningless random ID.

Seems the situation is even worse than I expected... The EPUB3 Publications specification explicitly allows chains of subexpressions through ID/REF mechanisms and refines attributes. I guess it'll be clearer with this example:

<dc:identifier id="bookID">
<!-- meta element below refines the dc:identifier element above -->
<meta refines="#bookID"
id="meta-authority1">My metadata Authority</meta>
<!-- link element below refines the meta element above... -->
<link refines="#meta-authority1"

One ID/IDREF is already hard enough to deal with in UI but a chain?!? I can find one single word only to describe such a mechanism in a Standard oriented towards creation of visual products, i.e. oriented towards Wysiwyg-ness and nicely composed UIs: ridiculous. EPUB3 metadata are clearly a deep weakness in the EPUB3 format because of the incredible complexity they introduce in editing environments. Honestly, I am not sure to implement this, it would drastically uglify a UI that is already complex enough.

Tuesday 4 September 2012

BlueGriffon EPUB3 Validator

I am glad to report my EPUB3 Validator add-on to Firefox is now available. A few details about it:

  • JavaScript only
  • not based on epubcheck's code at all, no Java code, does not rely on Java at all
  • my own xml content model validator
  • my own CSS parser and CSS Profile validator
  • Firefox >=12
  • Mac OS X, Windows, Linux
  • Free upgrades for life after purchase.

More information there.

Wednesday 29 August 2012


I have taken all packages available from the epub-samples EPUB3 repository and pushed them through my own EPUB3 validator. Some of these "clean" packages are not that clean, so here are the validation logs. Please note my validator also checks all stylesheets in the package according to the EPUB3 CSS Profile (AFAIK, no other validator is able to do that).

Monday 27 August 2012

EPUB3 fun #8, update

I said a few days ago that validation of EPUB3 Content Documents is impossible. That's actually incorrect and I apologize for that. The PUB3 Content Documents spec normatively cites a RelaxNG schema for XHTML5 markup and dataypes. Still, I have no idea from reading the spec what flavor or xhtml5 it represents... I'm left with a RNG to understand if a given html5 element is implemented or not, and how, in EPUB3. Hum.

Saturday 25 August 2012

EPUB3 fun #9

Validation of meta elements in EPUB3 triggers a few surprises... In fact, that's not the meta elements themselves, it's the properties they carry. Let me explain:

  • EPUB3 meta elements carry a property attribute
  • a property is a keyword optionally preceded by a property vocabulary prefix and a colon
  • EPUB3 Media Overlays use the reserved media prefix mapped to namespace http://www.idpf.org/epub/vocab/overlays/#
  • the Media Overlays says the media:duration property has a cardinality of "exactly one for the Publication and for each Media Overlay"

In fact, that's quite painful and costly to test in the following case:

  1. suppose the namespace above is declared (yes, I know it's already reserved but it's not forbidden to do it !!!) on the package's prefix attribute with prefix foo
  2. suppose we have one <meta property="media:duration"> in the OPF for the Publication
  3. but also a <meta property="foo:duration"> for the same value...
  4. that's invalid per spec, see the last item of the list above

You can't compare the prefixes, that's not enough. You really need to rely on URIs and that's where validation is expensive.

Conclusion: Media Overlays 3.0 have been released with EPUB3. The associated media prefix should be dropped and its related properties should be prefixless in EPUB3. If the issue is the media:narrator property too close to a value in another property vocabulary, it should be changed.

Thursday 23 August 2012


"So Twitter just joined W3C. Should we cut our specs into modules < 140 chars?"

Tuesday 21 August 2012

EPUB3 fun #8

EPUB3 is based on the xml serialization of HTML5. When I say HTML5, I really mean the W3C version. But there is a problem ; that also happens with many other specs normatively referenced by EPUB3 but let's focus on HTML5 right now: the link to the HTML5 normative reference has for href http://www.w3.org/TR/html5/ . In other terms, the last version (WD, LCWD, CR, PR, REC) published by W3C.

What's the version to consider for the implementation of an EPUB3 reader or editor? I have no idea. What has changed between the time the HTML5 WD was normatively referenced and now? I have no idea. What if for instance an "at-risk" feature is dropped in a future WD of the spec? I have no idea.

This is not only a theoretical issue, it has a deep and immediate practical impact: validation of the Content Documents inside an EPUB3 ebook is impossible. More globally, full validation of an EPUB3 package is then impossible.

EPUB3 fun #7

Hum, yet another bug in an EPUB3 spec... Excerpt (verbatim, nothing added or removed) from section of EPUB Content Documents 3.0:

The prefix attribute

The prefix attribute definition is unchanged, but the attribute is defined to be in the namespace http://www.idpf.org/2007/ops when used in Content Documents.

Unchanged from what?!?

Friday 10 August 2012

EPUB3 fun #6

In the EPUB ebook world, there's one sentence I heard so many times from so various sources I did not even feel the need to verify it:

"Validation of EPUB is extremely important and we heavily rely on the EPUB Validator"

First thought: "excellent!". People need to distribute validated packages because they don't want to suffer big issues on the some of the many ebook readers on the market. Excellent.

Unfortunately, after a closer look, the situation seems to me a bit different from that idealistic (I could even say utopian) view... Let me explain (please note I have spent time carefully reading the epubcheck source code for instance):

  • if you count the number of major Standards (de jure, de facto, proprietary) involved in the validation of a given *.epub EPUB3 file, you'll find a few dozens of them. And again, only the major ones, the ones a serious industrial validator must absolutely validate against.
  • some validations are complex and expensive, for instance a serious validation of encryption.xml or signatures.xml will involve much more than just a RNG-based validation of the XML instance... Validation of external property vocabularies can be extremely tricky or painful/expensive to implement.
  • the first step of validation is related to the ZIP package itself, and the epub30-ocf spec has a few very technical requirements there. I sincerely doubt all of them are validated, I sincerely doubt EPUB3 packages all around the world currently pass or will pass all the conformance requirements there.
  • only in the W3C space, an EPUB3 validator must at least validate against a dozen of specs.
  • the complexity of some of these specs is huge, drastically impacting users (ebook authors) either on a learning curve's basis or on a financial one. Or both.

EPUB3 is probably too complex as a spec. In EPUB2, most people did not understand the difference between spine, ncx and guide. Most common question was "why do we have multiple table of contents and which one is the good one?". In EPUB3, it's a bit better but only a bit. We have landmarks, multiple table of contents and still a spine. Personally I still wonder why there is a manifest of files; probably only because of the MIME types. Hey, even OSes rely on file extensions to infer a MIME type!!!

I'd love to see appear an EPUB4 strictly xhtml5/svg/mathml-based. No other XML namespace allowed. No more OCF, OPF, NCX. Manifest of files coming from the ZIP list of entries. Direct inclusion of non-conflicting property vocabularies. No need to have an OCF, we can have a nice index.xhtml. Rely on a vocabulary of classes/IDs/roles (not the ARIA role...) expressing extra constraints/behaviours on existing html5 elements. Would be enough for most ebook authors and publishers and would drastically decrease the complexity of the publishing chain, IMHO.

Wednesday 8 August 2012

EPUB3 fun #5

Dates and time... Dates and time are painful, dates and time are complex, "dates and time are the most complex objects we use on a daily basis" used to say Reuters' Misha Wolf in the HTML WG in the good ol'days.

Excerpt from section 2.2.7 of OPF 2.01:

The date element has one optional OPF event attribute. The set of values for event are not defined by this specification; possible values may include: creation, publication, and modification.

Excerpt from section 3.4.6 of EPUB Publications 3.0:

The date element must only be used to define the publication date of the EPUB Publication.
Only one date element is allowed.

Ahem... So EPUB3 allows only two dates for a package: the publication date and the last-modification date (through a <meta property="dcterms:modified"> element). It's impossible to preserve the date of creation or various modification dates. Is it only me or this change from EPUB2 to EPUB3 is weird?

Please note that to insert a publication date into an OPF document ready for distribution, you must modify it and rezip the whole package. So if an OPF document does have a <dc:date> publication element, its content value must be equal to the last-modified date!!! In other terms, this is totally useless. Wow.

Tuesday 7 August 2012

EPUB3 fun #4

Excerpt from section 4.2.4 from EPUB Publications 3.0:

<package …
prefix="foaf: http://xmlns.com/foaf/spec/
dbp:  http://dbpedia.org/ontology/">

Excerpt from an unnumbered (sic) section of EPUB3 Fixed Layout Documents:

Implementors should note that future revisions of [Publications30] may establish the vocabulary represented by the URI http://www.idpf.org/vocab/rendition/# as a reserved vocabulary. In this case, the result will be that a) explicit mapping declaration using the prefix attribute will no longer be applicable, and b) the prefix ‘rendition’ will be reserved for this vocabulary. Future revisions of [Publications30] may also integrate the properties defined here into the Package Document default vocabulary. In this case the properties defined herein will be allowed to occur in Package Documents without a prefix.

In other terms, an attribute prefix="foobar: http://www.idpf.org/vocab/rendition/#" is needed on the package element for the time being to be able to parse <meta property="foobar:orientation">. But we're already warned that that foobar may become rendition w/o the need to declare the corresponding prefix on the package element. And well, even rendition could be itself dropped in the future. So in the long run, it will probably be <meta property="orientation"> w/o property prefix or prefix declaration.

<troll>I think it is not complex enough. Prefixes should be declared by extra meta elements themselves using a prefixed property, just to be able to have a few circular references...</troll>

Hey guys, why don't you finish shaking the shaker and get rid of the rendition prefix for good right now? Just for the record, I have in front of me here two EPUB3 documents, the first one uses the rendition property prefix without prefix declaration, the second uses the rendering property prefix instead of rendition but since the URI is ok, this is valid. In other terms, it's a mess already... I'm sure many epub3 Readers will look for rendition and only rendition, failing to correctly catch and resolve the prefix at all. Of course, the situation will get worse if the prefix becomes reserved or even integrated ! Pfff....

Monday 6 August 2012

EPUB3 fun #3

Clearly, Wysiwyg-editability was, as too often in W3C specs, the last concern on EPUB3 spec editors' mind... EPUB3 is not a W3C spec but an IDPF spec, but the result is the same, unfortunately. I have a concrete example here: in EPUB2, here's how an author is specified in the OPF metadata of the package:

<dc:creator opf:file-as="Murakami, Haruki" opf:role="aut">Haruki Murakami</dc:creator>

That is quite easy to map into a simple and efficient UI.

But in EPUB3, that's pretty different since the properties refining the <dc:creator> element are expressed in standalone <meta> elements using an ID/IREF reference mechanism:

<dc:creator id="mainauthor">Haruki Murakami</dc:creator>
<meta refines="#mainauthor" property="role" scheme="marc:relators" id="role">aut</meta>
<meta refines="#mainauthor" property="alternate-script" xml:lang="ja">村上 春樹</meta>
<meta refines="#mainauthor" property="file-as">Murakami, Haruki</meta>
<meta refines="#mainauthor" property="display-seq">1</meta>

Please note the alternate-script property that was not expressable at all in EPUB2. Please also note the display-seq property that allows to specify an rendering order for the element (in the list of creators). These are cool features but...

  1. any mechanism based on ID/IDREF introduces an extra keyword, the ID of the element ("mainauthor" here)... Asking a non-techie user to provide an ID is clearly suboptimal in terms of UX. That means the editing environment should pick one ID for the author, at the risk of using a human language not understood by the package's author or even a meaningless random ID.
  2. there can be multiple alternate-script properties and that is really, really tricky to offer in a simple and clean UI.
  3. CSS cannot style the above since there is no ID/IDREF mechanism in CSS; I understand that it was difficult to change the content model of DC elements but still, this is really a pity.
  4. the display-seq property  is absolutely suboptimal: since <dc:creator> elements can be refined by a role property, the display sequence should apply for all elements of the same localName having the same role and not only all elements of the same localName; having the display sequence's value in PCDATA instead of inside an attribute on the <dc:creator> itself seems to me a design error. Even worse, the spec says in section 4.3.2 that "When the display-seq property is attached to some, but not all, of the members in a set, only the elements identified as having a sequence should be included in any rendering". Weird!
  5. dealing with such metadata is expensive: for each <dc:creator> found, you have to look for all <meta> elements refining it. Please note the spec does not say what happens to a <meta> element when the IRI in the refines attributes has no target (I think such element should be made invalid).
  6. unless I missed it, I think nothing is said about conflicting properties; for instance a creator having two roles specified, "auth" and ""pmn".

All in all, EPUB3 creator/contributor metadata are painful to deal with and edit, using ID/IDREF mechanisms in a specification (also) made for authoring environments while we have the full power of XML to avoid them seems to me a strategic error.

Don't get me wrong, I do understand why the <meta> and its refines attribute were introduced. But I think the cost to pay for that is too high.

  1. we probably don't need a potentially multivalued alternate-script property. I think a monovalued original-script attribute on <dc:creator> or <dc:contributor> was probably enough.
  2. file-as and role were perfect as attributes in EPUB2. Having the possibility to declare a scheme for role seems to me useless, most package authors will use marc roles anyway for compatibility reasons with EPUB2. file-as meta property is, compared to the file-as attribute of EPUB2, useless bloat in terms of footprint, speed of access, editability and maintainability.
  3. the display-seq property seems to me far from meeting package authors' expectations. Its specification makes it painful to deal with since creators/contributors can become hidden if they have no display sequence while others have one. This property seems to me useless and the ordering of similar elements inside the <metadata> element is most certainly enough. I even bet that this feature will be drastically underused.
  4. all in all, that says the <meta> element of EPUB3 is a suboptimal solution for problems touching only an extreme minority of package authors. The complexity it induces is, in my humble opinion, counter-productive and EPUB2 metadata were better designed and specified.

EPUB3 fun #2

Aaaaah, forward compatibility... Here are two interesting excerpts from the EPUB Publications 3.0 specification.

In section 3.4.14: Authors may include the guide element in the Package Document for EPUB 2 Reading System forwards compatibility purposes

In section 3.4.1: The (version) attribute (of the package element) must have the value 3.0 to indicate compliance with this version of the specification.

Let's look now at two bits coming from the Open Packaging Format (OPF) 2.0.1 specification:

In section 1.4.12: the version attribute of the package element is specified with a value of 2.0

In section 1.3.2: In addition, to be processed as an OPF 2.0 package, a version attribute with a value of 2.0 must be specified on the package element. A package element that omits the version attribute must be processed as an OEBPS 1.2 package

In summary, <package version="3.0"> is absolutely needed for an EPUB to be parsed as EPUB3 and EPUB2 absolutely requires <package version="2.0"> since the absence of the version attribute defaults to OEBPS 1.2!!! Conclusion, "EPUB 2 Reading System forwards compatibility" described in the first quote above is absolutely illusory. Even worse, compatibility can be achieved if and only if EPUB Reading Systems deliberately ignore the version attribute on the package element... Ooops, to say the least.

EPUB3 fun #1

According to the section 3.4.6 of the EPUB Publications 3.0 spec, only one <dc:source> element is allowed in the OPF file for an EPUB3 package; it specifies a primary metadata expression. But according to section 3.4.7, it's also possible to use a <meta> element (in the http://www.idpf.org/2007/opf namespace, not the html one...) to specify a primary expression for the same source of the package. So


is not valid per spec but since nothing appears to be said in the spec about uniqueness of definition for primary expressions in the spec

<meta property="dcterms:source">urn:isbn:9780375704025</meta>

is then perfectly valid... Of course, the behaviour of EPUB3 browsing or editing environments in that case is undefined :-(

And guess what? I have in front of me right now a "EPUB3" book having both<dc:source> and <meta property="dcterms:source"> elements. Lovely...

Tuesday 19 June 2012

Printing a web page #2

So the issue I described yesterday does not seem to hit all platforms. But it's still painful to me on my Mac so, as usual, I spent a few minutes hacking a Firefox add-on. It adds a new menu item to the File menu to print "as on screen" and it does exactly what it says.

It's a 0.9 but I tested on a dozen of web sites having very specific @media print styles and it works as expected. Auto-update is not enabled yet, I'll enable it for 1.0. Enjoy!

Monday 18 June 2012

Printing a web page

I hit a problem printing two well-known (in my personal environment) web pages this morning. I needed to print the W3C's FAQ and the home page of ERCIM, the european foot of the W3C. Because both have @media print stylesheets, you can see in the following screenshots a comparison between the screen rendering of the web pages and the corresponding print results:

W3C screenshot and print

Oops, I really needed the logo and a better presentation but the print stylesheet hides it (I checked the print option to have it printed)

ERCIM home page screenshot and print

Wow, that's a bit of a minimal print stylesheet :-(

So I needed a print of the web pages "as on the screen". But the print dialog of my browsers (I tested Firefox, Safari and Chromium on my Mac) has no such option:

system print dialog

Yeah, yeah, I know, I can still make a screenshot of the pages, open them in my browser and ask the browser to print the screenshot. But if *I* can do it, that's overkill for the average non-geeky user. The web page could also have alternate print stylesheets but we hit a similar problem: there's no way to select them and such a UI would be certainly too complex to many users... A "Print as on screen" option is clearly missing here and I wish Microsoft, Apple and Linux distros could add it to their respective system print dialog browser vendors could add such a new menu entry living next to the existing "Print" menu item.

Tuesday 10 April 2012

A floating table of contents for W3C specs

If like me you spend a lot of time reading, reviewing, navigating into W3C specs, you have probably also spent a lot of time scrolling to find the table of contents of these documents. Again, again and again. So many times you're probably fed up with it, like I am. In that case, this add-on for Firefox is for you. Install it and browse your favorite W3C spec. When it's loaded, look for the small box in the top-left corner. Make your mouse pointer hover over it to open the floating table of contents. Click anywhere (including on a link) in the expanded box to shrink it back. Enjoy :-)

Update: v1.1 is here. Next updates will be automatic.

Friday 10 February 2012

Blaming CSS WG is too easy, Brendan

Brendan, I have the highest respect for you and you do know it. But I cannot let you affirm in public the current situation about prefixes was caused first by the CSS Working Group.

  1. the CSS Working Group gathers you guys, browsers vendors. The Working Group decides nothing "by itself", it's an empty shell where browser vendors discuss, sometimes fight, and reach consensus. If a consensus was reached about introducing vendor prefixes in the past, the MEMBERSHIP of the Group is fully responsible for it and that DOES include Mozilla, or more probably Netscape at that time but eh we were all there at that time.
  2. even the W3C Process is decided by the W3C Members. The CSS WG is totally out of scope here. It's like saying you blame an airport because you don't like showing an ID at the airport, the airport being bound by national and international regulations. Nothing it can do about it. Have someone at the W3C Advisory Board (you used to have Arun) or call for Process changes in a constructive way. But please, tell Mozilla reps to stop asking for Process changes in the CSS WG when the bits to change are NOT in our hands. That too sucks our time.
  3. the CSS Working group kept drafts under its wings too often not because of pains caused by the W3C Process but because of the perfectionism of the WG Members. And they include Mozilla. People who criticize W3C say it lacks pragmatism, but trust me on that please, I too often said "perfectionism on CSS 2.1 sucks our time for CSS 3". CSS 2.1 took too many years to emerge also because you browser vendors wanted the most perfect spec and could never end the loop.
  4. let's take CSS Variables, a top request by the whole CSS Community since 1997. Dave Hyatt and I had a complete, simple, nice, understandable, easily implementable proposal, Dave even implemented it in WebKit and shipped it. Finally removed it because other browser vendors ended up in endless discussions on the feature itself. The CSS WG is not guilty here. The browser vendors are. They argued and argued and argued ad nauseum, for something that could have hit browsers FOUR YEARS AGO and we still don't even have an alternative. Don't blame the CSS WG, blame your representatives to the CSS WG. Where is the lack of pragmatism and the lack of speed?
  5. when the whole mobile Web is full of a feature implemented by WebKit, documented by Apple in two lines only on their developer's web site, protected by IPRs owned by Apple, never submitted by any browser vendor to the Working Group, and when Microsoft, Mozilla and Opera want to implement that super feature because lack of it harms their market share, don't blame the CSS WG. Fight the legal issue about IPR, implement a counter-measure legally feasible we could rapidly standardize or get Apple to submit their feature to the WG. But you just cannot tell the WG is guilty in any way here.
  6. I have said dozens and dozens of times in the CSS WG that prefixes are a terrible burden on Web Authors' shoulders. It's only recently (relatively to the age of the WG) that we adopted a rule allowing us to get rid of prefixes when we think a feature is stable enough. But this rule comes, again, from CONSENSUS AMONG BROWSER VENDORS. The Working Group here is nothing more but a meeting room. Each time a new proposal to simplify (or get rid of) prefixes was submitted in the past, one browser vendor at least objected. So what should we do? Break consensus? On what basis? My chair's hat?!? Playing Heads or Tails? Even with the new rule, discussions about the stabilization of a given feature and removal of prefixes always lead to one browser vendor objecting to the proposal supported by the others. And it's not always the same browser vendor. How can you say here the WG is first guilty?
  7. as far as I know, the list of prefixed (and even sometimes unprefixed while they should be prefixed!) properties implemented by WebKit is the following one:

    many of them being "documented" by Apple. Some are well known, come from a CSS WG spec/draft/document, and have their counterparts in Gecko, Presto and Trident. Some are totally unknown to us and were never submitted for standardization. So again, don't blame the CSS WG for prefixed properties that NEVER reached us. They are going to hit the market and spread because of a company, not because of a standards body. For the ones you implement and people don't use, well, Apple has nice editing tools, nice IDEs. I told you a loooong time ago that a browser is only the top of the ecosystem's iceberg and that editing is still a major part of it.

  8. we already have an existing case with extremely ugly de facto standardization coming from WebKit mobile market dominance, and that's the infamous meta viewport tag. I heard myself some Apple engineers acknowledge it was a rather big mistake. We have CSS equivalents for this on our radar, submitted by Opera. We could have moved towards a de jure solution faster here but apparently there is little interest shown by browser vendors if you except of course Opera. Blame again the CSS WG for that? Not you, Brendan, please. You know too well standardization for that.
  9. all of that said, please tell me how you are going to get rid of the -webkit-* prefix (or even implement the prefixed version in Moz) for -webkit-text-size-adjust given the IPRs?

It is too easy to always beat the CSS WG or the W3C. If you recall correctly, I have been one of the first ones to shout at the W3C in the XHTML2 fiasco. It was so early that Netscape still existed. I am still the first one to say it when they mess things up. But here, that's just unfair. I only agree that the CSS Working Group could have done better but hey, the Working Group members - hear YOU BROWSER VENDORS - decided there. So putting that organization at the top of your recriminations just carries a false message and hides an inconvenient truth.

Mozilla and Opera gave hand to Apple to create the WHAT-WG. You guys can speak together, for instance when your high-level representatives had a private secret meeting at Apple while the CSS WG was by pure coincidence in the meeting room on the other side of the corridor, was it november 2010. Meet again and let Apple know it must submit to the corresponding WGs technical proposals for the features that hit the Web and spread too wide to remain proprietary. Apple too must preserve the Open Web. So it says for HTML and APIs, so why not for CSS?

- page 2 of 12 -