<Glazblog/>

Search

Your search for epub3 returned 22 results.

Thursday 1 February 2018

LibreOffice and EPUB

LibeOffice 6.0 is now available. And it's through the inevitable Korben I discovered this morning it has a builtin EPUB export. So let's take a closer look at that new beast and evaluate how it deals with that painful task. Conformant EPUB? And which version of EPUB? Reusable XHTML and CSS? We'll see.

After installation (on a Mac), I created a new trivial text document; it contains a paragraph, a level 1 header, an image, a table, and a unordered list of three items. I did not touch at all fonts, styles, margins, etc.

Trivial text document in LibreOffice 6.0

Then I discovered LibreOffice now has two new menu items: File > Export As... > Export directly as EPUB and File > Export As... > Export as EPUB... .

Export directly as EPUB

It directly opens a filepicker to select a destination *.epub file. Let's unzip the saved package and take a look at its guts:

  • the mimetype file is correctly placed as first file in the package and it's correctly stored without compression
  • other files are correctly stored using Deflate
  • the META-INF/container.xml is stored in last position in the zip, which is probably a mistake
  • the OPF file says it's a EPUB 3.0 package and its metadata are clean ; AFAICT, the OPF file is conformant to the spec
  • XML and XHTML files in the package are serialized without carriage returns (if you except one after the XML prolog) or indentation...
  • a NCX is present
  • the Navigation Document (called toc.xhtml) and the NCX live side by side in a OEBPS folder (sigh)
  • there is a  empty OEBPS/styles/stylesheet.css file
  • the content files are in a OEBPS/sections folder
  • that folder contains 2 files (!) section001.xhtml and section002.xhtml
  • looking at these files, LibreOffice seems to have split the original document at section breaks, hence the two sections found in the EPUB package
  • there is no title element in these files
  • there is clearly a problem with exported CSS styles, the body of each generated document having no margins, paddings. And since there is no CSS-reset either...
  • the set of LibreOffice styles (the leftmost dropdown in the toolbar) are not exported to CSS; the whole export relies on CSS inline styles (style attributes) and not on classes
  • the original document uses the "Liberation Serif" font, that is not registered under that name into the OS X fontbook (old issue well known in the OOXML world...). The final rendition in a browser is then buggy, font-wise. The font-family declarations in the document don't use a fallback to serif.
  • there is a very weird font-effect: outline property serialized on all paragraphs in table cells
  • strangely again, all these paragraphs have text-decoration: overline; text-shadow: 1px 1px 1px #666666; while the original text is not overlined nor shadowed
  • when a paragraph (a p in terms of OOXML) contains one single run of text (a r in OOXML), the output could be optimized getting rid of a span and adding its inline styles to the parent paragraph. The output is too verbose and will trigger issues in html editors, Wysiwyg or not.
  • the margin values in the document use a mix of inches and pixels, which is kind of weird
  • the image in the original document is lost in the EPUB package
  • headers are not generated as h1, h2, ... but as p elements with styles.
  • the EPUB version does not correctly deal with the unordered list and all list items become regular paragraphs. No ol or ul, bullet, no counter, no list-style-type. Semantics is lost.

Firefox Quantum viewing the resulting section002.xhtml file. You can clearly see where the html+CSS export is buggy:

Firefox Quantum viewing the resulting section002.xhtml file

How iBooks sees that EPUB:

how iBooks sees that EPUB

Export as EPUB...

Aaah, that one is quite different since it first opens the following dialog:

Export as EPUB... dialog

The dialog offers the following choices:

  1. export as EPUB2 or EPUB3 (nice!)
  2. Split at Page Breaks or Headings (very nice feature but why not also a "Don't split" option?)

and validating the dialog goes to the aforementioned *.epub filepicker.

Conclusion

This is an excellent start, really, and splitting the document at headers or page breaks is an excellent idea. Unfortunately, there are too many holes in the xhtml+CSS export at this time to make it really usable unless your document has almost unstyled paragraphs only. Some generated styles (overline?!?) are not present in the original document, it generates only paragraphs and tables losing the header or list semantics, the LibreOffice styles are not serialized in a CSS stylesheet (bug?) and more. This will help some individuals but I am not sure it will help EPUB publication chains, at least for now.

Update: Wow. I added an extra test: I compared the result of "Export to XHTML" and the XHTML inside a "Export to EPUB". In the former, styles are correctly exported as a stylesheet, classes are correctly used, h1 and ol/li are correctly used, the image is preserved, and the general rendering is MUCH better. So the Export to EPUB has one of the two following problems: it reuses the "Export to XHTML" code and splitting introduced a lot of bugs, OR it has its own export-to-xhtml code and it's a mistake since the existing one does quite a decent job...

Second update: LibreOffice's trunk does a significantly better job: stylesheet is correctly generated, xhtml files are CR'd and indented correctly, images are preserved.

Thursday 18 January 2018

Announcing WebBook Level 1, a new Web-based format for electronic books

TL;DR: the title says it all, and it's available there.

Eons ago, at a time BlueGriffon was only a Wysiwyg editor for the Web, my friend Mohamed Zergaoui asked why I was not turning BlueGriffon into an EPUB editor... I had been observing the electronic book market since the early days of Cytale and its Cybook but I was not involved into it on a daily basis. That seemed not only an excellent idea, but also a fairly workable one. EPUB is based on flavors of HTML so I would not have to reinvent the wheel.

I started diving into the EPUB specs the very same day, EPUB 2.0.1 (released in 2009) at that time. I immediately discovered a technology that was not far away from the Web but that was also clearly not the Web. In particular, I immediately saw that two crucial features were missing: it was impossible to aggregate a set of Web pages into a EPUB book through a trivial zip, and it was impossible to unzip a EPUB book and make it trivially readable inside a Web browser even with graceful degradation.

When the IDPF started working on EPUB 3.0 (with its 3.0.1 revision) and 3.1, I said this was coming too fast, and that the lack of Test Suites with interoperable implementations as we often have in W3C exit criteria was a critical issue. More importantly, the market was, in my opinion, not ready to absorb so quickly two major and one minor revisions of EPUB given the huge cost on both publishing chains and existing ebook bases. I also thought - and said - the EPUB 3.x specifications were suffering from clear technical issues, including the two missing features quoted above.

Today, times have changed and the Standards Committee that oversaw the future of EPUB, the IDPF, has now merged with the World Wide Web Consortium (W3C). As Jeff Jaffe, CEO of the W3C, said at that time,

Working together, Publishing@W3C will bring exciting new capabilities and features to the future of publishing, authoring and reading using Web technologies

Since the beginning of 2017, and with a steep acceleration during spring 2017, the Publishing@W3C activity has restarted work on the EPUB 3.x line and the future EPUB 4 line, creating a EPUB 3 Community Group (CG) for the former and a Publishing Working Group (WG) for the latter. If I had some reservations about the work division between these two entities, the whole thing seemed to be a very good idea. In fact, I started advocating for the merger between IDPF and W3C back in 2012, at a moment only a handful of people were willing to listen. It seemed to me that Publishing was a underrated first-class user of Web technologies and EPUB's growth was suffering from two critical ailments:

  1. IDPF members were not at W3C so they could not confront their technical choices to browser vendors and the Web industry. It also meant they were inventing new solutions in a silo without bringing them to W3C standardization tables and too often without even knowing if the rendering engine vendors would implement them.
  2. on another hand, W3C members had too little knowledge of the Publishing activity, that was historically quite skeptical about the Web... Working Groups at W3C were lacking ebook expertise and were therefore designing things without having ebooks in mind.

I was then particularly happy when the merger I advocated for was announced.

As I recently wrote on Medium, I am not any more. I am not convinced by the current approach implemented by Publishing@W3C on many counts:

  • the organization of the Publishing@W3C activity, with a Publishing Business Group (BG) formally ruling (see Process section, second paragraph) the EPUB3 CG and a Steering Committee (see Process section, first paragraph) recreated the former IDPF structure inside W3C.  The BG Charter even says that it « advises W3C on the direction of current and future publishing activity work » as if the IDPF and W3C did not merge and as if W3C was still only a Liaison. It also says « the initial members of the Steering Committee shall be the individuals who served on IDPF’s Board of Directors immediately prior to the effective date of the Combination of IDPF with W3C », maintaining the silo we wanted to eliminate.
  • the EPUB3 Community Group faces a major technical challenge, recently highlighted by representatives of the Japanese Publishing Industry: EPUB 3.1 represents too much of a technical change compared to EPUB 3.0.1 and is not implementable at a reasonable cost in a reasonable timeframe for them. Since EPUB 3 is recommended by the Japanese Government as the official ebook format in Japan, that's a bit of a blocker for EPUB 3.1 and its successors. The EPUB3 CG is then actively discussing a potential rescindment of EPUB 3.1, an extraction of the good bits we want to preserve, and the release of a EPUB 3.0.2 specification based on 3.0.1 plus those good bits. In short, the EPUB 3.1 line, that saw important clarifying changes from 3.0.1, is dead.
  • the Publishing Working Group is working on a collection of specifications known as Web Publications (WP), Packaged Web Publications (PWP), and EPUB 4. What these specifications represent is extremely complicated to describe. With a daily observation of the activities of the Working Group, I still can't firmly say what they're up to, even if I am already convinced that some technological choices (for instance JSON-LD for manifests) are highly questionable and do not « lead Publishing to its full Web potential », to paraphrase the famous W3C motto. It must also be said that the EPUB 3.1 hiatus in the EPUB3 CG shakes the EPUB 4 plan to the ground, since it's now extremely clear the ebook market is not ready at all to move to yet another EPUB version, potentially incompatible with EPUB 3.x (for the record, backwards-compatibility in the EPUB world is a myth).
  • the original sins of EPUB quoted above, including the two missing major features quoted in the second paragraph of the present article, are a minor requirement only. Editability of EPUB, one of the greatest flaws of that ecosystem, is still not a first-class requirement if not a requirement at all. Convergence with the Web is severely encumbered by personal agendas and technical choices made by one implementation vendor for its own sake; the whole W3C process based on consensus is worked around not because there is no consensus (the WG minutes show consensus all the time) but mostly beacause the rendering engine vendors are still not in the loop and their potential crucial contributions are sadly missed. And they are not in the loop because they don't understand a strategy that seems decorrelated from the Web; the financial impact of any commitment to the Publishing@W3C is then a understandable no-go.
  • the original design choices of EPUB, using painful-to-edit-or-render XML dialects, were also an original sin. We're about to make the same mistake, again and again, either retaining things that partly block the software ecosystem or imagining new silos that won't be editable nor grokable by a Web Browser. Simplicity, Web-centricity and mainstream implementations are not in sight.

Since the whole organization of Publishing @W3C is governed by the merger agreement between IDPF and W3C, I do not expect to change anyone's mind with the present article. I only felt the need to express my opinion, in both public and private fora. Unsurprisingly, the feedback to my private warnings was fairly negative. In short, it works as expected and I should stop spitting in the soup. Well, if that works as expected, the expectations were pretty low, sorry to say, and were not worth a merger between two Standard Bodies.

I have then decided to work on a different format for electronic books, called WebBook. A format strictly based on Web technologies and when I say "Web technologies", I mean the most basic ones: html, CSS, JavaScript, SVG and friends; the class of specifications all Web authors use and master on a daily basis. Not all details are decided or even ironed, the proposal is still a work in progress at this point, but I know where I want to go to.

I will of course happily accept all feedback. If people like my idea, great! If people disagree with it, too bad for me but fine! At least during the early moments of my proposal, and because my guts tell me my goals are A Good Thing™️, I'm running this as a Benevolent Dictator, not as a consensus-based effort. Convince me and your suggestions will make it in.

I have started from a list of requirements, something that was never done that way in the EPUB world:

  1. one URL is enough to retrieve a remote WebBook instance, there is no need to download every resource composing that instance

  2. the contents of a WebBook instance can be placed inside a Web site’s directory and are directly readable by a Web browser using the URL for that directory

  3. the contents of a WebBook instance can be placed inside a local directory and are directly readable by a Web browser opening its index.html or index.xhtml topmost file

  4. each individual resource in a WebBook instance, on a Web site or on a local disk, is directly readable by a Web browser

  5. any html document can be used as content document inside a WebBook instance, without restriction

  6. any stylesheet, replaced resource (images, audio, video, etc.) or additional resource useable by a html document (JavaScript, manifests, etc.) can be used inside the navigation document or the content documents of a WebBook instance, without restriction

  7. the navigation document and the content documents inside a WebBook instance can be created and edited by any html editor

  8. the metadata, table of contents contained in the navigation document of a WebBook instance can be created and edited by any html editor

  9. the WebBook specification is backwards-compatible

  10. the WebBook specification is forwards-compatible, at the potential cost of graceful degradation of some content

  11. WebBook instances can be recognized without having to detect their MIME type

  12. it’s possible to deliver electronic books in a form that is compatible with both WebBook and EPUB 3.0.1

I also made a strong design choice: the Level 1 of the specification will not be a fit-all-cases document. WebBook will start small, simple and extensible, and each use case will be evaluated individually, sequentially and will result in light extensions at a speed the Publishing industry can bear with. So don't tell me WebBook Level 1 doesn't support a given type of ebook or is not at parity level with EPUB 3.x. It's on purpose.

With that said, the WebBook Level 1 is available here and, again, I am happily accepting Issues and PR on github. You'll find in the spec references to:

  • « Moby Dick » released as a WebBook instance
  • « Moby Dick » released as a EPUB3-compatible WebBook instance
  • a script usable with Node.js to automagically convert a EPUB3 package into a EPUB3-compatible WebBook

My EPUB Editor BlueGriffon is already modified to deal with WebBook. The next public version will allow users to create EPUB3-compatible WebBooks.

I hope this proposal will show stakeholders of the Publishing@W3C activity another path to greater convergence with the Web is possible. Should this proposal be considered by them, I will of course happily contribute to the debate, and hopefully the solution.

Tuesday 3 January 2017

One BlueGriffon

There are - there were - two BlueGriffons: the Web Editor (aka BlueGriffon) and the EPUB2/EPUB3 editor (aka BlueGriffon EPUB Edition). With forthcoming v2.2, these are going to merge into a single product so they will never be out of sync again. Technically, the merge is already achieved and the next days will be spent on testing/ironing. Licenses for each product will work with that v2.2. BlueGriffon licenses will, as before, enable the commercial features of the product while licenses of the EPUB Edition will enable the commercial features AND EPUB2/EPUB3 editing. We'll also add an upgrade at special cost from the former to the latter. Stay tuned!

BlueGriffon 2.2

Tuesday 26 February 2013

EPUB NG

Following the W3C Workshop on electronic books in NYC two weeks ago, Dave Cramer (Hachette), Hadrien Gardeur (Feedbooks) and myself (Disruptive Innovations) have started a new Google Group called EPUB NG. Don't misunderstand us, it's called EPUB New Generation only because we needed a name and we start from what's available on the market right now, EPUB3. We're not forking, we're not doing a secret thing, we only needed a space where we could start discussions about the largest issues I found in current specs and what Dave recently called EPUB Zero.

So if you're interested in throwing ideas about a new, simpler, lighter format for electronic books more in line with W3C standards and Web habits, start reading us and ping one of us to request an invite. Please detail your affiliation and background in the electronic books' space? Thanks!

Tuesday 30 October 2012

EPUB3 fun #12

The EPUB 3 Content Documents spec reads:

The EPUB 3 CSS Profile includes @media and @import rules with media queries as defined in the Media Queries [MediaQueries] specification.

But it says nothing about stylesheets linked through a <link> element and Media Queries ! So, normatively per EPUB 3.0, <link> elements with Media Queries are currently forbidden...

Tuesday 23 October 2012

BlueGriffon EPUB Edition released

I am happy and proud to let you know I just released BlueGriffon EPUB Edition, the very first Wysiwyg cross-platform editor for EPUB2 and EPUB3 ebooks offering full UI-based control on EPUB packages, including all of EPUB2 and EPUB3 metadata. A more complete list of features and screenshots are available. Implemented with a permanent requirement of conformance to the IDPF specifications, it took more than a year to emerge from the original BlueGriffon Web editor because it's not only some light zip-package management over (x)html editing. EPUB metadata are complex to author and edit, very complex, and it's not a surprise to me if BlueGriffon EPUB Edition is alone right now in that segment of the market.

On the Mozilla side, this is quite good news I must say. Most current EPUB readers and authoring tools are based on WebKit or the rendering engine inside Apple Pages. BlueGriffon EPUB Edition shows that Gecko is a 100% viable solution as a rendering engine for EPUB. It also shows that XUL is still a superb technology allowing very complex consumer- or business-oriented applications.

To celebrate the launch of BlueGriffon EPUB Edition, a discount coupon is available until the 25th 00:00am Pacific time, giving you a 25€ discount per license purchased.

Tuesday 9 October 2012

BlueGriffon EPUB Edition ETA

If everything goes well, BlueGriffon EPUB Edition will be released 23-oct-2012 !

OS X, Windows, Linux. Full EPUB2 and EPUB3 metadata support. epub:type support. CSS Pro Editor. SVG Editor. And more ! Stay tuned !

Friday 14 September 2012

EPUB3 fun #11, not all metadata are metadata

Excerpt from EPUB 3 Publications section 4.3.2 (strong emphasize mine):

Cardinality: In the metadata section: zero or more; Attached to other metadata: zero or one

Other metadata? Outside of <metadata>?!? I don't understand. At all.

It can't be related to the META-INF/metadata.xml file since section 2.5.4 of EPUB3 Container Format reads (strong emphasize mine):

This file, if present, must be used for container-level metadata. This version of the OCF specification does not specify any container-level metadata.

And the other sections of the OPF file do not contain metadata but a manifest, a spine and other stuff but no metadata...

I think - but I can't be sure - it in fact means:

Cardinality: as a primary expression (i.e. applies to the Publication): zero or more; as a subexpression (i.e. refining a primary expression or another subexpression: zero or one for the refined expression or subexpression

Anyway, the original prose is very badly worded, to say the least. Carefully reading the spec, I have no idea what are "other metadata" so there is at least an important definition (sic) here that remains undefined...

EPUB3 fun #10, all your chains of ID/IDREF are belong to us

Remember when I said the following?

any mechanism based on ID/IDREF introduces an extra keyword, the ID of the element ("mainauthor" here)... Asking a non-techie user to provide an ID is clearly suboptimal in terms of UX. That means the editing environment should pick one ID for the author, at the risk of using a human language not understood by the package's author or even a meaningless random ID.

Seems the situation is even worse than I expected... The EPUB3 Publications specification explicitly allows chains of subexpressions through ID/REF mechanisms and refines attributes. I guess it'll be clearer with this example:

<dc:identifier id="bookID">
B64F547F-F2D1-4ACE-B8D2-D164785CDEB8
</dc:identifier>
<!-- meta element below refines the dc:identifier element above -->
<meta refines="#bookID"
property="meta-auth"
id="meta-authority1">My metadata Authority</meta>
<!-- link element below refines the meta element above... -->
<link refines="#meta-authority1"
rel="xml-signature"
href="../META-INF/signature.xml#Signature"/>

One ID/IDREF is already hard enough to deal with in UI but a chain?!? I can find one single word only to describe such a mechanism in a Standard oriented towards creation of visual products, i.e. oriented towards Wysiwyg-ness and nicely composed UIs: ridiculous. EPUB3 metadata are clearly a deep weakness in the EPUB3 format because of the incredible complexity they introduce in editing environments. Honestly, I am not sure to implement this, it would drastically uglify a UI that is already complex enough.

Tuesday 4 September 2012

BlueGriffon EPUB3 Validator

I am glad to report my EPUB3 Validator add-on to Firefox is now available. A few details about it:

  • JavaScript only
  • not based on epubcheck's code at all, no Java code, does not rely on Java at all
  • my own xml content model validator
  • my own CSS parser and CSS Profile validator
  • Firefox >=12
  • Mac OS X, Windows, Linux
  • Free upgrades for life after purchase.

More information there.

Wednesday 29 August 2012

epub-samples

I have taken all packages available from the epub-samples EPUB3 repository and pushed them through my own EPUB3 validator. Some of these "clean" packages are not that clean, so here are the validation logs. Please note my validator also checks all stylesheets in the package according to the EPUB3 CSS Profile (AFAIK, no other validator is able to do that).

Monday 27 August 2012

EPUB3 fun #8, update

I said a few days ago that validation of EPUB3 Content Documents is impossible. That's actually incorrect and I apologize for that. The PUB3 Content Documents spec normatively cites a RelaxNG schema for XHTML5 markup and dataypes. Still, I have no idea from reading the spec what flavor or xhtml5 it represents... I'm left with a RNG to understand if a given html5 element is implemented or not, and how, in EPUB3. Hum.

Saturday 25 August 2012

EPUB3 fun #9

Validation of meta elements in EPUB3 triggers a few surprises... In fact, that's not the meta elements themselves, it's the properties they carry. Let me explain:

  • EPUB3 meta elements carry a property attribute
  • a property is a keyword optionally preceded by a property vocabulary prefix and a colon
  • EPUB3 Media Overlays use the reserved media prefix mapped to namespace http://www.idpf.org/epub/vocab/overlays/#
  • the Media Overlays says the media:duration property has a cardinality of "exactly one for the Publication and for each Media Overlay"

In fact, that's quite painful and costly to test in the following case:

  1. suppose the namespace above is declared (yes, I know it's already reserved but it's not forbidden to do it !!!) on the package's prefix attribute with prefix foo
  2. suppose we have one <meta property="media:duration"> in the OPF for the Publication
  3. but also a <meta property="foo:duration"> for the same value...
  4. that's invalid per spec, see the last item of the list above

You can't compare the prefixes, that's not enough. You really need to rely on URIs and that's where validation is expensive.

Conclusion: Media Overlays 3.0 have been released with EPUB3. The associated media prefix should be dropped and its related properties should be prefixless in EPUB3. If the issue is the media:narrator property too close to a value in another property vocabulary, it should be changed.

Tuesday 21 August 2012

EPUB3 fun #8

EPUB3 is based on the xml serialization of HTML5. When I say HTML5, I really mean the W3C version. But there is a problem ; that also happens with many other specs normatively referenced by EPUB3 but let's focus on HTML5 right now: the link to the HTML5 normative reference has for href http://www.w3.org/TR/html5/ . In other terms, the last version (WD, LCWD, CR, PR, REC) published by W3C.

What's the version to consider for the implementation of an EPUB3 reader or editor? I have no idea. What has changed between the time the HTML5 WD was normatively referenced and now? I have no idea. What if for instance an "at-risk" feature is dropped in a future WD of the spec? I have no idea.

This is not only a theoretical issue, it has a deep and immediate practical impact: validation of the Content Documents inside an EPUB3 ebook is impossible. More globally, full validation of an EPUB3 package is then impossible.

EPUB3 fun #7

Hum, yet another bug in an EPUB3 spec... Excerpt (verbatim, nothing added or removed) from section 2.1.3.1.3 of EPUB Content Documents 3.0:

The prefix attribute

The prefix attribute definition is unchanged, but the attribute is defined to be in the namespace http://www.idpf.org/2007/ops when used in Content Documents.

Unchanged from what?!?

Friday 10 August 2012

EPUB3 fun #6

In the EPUB ebook world, there's one sentence I heard so many times from so various sources I did not even feel the need to verify it:

"Validation of EPUB is extremely important and we heavily rely on the EPUB Validator"

First thought: "excellent!". People need to distribute validated packages because they don't want to suffer big issues on the some of the many ebook readers on the market. Excellent.

Unfortunately, after a closer look, the situation seems to me a bit different from that idealistic (I could even say utopian) view... Let me explain (please note I have spent time carefully reading the epubcheck source code for instance):

  • if you count the number of major Standards (de jure, de facto, proprietary) involved in the validation of a given *.epub EPUB3 file, you'll find a few dozens of them. And again, only the major ones, the ones a serious industrial validator must absolutely validate against.
  • some validations are complex and expensive, for instance a serious validation of encryption.xml or signatures.xml will involve much more than just a RNG-based validation of the XML instance... Validation of external property vocabularies can be extremely tricky or painful/expensive to implement.
  • the first step of validation is related to the ZIP package itself, and the epub30-ocf spec has a few very technical requirements there. I sincerely doubt all of them are validated, I sincerely doubt EPUB3 packages all around the world currently pass or will pass all the conformance requirements there.
  • only in the W3C space, an EPUB3 validator must at least validate against a dozen of specs.
  • the complexity of some of these specs is huge, drastically impacting users (ebook authors) either on a learning curve's basis or on a financial one. Or both.

EPUB3 is probably too complex as a spec. In EPUB2, most people did not understand the difference between spine, ncx and guide. Most common question was "why do we have multiple table of contents and which one is the good one?". In EPUB3, it's a bit better but only a bit. We have landmarks, multiple table of contents and still a spine. Personally I still wonder why there is a manifest of files; probably only because of the MIME types. Hey, even OSes rely on file extensions to infer a MIME type!!!

I'd love to see appear an EPUB4 strictly xhtml5/svg/mathml-based. No other XML namespace allowed. No more OCF, OPF, NCX. Manifest of files coming from the ZIP list of entries. Direct inclusion of non-conflicting property vocabularies. No need to have an OCF, we can have a nice index.xhtml. Rely on a vocabulary of classes/IDs/roles (not the ARIA role...) expressing extra constraints/behaviours on existing html5 elements. Would be enough for most ebook authors and publishers and would drastically decrease the complexity of the publishing chain, IMHO.

Wednesday 8 August 2012

EPUB3 fun #5

Dates and time... Dates and time are painful, dates and time are complex, "dates and time are the most complex objects we use on a daily basis" used to say Reuters' Misha Wolf in the HTML WG in the good ol'days.

Excerpt from section 2.2.7 of OPF 2.01:

The date element has one optional OPF event attribute. The set of values for event are not defined by this specification; possible values may include: creation, publication, and modification.

Excerpt from section 3.4.6 of EPUB Publications 3.0:

The date element must only be used to define the publication date of the EPUB Publication.
...
Only one date element is allowed.

Ahem... So EPUB3 allows only two dates for a package: the publication date and the last-modification date (through a <meta property="dcterms:modified"> element). It's impossible to preserve the date of creation or various modification dates. Is it only me or this change from EPUB2 to EPUB3 is weird?

Please note that to insert a publication date into an OPF document ready for distribution, you must modify it and rezip the whole package. So if an OPF document does have a <dc:date> publication element, its content value must be equal to the last-modified date!!! In other terms, this is totally useless. Wow.

Tuesday 7 August 2012

EPUB3 fun #4

Excerpt from section 4.2.4 from EPUB Publications 3.0:

<package …
prefix="foaf: http://xmlns.com/foaf/spec/
dbp:  http://dbpedia.org/ontology/">
…
</package>

Excerpt from an unnumbered (sic) section of EPUB3 Fixed Layout Documents:

Implementors should note that future revisions of [Publications30] may establish the vocabulary represented by the URI http://www.idpf.org/vocab/rendition/# as a reserved vocabulary. In this case, the result will be that a) explicit mapping declaration using the prefix attribute will no longer be applicable, and b) the prefix ‘rendition’ will be reserved for this vocabulary. Future revisions of [Publications30] may also integrate the properties defined here into the Package Document default vocabulary. In this case the properties defined herein will be allowed to occur in Package Documents without a prefix.

In other terms, an attribute prefix="foobar: http://www.idpf.org/vocab/rendition/#" is needed on the package element for the time being to be able to parse <meta property="foobar:orientation">. But we're already warned that that foobar may become rendition w/o the need to declare the corresponding prefix on the package element. And well, even rendition could be itself dropped in the future. So in the long run, it will probably be <meta property="orientation"> w/o property prefix or prefix declaration.

<troll>I think it is not complex enough. Prefixes should be declared by extra meta elements themselves using a prefixed property, just to be able to have a few circular references...</troll>

Hey guys, why don't you finish shaking the shaker and get rid of the rendition prefix for good right now? Just for the record, I have in front of me here two EPUB3 documents, the first one uses the rendition property prefix without prefix declaration, the second uses the rendering property prefix instead of rendition but since the URI is ok, this is valid. In other terms, it's a mess already... I'm sure many epub3 Readers will look for rendition and only rendition, failing to correctly catch and resolve the prefix at all. Of course, the situation will get worse if the prefix becomes reserved or even integrated ! Pfff....

Monday 6 August 2012

EPUB3 fun #3

Clearly, Wysiwyg-editability was, as too often in W3C specs, the last concern on EPUB3 spec editors' mind... EPUB3 is not a W3C spec but an IDPF spec, but the result is the same, unfortunately. I have a concrete example here: in EPUB2, here's how an author is specified in the OPF metadata of the package:

<dc:creator opf:file-as="Murakami, Haruki" opf:role="aut">Haruki Murakami</dc:creator>

That is quite easy to map into a simple and efficient UI.

But in EPUB3, that's pretty different since the properties refining the <dc:creator> element are expressed in standalone <meta> elements using an ID/IREF reference mechanism:

<dc:creator id="mainauthor">Haruki Murakami</dc:creator>
<meta refines="#mainauthor" property="role" scheme="marc:relators" id="role">aut</meta>
<meta refines="#mainauthor" property="alternate-script" xml:lang="ja">村上 春樹</meta>
<meta refines="#mainauthor" property="file-as">Murakami, Haruki</meta>
<meta refines="#mainauthor" property="display-seq">1</meta>

Please note the alternate-script property that was not expressable at all in EPUB2. Please also note the display-seq property that allows to specify an rendering order for the element (in the list of creators). These are cool features but...

  1. any mechanism based on ID/IDREF introduces an extra keyword, the ID of the element ("mainauthor" here)... Asking a non-techie user to provide an ID is clearly suboptimal in terms of UX. That means the editing environment should pick one ID for the author, at the risk of using a human language not understood by the package's author or even a meaningless random ID.
  2. there can be multiple alternate-script properties and that is really, really tricky to offer in a simple and clean UI.
  3. CSS cannot style the above since there is no ID/IDREF mechanism in CSS; I understand that it was difficult to change the content model of DC elements but still, this is really a pity.
  4. the display-seq property  is absolutely suboptimal: since <dc:creator> elements can be refined by a role property, the display sequence should apply for all elements of the same localName having the same role and not only all elements of the same localName; having the display sequence's value in PCDATA instead of inside an attribute on the <dc:creator> itself seems to me a design error. Even worse, the spec says in section 4.3.2 that "When the display-seq property is attached to some, but not all, of the members in a set, only the elements identified as having a sequence should be included in any rendering". Weird!
  5. dealing with such metadata is expensive: for each <dc:creator> found, you have to look for all <meta> elements refining it. Please note the spec does not say what happens to a <meta> element when the IRI in the refines attributes has no target (I think such element should be made invalid).
  6. unless I missed it, I think nothing is said about conflicting properties; for instance a creator having two roles specified, "auth" and ""pmn".

All in all, EPUB3 creator/contributor metadata are painful to deal with and edit, using ID/IDREF mechanisms in a specification (also) made for authoring environments while we have the full power of XML to avoid them seems to me a strategic error.

Don't get me wrong, I do understand why the <meta> and its refines attribute were introduced. But I think the cost to pay for that is too high.

  1. we probably don't need a potentially multivalued alternate-script property. I think a monovalued original-script attribute on <dc:creator> or <dc:contributor> was probably enough.
  2. file-as and role were perfect as attributes in EPUB2. Having the possibility to declare a scheme for role seems to me useless, most package authors will use marc roles anyway for compatibility reasons with EPUB2. file-as meta property is, compared to the file-as attribute of EPUB2, useless bloat in terms of footprint, speed of access, editability and maintainability.
  3. the display-seq property seems to me far from meeting package authors' expectations. Its specification makes it painful to deal with since creators/contributors can become hidden if they have no display sequence while others have one. This property seems to me useless and the ordering of similar elements inside the <metadata> element is most certainly enough. I even bet that this feature will be drastically underused.
  4. all in all, that says the <meta> element of EPUB3 is a suboptimal solution for problems touching only an extreme minority of package authors. The complexity it induces is, in my humble opinion, counter-productive and EPUB2 metadata were better designed and specified.

EPUB3 fun #2

Aaaaah, forward compatibility... Here are two interesting excerpts from the EPUB Publications 3.0 specification.

In section 3.4.14: Authors may include the guide element in the Package Document for EPUB 2 Reading System forwards compatibility purposes

In section 3.4.1: The (version) attribute (of the package element) must have the value 3.0 to indicate compliance with this version of the specification.

Let's look now at two bits coming from the Open Packaging Format (OPF) 2.0.1 specification:

In section 1.4.12: the version attribute of the package element is specified with a value of 2.0

In section 1.3.2: In addition, to be processed as an OPF 2.0 package, a version attribute with a value of 2.0 must be specified on the package element. A package element that omits the version attribute must be processed as an OEBPS 1.2 package

In summary, <package version="3.0"> is absolutely needed for an EPUB to be parsed as EPUB3 and EPUB2 absolutely requires <package version="2.0"> since the absence of the version attribute defaults to OEBPS 1.2!!! Conclusion, "EPUB 2 Reading System forwards compatibility" described in the first quote above is absolutely illusory. Even worse, compatibility can be achieved if and only if EPUB Reading Systems deliberately ignore the version attribute on the package element... Ooops, to say the least.

- page 1 of 2