LibeOffice 6.0 is now available. And it's through the inevitable Korben I discovered this morning it has a builtin EPUB export. So let's take a closer look at that new beast and evaluate how it deals with that painful task. Conformant EPUB? And which version of EPUB? Reusable XHTML and CSS? We'll see.

After installation (on a Mac), I created a new trivial text document; it contains a paragraph, a level 1 header, an image, a table, and a unordered list of three items. I did not touch at all fonts, styles, margins, etc.

Trivial text document in LibreOffice 6.0

Then I discovered LibreOffice now has two new menu items: File > Export As... > Export directly as EPUB and File > Export As... > Export as EPUB... .

Export directly as EPUB

It directly opens a filepicker to select a destination *.epub file. Let's unzip the saved package and take a look at its guts:

  • the mimetype file is correctly placed as first file in the package and it's correctly stored without compression
  • other files are correctly stored using Deflate
  • the META-INF/container.xml is stored in last position in the zip, which is probably a mistake
  • the OPF file says it's a EPUB 3.0 package and its metadata are clean ; AFAICT, the OPF file is conformant to the spec
  • XML and XHTML files in the package are serialized without carriage returns (if you except one after the XML prolog) or indentation...
  • a NCX is present
  • the Navigation Document (called toc.xhtml) and the NCX live side by side in a OEBPS folder (sigh)
  • there is a  empty OEBPS/styles/stylesheet.css file
  • the content files are in a OEBPS/sections folder
  • that folder contains 2 files (!) section001.xhtml and section002.xhtml
  • looking at these files, LibreOffice seems to have split the original document at section breaks, hence the two sections found in the EPUB package
  • there is no title element in these files
  • there is clearly a problem with exported CSS styles, the body of each generated document having no margins, paddings. And since there is no CSS-reset either...
  • the set of LibreOffice styles (the leftmost dropdown in the toolbar) are not exported to CSS; the whole export relies on CSS inline styles (style attributes) and not on classes
  • the original document uses the "Liberation Serif" font, that is not registered under that name into the OS X fontbook (old issue well known in the OOXML world...). The final rendition in a browser is then buggy, font-wise. The font-family declarations in the document don't use a fallback to serif.
  • there is a very weird font-effect: outline property serialized on all paragraphs in table cells
  • strangely again, all these paragraphs have text-decoration: overline; text-shadow: 1px 1px 1px #666666; while the original text is not overlined nor shadowed
  • when a paragraph (a p in terms of OOXML) contains one single run of text (a r in OOXML), the output could be optimized getting rid of a span and adding its inline styles to the parent paragraph. The output is too verbose and will trigger issues in html editors, Wysiwyg or not.
  • the margin values in the document use a mix of inches and pixels, which is kind of weird
  • the image in the original document is lost in the EPUB package
  • headers are not generated as h1, h2, ... but as p elements with styles.
  • the EPUB version does not correctly deal with the unordered list and all list items become regular paragraphs. No ol or ul, bullet, no counter, no list-style-type. Semantics is lost.

Firefox Quantum viewing the resulting section002.xhtml file. You can clearly see where the html+CSS export is buggy:

Firefox Quantum viewing the resulting section002.xhtml file

How iBooks sees that EPUB:

how iBooks sees that EPUB

Export as EPUB...

Aaah, that one is quite different since it first opens the following dialog:

Export as EPUB... dialog

The dialog offers the following choices:

  1. export as EPUB2 or EPUB3 (nice!)
  2. Split at Page Breaks or Headings (very nice feature but why not also a "Don't split" option?)

and validating the dialog goes to the aforementioned *.epub filepicker.

Conclusion

This is an excellent start, really, and splitting the document at headers or page breaks is an excellent idea. Unfortunately, there are too many holes in the xhtml+CSS export at this time to make it really usable unless your document has almost unstyled paragraphs only. Some generated styles (overline?!?) are not present in the original document, it generates only paragraphs and tables losing the header or list semantics, the LibreOffice styles are not serialized in a CSS stylesheet (bug?) and more. This will help some individuals but I am not sure it will help EPUB publication chains, at least for now.

Update: Wow. I added an extra test: I compared the result of "Export to XHTML" and the XHTML inside a "Export to EPUB". In the former, styles are correctly exported as a stylesheet, classes are correctly used, h1 and ol/li are correctly used, the image is preserved, and the general rendering is MUCH better. So the Export to EPUB has one of the two following problems: it reuses the "Export to XHTML" code and splitting introduced a lot of bugs, OR it has its own export-to-xhtml code and it's a mistake since the existing one does quite a decent job...

Second update: LibreOffice's trunk does a significantly better job: stylesheet is correctly generated, xhtml files are CR'd and indented correctly, images are preserved.