LibreOffice and EPUB
By glazou on Thursday 1 February 2018, 10:31 - Standards - Permalink
LibeOffice 6.0 is now available. And it's through the inevitable Korben I discovered this morning it has a builtin EPUB export. So let's take a closer look at that new beast and evaluate how it deals with that painful task. Conformant EPUB? And which version of EPUB? Reusable XHTML and CSS? We'll see.
After installation (on a Mac), I created a new trivial text document; it contains a paragraph, a level 1 header, an image, a table, and a unordered list of three items. I did not touch at all fonts, styles, margins, etc.

Then I discovered LibreOffice now has two new menu items: File > Export As... > Export directly as EPUB and File > Export As... > Export as EPUB... .
Export directly as EPUB
It directly opens a filepicker to select a destination *.epub file. Let's unzip the saved package and take a look at its guts:
- the
mimetypefile is correctly placed as first file in the package and it's correctly stored without compression - other files are correctly stored using Deflate
- the
META-INF/container.xmlis stored in last position in the zip, which is probably a mistake - the OPF file says it's a EPUB 3.0 package and its metadata are clean ; AFAICT, the OPF file is conformant to the spec
- XML and XHTML files in the package are serialized without carriage returns (if you except one after the XML prolog) or indentation...
- a NCX is present
- the Navigation Document (called
toc.xhtml) and the NCX live side by side in a OEBPS folder (sigh) - there is a empty
OEBPS/styles/stylesheet.cssfile - the content files are in a
OEBPS/sectionsfolder - that folder contains 2 files (!)
section001.xhtmlandsection002.xhtml - looking at these files, LibreOffice seems to have split the original document at section breaks, hence the two sections found in the EPUB package
- there is no
titleelement in these files - there is clearly a problem with exported CSS styles, the body of each generated document having no margins, paddings. And since there is no CSS-reset either...
- the set of LibreOffice styles (the leftmost dropdown in the toolbar) are not exported to CSS; the whole export relies on CSS inline styles (
styleattributes) and not on classes - the original document uses the "Liberation Serif" font, that is not registered under that name into the OS X fontbook (old issue well known in the OOXML world...). The final rendition in a browser is then buggy, font-wise. The
font-familydeclarations in the document don't use a fallback toserif. - there is a very weird
font-effect: outlineproperty serialized on all paragraphs in table cells - strangely again, all these paragraphs have
text-decoration: overline; text-shadow: 1px 1px 1px #666666;while the original text is not overlined nor shadowed - when a paragraph (a
pin terms of OOXML) contains one single run of text (arin OOXML), the output could be optimized getting rid of aspanand adding its inline styles to the parent paragraph. The output is too verbose and will trigger issues in html editors, Wysiwyg or not. - the
marginvalues in the document use a mix of inches and pixels, which is kind of weird - the image in the original document is lost in the EPUB package
- headers are not generated as
h1,h2, ... but aspelements with styles. - the EPUB version does not correctly deal with the unordered list and all list items become regular paragraphs. No
olorul, bullet, no counter, nolist-style-type. Semantics is lost.
Firefox Quantum viewing the resulting section002.xhtml file. You can clearly see where the html+CSS export is buggy:

How iBooks sees that EPUB:

Export as EPUB...
Aaah, that one is quite different since it first opens the following dialog:

The dialog offers the following choices:
- export as EPUB2 or EPUB3 (nice!)
- Split at Page Breaks or Headings (very nice feature but why not also a "Don't split" option?)
and validating the dialog goes to the aforementioned *.epub filepicker.
Conclusion
This is an excellent start, really, and splitting the document at headers or page breaks is an excellent idea. Unfortunately, there are too many holes in the xhtml+CSS export at this time to make it really usable unless your document has almost unstyled paragraphs only. Some generated styles (overline?!?) are not present in the original document, it generates only paragraphs and tables losing the header or list semantics, the LibreOffice styles are not serialized in a CSS stylesheet (bug?) and more. This will help some individuals but I am not sure it will help EPUB publication chains, at least for now.
Update: Wow. I added an extra test: I compared the result of "Export to XHTML" and the XHTML inside a "Export to EPUB". In the former, styles are correctly exported as a stylesheet, classes are correctly used, h1 and ol/li are correctly used, the image is preserved, and the general rendering is MUCH better. So the Export to EPUB has one of the two following problems: it reuses the "Export to XHTML" code and splitting introduced a lot of bugs, OR it has its own export-to-xhtml code and it's a mistake since the existing one does quite a decent job...
Second update: LibreOffice's trunk does a significantly better job: stylesheet is correctly generated, xhtml files are CR'd and indented correctly, images are preserved.

Comments
Hi. Did you report the issues to the Libreoffice bug tracker? https://www.libreoffice.org/get-hel...
@sam: https://bugs.documentfoundation.org...