<Glazblog/>

Thursday 7 June 2018

Browser detection inside a WebExtension

Just for the record, if you really need to know about the browser container of your WebExtension, do NOT rely on StackOverflow answers... Most of them are based, directly or not, on the User Agent string. So spoofable, so unreliable. Some will recommend to rely on a given API, implemented by Firefox and not Edge, or Chrome and not the others. In general valid for a limited time only... You can't even rely on chrome, browser or msBrowser since there are polyfills for that to make WebExtensions cross-browser.

So the best and cleanest way is probably to rely on chrome.extension.getURL("/") . It can start with "moz", "chrome" or "ms-browser". Unlikely to change in the near future. Simple to code, works in both content and background.

My pleasure :-)

Wednesday 2 May 2018

Nominating Florian Rivoal for a seat at the W3C Advisory Board

The World Wide Web Consortium (W3C) is at crossroads. There are multiple reasons for that:

  • it's slow, extremely slow. Its Process, that rules the daily life of the Membership, cannot be changed fast even in front of a major hickup. It's not an exaggeration to say that important problems do take years to fix, even when there is an clear, critical and immediate issue.
  • it's opaque and weirdly managed. Elections at the Advisory Board (AB) and the Technical Architecture Group (TAG) are the only elections I know where the scores of the candidates remain secret after the end of the election. Even the candidates, successful or not, don't know their scores. It's the only organization I know where votes of the Membership can be biaised by changes to the requests during the course of a vote... It's the only Standards Committee I know where the management regularly interferes with the local Process. The finances of the W3C are also opaque, with a Staff that has seen a salary freeze for many years now, a non-incorporated structure (even as a Foundation) and extremely complicated relationships between the MIT-based foot of W3C and its european or asian feet.
  • it's unable to acknowledge the fact that some crucial parts of the Open Web Platform (OWP) it voluntarily abandoned long ago are now gone. html, DOM and other major layers of the OWP are now in the smart hands of the WHATWG. As an example, nobody really cares about the W3C versions of html and DOM because they're not what's implemented, because they diverge from implementations; it's just not usable in a real production environment. A clear side-effect is a counter-productive state of war with WHATWG that affects everyone and everything.
  • the merger with IDPF, mostly done behind curtains and with little interaction with the Membership, is suboptimal, to say the least. The former IDPF has recreated its ivory tower inside the W3C. The original Charters of the Publishing Groups were not conformant to W3C Process; the spirit of these Groups itself is not conformant the W3C spirit and usual way of producing Standards. As I said above, W3C is at crossroads but it has experience and expertise about delivering Standards to billions of people, like it or not.
  • the W3C Director, our Maaaaaster, is mostly gone. He's on a sabbatical now but he's been mostly absent from W3C daily activities for the last decade. Decisions, technical or process-wise, are always made « in the name of the Director », but the Director does not show up any more if you except big events like Plenary Meetings. In my 7.5 years tenure as CSS WG Co-chair, not a single spec transition conference call was held by the Director (and the CSS WG is one of the major groups of the W3C). The W3C is unable to acknowledge that absence and materialize it in its Process.
  • a part of Standardization relies on some absolutely crucial Invited Experts that invest their own budget for us. Even the attendance cost to the Plenary Meetings can be too much for some of them. We started discussing a sponsoring scheme for Invited Experts a decade ago and we're still nowhere at all.

So I think the reasons why I ran myself in the past for a seat at the W3C Advisory Board still stand. The W3C needs reforms, and probably a management change. But I wrote here a while ago, « I am now 50 years old, I have been contributing to W3C for, er, almost 22 years and that's why I will not run any more. We need younger people, we need different perspectives, we need different ways of doing, we need different futures. We need a Consortium of 2017, we still have a Consortium of 2000, we still have the people of 2000. »

I pinged Florian Rivoal about all of that. Florian is a extremely talented, multicultural, brilliant french engineer (but he also holds a MBA from INSEAD, often ranked #1 MBA in the whole world) based in Japan. He has been a crucial contributor to the CSS Working Group, the Publishign activity and many other areas of the daily W3C activities as a Avisory Committee representative for his various past employers. I do trust him, I like his vision, I like his diplomatic talent, appreciate that he deeply and truely cares for the future of the World Wide Web and the future of the W3C, and I love his technical expertise.

After a short chat, I told Florian that I wanted to nominate him for a seat at the W3C Advisory Board. After some thoughts, Florian accepted. Kodansha (one of the largest publishing company in Japan) has agreed to sponsor his participation to the AB.

You can read his official candidacy in french, english, japanese, chinese and korean. If your employer is a W3C Member and you're the AC-rep of your employer, please consider giving your vote to Florian, Florian would be a great addition to the W3C Advisory Board. And if you're not the AC-rep but your employer is a W3C Member, please consider telling your AC-rep about Florian's candidacy with a recommendation to vote for him.

Thank you.

Thursday 1 February 2018

LibreOffice and EPUB

LibeOffice 6.0 is now available. And it's through the inevitable Korben I discovered this morning it has a builtin EPUB export. So let's take a closer look at that new beast and evaluate how it deals with that painful task. Conformant EPUB? And which version of EPUB? Reusable XHTML and CSS? We'll see.

After installation (on a Mac), I created a new trivial text document; it contains a paragraph, a level 1 header, an image, a table, and a unordered list of three items. I did not touch at all fonts, styles, margins, etc.

Trivial text document in LibreOffice 6.0

Then I discovered LibreOffice now has two new menu items: File > Export As... > Export directly as EPUB and File > Export As... > Export as EPUB... .

Export directly as EPUB

It directly opens a filepicker to select a destination *.epub file. Let's unzip the saved package and take a look at its guts:

  • the mimetype file is correctly placed as first file in the package and it's correctly stored without compression
  • other files are correctly stored using Deflate
  • the META-INF/container.xml is stored in last position in the zip, which is probably a mistake
  • the OPF file says it's a EPUB 3.0 package and its metadata are clean ; AFAICT, the OPF file is conformant to the spec
  • XML and XHTML files in the package are serialized without carriage returns (if you except one after the XML prolog) or indentation...
  • a NCX is present
  • the Navigation Document (called toc.xhtml) and the NCX live side by side in a OEBPS folder (sigh)
  • there is a  empty OEBPS/styles/stylesheet.css file
  • the content files are in a OEBPS/sections folder
  • that folder contains 2 files (!) section001.xhtml and section002.xhtml
  • looking at these files, LibreOffice seems to have split the original document at section breaks, hence the two sections found in the EPUB package
  • there is no title element in these files
  • there is clearly a problem with exported CSS styles, the body of each generated document having no margins, paddings. And since there is no CSS-reset either...
  • the set of LibreOffice styles (the leftmost dropdown in the toolbar) are not exported to CSS; the whole export relies on CSS inline styles (style attributes) and not on classes
  • the original document uses the "Liberation Serif" font, that is not registered under that name into the OS X fontbook (old issue well known in the OOXML world...). The final rendition in a browser is then buggy, font-wise. The font-family declarations in the document don't use a fallback to serif.
  • there is a very weird font-effect: outline property serialized on all paragraphs in table cells
  • strangely again, all these paragraphs have text-decoration: overline; text-shadow: 1px 1px 1px #666666; while the original text is not overlined nor shadowed
  • when a paragraph (a p in terms of OOXML) contains one single run of text (a r in OOXML), the output could be optimized getting rid of a span and adding its inline styles to the parent paragraph. The output is too verbose and will trigger issues in html editors, Wysiwyg or not.
  • the margin values in the document use a mix of inches and pixels, which is kind of weird
  • the image in the original document is lost in the EPUB package
  • headers are not generated as h1, h2, ... but as p elements with styles.
  • the EPUB version does not correctly deal with the unordered list and all list items become regular paragraphs. No ol or ul, bullet, no counter, no list-style-type. Semantics is lost.

Firefox Quantum viewing the resulting section002.xhtml file. You can clearly see where the html+CSS export is buggy:

Firefox Quantum viewing the resulting section002.xhtml file

How iBooks sees that EPUB:

how iBooks sees that EPUB

Export as EPUB...

Aaah, that one is quite different since it first opens the following dialog:

Export as EPUB... dialog

The dialog offers the following choices:

  1. export as EPUB2 or EPUB3 (nice!)
  2. Split at Page Breaks or Headings (very nice feature but why not also a "Don't split" option?)

and validating the dialog goes to the aforementioned *.epub filepicker.

Conclusion

This is an excellent start, really, and splitting the document at headers or page breaks is an excellent idea. Unfortunately, there are too many holes in the xhtml+CSS export at this time to make it really usable unless your document has almost unstyled paragraphs only. Some generated styles (overline?!?) are not present in the original document, it generates only paragraphs and tables losing the header or list semantics, the LibreOffice styles are not serialized in a CSS stylesheet (bug?) and more. This will help some individuals but I am not sure it will help EPUB publication chains, at least for now.

Update: Wow. I added an extra test: I compared the result of "Export to XHTML" and the XHTML inside a "Export to EPUB". In the former, styles are correctly exported as a stylesheet, classes are correctly used, h1 and ol/li are correctly used, the image is preserved, and the general rendering is MUCH better. So the Export to EPUB has one of the two following problems: it reuses the "Export to XHTML" code and splitting introduced a lot of bugs, OR it has its own export-to-xhtml code and it's a mistake since the existing one does quite a decent job...

Second update: LibreOffice's trunk does a significantly better job: stylesheet is correctly generated, xhtml files are CR'd and indented correctly, images are preserved.

Thursday 18 January 2018

Announcing WebBook Level 1, a new Web-based format for electronic books

TL;DR: the title says it all, and it's available there.

Eons ago, at a time BlueGriffon was only a Wysiwyg editor for the Web, my friend Mohamed Zergaoui asked why I was not turning BlueGriffon into an EPUB editor... I had been observing the electronic book market since the early days of Cytale and its Cybook but I was not involved into it on a daily basis. That seemed not only an excellent idea, but also a fairly workable one. EPUB is based on flavors of HTML so I would not have to reinvent the wheel.

I started diving into the EPUB specs the very same day, EPUB 2.0.1 (released in 2009) at that time. I immediately discovered a technology that was not far away from the Web but that was also clearly not the Web. In particular, I immediately saw that two crucial features were missing: it was impossible to aggregate a set of Web pages into a EPUB book through a trivial zip, and it was impossible to unzip a EPUB book and make it trivially readable inside a Web browser even with graceful degradation.

When the IDPF started working on EPUB 3.0 (with its 3.0.1 revision) and 3.1, I said this was coming too fast, and that the lack of Test Suites with interoperable implementations as we often have in W3C exit criteria was a critical issue. More importantly, the market was, in my opinion, not ready to absorb so quickly two major and one minor revisions of EPUB given the huge cost on both publishing chains and existing ebook bases. I also thought - and said - the EPUB 3.x specifications were suffering from clear technical issues, including the two missing features quoted above.

Today, times have changed and the Standards Committee that oversaw the future of EPUB, the IDPF, has now merged with the World Wide Web Consortium (W3C). As Jeff Jaffe, CEO of the W3C, said at that time,

Working together, Publishing@W3C will bring exciting new capabilities and features to the future of publishing, authoring and reading using Web technologies

Since the beginning of 2017, and with a steep acceleration during spring 2017, the Publishing@W3C activity has restarted work on the EPUB 3.x line and the future EPUB 4 line, creating a EPUB 3 Community Group (CG) for the former and a Publishing Working Group (WG) for the latter. If I had some reservations about the work division between these two entities, the whole thing seemed to be a very good idea. In fact, I started advocating for the merger between IDPF and W3C back in 2012, at a moment only a handful of people were willing to listen. It seemed to me that Publishing was a underrated first-class user of Web technologies and EPUB's growth was suffering from two critical ailments:

  1. IDPF members were not at W3C so they could not confront their technical choices to browser vendors and the Web industry. It also meant they were inventing new solutions in a silo without bringing them to W3C standardization tables and too often without even knowing if the rendering engine vendors would implement them.
  2. on another hand, W3C members had too little knowledge of the Publishing activity, that was historically quite skeptical about the Web... Working Groups at W3C were lacking ebook expertise and were therefore designing things without having ebooks in mind.

I was then particularly happy when the merger I advocated for was announced.

As I recently wrote on Medium, I am not any more. I am not convinced by the current approach implemented by Publishing@W3C on many counts:

  • the organization of the Publishing@W3C activity, with a Publishing Business Group (BG) formally ruling (see Process section, second paragraph) the EPUB3 CG and a Steering Committee (see Process section, first paragraph) recreated the former IDPF structure inside W3C.  The BG Charter even says that it « advises W3C on the direction of current and future publishing activity work » as if the IDPF and W3C did not merge and as if W3C was still only a Liaison. It also says « the initial members of the Steering Committee shall be the individuals who served on IDPF’s Board of Directors immediately prior to the effective date of the Combination of IDPF with W3C », maintaining the silo we wanted to eliminate.
  • the EPUB3 Community Group faces a major technical challenge, recently highlighted by representatives of the Japanese Publishing Industry: EPUB 3.1 represents too much of a technical change compared to EPUB 3.0.1 and is not implementable at a reasonable cost in a reasonable timeframe for them. Since EPUB 3 is recommended by the Japanese Government as the official ebook format in Japan, that's a bit of a blocker for EPUB 3.1 and its successors. The EPUB3 CG is then actively discussing a potential rescindment of EPUB 3.1, an extraction of the good bits we want to preserve, and the release of a EPUB 3.0.2 specification based on 3.0.1 plus those good bits. In short, the EPUB 3.1 line, that saw important clarifying changes from 3.0.1, is dead.
  • the Publishing Working Group is working on a collection of specifications known as Web Publications (WP), Packaged Web Publications (PWP), and EPUB 4. What these specifications represent is extremely complicated to describe. With a daily observation of the activities of the Working Group, I still can't firmly say what they're up to, even if I am already convinced that some technological choices (for instance JSON-LD for manifests) are highly questionable and do not « lead Publishing to its full Web potential », to paraphrase the famous W3C motto. It must also be said that the EPUB 3.1 hiatus in the EPUB3 CG shakes the EPUB 4 plan to the ground, since it's now extremely clear the ebook market is not ready at all to move to yet another EPUB version, potentially incompatible with EPUB 3.x (for the record, backwards-compatibility in the EPUB world is a myth).
  • the original sins of EPUB quoted above, including the two missing major features quoted in the second paragraph of the present article, are a minor requirement only. Editability of EPUB, one of the greatest flaws of that ecosystem, is still not a first-class requirement if not a requirement at all. Convergence with the Web is severely encumbered by personal agendas and technical choices made by one implementation vendor for its own sake; the whole W3C process based on consensus is worked around not because there is no consensus (the WG minutes show consensus all the time) but mostly beacause the rendering engine vendors are still not in the loop and their potential crucial contributions are sadly missed. And they are not in the loop because they don't understand a strategy that seems decorrelated from the Web; the financial impact of any commitment to the Publishing@W3C is then a understandable no-go.
  • the original design choices of EPUB, using painful-to-edit-or-render XML dialects, were also an original sin. We're about to make the same mistake, again and again, either retaining things that partly block the software ecosystem or imagining new silos that won't be editable nor grokable by a Web Browser. Simplicity, Web-centricity and mainstream implementations are not in sight.

Since the whole organization of Publishing @W3C is governed by the merger agreement between IDPF and W3C, I do not expect to change anyone's mind with the present article. I only felt the need to express my opinion, in both public and private fora. Unsurprisingly, the feedback to my private warnings was fairly negative. In short, it works as expected and I should stop spitting in the soup. Well, if that works as expected, the expectations were pretty low, sorry to say, and were not worth a merger between two Standard Bodies.

I have then decided to work on a different format for electronic books, called WebBook. A format strictly based on Web technologies and when I say "Web technologies", I mean the most basic ones: html, CSS, JavaScript, SVG and friends; the class of specifications all Web authors use and master on a daily basis. Not all details are decided or even ironed, the proposal is still a work in progress at this point, but I know where I want to go to.

I will of course happily accept all feedback. If people like my idea, great! If people disagree with it, too bad for me but fine! At least during the early moments of my proposal, and because my guts tell me my goals are A Good Thing™️, I'm running this as a Benevolent Dictator, not as a consensus-based effort. Convince me and your suggestions will make it in.

I have started from a list of requirements, something that was never done that way in the EPUB world:

  1. one URL is enough to retrieve a remote WebBook instance, there is no need to download every resource composing that instance

  2. the contents of a WebBook instance can be placed inside a Web site’s directory and are directly readable by a Web browser using the URL for that directory

  3. the contents of a WebBook instance can be placed inside a local directory and are directly readable by a Web browser opening its index.html or index.xhtml topmost file

  4. each individual resource in a WebBook instance, on a Web site or on a local disk, is directly readable by a Web browser

  5. any html document can be used as content document inside a WebBook instance, without restriction

  6. any stylesheet, replaced resource (images, audio, video, etc.) or additional resource useable by a html document (JavaScript, manifests, etc.) can be used inside the navigation document or the content documents of a WebBook instance, without restriction

  7. the navigation document and the content documents inside a WebBook instance can be created and edited by any html editor

  8. the metadata, table of contents contained in the navigation document of a WebBook instance can be created and edited by any html editor

  9. the WebBook specification is backwards-compatible

  10. the WebBook specification is forwards-compatible, at the potential cost of graceful degradation of some content

  11. WebBook instances can be recognized without having to detect their MIME type

  12. it’s possible to deliver electronic books in a form that is compatible with both WebBook and EPUB 3.0.1

I also made a strong design choice: the Level 1 of the specification will not be a fit-all-cases document. WebBook will start small, simple and extensible, and each use case will be evaluated individually, sequentially and will result in light extensions at a speed the Publishing industry can bear with. So don't tell me WebBook Level 1 doesn't support a given type of ebook or is not at parity level with EPUB 3.x. It's on purpose.

With that said, the WebBook Level 1 is available here and, again, I am happily accepting Issues and PR on github. You'll find in the spec references to:

  • « Moby Dick » released as a WebBook instance
  • « Moby Dick » released as a EPUB3-compatible WebBook instance
  • a script usable with Node.js to automagically convert a EPUB3 package into a EPUB3-compatible WebBook

My EPUB Editor BlueGriffon is already modified to deal with WebBook. The next public version will allow users to create EPUB3-compatible WebBooks.

I hope this proposal will show stakeholders of the Publishing@W3C activity another path to greater convergence with the Web is possible. Should this proposal be considered by them, I will of course happily contribute to the debate, and hopefully the solution.

Thursday 11 January 2018

Web. Period.

Well. I have published something on Medium. #epub #web #future #eprdctn

Thursday 7 December 2017

Open Office XML shading patterns for JavaScript

Working on Open Office XML and html these days, I ended up reading and implementing section 17.18.78 of the ISO/IEC 29500 spec. It's the one dedicated to shading patterns. In words we're used to, predefined background images serving as pattern masks. It's not a too long list but the PNG or data URLs were not available as public resource, and I found that rather painful. I am then making my own implementation, in JavaScript, of ST_Shd public. Feel free to use it under MPL 2.0 if you need it.

Monday 15 May 2017

W3C Advisory Board elections

The W3C Advisory Board (AB) election 2017 just started, and I am not running this time. I have said multiple times the way people are elected is far too conservative, giving a high premium to "big names" or representatives of "big companies" on one hand, tending to preserve a status-quo in terms of AB membership on the other. Newcomers and/or representative of smaller companies have almost zero chance to be elected. Even with the recent voting system changes, the problem remains.

Let me repeat here my proposal for both AB and TAG: two consecutive mandates only; after two consecutive mandates, elected members cannot run again for re-election at least during at least one year.

But let's focus on current candidates. Or to be more precise, on their electoral program:

  1. Mike Champion (Microsoft), who has been on the AB for years, has a clear program that takes 2/3rds of his AB nominee statement.
    1. increase speed on standards
    2. bridge the gap existing between "fast" implementors and "slow" standards
    3. better position W3C internally
    4. better position W3C externally
    5. help the Web community
  2. Rick Johnson (VitalSource Technologies | Ingram Content Group) does not have a detailed program. He wants to help the Publishing side of W3C.
  3. Charles McCathie Nevile (Yandex) wants
    1. more pragmatism
    2. to take "into account the broad diversity of its membership both in areas of interest and in size and power" but he has "been on the AB longer than any current participant, including the staff", which does not promote diversity at all
  4. Natasha Rooney (GSMA) has a short statement with no program at all.
  5. Chris Wilson (Google Inc.), who has also been elected to the AB twice already, wants :
    1. to engage better developers and vendors
    2. to focus better W3C resources, with more agility and efficiency
    3. to streamline process and policies to let us increase speed and quality
  6. Zhang Yan (China Mobile Communications Corporation) does not really have a clear program besides "focus on WEB technology for 5G, AI and the Internet of things and so on"
  7. Judy Zhu (Alibaba (China) Co., Ltd.) wants:
    1. to make W3C more globalized (good luck on that one...)
    2. to make W3C Process more usable/effective/efficient
    3. increase W3C/industries collaboration (but isn't it a industrial consortium already?)
    4. increase agility
    5. focus more on security and privacy

If I except the mentions of agility and Process, let me express a gut feeling: this is terribly depressing. Candidacy statements from ten years ago look exactly the same. They quote the same goals. They're even phrased the same way... But in the meantime, we have major topics on the meta-radar (non-exhaustive list):

  • the way the W3C Process is discussed, shaped and amended is so incredibly long it's ridiculous. Every single major topic Members raised in the last ten years took at least 2 years (if not six years) to solve, leaving Groups in a shameful mess. The Process is NOT a Technical Report that requires time, stability and implementations. It's our Law, that impacts our daily life as Members. When an issue is raised, it's because it's a problem right now and people raising the issue expect to see the issue solved in less than "years", far less than years.
  • no mention at all of finances! The finances of the W3C are almost a taboo, that only a few well-known zealots like yours truly discuss, but they feed all W3C activities. After years of instability, and even danger, can the W3C afford keeping its current width without cutting some activities and limiting its scope? Can the W3C avoid new revenue streams? Which ones?
  • similarly, no mention of transparency! I am not speaking of openness of our technical processes here, I am very clearly and specifically speaking of the transparency of the management of the Consortium itself. The way W3C is managed is far too vertical and it starts being a real problem, and a real burden. We need changes there. Now.
  • the role of the Director, another taboo inside W3C, must be discussed. It is time to acknowledge the fact the Director is not at the W3C any more. It's time to stop signing all emails "in the name of the Director', handle all transition conference calls "in the name of the Director" but almost never "with the Director". I'm not even sure we need another Director. It's time to acknowledge the fact Tim should become Honorary Director - with or without veto right - and distribute his duties to others.
  • we need a feedback loop and very serious evaluation of the recent reorganization of the W3C. My opinion is as follows: nobody knows who to contact and it's a mess of epic magnitude. The Functional leaders centralize input and then re-dispatch it to others, de facto resurrecting Activities and adding delays to everything. The reorg is IMHO a failure, and a rather expensive one in terms of effectiveness.
  • W3C is still not a legal entity, and it does not start being a burden... it's been a burden for eons. The whole architecture of W3C, with regional feet and a too powerful MIT, is a scandalous reminiscence of the past.
  • our election system for AB and TAG is too conservative. People stay there for ages, while all our technical world seems to be reshaped every twelve months. My conclusion is simple, and more or less matches what Mike Champion said : the Consortium is not tailored any more to match its technical requirements. Where we diverge: I understand Mike prefers evolution to revolution, I think evolution is not enough any more and revolution is not avoidable any more. We probably need to rethink the W3C from the ground up.
  • Incubation has been added to W3C Process in a way that is perceived by some as a true scandal. I am not opposed at all to Incubation, but W3C has shown a lack of caution, wisdom, consensus and obedience to its own Process that is flabbergasting. W3M acts fast when it need to remind a Member about the Process, but W3M itself seems to work around the Process all the time. The way Charters under review are modified during the Charter Review itself is the blatant example of that situation.

Given how far the candidacy statements are from what I think are the real and urgent issues of the W3C, I'm not even sure I am willing to vote... I will eventually cast a ballot, sure, but I stand by my opinion above: this is depressing.

I am now 50 years old, I have been contributing to W3C for, er, almost 22 years and that's why I will not run any more. We need younger people, we need different perspectives, we need different ways of doing, we need different futures. We need a Consortium of 2017, we still have a Consortium of 2000, we still have the people of 2000. If I was 20 today, born with the Web as a daily fact, how would I react looking at W3C? I think I would say « oh that old-school organization... » and that alone explains this whole article.

Conclusion for all W3C AB candidates: if you want my vote, you'll have to explain better, much better, where you stand in front of these issues. What do you propose, how do you suggest to implement it, what's your vision for W3C 2020. Thanks.

Tuesday 17 January 2017

E0

Long looooong ago, I wrote a deep review of the XHTML 2.0 spec that was one of the elements that led to the resuming of the HTML activity at the W3C and the final dismissal of XHTML 2.0. 

Long ago, I started a similar effort on EPUB that led to Dave Cramer's EPUB Zero. It's time (fr-FR) to draw some conclusions.

This document is maintained on GitHub and accepting contributions.

Daniel Glazman

OCF

  1. EPUB publications are not "just a zip". They are zips with special constraints. I question the "mimetype file in first uncompressed position" constraint since I think the vast majority of reading systems don't care (and can't care) because most of the people creating EPUB at least partially by hand don't know/care. The three last contraints (zip container fields) on the ZIP package described in section 4.2 of the spec are usually not implemented by Reading Systems.
  2. the container element of the META-INF/container.xml file has a version attribute that is always "1.0", whatever the EPUB version. That forces editors, filters and reading systems to dive into the default rendition to know the EPUB version and that's clearly one useless expensive step too much.
  3. having multiple renditions per publication reminds me of MIME multipart/alternative. When Borenstein and Freed added it to the draft of RFC 1341 some 25 years ago, Mail User Agents developers (yours truly counted) envisioned and experimented far more than alternatives between text/html and text/plain only. I am under the impression multiple renditions start from the same good will but fail to meet the goals for various reasons:
    1. each additional rendition drastically increases the size of the publication...
    2. most authoring systems, filters and converters on the market don't deal very well with multiple renditions
    3. EPUB 2 defined the default rendition as the first rendition with a application/oebps-package+xml  mimetype while the EPUB 3 family of specs defines it as the first rendition in the container
    4. while a MIME-compliant Mail User Agent will let you compose a message in text/html and output for you the multipart/alternative between that text/html and its text/plain serialization, each Publication rendition must be edited separately.
    5. in the case of multiple renditions, each rendition has its own metadata and it's then legitimate to think the publication needs its own metadata too. But the META-INF/metadata.xml file has always been quoted as "This version of the OCF specification does not define metadata for use in the metadata.xml file. Container-level metadata may be defined in future versions of this specification and in IDPF-defined EPUB extension specifications." by all EPUB specifications. The META-INF/metadata.xml should be dropped.
    6. encryption, rights and signatures are per-publication resources while they should be per-rendition resources.
  4. the full-path attribute on rootfile elements is the only path in a publication that is relative to the publication's directory. All other URIs (for instance in href attributes) are relative to the current document instance. I think full-path should be deprecated in favor of href here, and finally superseded by href for the next major version of EPUB.
  5. we don't need the mimetype attribute on rootfile elements since the prose of EPUB 3.1 says the target of a rootfile must be a Package Document, i.e. an OPF file... If EPUB 2 OCF could directly target for instance a PDF file, it's not the case any more for OCF 3.
  6. absolute URIs (for instance /files/xhtml/index.html with a leading slash, cf. path-absolute construct in RFC 3986) are harmful to EPUB+Web convergence
  7. if multiple renditions are dropped, the META-INF/container.xml file becomes useless and it can be dropped.
  8. the prose for the META-INF/manifest.xml makes me wonder why this still exists: "does not mandate a format", "MUST NOT be used". Don't even mention it! Just say that extra unspecified files inside META-INF directory must be ignored (Cf. OCF section 3.5.1) , and possibly reserve the metadata.xml file name, period. Oh, and a ZIP is also a manifest of files...
  9. I am not an expert of encryption, signatures, XML-ENC Core or XML DSIG Core, so I won't discuss encryption.xml and signatures.xml files
  10. the rights.xml file still has no specified format. Strange. Cf. item 8 above.
  11. Resource obfuscation is so weak and useless it's hilarious. Drop it. It is also painful in EPUB+Web convergence.
  12. RNG schemas are not enough in the OCF spec. Section 3.5.2.1 about container.xml file for instance shouls have models and attribute lists for each element, similarly to the Packages spec.
  13. not sure I have ever seen links and link elements in a container.xml file... (Cf. issue #374). The way these links are processed is unspecified anyway. Why are these elements normatively specified since extra elements are allowed - and explicitely ignored by spec - in the container?

Packages

  1. a Package consists of one rendition only.
  2. I have never understood the need for a manifest of resources inside the package, probably because my own publications don't use Media Overlays or media fallbacks.
  3. fallbacks are an inner mechanism also similar to multipart/alternative for renditions. I would drop it.
  4. I think the whole package should have properties identifying the required technologies for the rendition of the Package (e.g. script), and avoid these properties on a per-item basis that makes no real sense. The feature is either present in the UA for all content documents or for none.
  5. the spine element represent the default reading order of the package. Basically, it's a list. We have lists in html, don't we? Why do we need a painful and complex proprietary xml format here?
  6. the name of the linear attribute, that discriminates between primary and supplementary content, is extremely badly chosen. I always forget what really is linear because of that.
  7. Reading Systems are free, per spec, to completely ignore the linear attribute, making it pointless from an author's point of view.
  8. I have never seen the collection element used and I don't really understand why it contains link elements and not itemref elements
  9. metadata fun, as always with every EPUB spec. Implementing refines in 3.0 was a bit of a hell (despite warnings to the EPUB WG...), and it's gone from 3.1, replaced by new attributes. So no forwards compatbility, no backwards compatibility. Yet another parser and serializer for EPUB-compliant user agents.
  10. the old OPF guide element is now a html landmarks list, proving it's feasible to move OPF features to html
  11. the Navigation Document, an html document, is mandatory... So all the logics mentioned above could be there.
  12. Fixed Layout delenda est. Let's use CSS Fragmentation to make sure there's no orphaned content in the document post-pagination, and if CSS Fragmentation is not enough, make extension contributions to the CSS WG.
  13. without 3.0 refines, there is absolutely nothing any more in 3.1 preventing Package's metadata to be expressed in html; in 3.0, the refines attribute was a blocker, implying an extension of the model of the meta html element or another ugly IDREF mechanism in html.
  14. the prefix attribute on the package element is a good thing and should be preserved
  15. the rendition-flow property is weird, its values being paginated, scrolled-continuous, scrolled-doc and auto. Where is paginated-doc, the simplest paginated mode to implement?
  16. no more NCX, finally...
  17. the Navigation Document is already a html document with nav elements having a special epub:type/role (see issue #941), that's easy to make it contain an equivalent to the spine or more.

Content Documents

  1. let's get rid of the epub namespace, please...

Media Overlays

  1. SMIL support across rendering engines is in very bad shape and the SMIL polyfill does not totally help. Drop Media Overlays for the time being and let's focus on visual content.

Alternate Style Tags

  1. terrible spec... Needed because of Reading Systems' limitations but still absolutely terrible spec...
  2. If we get rid of backwards compatibility, we can drop it. Submit extensions to Media Queries if needed.

On the fly conclusions

  1. backwards compatibility is an enormous burden on the EPUB ecosystem
  2. build a new generation of EPUB that is not backwards-compatible
  3. the mimetype file is useless
  4. file extension of the publication MUST be well-defined
  5. one rendition only per publication and no more links/link elements
  6. in that case, we don't need the container.xml file any more
  7. metadata.xml and manifest.xml files removed
  8. we may still need encryption.xml, signatures.xml and rights.xml inside a META-INF directory (or directly in the package's root after all) to please the industry.
  9. application/oebps-package+xml mimetype is not necessary
  10. the EPUB spec evolution model is/was "we must deal with all cases in the world, we move very fast and we fix the mistakes afterwards". I respectfully suggest a drastic change for a next generation: "let's start from a low-level common ground only and expand slowly but cleanly"
  11. the OPF file is not needed any more. The root of the unique rendition in a package should be the html Navigation Document.
  12. the metadata of the package should be inside the body element of the Navigation Document
  13. the spine of the package can be a new nav element inside the body of the Navigation Document
  14. intermediary summary: let's get rid of both META-INF/container.xml and the OPF file... Let's have the Navigation Document mandatorily named index.xhtml so a directory browsing of the uncompressed publication through http will render the Navigation Document.
  15. let's drop the Alternate Style Tags spec for now. Submission of new Media Queries to CSS WG if needed.
  16. let's drop Media Overlays spec for now.

CONCLUSION

EPUB is a monster, made to address very diverse markets and ecosystems, too many markets and ecosystems. It's weak, complex, a bit messy, disconnected from the reality of the Web it's supposed to be built upon and some claim (link in fr-FR) it's too close to real books to be disruptive and innovative.

I am then suggesting to severe backwards compatibility ties and restart almost from scratch, and entirely and purely from W3C Standards. Here's the proposed result:


1. E0 Publication

A E0 Publication is a ZIP container. Files in the ZIP must be stored as is (no compression) or use the Deflate algorithm. File and directory names must be encoded in UTF-8 and follow some restrictions (see EPUB 3.1 filename restrictions).

The file name of a E0 Publication MUST use the e0 file extension.

A E0 Publication MUST contain a Navigation Document. It MAY contain files encryption.xml, signatures.xml and rights.xml (see OCF 3.1 for more information). All these files must be placed directly inside the root of the E0 Publication.

A E0 Publication can also contain Content Documents and their resources.

Inside a E0 Publication, all internal references (links, anchors, references to replaced elements, etc) MUST be strictly relative. With respect to section 4.2 of RFC 3986, only path-noscheme and path-empty are allowed for IRIs' relative-part. External references are not restricted.

2. E0 Navigation Document

A E0 Navigation Document is a html document. Its file name MUST be index.xhtml if the document is a XML document and index.html if it it is not a XML document. A E0 Publication cannot contain both index.html and index.xhtml files.

A E0 Navigation Document contains at least one header element (for metadata) and at least two nav html elements (for spine and table of contents) in its body element.

2.1. E0 Metadata in the Navigation Document

E0 metadata are designed to be editable in any Wysiwyg html editor, and potentially rendered as regular html content by any Web browser.

E0 metadata are expressed inside a mandatory header html element inside the Navigation Document. That element must carry the "metadata" ID and the vocab attribute with value "http://www.idpf.org/2007/opf". All metadata inside that header element are then expressed using html+RDFa Lite 1.1. E0 metadata reuse EPUB 3.1 metadata and corresponding unicity rules, expressed in a different way.

Refinements of metadata are expressed through nesting of elements.

Example:

<header id="metadata"
        vocab="http://www.idpf.org/2007/opf">
<h1>Reading Order</h1> <ul> <li>Author: <span property="dc:creator">glazou
(<span property="file-as">Glazman, Daniel</span>)</span></li> <li>Title: <span property="dc:title">E0 Publications</span></li> </ul> </header>

The mandatory title element of the Navigation Document, contained in its head element, should have the same text contents than the first "dc:title" metadata inside that header element.

2.2. E0 Spine

The spine of a E0 Publication is expressed in its Navigation Document as a new nav element holding the "spine" ID. The spine nav element is mandatory.

See EPUB 3.1 Navigation Document.

2.3. E0 Table of Contents

The Table of Contents of a E0 Publication is expressed in its Navigation Document as a nav element carrying the "toc" ID. The Table of Contents nav element is mandatory.

See EPUB 3.1 Navigation Document.

2.4. E0 Landmarks

The Landmarks of a E0 Publication is expressed in its Navigation Document as a nav element carrying the "landmarks" ID. The Landmarks nav element is optional.

See EPUB 3.1 Navigation Document.

2.5. Other nav elements

The Navigation Document may include one or more nav elements. These additional nav elements should have an role attribute to provide a machine-readable semantic, and must have a human-readable heading as their first child.

IDs "metadata", "spine" , 'landmarks" and "toc" are reserved in the Navigation Document and must not be used by these extra nav elements.

2.6. Example of a Navigation Document

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="content-type">
    <title>Moby-Dick</title>
  </head>
  <body>
    <header id="metadata"
            vocab="http://www.idpf.org/2007/opf">
      <ul>
        <li>Author:
          <span property="dc:creator">Herman Melville
(<span property="file-as">Melville Herman</span>)</span></li> <li>Title: <span property="dc:title">Moby-Dick</span></li> <li>Identifier: <span property="dc:identifier">glazou.e0.samples.moby-dick</span></li> <li>Language: <span property="dc:language">en-US</span></li> <li>Last modification: <span property="dcterms:modified">2017-01-17T11:16:41Z</span></li> <li>Publisher: <span property="dc:publisher">Harper & Brothers, Publishers</span></li> <li>Contributor: <span property="dc:contributor">Daniel Glazman</span></li> </ul> </header> <nav id="spine"> <h1>Default reading order</h1> <ul> <li><a href="cover.html">Cover</a></li> <li><a href="titlepage.html">Title</a></li> <li><a href="toc-short.html">Brief Table of Contents</a></li> ... </ul> </nav> <nav id="toc" role="doc-toc"> <h1>Table of Contents</h1> <ol> <li><a href="titlepage.html">Moby-Dick</a></li> <li><a href="preface_001.html">Original Transcriber’s Notes:</a></li> <li><a href="introduction_001.html">ETYMOLOGY.</a></li> ... </ol> </nav> </body> </html>

3. Directories

A E0 Publication may contain any number of directories and nested directories.

4. E0 Content Documents

E0 Content Documents are referenced from the Navigation Document. E0 Content Documents are html documents.

E0 Content Documents should contain <link rel="prev"...> and <link rel="next"...> elements in their head element conformant to the reading order of the spine present in the Navigation Document. Content Documents not present in that spine don't need such elements.

The epub:type attribute is superseded by the role attribute and must not be used.

5. E0 Resources

E0 Publications can contain any number of extra resources (CSS stylesheets, images, videos, etc.) referenced from either the Navigation Document or Content Documents.

Tuesday 27 December 2016

WAI-ARIA 1.1 and DPUB-ARIA 1.0 in BlueGriffon

I have just added support for WAI-ARIA 1.1 and DPUB-ARIA 1.0 to the trunk of BlueGriffon. The next public version (2.2) will include those changes. In the meantime, a OS X trial version is available from here. You'll find the changes implemented inside the new panel Panels > ARIA.

Wednesday 30 November 2016

Rest in peace, Opera...

I think we can now safely say Opera, the browser maker, is no more. My opinions about the acquisition of the browser by a chinese trust were recently confirmed and people are let go or fleeing en masse. Rest in Peace Opera, you brought good, very good things to the Web and we'll miss you.

In fact, I'd love to see two things appear:

  • Vivaldi is of course the new Opera, it was clear from day 1. Even the name was chosen for that. The transformation will be completed the day Vivaldi joins W3C and sends representatives to the Standardization tables.
  • Vivaldi and Brave should join forces, in my humble opinion.

Tuesday 20 September 2016

W3C

J'ai toujours dit que la standardisation au W3C, c'est de l'hémoglobine sur les murs dans une ambiance feutrée. Je ne changerai pas un iota à cette affirmation. Mais le W3C c'est aussi l'histoire d'une industrie dès ses premières heures et des amitiés franches construites dans l'explosion d'une nouvelle ère. J'ai passé ce soir, en marge du Technical Plenary Meeting du W3C à Lisbonne, un dîne inoubliable avec mes vieux potes Yves et Olivier que je connais et apprécie depuis ohlala tellement longtemps. Un moment délicieux, sympa et drôle autour d'un repas fabuleux dans une gargote lisboète de rêve. Des éclats de rire, des confidences, une super-soirée bref un vrai moment de bonheur. Merci à eux pour cette géniale soirée et à Olivier pour l'adresse, en tous points extra. Je re-signe quand vous voulez, les gars, et c'est un honneur de vous avoir comme potes :-)

Wednesday 30 March 2016

Something ridiculous in ES6

ES6 introduced a change in Gecko that remained under my radar for a looong time. Testing BlueGriffon 2.0 features, I discovered a feature that was completely borked and I was unable to explain it... Everything in my code looked fine. After some debugging, I nailed it into a codepen. Gecko and Chrome show the behaviour I dislike (window.bar is undefined if bar is a constant) while Safari does not.

I find this, mandatory per ES6 spec, completely ridiculous. It's not understandable from a JS author's perspective and apparently broke "a ton of stuff". This is clearly the kind of things where gurus and purists should have thought of users (JS authors) and did not.

Please TC39, change that. It's ugly and painful.

Wednesday 3 June 2015

In praise of Rick Boykin and its Bulldozer editor

Twenty years ago this year, Rick Boykin started a side project while working at NASA. That project, presented a few months later as a poster session at the 4th International Web Conference in Boston (look in section II. Infrastructure), was Bulldozer, one of the first Wysiwyg editors natively made for the Web. I still remember his Poster session at the conference as the most surprising and amazing short demo of the conference. His work on Bulldozer was a masterpiece and I sincerely regretted he stopped working on it, or so it seemed, when he left NASA the next year.

I thanked you twenty years ago, Rick, and let me thank you again today. Happy 20th birthday, Bulldozer. You paved the way and I remember you.

Friday 9 January 2015

No comment

HTTP headers
Screenshot by Robin Berjon following a message of mine

Monday 15 September 2014

Molly needs you, again!

There are bad mondays. This is a bad monday. And this is a bad monday because I just discovered two messages - among others - posted by our friend Molly Holzschlag (ANC is Absolute Neutrophil Count):

First message

Second message

If you care about our friend Molly and value all what she gave to Web Standards and CSS across all these years, please consider donating again to the fund some of her friends set up a while ago to support her health and daily life expenses. There are no little donations, there are only love messages. Send Molly a love message. Please.

Thank you.

Friday 20 June 2014

Leaving Samsung

This is my last day at Samsung. I am open to job opportunities. I'll continue co-chairing the CSS Working Group, but under my Disruptive Innovations' wings, starting immediately.

Thursday 20 March 2014

Samsung Web Tech Talk on JavaScript trends

SRA-SV LogoSamsung Research America will host a meetup about JavaScript trends in its San Jose R&D center on the 7th of april. This is a free event and anyone can attend.

Date: 07-apr-2014
Time: 5pm - 8pm
Location: 95 Plumeria Drive, San Jose, CA

Agenda:

  • 4:30pm - 5:00pm Welcome
  • 5:00pm - 5:30pm "Web Technologies on Mobile - Opportunities and Challenges", Andreas Gal, VP Mobile at Mozilla
  • 5:35pm - 6:00pm "Supersonic JavaScript", Ariya Hidayat, Shape Security
  • 6:20pm - 6:45pm "JavaScript in the Small", Satish Chandra, Samsung
  • 6:45pm - 8:00pm Open Discussion
  • 8:00pm - 9:00pm Networking

Feel free to attend using the link at the top of this article!

Thursday 6 February 2014

Next Game Frontier, The conference dedicated to Web Gaming

Last October, I was attending the famous Paris Web conference in Paris, France. In the main lobby of the venue, two Microsoftees (David Catuhe and David Rousset) were demo'ing a game based on their own framework Open Source babylon.js. Yes, Microsoftee and an Open Source JS framework over WebGL... I was looking at their booth, the people queuing to try the game and started explaining them there are conferences about Gaming, there are conferences about Web technologies in general and html5 in particular but there are no conference dedicated to Gaming based on Web technologies...

To my surprise, the two Davids reacted very positively to my proposal and we started immediately discussing a plan for such a conference.

Next Game Frontier LogoPeople, I am immensely happy to announce the First Edition of the Next Game Frontier conference, the conference dedicated to Web Gaming, co-organized this year by Microsoft and Samsung Electronics.

Web site: Next Game Frontier

Location: Microsoft France campus, Issy-les-moulineaux, France

Date: 13th of March 2014

Next Game Frontier on Lanyrd

Free registration but number of seats limited so register ASAP!

Schedule:

9:00 - 9:30 Breakfast

9:30 - 9:45 Opening Keynote (D. Glazman, D. Catuhe & D. Rousset)
9:45 - 10:45 Microsoft session - Create a 3D game with WebGL and Babylon.js (D. Catuhe & D. Rousset)

10:45 - 11:00 Break

11:00 - 12:00 Mozilla session - Le Web en tant que plateforme pour les jeux, de WebGL à AsmJS (T. Nitot)

12:00 - 13:15 Lunch

13:15 - 14:15 Create 3D assets for the mobile world & the Web, the point of view of a 3D designer (M. Rousseau)
14:15 - 15:15 Samsung session - Enhancing HTML5 gaming using WebCL (Samsung) & Turbulenz (Partner)

15:15 - 15:30 Break

15:30 - 16:30 Three.js (J. Etienne from http://learningthreejs.com)
16:30 - 17:30 Minko.io (Jean-Marc Le Roux from http://aerys.in)

17:30 - 18:30 Roundtable - Open discussions about Web Gaming - Microsoft, Mozilla, Samsung, Ubisoft moderated by a journalist

Save the date, and register now but please, don't register if you don't plan to come. Thanks!

Tuesday 24 September 2013

META Seal of Recognition

META Seal of RecognitionI am extremely happy and proud to let you know BlueGriffon received last thursday in Berlin, Germany, the « META Seal of Recognition » Award from the Multilingual Europe Technology Alliance for being the very first editor to implement the three main data categories of the W3C Internationalization Tag Set 2.0 (ITS 2.0) :-)

That implementation was done under a contract from DFKI and funding from the European Commission (project LT-Web), 7th Framework Programme (FP7), grant agreement n° 287815. The code is Open Source and will be available with forthcoming version 1.8 of BlueGriffon.

Here is the press release about it:

At the fourth annual META-FORUM conference in Berlin on September 19/20, it was announced that Disruptive Innovations was awarded the META Seal of Recognition for BlueGriffon. The META Seal of Recognition recognises excellence in software, products, and services which actively contribute to the European Multilingual Information Society. The META Technology Council, a panel of 30 experts drawn from the European Language Technology landscape, recognises the contribution BlueGriffon makes to the European Multilingual Information Society.

META, the Multilingual Europe Technology Alliance brings together researchers, commercial technology providers, private and corporate language technology users, language professionals and other information society stakeholders. META is preparing the necessary ambitious joint effort towards furthering language technologies as a means towards realising the vision of a Europe united as one single digital market and information space.

The META Seal of Recognition is awarded annually to select products and services which actively contribute to the initiative’s goals. This year is the third time the META Seal of Recognition has been awarded at a special ceremony as part of META-FORUM 2013 held in Berlin, Germany.

For more information see http://www.meta-net.eu/meta-seal

Wednesday 17 July 2013

Internationalization Tag Set (ITS) 2.0 in BlueGriffon

I have been contracted by german company DFKI under a European contract to implement a part of the Internationalization Tag Set (ITS) 2.0 specification into BlueGriffon and I now have a first runnable prototype. So there is a new floating panel in BlueGriffon:

Local ITS state Global ITS rules

Features:

  • The "Locally" tab shows the ITS state of the container element of the selection. The ITS state is computed from the local ITS attributes, the global ITS rules applying to the element and potentially the ITS state inherited from the ancestors of the element (the inheritance rules of ITS 2.0 are fully implemented). That tab of course allows to override that local state and apply local attributes.
  • Three data categories are implemented under the current contract: Translate, Localization Note and Terminology
  • The "Global" tab allows to create and manipulate global ITS rulesets attached to the document, either inline (through a <script type="application/its+xml"> element) or external (through a link element). The order of rulesets attached to the document can be modified. Parameters and ITS rules can be added to the rulesets or moved into the rulesets. During a creation of a ruleset, both XPath and CSS query languages are available. The rule creation/modification dialog has a magic button computing automatically an Xpath or CSS selector for the currently selected element. All global properties defined by the spec are editable with respect to the cardinality defined by the spec. For XPath, the code looks for an already defined HTML namespace in the ITS rules and adds one (that is reported to the user) if that namespace is not present.
  • Parameters are correctly expanded in XPath and CSS selectors during global rules' application.
  • All operations are undoable.
  • The code was architectured with extensibility in mind and it will be pretty easy to add new ITS 2.0 data categories in the future.

All the above will be available in forthcoming BlueGriffon 1.8 to all users for free, thanks to the European Commission!

- page 1 of 13