<Glazblog/>

Why html5 elements INS and DEL suck

I have said it multiple times here, in W3C mailing-lists or in public between 1998 and now but apparently it must be said again and again: the current HTML5 Last Call Working Draft - that does not reach at all the quality of other LCWD in the W3C and did not meet the basic requirements for a LCWD in the W3C Process - still has not worked on that erratum. So let me repeat it : html5 ins and del elements suck and should be dropped in favor of a better solution.

  • ins and del are, by definition, both inline-level and block-level elements. If in a Wysiwyg editor, you select the textual contents of a paragraph, turn on a "Visible Modification Marks" feature and hit the Delete or Backspace key, the editor has the option between <del><p>....</p></del> and <p><del>...</del></p>. The user has no way to make a difference between the two but the two are NOT strictly equivalent. In the latter case, it is still theoritically possible to place the caret in the paragraph but BEFORE or AFTER the del element and insert new chars. In the former case, the whole paragraph is deleted and the user can't insert anything inside any more.
  • In the latter case just above, it's impossible for the user to know if a caret placed at the beginning of the paragraph is before the paragraph, inside the paragraph but before the del element, or at the beginning of the del element.
  • much more importantly, ins and del cannot cover one trivial case : since there is no equivalent to SGML inclusions (see for instance this link for a rather clean explanation) in XML, the following is impossible: <ul><del><li>a</li></del><li>b</li></ul>. It is for instance totally impossible to mark an element as entirely deleted if the parent container's model does not allow the del element...

The situation is unfortunately very clear: the ins and del elements as they exist now in the various html specs are unable to provide editing environments with a workable and predictable solution for Visible Modification Marks, the primary reason why the elements were originally introduced in HTML 4. As a matter of fact, almost no Wysiwyg editor implements them.

For the n-th time in 13 years, I strongly recommend to drop the ins and del elements in favor of the following attributes. All elements inside the body element should be able to carry them.

  • change attribute ; possible values: inserted, deleted optionnally followed by a whitespace and one of the keywords reviewed or to-be-reviewed.
  • review-by attribute ; an arbitrary value meaningful only when the change attribute contains the to-be-reviewed value and meant to be displayed for human consumption ; can be for instance a name, a mail, a twitter id, etc.
  • reviewed-by attribute ; an arbitrary value meaningful only when the change attribute contains the reviewed value and meant to be displayed for human consumption ; can be for instance a name, a mail, a twitter id, etc.
  • the cite and datetime attributes as currently defined in the html5 spec

This is the minimum attributes set needed to resolve the issue. Another attribute "tagging" the potential reviews of the proposed change could also be added.

I really hope this change is going to happen. Again, the current ins and del html elements are totally hopeless.

Comments

1. On Saturday 25 June 2011, 13:52 by Josh T.

You're absolutely right that ins and del need to be sorted out, although I've never really thought about it. I would suggests that some of these attributes have a prefix like the ARIA attributes do. For example: "mod-change" or "mod-reviewer".

2. On Saturday 25 June 2011, 14:41 by Daniel Glazman

@Josh T.: I have no opinion here ; works for me as soon as we focus on attributes and not both-block-and-inline elements.

3. On Saturday 25 June 2011, 15:01 by karl

A lot more complicated, I guess, would be a system that would has no notion of nested markup. Because inserting and erasing a content has nothing to do with its markup structure.

We can imagine something like this. [fndel] here to not induce any specific markup. I don't know what it should be.

<p class="1stpar">balabla [fndel] abablabla</p>
<p class="2ndpar"> glglgl ajhk [/fndel] djhfjkhk dsjhdh jh</p>

This would be useful. I remember a XTech talk about something like this. Maybe by Jeni Tennison. Not sure anymore.

4. On Saturday 25 June 2011, 15:08 by karl

Retrouvé l'article original qui avait servi à la présentation
http://academic.research.microsoft....

5. On Monday 27 June 2011, 00:19 by zcorpan

The a element has the same content model. Same issues, I guess?

Maybe you should create a microdata format with your suggested features and use those instead of ins and del.

6. On Monday 27 June 2011, 16:49 by François Yergeau

Deux remarques:

- (Avec mon chapeau de minimaliste bien vissé sur la tête) Il me semble qu'un seul attribut « reviewer » ferait l'affaire des deux « review-by » et « reviewed-by », qui sont mutuellement exclusifs. Le jeu proposé n'est donc pas minimal, ou alors j'ai manqué quelque chose ? (Retire chapeau, fait chaud !)

- Je signale au passage que l'éditeur XML XMetal utilise des instructions de traitement pour implémenter les marques de changements ; c'est donc parallèle à la structure du balisage comme le suggère Karl. Mais je signale aussi que dans un système de publication qui traite du contenu venant de XMetal, nous transformons ces IT en attributs (en introduisant au besoin des <span> synthétiques), pour consommation ultérieure par XSLT.

7. On Tuesday 28 June 2011, 14:16 by yt75

HTML is a bit in between two domains (or layers if you want) in a way, one that would be latex or all the markup languages used in todays CMS (markmin and like), and a kind of postscript or tex equivalent.
Not necessarily a major issue, but meanwhile "IT" in general (people practising it) is still totally shying away from its huge need of identifiers to get written, thinking it would be different from other domains (or that it could work like words), which of course isn't the case. Why not a simple abstract syntax only format with numerical IDs (GS1 bar codes kind distributed) used as operators identifiers for instance ?
Discussion around this :
http://groups.google.com/group/comp...