Wysiwyg editing is hard #1
By glazou on Friday 6 August 2010, 12:30 - Nvu - Permalink
You probably never asked yourself about it, but how complicated is the deletion of the selection in a markup-based Wysiwyg editor? In one single word: complicated.... I will try below to make you understand what a Wysiwyg markup-based editor does when you hit the delete key.
Let's take a simple example :
<h1>aa[aa</h1>
<ul>
<li>bbbb
<ul>
<li>cc]cc</li>
<li>dddd</li>
</ul>
</li>
<li>eeee</li>
</ul>
The selection is everything between the square brackets (included). Now the user hits the delete key. The natural algo for that is the following one (I'm not saying it's the only one; see DOMRange.extractContents to see another one):
- find all the "visible" nodes in the various ranges (here we have only one range) of the selection. A "visible" node is a text node or an empty element.
- split the start and end nodes of each range of the selection if necessary, for instance if these nodes are text nodes and the selection splits the nodes in two nodes of non-zero length.
- for each visible node in each range of the selection, remove the node and remove recursively all the parents if they are "removable". A "removable" parent is an empty element or, depending on its tag name, an element containing only text nodes made of whitespaces (/s in regexps); some elements are explicitely not "removable", body for instance.
This method is enough for the test case above and here's the expected result:
<h1>aa</h1>
<ul>
<li>
<ul>
<li>cc</li>
<li>dddd</li>
</ul>
</li>
<li>eeee</li>
</ul>
But now imagine we have some markup that requires merging after the deletion of the selection. Here's one:
<ul>
<li>aaaa</li>
<li>bb[bb</li>
</ul>
<ul>
<li>cc]cc</li>
<li>dddd</li>
</ul>
What the user really expects here when the selection is deleted is the merging of the two ULs into one single list. We need then to improve a bit our algo above...
- find all the "visible" nodes in the various ranges (here we have only one range) of the selection. A "visible" node is a text node or an empty element.
- split the start and end nodes of each range of the selection if necessary, for instance if these nodes are text nodes and the selection splits the nodes in two nodes of non-zero length.
- traverse all ranges in our selection and preserve a list of all nodes traversed with the traversal direction (start, up, down, next)
| node |
direction |
|---|---|
| "aa" |
start |
| h1 |
up |
| ul |
next |
| li |
down |
| "bbbb" |
down |
| ul |
next |
| li |
down |
| "cc" |
down |
For our examples of two adjacent lists, the list of traversed nodes is:
| node |
direction |
|---|---|
| "bb" |
start |
| li |
up |
| ul |
up |
| ul |
next |
| li |
down |
| "cc" |
down |
But that's not all... There can be non-significant text nodes in the DOM between elements, like a carriage return between the two ULs. So to make sure we correctly merge during our deletion, the third step of our algo should be this one:
- for each node being an element and having a direction "next" in the list of traversed nodes above, find the previous "visible" sibling, discarding text nodes containing only white spaces if the element is a block. If the previous visible sibling is also an element and of same type than the reference node, and if these elements are "mergeable", then append all the children of the reference node to the previous visible node's children and delete the reference node. Two "mergeable" elements are for instance two paragraphs, or two h1 elements, or two dl elements; two inline text elements like span or strong are also mergable if
The rest of the algo is similar to what we did for the h1/ul example above:
- for each visible node in each range of the selection, remove the node and remove recursively all the parents if they are "removable". A "removable" parent is an empty element or, depending on its tag name, an element containing only text nodes made of whitespaces (/s in regexps); some elements are explicitelynot "removable", body for instance.
<ul>
<li>aaaa</li>
<li>bb</li>
<li>cc</li>
<li>dddd</li>
</ul>
Next time, we'll dive into some high-level functions like indentations or list creation.

Comments
Very interesting, I know for sure that I am totally ignorant in the domain of rules or conventions of the wysiwyg editing, and maybe I missed all the points of your article.
But when I think on those examples with my user point of view, those results are not what I expect.
If I only select part of a title, or part of a list item for a deletion. Its because I really want to merge the remaining texts parts, not only the lists.
In the second example the result I expect, is more like this :
<ul>
<li>aaaa</li>
<li>bbcc</li>
<li>dddd</li>
</ul>
And in the first example, it looks like a little bit choking, but it should be this :
<h1>aacc</h1>
<ul>
<li>dddd</li>
<li>eeee</li>
</ul>
Or maybe this :
<h1>aacc</h1>
<ul>
<ul>
<li>dddd</li>
</ul>
<li>eeee</li>
</ul>
I suppose with your results, if I want to obtain my results I only need to use the backspace key 3 times for the first example and 2 times for the second.
But still in my user pov, I don't understand why?
If I selected for deletion the "dots" of the list items its to make them disappear.
If I wanted your results, in the first example I will first select the "aa", delete it and after that select the "cc" and delete it too. Two selection to make sure I don't break the lists structure.
And for the second example, I will use the same selection to obtain my result, and then place the caret between "bb" and "cc" and hit the enter key.
Or, in a perfect world, the UI let me make my selection from the middle of the "bbbb" item, to the beginning of the "cccc" item (just before the dot of the item), and understand I want to merge the two lists.
<ul>
<li>aaaa</li>
<li>bb[bb</li>
</ul>
<ul>
]<li>cccc</li>
<li>dddd</li>
</ul>
Then after I'll delete the first "cc" independently.
Well, again, maybe I have totally missed the subject or its implications, and I'm sorry about that (and by the way sorry for my very limited English, but since the article is in English, I tried).
Are you sure that the expected behaviour in your first "simple" case is that the selected part of the 'h1' and 'li' tags should be deleted, but the nodes themselves should stay?
That wouldn't be my own expectation - I'd expect that after deleted everything from the 'aa' to the 'cc', these two would be joined; something like:
<h1>aacc</h1>
<ul>
<ul>
<li>dddd</li>
</ul>
</li>
<li>eeee</li>
</ul>
In part, the reasoning being that the bullet point is part of the selection I'm deleting, so the remaining "cc" is no longer part of the list. I'm also deleting anything separating it from the heading, so it should end up as part of the 'h1' line.
Actually, I've misread the structure of the first example, not noticing the nested list. I think that in this case, since I'm deleting the "bbbb" element containing it, the remainder of that list should get raised into the top level, leaving:
<h1>aacc</h1>
<ul>
<li>dddd</li>
<li>eeee</li>
</ul>
...as Chocolat suggested.