warning, work in progress, last update Thu Aug 17 15:35:19 CEST 2006

screenshotThis is about implementing in or around Gecko a new text editor for source code with syntax highlight and code completion, the highlight styles being of course controlled by CSS stylesheets, and implying only minimal work to specify a grammar and the corresponding tokenizer.

A new editor

Of course, the text editor in Gecko is not enough and we would need to write a new editor object above nsPlainTextEditor.

  • all text lines live inside a p element, so they can be numbered and have bindings
  • BR elements are automatically turned into a break between two paragraphs
  • when the CR key is pressed, the p is splitted in two or a new p is created before or after ; syntax highlight is refreshed starting from the last child of the first p element of the two
  • whenever text is inserted/deleted and syntax highlight is enabled, the syntax highlight refresh starts with the element preceding the inserted or deleted text

Grammars for Syntax Highlight

The grammar of the language to highlight is described in an xml file like this one.

  • the root element must be grammar
  • the grammar elements contains zero or more stylesheet elements followed by one or more context elements, having a name attribute.
  • a stylesheet element has one mandatory attribute href holding a URL for a CSS stylesheet to be applied to any document matched against the current grammar.
  • a context element consists of an optional error element, an optional ignore element and one or more token elements
  • the error element takes the following attributes
    • mandatory skipuntil attribute holding the name of an existing context element in the current grammar or the empty string
    • mandatory expecting attribute holding the name of an existing context element in the current grammar or the empty string
    • an optional pairs attribute of value yes or no
  • the ignore element takes no attributes and contains zero or more type element
  • a type element holds a single mandatory attribute name
  • a token element can hold three mandatory attributes and two optional ones
    • mandatory type attribute defining a token's name (see below about the Tokenizer)
    • mandatory role attribute aimed to be used as the value of the class attribute of the span element surrounding the token in the editor
    • mandatory expecting attribute aimed to be used as the value of the expecting attribute of the span element surrounding the token in the editor, and specifying the context that should be expected after the current token. If a given token opens on multiple contexts, multiple token elements with the same type attributes can be used to represent it. If the value of this attribute is the empty string, then the expected context is the first of all contexts defined in the grammar document, in traversal order.
    • optional string attribute. If present, it means that the string value of the token must be the value of this attribute.
    • optional values attribute. If present, it specifies a name of a table of values used for code completion (see below Code Completion)
    • optional boolean multiline attribute, default value being "false", indicating line breaks are admitted anywhere in the token

Syntax highlight itself

The editor has the following methods:

  • initSyntaxHighlight(in nsIEditor editor, in Document grammar, in Tokenizer tokenizer, in CodeCompletion cc) where editor is what you think it is, grammar is a dom document containing a grammar as defined above, tokenizer is a tokenizer for the given grammar implementing the following methods, and cc is an object (possibly null to indicate there is no code completion) dedicated to code completion.
    • Tokenizer::init(in string Text) initializes the tokenizer. Other methods like nextToken() and nextURLToken() will tokenize the string passed to init().
    • Tokenizer::nextToken() and Tokenizer::nextContinuedToken()
      • argument: a boolean indicating if whitespace has to be ignored or not
      • returns: the type of the next consumed token
      • returns: the string containing the text value of the token
      • returns: a boolean indicating if an end delimitor was found for the returned token
      • returns: an errorcode, 0 if no error
    • Tokenizer::nextURLToken().
      • argument: a boolean indicating if whitespace has to be ignored or not
      • returns: the string containing the text value of the token assuming the token is a URL
      • returns: a boolean indicating if an end delimitor was found for the returned token
      • returns: an errorcode, 0 if no error
  • validate(in Node startElement, in long nbelements) where startElement is a dom element in the edited document where the validation starts, and nbelements the number of elements to validate without error to stop validation ; a value of -1 indicates that the whole document must be validate or re-validated. If startElement is null, then the validation will start with the first child of the first p element in the document to validate ; in that case, the element is always matched against the first of all contexts defined in the grammar document.

Code completion

A CodeCompletion should implement the following methods:

  • getValuesListForToken(in string token, in string valuesType, in Element elt) taking as argument the string representation of the token requiring code completion (can be the empty string), the string representing the type of code completion and corresponding to a values attribute for that token in the grammar, and the element surrouding that token in the editor; it returns an array of strings.