Using HTMLDocument

U

Constructing documents using HTMLDocument

The HTMLDocument module allows the construction of HTML documents by building up the document tree. By maintaining a separation between the document text and the markup language, several easy mistakes that would make the document invalid (for example, forgetting to place closing quotes on an attribute value, or placing closing tags for element in the wrong order) can be avoided, and the majority of cross-site scripting attacks against web applications will be blocked.

The new function creates a new HTML document, and takes two parameters. The first is the DTD declaration for the document, for which HTML4Strict is generally the most suitable unless you will be processing Kaya’s output further using XML tools. The second parameter is the document title, which should provide a very short summary and ideally be unique to that document.

The following code creates a new HTML document and retrieves the body element into which the visible contents of the page are placed.

doc = HTMLDocument::new(HTML4Strict,"Kaya HTML example");
body = doc.body;

The document is then built up by defining child elements for the document body, each of which may have its own child elements and perhaps some text.

Constructing a basic document structure

The majority of the functions for adding block-level elements (paragraphs, headings, lists, etc.) to a HTML document take the location that the new element will be added as the first parameter, and return the new element itself (so that you can add further elements to it, or set a CSS class, for example). If you are not going to use the element further, then the void coercion provides an easy way to discard it. Elements are added as the last child element of the target element.

void(addHeading(body,1,"Kaya HTML example"));
paragraph = addParagraph(body,"This is an example of a short paragraph.");
list = addList(body,Unordered,0);
void(pushListItem(list,"First bullet point"));
item2 = pushListItem(list,"Second bullet point");
item3 = pushListItem(list,"Third bullet point");
setClass(item3,"finalitem");
addString(paragraph," It doesn't have much content yet.");

This produces the HTML code below. Other than demonstrating that it could be done, there was no need to use addString when constructing the paragraph – in this case all of the text could have been added in the addParagraph function call with the same results. However, it demonstrates that it is not necessary to build up the document in the order it is eventually output.

<body>
 <h1>Kaya HTML example</h1>
 <p>This is an example of a short paragraph. It doesn't have much content yet.</p>
 <ul>
  <li>First bullet point</li>
  <li>Second bullet point</li>
  <li class='finalitem'>Third bullet point</li>
 </ul>
</body>

Lists can be nested in HTML, but remember that the sub-list is a child of the list item, not of the list itself. This time, we give the list some initial contents – see the addList documentation for more details.

sublist = addList(item2,Ordered,3,
                  ["Sub-item 1",
                   "Sub-item 2",
                   "Sub-item 3"]);
subitem3 = getListItem(sublist,2);
setClass(subitem3,"finalitem");

getListItem can then be used to retrieve the items for editing later. The list in the final HTML now looks like this:

 <ul>
  <li>First bullet point</li>
  <li>Second bullet point
   <ul>
    <li>Sub-item 1</li>
    <li>Sub-item 2</li>
    <li class='finalitem'>Sub-item 3</li>
   </ul>
  </li>
  <li class='finalitem'>Third bullet point</li>
 </ul>

To retrieve the contents of a block-level element other than a list, use the getChildren function, which will return a list of all sub-elements of a block-level element (ignoring any text that is not part of a sub-element).

Adding extra text and inline elements to the block

In the same way that the block structure of the document is built up by appending blocks inside other blocks, the text content of a block can be built up in the same way. The use of addString to add extra text was demonstrated in the previous section. Inline elements such as emphasis or computer code may also be added.

There are a large number of inline elements available in HTML, and their use in Kaya is expressed by the InlineElement data type. Most of the constructors of this data type have no parameters, but a few such as Hyperlink(String uri) have a required parameter. In general, the easiest way to add these elements to a document is with the appendInlineElement function.

p = addParagraph(body,"This section explains how ");
void(appendInlineElement(p,ComputerCode,"appendInlineElement"));
addString(p," is used in ");
void(appendInlineElement(p,Hyperlink("http://kayalang.org/"),"Kaya"));
addString(p," to build up sentences.");

This function works in the same way as the functions used for building up block level elements – the first parameter is the location to be added to, and it returns the new element that is added. The code above generates this HTML:

<p>This section explains how <code>appendInlineElement</code>
is used in <a href="http://kayalang.org/">Kaya</a> to build 
up sentences.</p>

Images are added in a similar way. The addImage uses an ImageData structure to define an image. This provides a source URL, alternative text for when image display is unavailable, and the dimensions of the image (which may be unspecified if they are not known).

img = ImageData("http://kayalang.org/images/logos/poweredby1.png",
                "Powered by Kaya",
                Specified(80,15));
void(addImage(footer,img));

On some occasions, it may be helpful to add an inline element to an existing element, after that element has already had contents added. For example, you may want to emphasise all occurences of a certain word. The addInlineElementAt function does this. All characters between the given starting and ending positions inclusive will be ‘highlighted’ with the new inline element. The element must of course start and end in the same element, or an Exception is thrown.

p = addParagraph(parent,"abcdefghijklmnopqrstuvwxyz");
void(addInlineElementAt(p,Emphasis,5,10));
void(addInlineElementAt(p,StrongEmphasis,6,8));
// <p>abcde<em>f<strong>ghi</strong>jk</em>lmnopqrstuvwxyz</p>

Adding existing blocks to a document

It may be useful, especially for creating templates, to have a function that generates a block that can then be added anywhere in a document. The key to generating these blocks is to provide them with a temporary location while they are being constructed. The anonymousBlock function is the easiest way to create this temporary location.

ElementTree requiredFieldMessage() {
    temp = anonymousBlock;
    message = appendInlineElement(temp,StrongEmphasis,
                                  "(This field is required)");
    return message;
    // return the new element, not the temporary block
}

This element can then be inserted into an existing document tree using appendExisting (to add to the end of an element) or prependExisting (to add to the start).

appendExisting(field5description,requiredFieldMessage());

On some occasions, the function may take a long time or a lot of memory to complete. Some efficiency gains can be made in this case by calling the function lazily using appendGenerator instead of appendExisting. The code above would be changed to:

appendGenerator(field5description,@requiredFieldMessage);

For more information on this, see the functions tutorial.

Adding tables of data to a document

Data tables are constructed in a slightly different way to most parts of the document. The easiest way to construct a table is to use initialiseTable which constructs the table based on the InitialTableData structure passed to it.

header = [["Version","Date"]];
footer = [];
tablebody = [[
              ["0.2.1","2006-11-15"],
              ["0.2.2","2006-11-26"],
              ["0.2.3","2006-12-04"],
              ["0.2.4","2007-01-30"],
              ["0.2.5","2007-04-20"]
            ]];
itd = InitialTableData(header,footer,body);
table = initialiseTable(body,itd);

This generates the following table:

<table>
 <thead>
  <tr><th>Version</th><th>Date</th></tr>
 </thead>
 <tbody>
  <tr><td>0.2.1</td><td>2006-11-15</td></tr>
  <tr><td>0.2.2</td><td>2006-11-26</td></tr>
  <tr><td>0.2.3</td><td>2006-12-04</td></tr>
  <tr><td>0.2.4</td><td>2007-01-30</td></tr>
  <tr><td>0.2.5</td><td>2007-04-20</td></tr>
 </tbody>
</table>

The InitialTableData format allows complex data tables, including multiple body sections to be quickly constructed. In a real application, it is likely that the contents of the body arrays would be constructed using a database query or read from an external data file.

Table cells can later be retrieved for further editing and adding markup using the getTableCell function.

// first, get the part of the table the cell is in
tbody = getTableBodySections(table)[0];
cell = getTableCell(tbody,4,0);
addString(cell," ");
void(appendInlineElement(cell,Emphasis,"(latest version)"));

The last row of the table is now

  <tr><td>0.2.5 <em>(latest version)</em></td>
      <td>2007-04-20</td></tr>

Reading HTML from a string

Normally, adding HTML code into a string simply results in the code being escaped and printed as part of the element. This is a useful protection against cross-site scripting attacks, but may make things harder for legitimate users (and indeed application authors!) who know HTML. The readFromString function will convert a String into a set of HTML elements, and add them to the document at the specified point. If this function is being used on data provided by users who are only partially trusted, then various standard ‘whitelists’ or a customised one may be used to restrict the HTML elements and attributes they are allowed to use – see the WhiteList documentation for more details.

readFromString(body,getContent(currentPage()),
               AllElements(Safe),HTML4Strict);

This code uses the HTML provided by the getContent and currentPage functions (part of the application, not the Kaya standard library) and adds it to the body of the document. The whitelist allows both block and inline elements, though only a restricted set of tags and attributes are allowed, and the incoming code is expected to be HTML 4.

The document type of the incoming code (or code fragment) does not have to match that of the output document – it simply provides a hint to the parser about what the code might do. If your template or content code is generated by an XML tool, or by an author obeying XHTML rules, you should use the XHTML1Strict document type for reading (even though you are probably using HTML4Strict for output), as this is considerably faster due to the fewer degrees of freedom allowed to the input. Conversely, if you will be accepting input of unknown quality, then using TagSoup will be slower but will ensure that the document parsing always succeeds (the other document types will throw an Exception rather than attempting error recovery if the input is ambiguous).

For advanced templating, it may be useful to call user-defined Kaya functions as part of processing the input string, for which the readFromTemplate function is provided. More details on this function and the template syntax are in the templating tutorial.

Recent Comments

No comments to show.

Pages