Trade in bad habits for great code

Mar 24, 2019  The generate-id function generates a unique id for the first node in a given node-set and returns a string containing that id. Syntax generate-id( node-set ) Arguments node-set (optional) An id will be generated for the first node in this node-set. If omitted, the current context node will be used.

Writing code to handle XML transformations in XSLT is much easier than in any other commonly used programming language. But the XSLT language has such a different syntax and processing model from classical programming languages that it takes time to grasp all of XSLT's subtle nuances.

This article is in no way meant as an extensive and complex XSLT tutorial. Instead, it starts with explanation of topics that pose the biggest difficulties for inexperienced XML and XSLT developers. Later, it moves to topics related to the overall design of stylesheets and their performance.

  1. XSLT Tutorial XSLT Introduction XSL Languages XSLT Transform XSLT XSLT XSLT XSLT XSLT XSLT XSLT Apply XSLT on the Client XSLT on the Server XSLT Edit XML XSLT Examples XQuery Tutorial XQuery Introduction XQuery Example XQuery FLWOR XQuery HTML XQuery Terms XQuery Syntax XQuery Add XQuery.
  2. Oct 22, 2008  Since I couldn't implemented using only the function generate-id, i tried using the function contains. I've noticed that different results are showing in preview in S.D. (pressing F12): The problem is the generate-id function, which produces totaly different results, thus, the actual test is not working.

Working with namespaces

Although it's increasingly rare to see XML documents without namespaces, there still seems to be some confusion related to their proper use in different technologies. Many documents use prefixes to denote elements in a namespace, and this explicit notation of namespaces doesn't typically lead to confusion. The example in Listing 1 shows a simple SOAP message that uses two namespaces—one for the SOAP envelope and one for the actual payload.

Listing 1. XML document with namespaces

As elements in the source document have prefixes, it's clear that they belong to a namespace. No one will have problems processing such a document in XSLT. It is sufficient to duplicate namespace declarations from the source document in the stylesheet. Although you can use arbitrary prefixes, it's usually more convenient to use the same prefixes as in typical input documents, as in Listing 2.

Listing 2. Stylesheet that accesses information in a namespaced document

As you can see, this code declares namespace prefixes env and p on the root element xsl:stylesheet. Such declarations are then inherited by all elements in the stylesheet so you can use them in any embedded XPath expression. Also note that in XPath expressions, you must prefix all elements with the appropriate namespace prefix. If you forget to mention a prefix in any step, your expression will return nothing—an error for which it's difficult to track the cause.

Documents that use namespaces are typically the cause of trouble when the use of namespaces is not apparent at first blush. If you have a lot of elements in one namespace, you can define this namespace as a default using the xmlns attribute. Elements from the default namespace do not use prefixes; therefore, it's easy to miss that they're actually in a namespace. Imagine that you have to transform the XHTML document in Listing 3.

Listing 3. XHTML document using a default namespace

It might be that you simply glanced over xmlns='http://www.w3.org/1999/xhtml', or it might be that this default namespace declaration is preceded by a dozen other attributes and you simply didn't see what was in column 167—even on your widescreen display. It is quite natural to write XPath expressions like /html/head/title, but such expressions return an empty node set, because the input document contains no elements like title. All elements in the input document belong to the http://www.w3.org/1999/xhtml namespace, and this must be reflected in the XPath expressions.

To access namespaced elements in XPath, you must define a prefix for their namespace. For example, if you want to access a title in the sample XHTML document, you have to define a prefix for the XHTML namespace, then use this prefix in all XPath steps, as the example stylesheet in Listing 4 shows.

Listing 4. The transformation must use namespace prefixes even for input documents that use a default namespace

Again, you have to be very careful about prefixes in XPath expressions. One missing prefix, and you'll get the wrong result.

Unfortunately, XSLT version 1.0 has no concept similar to a default namespace; therefore, you must repeat namespace prefixes again and again. This problem was rectified in XSLT version 2.0, where you can specify a default namespace that applies to un-prefixed elements in an XPath expression. In XSLT 2.0, you can simplify the previous stylesheet as in Listing 5.

Listing 5. Declaration of a XPath default namespace in XSLT 2.0

Improper use of node test text()

Most stylesheets contain dozens of simple templates that are responsible for processing leaf elements in input documents. For example, you store a price inside an element:

and you want to output it as a new paragraph in HTML with the currency and a label added:

In many stylesheets I have seen, templates that handle this functionality can fail miserably. The reason is the use of the text() node test inside the template body, which in 99 percent of cases leads to broken code. What's wrong with the following template?

The XPath expression inside the xsl:value-of instruction is shorthand for the expression child::text(). This expression selects all text nodes between the children of the <price> element. Typically, there's only one such node, and everything works as expected. But imagine that you put a comment or processing instruction in the middle of the <price> element:

The expression now returns two text nodes: 12 and 4.95. But the semantics of xsl:value-of is such that it returns only the first node of the node set. In this case, you'll get the wrong output:

Because xsl:value-of expects a single node, you must use it with an expressions that returns a single node. In many situations, a reference to the current node (.) is the right approach. The correct form of the example template above, then, is:

The current node (.) now returns the whole <price> element. The xsl:value-of instruction automatically returns the string value of a node that is a concatenation of all text node descendants. Such an approach guarantees that you will always get the whole content of an element regardless of included comments, processing instructions, or sub-elements.

In XSLT 2.0, the semantics of the xsl:value-of instruction is changed, and it returns a string value of all passed nodes—not just of the first one. But it's still better to reference the element for which content should be returned to its text nodes. This way, code won't break when new sub-elements are added to provide more granular markup.

Don't lose the context node

Each template (xsl:template) or iteration (xsl:for-each) is instantiated with a current node. All relative XPath expressions are evaluated starting from this current node. If you start an XPath expression with /, the expression won't be evaluated against the current node; instead, the evaluation will start at the document root node. The result of such expressions will always be the same, and it won't be related to the current node.

Imagine that you want to process the simple invoice in Listing 6.

Listing 6. Sample invoice

If you forgot to write expressions relative to the current node, you can easily end up with the wrong stylesheet, as in Listing 7.

Listing 7. Example of a bad stylesheet that loses context

The expression /invoice/item in xsl:for-each correctly selects all items in the invoice. But expressions inside xsl:for-each are wrong, as they start with /, which means that they're absolute. Such expressions always return a description, the quantity, and price of the first item (remember from the previous section that xsl:value-of returns only the first node from a node set), because an absolute expression does not depend on the current node, which corresponds to the currently processed item.

To easily fix this problem, use a relative expression inside xsl:for-each, as in Listing 8.

Listing 8. Use of relative XPath expressions inside the iteration body

Avoid broken links in non-Microsoft browsers

XSLT is good at automating common tasks. One such boring and laborious task is preparing a table of contents. With XSLT, you can generate such a table automatically. You simply generate anchors, then links pointing back to them. In HTML, you create an anchor simply by putting a unique identifier inside the id attribute:

When you construct a link back to this anchor, add label after the fragment identifier (#) to indicate that this is a link to a particular place inside the document:

A real stylesheet typically produces labels and links by using the generate-id() function or a real identifier provided in the input document.

The problem with this linking task is actually not in XSLT itself but in some 'too clever' Web browsers. I've seen many stylesheets in which a fragment identifier (#) was added to the anchor by mistake. The output of the stylesheet was then tested only in Windows® Internet Explorer®. Unfortunately, Internet Explorer can recover from many errors in HTML code, so there's no problem with links from the user perspective. But if you try the same page in such browsers as Mozilla Firefox or Opera, the links are broken, because these browsers can't recover from the excessive #.

To avoid other similar problems, the best you can do is test your stylesheet-generated output in multiple browsers.

Simplify stylesheets by changing the context node

If you process business documents or approach for mixed content handling. This approach is well suited to documents with regular structure, but mixed content typically varies in its internal structure and is difficult to handle correctly this way. So, whenever you see mixed content, try to forgot about simple xsl:value-of and xsl:for-each and move your interest to templates.

Ineffectiveness in your stylesheets

If you write small transformations operating on rather small datasets—for example, a view layer in a Web application—you're probably not very concerned about performance of transformation itself, as this process is typically fractional to the rest of processing. But when an XSLT stylesheet performs complex operations or works on a large input document, it's time to start thinking about the performance impact of constructs used in the stylesheet.

In general, it's difficult to make any judgments solely from XSLT code, as it depends on the particular XSLT implementation—whether it can handle some code well and possibly speed it up by using some sort of optimization.

Regardless, some things are good to skip in real stylesheets. If you want to save the planet, use the descendant axis (//) very carefully. When you use //, the XSLT processor has to inspect the whole tree (or subtree) in its full depth. In larger documents, this can be a very expensive operation. It is wise to write more specific expressions that explicitly specify where to look for nodes. For example, to get a buyer's address, it's better to write /Invoice/BuyerParty/Party/Address instead of //BuyerParty//Address or even //Address. The first variant is much faster, because only a fraction of the nodes have to be inspected during evaluation. Such targeted expressions are also less likely to be affected by the document structure evolution, where new elements with the same name but a different meaning can be added into different contexts in the input document.

Another trick when you do a lot of lookups, define a lookup key using xsl:key, then use the key() function to perform the lookup.

You can make plenty of other optimizations, but their impact depends on the XSLT processor you use.

XSLT 1.0 or 2.0?

Xsl Xslt

Which XSLT version you use depends on several factors, but generally, I recommend using XSLT 2.0. The latest version of the language contains many new instructions and functions that can greatly simplify many tasks—shorter and straightforward code is always easier to maintain. Moreover, in XSLT 2.0, you can write schema-aware stylesheets, which use a schema to validate both input and output documents. Schema-aware stylesheets can use information contained in a schema to automatically detect some types of errors and mistakes in your stylesheets.

Conclusion

This article covered some areas that tend to be more challenging in XSLT. I hope that now you have better understanding of some XSLT features and that you will be able to write better XSLT stylesheets.

Downloadable resources

Xslt

Generate-id In Xslt

Related topics