Search the Catalog
XSLT

XSLT

By Doug Tidwell
August 2001
0-596-00053-7, Order Number: 0537
473 pages, $39.95

Chapter 5
Creating Links and Cross-References

Contents:

Generating Links with the id() Function
Generating Links with the key() Function
Generating Links in Unstructured Documents
Summary

If you're creating a web site, publishing a book, or creating an XML transaction, chances are many pieces of information will refer to other things. This chapter discusses a several ways to link XML elements. It reviews three techniques:

Generating Links with the id() Function

Our first attempt at linking will be with the XPath id() function.

The ID, IDREF, and IDREFs Datatypes

Three of the basic datatypes supported by XML Document Type Definitions (DTDs) are ID, IDREF, and IDREFS. Here's a simple DTD that illustrates these datatypes:

<!--glossary.dtd-->
<!--The containing tag for the entire glossary-->
<!ELEMENT glossary  (glentry+) >

<!--A glossary entry-->
<!ELEMENT glentry  (term,defn+) >

<!--The word being defined-->
<!ELEMENT term  (#PCDATA) >
<!--The id is used for cross-referencing, and the 
    xreftext is the text used by cross-references.-->
<!ATTLIST term
               id  ID    #REQUIRED 
               xreftext  CDATA    #IMPLIED  >

<!--The definition of the term-->
<!ELEMENT defn  (#PCDATA | xref | seealso)* >

<!--A cross-reference to another term-->
<!ELEMENT xref   EMPTY  >

<!--refid is the ID of the referenced term-->
<!ATTLIST xref
               refid  IDREF    #REQUIRED >

<!--seealso refers to one or more other definitions-->
<!ELEMENT seealso EMPTY>
<!ATTLIST seealso
                  refids   IDREFS  #REQUIRED >

In this DTD, each <term> element is required to have an id attribute, and each <xref> element must have an refid attribute. The ID and IDREF datatypes work according to two rules:

To round out our example, the <seealso> element contains an attribute of type IDREFS. This datatype contains one or more values, each of which must match a value of an ID elsewhere in the document. Multiple values, if present, are separated by whitespace.

There are some complications of ID and related datatypes, but we'll discuss them later. For now, we'll focus on how the id() function works.

An XML Document in Need of Links

To illustrate the value of linking, we'll use a small glossary written in XML. The glossary contains some <glentry> elements, each of which contains a single <term> and one or more <defn> elements. In addition, a definition is allowed to contain a cross-reference (<xref>) to another <term>. Here's a short sample document:

<?xml version="1.0" ?>
<!DOCTYPE glossary SYSTEM "glossary.dtd">
<glossary>
  <glentry>
    <term id="applet">applet</term>
    <defn>
      An application program,
      written in the Java programming language, that can be 
      retrieved from a web server and executed by a web browser. 
      A reference to an applet appears in the markup for a web 
      page, in the same way that a reference to a graphics
      file appears; a browser retrieves an applet in the same 
      way that it retrieves a graphics file. 
      For security reasons, an applet's access rights are limited
      in two ways: the applet cannot access the file system of the 
      client upon which it is executing, and the applet's 
      communication across the network is limited to the server 
      from which it was downloaded. 
      Contrast with <xref refid="servlet"/>.
      <seealso refids="wildcard-char DMZlong pattern-matching"/>
    </defn>
  </glentry>

  <glentry>
    <term id="DMZlong" xreftext="demilitarized zone">demilitarized 
      zone (DMZ)</term>
    <defn>
      In network security, a network that is isolated from, and 
      serves as a neutral zone between, a trusted network (for example, 
      a private intranet) and an untrusted network (for example, the
      Internet). One or more secure gateways usually control access 
      to the DMZ from the trusted or the untrusted network.
    </defn>
  </glentry>

  <glentry>
    <term id="DMZ">DMZ</term>
    <defn>
      See <xref refid="DMZlong"/>.
    </defn>
  </glentry>

  <glentry>
    <term id="pattern-matching">pattern-matching character</term>
    <defn>
      A special character such as an asterisk (*) or a question mark 
      (?) that can be used to represent zero or more characters. 
      Any character or set of characters can replace a pattern-matching 
      character.
    </defn>
  </glentry>

  <glentry>
    <term id="servlet">servlet</term>
    <defn>
      An application program, written in the Java programming language, 
      that is executed on a web server. A reference to a servlet 
      appears in the markup for a web page, in the same way that a 
      reference to a graphics file appears. The web server executes
      the servlet and sends the results of the execution (if there are
      any) to the web browser. Contrast with <xref refid="applet" />.
    </defn>
  </glentry>

  <glentry>
    <term id="wildcard-char">wildcard character</term>
    <defn>
      See <xref refid="pattern-matching"/>.
    </defn>
  </glentry>
</glossary>

In this XML listing, each <term> element has an id attribute that identifies it uniquely. Many <xref> elements also refer to other terms in the listing. Notice that each time we refer to another term, we don't use the actual text of the referenced term. When we write our stylesheet, we'll use the XPath id function to retrieve the text of the referenced term; if the name of a term changes (as buzzwords go in and out of fashion, some marketing genius might want to rename the "pattern-matching character," for example), we can rerun our stylesheet and be confident that all references to the new term contain the correct text.

Finally, some <term> elements have an xreftext element because some of the actual terms are longer than we'd like to use in a cross-reference. When we have an <xref> to the term ASCII (American Standard Code for Information Interchange), it would get pretty tedious if the entire text of the term appeared throughout our document. For this term, we'll use the xreftext attribute's value, ensuring that the cross-reference contains the less-intimidating text ASCII.

A Stylesheet That Uses the id() Function

Let's look at our desired output. What we want is an HTML document, such as that shown in Figure 5-1, that displays the various definitions in an easy-to-read format, with the cross-references formatted as hyperlinks.

In the HTML document, we'll need to address several things in our stylesheet:

Figure 1-1

Figure 1-1. HTML document with generated cross-references

Here's the template that takes care of our first task, generating the HTML <title> and the <h1>:

<xsl:template match="glossary">
  <html>
    <head>
      <title>
        <xsl:text>Glossary Listing: </xsl:text>
        <xsl:value-of select="glentry[1]/term"/>
        <xsl:text> - </xsl:text>
        <xsl:value-of select="glentry[last()]/term"/>
      </title>
    </head>
    <body>
      <h1>
        <xsl:text>Glossary Listing: </xsl:text>
        <xsl:value-of select="glentry[1]/term"/>
        <xsl:text> - </xsl:text>
        <xsl:value-of select="glentry[last()]/term"/>
      </h2>
      <xsl:apply-templates select="glentry"/>
    </body>
  </html>
</xsl:template>

We generate the <title> and <h1> using the XPath expressions glentry[1]/term for the first <term> in the document, and using glentry[last()]/term for the last term.

Our next step is to process all the <glentry> elements. We'll generate an HTML paragraph for each one, and then we'll generate a named anchor point, using the id attribute as the name of the anchor. Here's the template:

<xsl:template match="glentry">
  <p>
    <b>
      <a name="{@id}"/>
      <xsl:value-of select="term"/>
      <xsl:text>: </xsl:text>
    </b>
    <xsl:apply-templates select="defn"/>
  </p>
</xsl:template>

In this template, we're using an attribute value template to generate the name attribute of the HTML <a> element. The XPath expression @id retrieves the id attribute of the <glentry> element we're currently processing. We use this attribute to generate a named anchor. We then write the term itself in bold and apply the template for the <defn> element. In our output document, each glossary entry contains a paragraph with the highlighted term and its definition.

The name attribute of this HTML <a> element is generated with an attribute value template.

Our next step is to process the cross-reference. Here's the template for the <xref> element:

<xsl:template match="xref">
  <a href="#{@refid}">
    <xsl:choose>
      <xsl:when test="id(@refid)/@xreftext">
        <xsl:value-of select="id(@refid)/@xreftext"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="id(@refid)"/>
      </xsl:otherwise>
    </xsl:choose>
  </a>
</xsl:template>

We create the <a> element in two steps:

For the first step, we know that the href attribute must contain a hash mark (#) followed by the name of the anchor point. Because we generated all the named anchors from the id attributes of the various <glentry> elements, we know the name of the anchor point is the same as the id.

Now all that's left is for us to retrieve the text. This retrieval is the most complicated part of the process (relatively speaking, anyway). Remember that we want to use the xreftext attribute of the <term> element, if there is one, and use the text of the <term> element, otherwise. To implement an if-then-else statement, we use the <xsl:choose> element. In the previous sample, we used a test expression of id(@refid)/@xreftext to see if the xreftext attribute exists. (Remember, an empty node-set is considered false. If the attribute doesn't exist, the node-set will be empty and the <xsl:otherwise> element will be evaluated.) If the test is true, we use id(@refid)/@xreftext to retrieve the cross-reference text. The first part of the XPath expression (id(@refid)) returns the node that has an ID that matches the value @refid; the second part (@xreftext) retrieves the xreftext attribute of that node. We insert the text of the xreftext attribute inside the <a> element.

Finally, we handle any <seealso> elements. The difference here is that the refids attribute can reference any number of glossary terms, so we'll use the id() function differently. Here's the template for <seealso>:

<xsl:template match="seealso">
  <b>
    <xsl:text>See also: </xsl:text>
  </b>
  <xsl:for-each select="id(@refids)">
    <a href="#{@id}">
      <xsl:choose>
        <xsl:when test="@xreftext">
          <xsl:value-of select="@xreftext"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="."/>
        </xsl:otherwise>
      </xsl:choose>
    </a>
    <xsl:if test="not(position()=last())">
      <xsl:text>, </xsl:text>
    </xsl:if>
  </xsl:for-each>
  <xsl:text>. </xsl:text>
</xsl:template>

There are a couple of important differences here. First, we call the id() function in an <xsl:for-each> element. Calling the id() function with an attribute of type IDREFS returns a node-set; each node in the node-set is the match for one of the IDs in the attribute.

The second difference is that referencing the correctly named anchor is more difficult. When we processed the <xref> element, we knew that the correct anchor name was the value of the refid attribute. When processing <seealso>, the refids attribute doesn't do us any good because it may contain any number of IDs. All is not lost, however. What we did previously was use the id attribute of each node returned by the id() function -- a minor inconvenience, but another difference in processing an attribute of type IDREFS instead of IDREF.

The final difference is that we want to add commas after all items except the last. The <xsl:if> element shown previously does just this. If the position() of the current item is the last, we don't output the comma and space (defined here with the <xsl:text> element). We formatted all references here as a sentence; as an exercise, feel free to process the items in a more sophisticated way. For example, you could generate an HTML list from the IDREFS, or maybe format things differently if the refids attribute only contains a single ID.

We've done several useful things with the id() function. We've been able to use attributes of type ID to discover the links between related pieces of information, and we've converted the XML into HTML links, renderable in an ordinary household browser. If this is the only kind of linking and referencing you need to do, that's great. Unfortunately, there are times when we need to do more, and on those occasions, the id() function doesn't quite cut it. We'll mention the limitations of the id() function briefly, then we'll discuss XSLT functions that let us overcome them.

Limitations of IDs

To this point, we've been able to generate cross-references easily. There are some limitations of the ID datatype and the id() function, though:

To get around all of these limitations, XSLT defines the key() function. We'll discuss that function in the next section.

Generating Links with the key() Function

Now that we've covered the id() function in great detail, we'll move on to XSLT's key() function. Each key() function effectively creates an index of the document. You can then use that index to find all elements that have a particular property. Conceptually, key() works like a database index. If you have a database of (U.S. postal) addresses, you might want to index that database by the people's last names, by the states in which they live, by their Zip Codes, etc. Each index takes a certain amount of time to build, but it saves processing time later. If you want to find all the people who live in the state of Idaho, you can use the index to find all those people directly; you don't have to search the entire database.

We'll discuss the details of how the key() function works, then we'll compare it to the id() function.

Defining a key()

You define a key() function with the <xsl:key> element:

<xsl:key name="language-index" match="defn" use="@language"/>

The key has three elements:

A Slightly More Complicated XML Document in Need of Links

To illustrate the full power of the key() function, we'll modify our original glossary slightly. Here's an excerpt:

<glentry>
  <term id="DMZlong" xreftext="demilitarized zone">demilitarized 
    zone (DMZ)</term>
  <defn topic="security" language="en">
    In network security, a network that is isolated from, and 
    serves as a neutral zone between, a trusted network (for example, 
    a private intranet) and an untrusted network (for example, the
    Internet). One or more secure gateways usually control access 
    to the DMZ from the trusted or the untrusted network.
  </defn>
  <defn topic="security" language="it">
    [Pretend this is an Italian definition of DMZ.]
  </defn>
  <defn topic="security" language="es">
    [Pretend this is a Spanish definition of DMZ.]
  </defn> 
  <defn topic="security" language="jp">
    [Pretend this is a Japanese definition of DMZ.]
  </defn>
  <defn topic="security" language="de">
    [Pretend this is a German definition of DMZ.]
  </defn> 
</glentry>

<glentry>
  <term id="DMZ" acronym="yes">DMZ</term>
  <defn topic="security" language="en">
    See <xref refid="DMZlong"/>.
  </defn>
</glentry>

In our modified document, we've added two new attributes to <defn>: topic and language. We also added the acronym attribute to the <term> element. We've modified our DTD to add these attributes and enumerate their valid values:

<!--The word being defined-->
<!ELEMENT term  (#PCDATA) >
<!--The id is used for cross-referencing, and the 
    xreftext is the text used by cross-references.-->
<!ATTLIST term
               id        ID       #REQUIRED 
               xreftext  CDATA    #IMPLIED  
               acronym   (yes|no) "no">

<!--The definition of the term-->
<!ELEMENT defn  (#PCDATA | xref | seealso)* >

<!--The topic defines the subject of the definition, the
    language code defines the language of this definition,
    and the acronym is yes or no (default is no).-->
<!ATTLIST defn
                topic    (Java|general|security) "general"
                language (en|de|es|it|jp)        "en">

The topic attribute defines the computing topic to which this definition applies, and the language attribute defines the language in which this definition is written. The acronym attribute defines whether or not this term is an acronym.

Now that we've created a more flexible XML document, we can use the key() function to do several useful things:

Thinking back to our earlier discussion, these are all things we can't do with the id() function. If the language, topic, and acronym attributes were defined to be of type ID, only one definition could be written in English, only one definition could apply to the security topic, and only one term could be an acronym. Clearly, that's an unacceptable limitation on our document.

Stylesheets That Use the key() Function

We've mentioned some useful things we can do with the key() function, so now we'll build some stylesheets that use it. Our first stylesheet will list all definitions written in a particular language. We'll go through the various parts of the stylesheet, explaining all the things we had to add to make everything work. The first thing we'll do, of course, is define the key() function:

<xsl:key name="language-index" match="defn" use="@language"/>

Notice that the match attribute we used was the simple element name defn. This tells the XSLT processor to match all <defn> elements at all levels of the document. Because of the structure of our document, we could have written match="/glossary/glentry/defn", as well. Although this XPath expression is more restrictive, it matches the same elements because all <defn> elements must appear inside <glentry> elements, which in turn appear inside the <glossary> element.

Next, we set up our stylesheet to determine what value of the language attribute we're searching for. We'll do this with a global <xsl:param> element:

<xsl:param name="targetLanguage"/>

Recall from our earlier discussion of the <xsl:param> element that any top-level <xsl:param> is a global parameter to the stylesheet and may be set or initialized from outside the stylesheet. The way to do this varies from one XSLT processor to another. Here's how it's done with Xalan. (The command should be on one line.)

java org.apache.xalan.xslt.Process -in moreterms.xml -xsl crossref2.xsl 
-param targetLanguage it

If you use Michael Kay's Saxon processor, the syntax looks like this:

java com.icl.saxon.StyleSheet moreterms.xml crossref2.xsl targetLanguage=it

Now that we've defined our key() function and defined a parameter to specify which language we're looking for, we need to generate our output. Here's the modified template that generates the HTML <title> and <h1> tags:

<xsl:template match="glossary">
  <html>
    <head>
      <title>
        <xsl:text>Glossary Listing: </xsl:text>
        <xsl:value-of select="key('language-index', 
          $targetLanguage)[1]/preceding-sibling::term"/>
        <xsl:text> - </xsl:text>
        <xsl:value-of select="key('language-index', 
          $targetLanguage)[last()]/preceding-sibling::term"/>
      </title>
    </head>
    <body>
      <h1>
        <xsl:text>Glossary Listing: </xsl:text>
        <xsl:value-of select="key('language-index', 
          $targetLanguage)[1]/ancestor::glentry/term"/>
        <xsl:text> - </xsl:text>
        <xsl:value-of select="key('language-index', 
          $targetLanguage)[last()]/ancestor::glentry/term"/>
      </h2>
      <xsl:for-each select="key('language-index', $targetLanguage)">
        <xsl:apply-templates select="ancestor::glentry"/>
      </xsl:for-each>
    </body>
  </html>
</xsl:template>

There are a couple of significant changes here. When we were using the id() function, it was easy to find the first and last terms in the document. Because we're now trying to list only the definitions that are written in a particular language, that won't work. Reading the XPath expressions in the <xsl:value-of> elements from left to right, we find the first and last <defn> elements returned by the key() function, then use the preceding-sibling axis to reference the <term> element that preceded it. We could also have written our XPath expressions using the ancestor axis:

<h1>
  <xsl:text>Glossary Listing: </xsl:text>
  <xsl:value-of select="key('language-index', 
    $targetLanguage)[1]/ancestor::glentry/term"/>
  <xsl:text> - </xsl:text>
  <xsl:value-of select="key('language-index', 
    $targetLanguage)[last()]/ancestor::glentry/term"/>
</h2>

Now that we've successfully generated the HTML <title> and <h1> elements, we need to process the actual definitions for the chosen language. To do this, we'll use the targetLanguage parameter. Here's how the rest of the template looks:

<xsl:for-each select="key('language-index', $targetLanguage)">
  <xsl:apply-templates select="ancestor::glentry"/>
</xsl:for-each>

In this code, we've selected all the values from the language-index key that match the targetLanguage parameter. For each one, we use the ancestor axis to select the <glentry> element. We've already written the templates that process these elements correctly, so we can just reuse them.

The final change we make is to select only those <defn> elements whose language attributes match the targetLanguage parameter. We do this with a simple XPath expression:

<xsl:apply-templates select="defn[@language=$targetLanguage]"/>

Here's the complete stylesheet:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes"/>
<xsl:strip-space elements="*"/>

  <xsl:key name="language-index" match="defn" use="@language"/>

  <xsl:param name="targetLanguage"/>

  <xsl:template match="/">
    <xsl:apply-templates select="glossary"/>
  </xsl:template>

  <xsl:template match="glossary">
    <html>
      <head>
        <title>
          <xsl:text>Glossary Listing: </xsl:text>
          <xsl:value-of select="key('language-index', 
            $targetLanguage)[1]/preceding-sibling::term"/>
          <xsl:text> - </xsl:text>
          <xsl:value-of select="key('language-index', 
            $targetLanguage)[last()]/preceding-sibling::term"/>
        </title>
      </head>
      <body>
        <h1>
          <xsl:text>Glossary Listing: </xsl:text>
          <xsl:value-of select="key('language-index', 
            $targetLanguage)[1]/ancestor::glentry/term"/>
          <xsl:text> - </xsl:text>
          <xsl:value-of select="key('language-index', 
            $targetLanguage)[last()]/ancestor::glentry/term"/>
        </h2>
        <xsl:for-each select="key('language-index', $targetLanguage)">
          <xsl:apply-templates select="ancestor::glentry"/>
        </xsl:for-each>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="glentry">
    <p>
      <b>
        <a name="{term/@id}"/>
        <xsl:value-of select="term"/>
        <xsl:text>: </xsl:text>
      </b>
      <xsl:apply-templates select="defn[@language=$targetLanguage]"/>
    </p>
  </xsl:template>

  <xsl:template match="defn">
    <xsl:apply-templates 
     select="*|comment()|processing-instruction()|text()"/>
  </xsl:template>

  <xsl:template match="xref">
    <a href="#{@refid}">
      <xsl:choose>
        <xsl:when test="id(@refid)/@xreftext">
          <xsl:value-of select="id(@refid)/@xreftext"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="id(@refid)"/>
        </xsl:otherwise>
      </xsl:choose>
    </a>
  </xsl:template>

  <xsl:template match="seealso">
    <b>
      <xsl:text>See also: </xsl:text>
    </b>
    <xsl:for-each select="id(@refids)">
      <a href="#{@id}">
        <xsl:choose>
          <xsl:when test="@xreftext">
            <xsl:value-of select="@xreftext"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="."/>
          </xsl:otherwise>
        </xsl:choose>
      </a>
      <xsl:if test="not(position()=last())">
        <xsl:text>, </xsl:text>
      </xsl:if>
    </xsl:for-each>
    <xsl:text>.  </xsl:text>
  </xsl:template>

</xsl:stylesheet>

Given our sample document and a targetLanguage of en, we get these results:

<html>
  <head>
    <title>Glossary Listing: applet - wildcard character</title>
  </head>
  <body>
    <h1>Glossary Listing: applet - wildcard character</h2>
    <p>
      <b><a name="applet"></a>applet: </b>
      An application program,
      written in the Java programming language, that can be 
      retrieved from a web server and executed by a web browser. 
      A reference to an applet appears in the markup for a web 
      page, in the same way that a reference to a graphics
      file appears; a browser retrieves an applet in the same 
      way that it retrieves a graphics file. 
      For security reasons, an applet's access rights are limited
      in two ways: the applet cannot access the file system of the 
      client upon which it is executing, and the applet's 
      communication across the network is limited to the server 
      from which it was downloaded. 
      Contrast with <a href="#servlet">servlet</a>.
      ...

Changing the targetLanguage to it, the results are now different:

<html>
  <head>
    <title>Glossary Listing: applet - servlet</title>
  </head>
  <body>
    <h1>Glossary Listing: applet - servlet</h2>
    <p>
      <b><a name="applet"></a>applet: </b>
      [Pretend this is an Italian definition of applet.]
    </p>
    <p>
      <b><a name="DMZlong"></a>demilitarized 
      zone (DMZ): </b>
      [Pretend this is an Italian definition of DMZ.]
    </p>
    <p>
      <b><a name="servlet"></a>servlet: </b>
      [Pretend this is an Italian definition of servlet.]
    </p>
  </body>
</html>

With this stylesheet, we have a way to create a useful subset of our glossary. Notice that we're still using our original technique of ID, IDREF, and IDREFS to process the <xref> and <seealso> elements. If you want, you could redefine the processing to use the key() function instead. Here's how you'd define a key() function to mimic our earlier use of ID and IDREF:

<xsl:template match="xref">
  <a href="#{@refid}">
    <xsl:choose>
      <xsl:when test="key('term-ids', @refid)[1]/@xreftext">
        <xsl:value-of select="key('term-ids', @refid)[1]/@xreftext"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="key('term-ids', @refid)[1]"/>
      </xsl:otherwise>
    </xsl:choose>
  </a>
</xsl:template>

As an exercise for the reader, you can modify this stylesheet so that it lists only definitions that apply to a particular topic, or only terms that are acronyms.

The key() function and the IDREFS datatype

For all its flexibility, the key() function doesn't support anything like the IDREFS datatype. We can try to use the key() function the same way we used id():

<xsl:template match="seealso">
  <b>
    <xsl:text>See also: </xsl:text>
  </b>
  <xsl:for-each select="key('term-ids', @refids)">
    <a>
  ...

But the <xsl:for-each> doesn't have anything to work with. That's because the key value we're looking for is "wildcard-char DMZlong pattern-matching". When we were dealing with the id() function, this string was broken into three tokens because anything with a datatype of ID can't contain a space. With the key() function, we can search on anything, including the contents of an element. (See "Generating Links in Unstructured Documents" for an example of this.) For this reason, our call to the key() function asking for all the <term> elements with an id attribute equal to "wildcard-char DMZlong pattern-matching" returns nothing. Any attribute with a datatype of ID can't contain spaces, so we get no results.

There are several ways to deal with this problem; we'll go through our choices next.

Solution #1: Replace the IDREFS datatype

If you consider this a problem and refuse to use the id() function, there are several approaches you can take. The most drastic (but probably the simplest to implement) is to not use the IDREFS datatype at all. You could change the <seealso> element so that it contains a list of references to other elements:

<seealso>
  <item refid="wildcard-character"/>
  <item refid="DMZlong"/>
  <item refid="pattern-matching"/>
</seealso>

This approach has the advantage that we can use the value of all the refid attributes of all <item> elements with the key() function. That means we can search on anything, not just values of attributes. The disadvantage, of course, is that we had to change the structure of our XML document to make this approach work. If you have control of the structure of your XML document, that's possible; it's entirely likely, of course, that you can't change the XML document at all. A variation on this approach would be to use a stylesheet to transform the IDREFS datatype into the previous structure.

Solution #2: Use the XPath contains() function

A second approach is to leave the structure of the XML document unchanged, then use the XPath contains() function to find all <term> elements whose id attributes are contained in the value of the refids attribute of the <seealso> element. Here's how that would work:

<xsl:template match="seealso">
  <b>
    <xsl:text>See also: </xsl:text>
  </b>
  <xsl:variable name="id_list" select="@refids"/>
  <xsl:for-each select="//term">
    <xsl:if test="contains($id_list, @id)">
      <a href="#{@id}">
        <xsl:choose>
          <xsl:when test="@xreftext">
            <xsl:value-of select="@xreftext"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="."/>
          </xsl:otherwise>
        </xsl:choose>
      </a>
      <xsl:if test="not(position()=last())">
        <xsl:text>, </xsl:text>
      </xsl:if>
    </xsl:if>
  </xsl:for-each>
  <xsl:text>.  </xsl:text>
</xsl:template>

We've done a couple of things here: First, we've saved the value of the refids attribute of the <seealso> element in the variable id_list. That's because we can't access it within the <for-each> element. We can find a given <seealso> element from within a given <term> element, but it's too difficult to find that element generically from every <term> element. The simplest way to find the element is to save the value in a variable.

Second, we look at all of the <term> elements in the document. For each one, if our variable (containing the refids attribute of the <seealso> element) contains the value of the current <term> element's id attribute, then we process that <term> element.

Here are the results our stylesheet generates:

<html>
  <head>
    <title>Glossary Listing: applet - wildcard character</title>
  </head>
  <body>
    <h1>Glossary Listing: applet - wildcard character</h2>
    <p>
      <b><a name="applet"></a>applet: </b>
      An application program,
      written in the Java programming language, that can be
      retrieved from a web server and executed by a web browser.
      A reference to an applet appears in the markup for a web
      page, in the same way that a reference to a graphics
      file appears; a browser retrieves an applet in the same
      way that it retrieves a graphics file.
      For security reasons, an applet's access rights are limited
      in two ways: the applet cannot access the file system of the
      client upon which it is executing, and the applet's
      communication across the network is limited to the server
      from which it was downloaded.
      Contrast with <a href="#servlet">servlet</a>.
      <b>See also: </b><a 
      href="#DMZlong">demilitarized zone</a>, <a href="#DMZ">
      DMZ</a>, <a href="#pattern-matching">pattern-matching 
      character</a>, <a href="#wildcard-char">wildcard 
      character</a>. 
    </p>
      ...

There are a couple of problems here. The most mundane is that in our stylesheet, we don't know how many <term> elements have id attributes contained in our variable. That means it's difficult to insert commas correctly between the matching <term>s. In the output here, we were lucky that the last match was in fact the last term, so the results here are correct. For any <seealso> element whose refid attribute doesn't contain the id attribute of the last <term> element in the document, this stylesheet won't work.

The more serious problem is that one of the matches is, in fact, wrong. If you look closely at the output, we get a match for the term DMZ, even though there isn't an exact match for its id in our variable. That's because the XPath contains() function says (correctly) that the value DMZlong contains the ids DMZlong and DMZ.

So our second attempt at solving this problem doesn't require us to change the structure of the XML document, but in this case, we have to change some of our IDs so that the problem we just mentioned doesn't occur. That's probably going to be a maintenance nightmare and a serious drawback to this approach.

Solution #3: Use recursion to process the IDREFS datatype

Here we use a recursive template to tokenize the refids attribute into individual IDs, then process each one individually. This style of programming takes a while to get used to, but it can be fairly simple. Here's the crux of our stylesheet:

<xsl:template match="seealso">
  <b>
    <xsl:text>See also: </xsl:text>
  </b>
  <xsl:call-template name="resolveIDREFS">
    <xsl:with-param name="stringToTokenize" select="@refids"/>
  </xsl:call-template>
</xsl:template>

<xsl:template name="resolveIDREFS">
  <xsl:param name="stringToTokenize"/>
  <xsl:variable name="normalizedString">
    <xsl:value-of 
      select="concat(normalize-space($stringToTokenize), ' ')"/>
  </xsl:variable>
  <xsl:choose>
    <xsl:when test="$normalizedString!=' '">
      <xsl:variable name="firstOfString" 
        select="substring-before($normalizedString, ' ')"/>
      <xsl:variable name="restOfString" 
        select="substring-after($normalizedString, ' ')"/>
      <a href="#{$firstOfString}">
        <xsl:choose>
          <xsl:when 
            test="key('term-ids', $firstOfString)[1]/@xreftext">
            <xsl:value-of 
              select="key('term-ids', $firstOfString)[1]/@xreftext"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of 
              select="key('term-ids', $firstOfString)[1]"/>
          </xsl:otherwise>
        </xsl:choose>
      </a>
      <xsl:if test="$restOfString!=''">
        <xsl:text>, </xsl:text>
      </xsl:if>
      <xsl:call-template name="resolveIDREFS">
        <xsl:with-param name="stringToTokenize" 
          select="$restOfString"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:text>.</xsl:text>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

The first thing we did was invoke the named template resolveIDREFS in the template for the <seealso> element. While invoking the template, we pass in the value of the refids attribute and let recursion work its magic.

The resolveIDREFS template works like this:

One technique in particular is worth mentioning here: the way we handled whitespace in the attribute value. We pass the string we want to tokenize as a parameter to the template, but we need to normalize the whitespace. We use two XPath functions to do this: normalize-space() and concat(). The call looks like this:

<xsl:template name="resolveIDREFS">
  <xsl:param name="stringToTokenize"/>
  <xsl:variable name="normalizedString">
    <xsl:value-of 
      select="concat(normalize-space($stringToTokenize), ' ')"/>
  </xsl:variable>

The normalize-space() function removes all leading and trailing whitespace from a string and replaces internal whitespace characters with a single space. Remember that whitespace inside an attribute isn't significant; our <seealso> element could be written like this:

  <seealso refids="  wildcard-char 





DMZlong 
pattern-matching       "/>

When we pass this attribute to normalizeSpace(), the returned value is wildcard-char DMZlong pattern-matching. All whitespace at the start and end of the value has been removed and all the whitespace between characters has been replaced with a single space.

Because we're using the substring-before() and substring-after() functions to find the first token and the rest of the string, it's important that there be at least one space in the string. (It's possible, of course, that an IDREFS attribute contains only one ID.) We use the concat() function to add a space to the end of the string. When the string contains only that space, we know we're done.

Although this approach is more tedious, it does everything we need it to do. We don't have to change our XML document, and we correctly resolve all the IDs in the IDREFS datatype.

Solution #4: Use an extension function

The final approach is to write an extension function that tokenizes the refids attribute and returns a node-set containing all id values we need to search for. Xalan ships with an extension that does just that. We invoke the extension function on the value of the refids attribute, then use a <xsl:for-each> element to process all items in the node-set. We'll cover extension functions in , but for now, here's what the stylesheet looks like:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:java="http://xml.apache.org/xslt/java"
  exclude-result-prefixes="java">

<xsl:output method="html" indent="yes"/>
<xsl:strip-space elements="*"/>

  <xsl:key name="term-ids" match="term" use="@id"/>

  <xsl:template match="/">
    <xsl:apply-templates select="glossary"/>
  </xsl:template>

  <xsl:template match="glossary">
    <html>
      <head>
        <title>
          <xsl:text>Glossary Listing: </xsl:text>
          <xsl:value-of select="glentry[1]/term"/>
          <xsl:text> - </xsl:text>
          <xsl:value-of select="glentry[last()]/term"/>
        </title>
      </head>
      <body>
        <h1>
          <xsl:text>Glossary Listing: </xsl:text>
          <xsl:value-of select="glentry[1]/term"/>
          <xsl:text> - </xsl:text>
          <xsl:value-of select="glentry[last()]/term"/>
        </h2>
        <xsl:apply-templates select="glentry"/>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="glentry">
    <p>
      <b>
        <a name="{term/@id}"/>
        <xsl:value-of select="term"/>
        <xsl:text>: </xsl:text>
      </b>
      <xsl:apply-templates select="defn"/>
    </p>
  </xsl:template>

  <xsl:template match="defn">
    <xsl:apply-templates 
     select="*|comment()|processing-instruction()|text()"/>
  </xsl:template>

  <xsl:template match="xref">
    <a href="#{@refid}">
      <xsl:choose>
        <xsl:when test="key('term-ids', @refid)[1]/@xreftext">
          <xsl:value-of select="key('term-ids', @refid)[1]/@xreftext"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="key('term-ids', @refid)[1]"/>
        </xsl:otherwise>
      </xsl:choose>
    </a>
  </xsl:template>

  <xsl:template match="seealso">
    <b>
      <xsl:text>See also: </xsl:text>
    </b>
    <xsl:for-each 
      select="java:org.apache.xalan.lib.Extensions.tokenize(@refids)">
      <a href="{key('term-ids', .)/@id}">
        <xsl:choose>
          <xsl:when test="key('term-ids', .)/@xreftext">
            <xsl:value-of select="key('term-ids', .)/@xreftext"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="key('term-ids', .)"/>
          </xsl:otherwise>
        </xsl:choose>
      </a>
      <xsl:if test="not(position()=last())">
        <xsl:text>, </xsl:text>
      </xsl:if>
    </xsl:for-each>
    <xsl:text>.</xsl:text>
  </xsl:template>

</xsl:stylesheet>

In this case, the tokenize function (defined in the Java class org.apache.xalan.lib.Extensions) takes a string as input, then converts the string into a node-set in which each token in the original string becomes a node.

Be aware that using extension functions limits the portability of your stylesheets. The extension function here does what we want, but we couldn't use this extension function with Saxon, XT, or the XSLT tools from Oracle or Microsoft. They may or may not supply similar functions, and if they do, you'll have to modify your stylesheet slightly to use them. If it's important to you that you be able to switch XSLT processors at some point in the future, using extensions will limit your ability to do that.

Hopefully at this point you're convinced of at least one of the following two things:

Advantages of the key() Function

Now that we've taken the key() function through its paces, you can see that it has several advantages:

Generating Links in Unstructured Documents

Before we leave the topic of linking, we'll discuss one more useful technique. So far, all of this chapter's examples have been structured nicely. When there was a relationship between two pieces of information, we had an id and refid pair to match them. What happens if the XML document you're transforming isn't written that way? Fortunately, we can use the key() function and a new function, generate-id(), to create structure where there isn't any.

An Unstructured XML Document in Need of Links

For our example here, we'll take out all of the id and refid attributes that have served us well so far. This may be a contrived example, but it demonstrates how we can use the key() and generate-id() functions to generate links between parts of our document.

In our new sample document, we've stripped out the references that neatly tied things together before:

<?xml version="1.0" ?>
<!DOCTYPE glossary SYSTEM "unstructuredglossary.dtd">
<glossary>
  <glentry>
    <term>applet</term>
    <defn>
      An application program,
      written in the Java programming language, that can be 
      retrieved from a web server and executed by a web browser. 
      A reference to an applet appears in the markup for a web 
      page, in the same way that a reference to a graphics
      file appears; a browser retrieves an applet in the same 
      way that it retrieves a graphics file. 
      For security reasons, an applet's access rights are limited
      in two ways: the applet cannot access the file system of the 
      client upon which it is executing, and the applet's 
      communication across the network is limited to the server 
      from which it was downloaded. 
      Contrast with <refterm>servlet</refterm>.
    </defn>
  </glentry>

  <glentry>
    <term>demilitarized zone</term>
    <defn>
      In network security, a network that is isolated from, and 
      serves as a neutral zone between, a trusted network (for example, 
      a private intranet) and an untrusted network (for example, the
      Internet). One or more secure gateways usually control access 
      to the DMZ from the trusted or the untrusted network.
    </defn>
  </glentry>

  <glentry>
    <term>DMZ</term>
    <defn>
      See <refterm>delimitarized zone</refterm>.
    </defn>
  </glentry>

  <glentry>
    <term>pattern-matching character</term>
    <defn>
      A special character such as an asterisk (*) or a question mark 
      (?) that can be used to represent zero or more characters. 
      Any character or set of characters can replace a pattern-matching 
      character.
    </defn>
  </glentry>

  <glentry>
    <term>servlet</term>
    <defn>
      An application program, written in the Java programming language, 
      that is executed on a web server. A reference to a servlet 
      appears in the markup for a web page, in the same way that a 
      reference to a graphics file appears. The web server executes
      the servlet and sends the results of the execution (if there are
      any) to the web browser. Contrast with <refterm>applet</refterm>.
    </defn>
  </glentry>

  <glentry>
    <term>wildcard character</term>
    <defn>
      See <refterm>pattern-matching character</refterm>.
    </defn>
  </glentry>
</glossary>

To generate cross-references between the <refterm> elements and the associated <term> elements, we'll need to do three things:

  1. Define a key for all terms. We'll use this key to find terms that match the text of the <refterm> element.

  2. Generate a new ID for each <term> we find.

  3. For each <refterm>, use the key() function to find the <term> element that matches the text of <refterm>. Once we've found the matching <term>, we call generate-id() to find the newly created ID.

We'll go through the relevant parts of the stylesheet. First, we define the key:

<xsl:key name="terms" match="term" use="."/>

Notice that we use the value of the <term> element itself as the lookup value for the key. Given a string, we can find all <term> elements with that same text.

Second, we need to generate a named anchor point for each <term> element:

<xsl:template match="glentry">
  <p>
    <b>
      <a name="{generate-id(term)}">
        <xsl:value-of select="term"/>
        <xsl:text>: </xsl:text>
      </a>
    </b>
    <xsl:apply-templates select="defn"/>
  </p>
</xsl:template>

Third, we find the appropriate reference for a given <refterm>. Given the text of a <refterm>, we can use the key() function to find the <term> that matches. Passing the <term> to the generate-id() function returns the same ID generated when we created the named anchor for that <term>:

<xsl:template match="refterm">
  <a href="#{generate-id(key('terms', .))}">
    <xsl:value-of select="."/>
  </a>
</xsl:template>

Our generated HTML output creates cross-references similar to those in our earlier stylesheets:

    <h1>Glossary Listing: applet - wildcard character</h2>
    <p>
        <b><a name="N11">applet: </a></b>
  An application program,
  written in the Java programming language, that can be 
  retrieved from a web server and executed by a web browser. 
  A reference to an applet appears in the markup for a web 
  page, in the same way that a reference to a graphics
  file appears; a browser retrieves an applet in the same 
  way that it retrieves a graphics file. 
  For security reasons, an applet's access rights are limited
  in two ways: the applet cannot access the file system of the 
  client upon which it is executing, and the applet's 
  communication across the network is limited to the server 
  from which it was downloaded. 
  Contrast with <a href="#N53">servlet</a>.
</p>
...
    <p>
        <b><a name="N53">servlet: </a></b>
  An application program, written in the Java programming language, 
  that is executed on a web server. A reference to a servlet 
  appears in the markup for a web page, in the same way that a 
  reference to a graphics file appears. The web server executes
  the servlet and sends the results of the execution (if there are
  any) to the web browser. Contrast with <a href="#N11">applet</a>.
</p>

Using the key() and generate-id() functions, we've been able to create IDs and references automatically. This approach isn't perfect; we have to make sure the text of the <refterm> element matches the text of the <term> exactly.

This example, like all of the examples we've shown so far, uses a single input file. A more likely scenario is that we have one XML document that contains terms, and we want to reference definitions in a second XML document that contains definitions, but no IDs. We can combine the technique we've described here with the document() function to import a second XML document and generate links between the two. We'll talk about the document() function in a later chapter; for now, just remember that there are ways to use more than one XML input document in your transformations.

The generate-id() Function

Before we leave the topic of linking, we'll go over the details of the generate-id() function. This function takes a node-set as its argument, and works as follows:

Summary

In this chapter, we've examined a several ways to generate links and cross-references between different parts of a document. If your XML document has a reasonable amount of structure, you can use the id() and key() functions to define many different relationships between the parts of a document. Even if your XML document isn't structured, you may be able to use key() and generate-id() to create simple references. In the next chapter, we'll look at sorting and grouping, two more ways to organize the information in our XML documents.

Back to: XSLT


oreilly.com Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts
International | About O'Reilly | Affiliated Companies | Privacy Policy

© 2001, O'Reilly & Associates, Inc.
webmaster@oreilly.com