An Introduction to Perl's XML::XSLT module

ArticleCategory: [Choose a category for your article]

Applications

AuthorImage:[Here we need a little image form you]

[Photo of the Author]

TranslationInfo:[Author and translation history]

original in en Egon Willighagen 

AboutTheAuthor:[A small biography about the author]

Joined the Dutch LF team in 1999 and became second editor earlier this year. Is an informational chemistry student at the University of Nijmegen. Plays basketball and enjoys hiking.

Abstract:[Here you write a little summary]

In this article the Perl module XML::XSLT is introduced. It shows some of the capabilities of the W3C's XSLT standard and how it can be used to help you manage and publish XML documents to the web.

ArticleIllustration:[This is the title picture for your article]

[Illustration]

ArticleBody:[The article body]

Introduction

XSL Transformations (XSLT) is a W3C recommendation, and can thus be considered a standard. XSLT is part of XSL which is XML Stylesheet Language. It's purpose is, as the name says, to stylesheet or layout a XML document. Formatting object play a major role in XSL in laying out the information, but in that process transformation of the data is often needed. And that is where XSLT comes in.

In contrast to XSL itself, XSLT is already recommended and stable. In several programming languages XSLT processors are being developed. The most mature ones are written in C (XT, written by James Clark) and in Java (Xalan, formerly developed by Lotus Inc. and know by the Apache Foundation). But in Perl also two projects are ongoing: XML::XSLT and XML::Sabotron. The former is the older one and completely written in Perl, while the latter is an interface for a C++ XSLT processor.

XML::XSLT module

The current version is 0.21 and can be downloaded from CPAN. Also recently the project has gone SourceForge, and a CVS tree is available. This article, though, is based on version 0.21. The Perl module is developed by Geert Josten, a chemistry student at the University of Nijmegen, but nowadays manny other people contributed to the development. With a CVS tree up, the development of XML::XSLT is expected to boost. Which is necessary to speed up the implementation of W3C's working draft on XSLT.

The Perl code below shows how the module is used:

#!/usr/bin/perl
use XML::XSLT;

my $xmlfile = "example.xml";
my $xslfile = "example.xsl";

my $parser = XML::XSLT->new ($xslfile, "FILE");

$parser->transform_document ($xmlfile, "FILE");
$parser->print_result();

In this example it is shown how a XML file (example.xml) is transformed based on a XSLT file (example.xsl). But stylesheets can also be based on a DOM tree:

#!/usr/bin/perl
use XML::XSLT;
use XML::DOM;

my $domparser = new XML::DOM::Parser;
my $doc = $domparser->parsefile ("file.xml");

my $parser = XML::XSLT->new ($doc, "DOM");

or a string:


#!/usr/bin/perl
use XML::XSLT;

my $xsl_string = qq{
<?xml version="1.0"?>
<xsl:stylesheet>
  <xsl:template match="/">
    <html>
      <xsl:apply-templates/>
    </body>
  </xsl:template>
</xsl:stylesheet>
};

my $parser = XML::XSLT->new ($xsl_string, "STRING");

And these tree types are also available for transform_document() procedure given in the first example.

Here is a script that converts a XML file based on a XSLT stylesheet. It takes two arguments on the command line, being the filenames of the XSLT sheet and the XML file. Note that this script makes use of the "FILE" mechanism.

Now that we know how a XSLT processor can be used in Perl to convert XML documents, we can have a look at the XSLT standard.

The XSL Transformation Standard

XSL Transformation was designed to facilitate the publishing of data stored in XML. While XSL Formatting is used for layout and design, XSLT is used for basic transformation of your XML data, such as sorting, selecting information and also combining information from several sources. However, in real live, it turned out that XSLT alone was also suitable for layout and design.

XML::XSLT does not cover all XSLT commands yet, but all commands used in this article are supported.

XSLT documents define how a XML document should be transformed. It does so by defining a template for each element. Below are several examples of XSLT documents which all apply to one XML document that contains a Gnumeric worksheet (GNOME Spreadsheet application). (So, now you know that Gnumeric's storage format are XML documents. Normally they are gzipped: try gunzipping a *.gnumeric file.)

If you inspect the worksheet, you'll see that besides hard data also layout information is stored. For example, the page layout and the cell width and height. We will make several XSLT sheets to do specific jobs like:

We will introduce the basics of XSLT by writing a XSLT sheet for making a very simple summary (verysimple.xsl):

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="*">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="text()"/>

  <xsl:template match="Item">
    <xsl:value-of select="./name"/> : <xsl:value-of select="./val-string"/>
  </xsl:template>

</xsl:stylesheet>

The first template matches all elements in the XML document. The second template matches all CDATA in the XML document. And the last template actually does what we want: each Item in the Summary in the Gnumeric document is given the CDATA value of the name and the val-string child elements. Check this! Compare the output and see if this is what you expect based on the XML document.

But the first template already matches Item, doesn't it? Then why does it apply the third template and not the first? This is because the latter overwrites the first. Templates are thus sorted from general to specific.

Note that XML::XSLT prints a lot of whitespace that originates from the XSLT sheet. I do not believe there is a way in this version to circumvent this. But since our output will be in XHTML, we do not care for now. The next example has the same functionality but adds XHTML stuff to it so that we can view it with a web browser (simple.xsl):

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="*">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="text()"/>

  <xsl:template match="Item">
    <b><xsl:value-of select="./name"/></b>: <i><xsl:value-of select="./val-string"/></i><br />
  </xsl:template>

  <xsl:template match="/">
    <html>
      <head>
        <title>Summary Gnumeric File</title>
      </head>
      <body bgcolor="white">
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>

</xsl:stylesheet>

There is now an additional template for the root (/) element. This makes it possible to put the XHTML code around all other output; the output we had in the first example is now placed in the body. Why? When XML::XSLT starts processing it searches for a template that matches the root. It then prints the XHTML code up till the opening body element. Then it continues to apply templates to all child elements. When that it all done, it continues with the root template again, and prints the closing body and html tags.

There is also some additional XHTML code in the Item template. Note that you can mix both XSLT commands with output data. A XSLT processor takes every element that does not has the xsl namespace to be output.

From now on the examples only gave new or changed templates. The complete stylesheet are all linked. To finalize our first example we will add a header and see a second instance of the apply-templates command (finalsimple.xsl):

  <xsl:template match="Summary">
    <h2>Summary</h2>
    <ul>
      <xsl:apply-templates/>
    </ul>
  </xsl:template>

The for-each command

The xsl:for-each command gives some additional control to the processing of the XML document, especially in the combination with xsl:sort but that command is not yet implemented in XML::XSLT.

To add information about the sheets in the Gnumeric Workbook we will make use of xsl:for-each (foreach.xsl):

  <xsl:template match="Sheets">
    <xsl:for-each select="Sheet">
      <h2><xsl:value-of select="Name"/></h2>
      <ul>
        Rows: <xsl:value-of select="MaxRow"/><br />
        Cols: <xsl:value-of select="MaxCol"/><br />
      </ul>
    </xsl:for-each>
  </xsl:template>

To bad the XML document used has only one worksheet. You might want to try it on a Gnumeric file with more worksheets.

As mentioned before we cannot sort elements with XML::XSLT at this moment. And that is a pity, because the XML data in the Gnumeric file is not sorted. If we could sort it, we would be able to generate a XHTML table with gives the exact content of the worksheet. But that is not possible now. However, what we can do is list all information in a certain row or column. This is explained in the next example.

The if command

To list all data in the third column (which holds the salary of a rich student in the Netherlands), we can make use of the xsl:if command (if.xsl):

  <xsl:template match="Sheets">
    <xsl:for-each select="Sheet">
      <h2><xsl:value-of select="Name"/></h2>
      <ul>
        Rows: <xsl:value-of select="MaxRow"/><br />
        Cols: <xsl:value-of select="MaxCol"/><br />
        <xsl:apply-templates select="Cells"/><br />
      </ul>
    </xsl:for-each>
  </xsl:template>

  <xsl:template match="Cells">
    Content of Col 3:
    <xsl:for-each select="Cell">
      <xsl:if test="@Col='3'">
        <xsl:value-of select="Content"/><xsl:text>, </xsl:text>
      </xsl:if>
    </xsl:for-each>
  </xsl:template>

Since the Sheets template did not, we have to tell it to apply templates to the child Cells element. By using the select attribute with the xsl:apply-templates command, we force it to apply templates to the Cells element only.

The Cells template loops over all child Cell elements (again, make sure to check this with the source XML file!), but only print the value if the attribute Col has the value "3". Notice that the at sign refers to attributes where omission of the sign refers the elements.

Now that the templates get more complex, it is worth noting what the current element is. Within the document there is no focus, but once in a template a focus is applied. For example, when applying the Cells template, the processor is focussed on a instance of this element. Thus, a Cells element. When selecting information this focus is applied: select="." in the Cells template refers to the Cells element. The select="Cell" in the xsl:for-each command would select all Cell elements, but once in the loop the focus would be on one of these Cell element. Notice that the test="@Col" thus refers to an attribute of Cell and not of Cells. From within the loop one can refer to attributes of Cells element with select="../@name", except for the fact that Cells has no attributes.

The xsl:text command makes sure that all text is outputed. Normally, the space in the ", " sequence would be regarded as whitespace, which is not part of important output. By using xsl:text one can make sure that all text is outputed.

Conclusion

There is much more to both XSLT and the XML::XSLT module. This short article is just to give an introduction to the module. It would probably have given more questions and answers, but that is good. Leave them on the talkback page or post them on the mailing list on the XML::XSLT web site.

References

Glossary

CDATA
CDATA is character data and can is any sequence of characters not containing "]]>". See XML Recommendation.

DOM
Document Object Model. An interface that can be used to access documents, for example XML documents. See the DOM web site.

Gnumeric
A spreadsheet program for Gnome.

mirror server hosted at Truenetwork, Russian Federation.