You need to have a schema... or at least a DTD
It is often stated in the XML literature that XML was not designed to be an syntax for the manual input of data. It was designed for machine-to-machine communication, but was intended to be easily readable by humans (which, in turn, makes for easier debugging, easier building of XPath queries, and that sort of thing). This wisdom is commonly ignored. Many products use manually-entered XML configuration files. Why? There are several reasons:
- It is easier than coming up with a new syntax for the configuration file.
- Code to parse XML is readily available, making it unnecessary to write a custom parser.
- It's not all that difficult to manually enter XML, at least in small quantities.
- There are some unexpected benefits, like being able to use XML stylesheets to upgrade configuration files for new releases. (See "Managing XML Documents Versions and Upgrades with XSLT," by Vadim Zaliva.)
I recently moved to a new, smaller, office, and I no longer had the bookshelf space for my binders full of printed-out PDF articles. I got rid of all the binders and put the PDF files in a directory. I needed an index, so I built a manually-entered XML file with an entry for each book, listing the title, any topics I wanted it listed under, and the name of the PDF file, like this:
<book> <title>Agile Development of Safety-Critical Software for Machinery</title> <topic>Agile / Safety Critical</topic> <topic>Agile Development</topic> <file>Katara-18052010.pdf</file> </book>
All the <book> elements were wrapped in a <library> element. I added a processing instruction to the front to point to an XML stylesheet:
<?xml-stylesheet href="/libsheet.xsl" type="text/xsl"?>
The stylesheet used the Muench method, modified to work with multiple group membership, to sort and group the books by topic. I just had to point my web browser to the XML file, and I had my index. Now, the format of the <book> entries could hardly be simpler: three child elements, no attributes, so simple that a schema seemed unnecessary. I use XML Copy Editor, which validates that my XML is well-formed before it saves it. What could go wrong? But I was studying XML schemas, so I decided to make a schema for my simple index file, just for practice. I added an xsi:schemaLocation attribute to my <library> element to point to the schema:
<library xmlns="http://www.TheXMLAdventure.com/schemas/pdfdoc/docindex.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.TheXMLAdventure.com/schemas/pdfdoc/docindex.xsd docindex.xsd">
XML Copy Editor can also validate an XML document against a schema, and I was shocked to find that in my index, which had grown to 618 books, I had about a dozen entries that did not match the schema. It was mostly entries where I had forgotten to put in the <file> element, which meant that I would not have detected the error until I tried to open the PDF file. I have come to the conclusion that even if your XML data is simple, if you are manually entering it, you need to have a schema. At least if you care about the integrity of your data. If you prefer, you could use a DTD, but that has some disadvantages:
- Unlike schemas, DTDs are not XML and, at least to me, are a rather ugly construct.
- DTDs cannot enforce things like maximum occurrences, or correct data type.
But you ought to use something to keep user errors from creeping into your XML data. I have now created schemas for all my existing XML projects.
XML for LEDs
When I think of XML, I usually think of storing data, like customer orders or employee information, and transforming it from one form to another. But I recently ran into a couple of uses for XML that I would not have thought of. Both involve controlling LEDs. The first is BBXML, Darin Franklin's XML interface for BetaBrite LED signs. These are the LED signs you see all over the place with scrolling messages; I have also heard them referred to as ticker tape signs or Times Square signs. These signs are programmed over an RS-232 connection using an arcane protocol that is not very convenient to work with. For example, if you wanted to display "THE XML ADVENTURE" on the sign, you would send it an ASCII string like this: _01Z00_02A0THE XML ADVENTURE_04 The meaning of the characters is as follows:
- _01 - Start Of Header.
- Z - Type of sign. "Z" means "all types".
- 00 - Address of sign. "00" means broadcast to all signs.
- _02 - Start Of Text.
- A - Write TEXT file.
- 0 - Message area. "0" means priority text message.
- THE XML ADVENTURE - The text of the message.
- _04 - End Of Transmission.
Using BBXML, you can express the message in XML, like this:
<alphasign> <text label="0">THE XML ADVENTURE</text> </alphasign>
All the processing in BBXML is done in the alphasign.xsl XML stylesheet. So if you had the above XML in a file called commands.xml, you would just issue this command to convert the XML to the protocol required by the BetaBrite sign:
xsltproc alphasign.xsl commands.xml > commands.txt
(xsltproc is the XSLT processor from the libxslt package for Unix/Linux. If you are using a different processor you would modify the command accordingly.) Once the XML is converted to the BetaBrite protocol, it can be sent to the sign with the Unix/Linix cat command:
cat commands.txt > /dev/betabrite
(On Windows, you would copy it to a COM port.)
BBXML would be particularly useful if you have data that you need to publish in several places, perhaps in a PDF file, on a website, and, in abbreviated form, on a BetaBrite sign. The information would start out as an XML file, with different XML stylesheets to tailor it to the various output media. The output of the stylesheet for BetaBrite signs would then be fed to BBXML. The BBXML website has an excellent user's guide for BBXML, which tells you pretty much everything you need to know about using BBXML with BetaBrite RS-232-based signs.
More recent BetaBrite signs are USB-based, and I have not yet found a driver (for Unix/Linux orWindows) that allows you to copy data directly to the sign. As soon as such a driver is found, BBXML will work with USB-based signs, too.
The second application comes from Front2BackDev, one of my favorite XQuery blogs. It uses XQuery and a MarkLogic server to control Phillips Hue LED light bulbs. These light bulbs have a controller that hooks up to your network, and communicates with the bulbs over the power wiring. It makes it possible to set each light individually to a particular color and brightness. Phillips has apps for Androids and iPhones, but the controller also has a REST-based API. The example at Front2BackDev uses XQuery to call this API and step all the lights in the house through a list of colors. This could be enhanced to do all sorts of interesting things, like adjusting the lights in the house for different ambiance at different house of the day, or controlling lights being used as information radiators in Continuous Integration systems. In a later post at Front2BackDev, they used this in conjunction with geofences defined in Google maps. An app on a smartphone would send its GPS coördinates to a Marklogic server, which would figure out where in the geofences the phone was, and change the color of all the lights in the house accordingly.
Grouping data with multiple group membership using the Muench method
Grouping data with an XSLT 1.0 XML stylesheet is generally done using the Muench method, named after Steve Muench, who popularized the technique in his book Building Oracle XML Applications. Recently, I needed to group data where it was possible to be a member of multiple groups. I could not find anything in the literature about using the Muench method with multiple group membership, so after I figured out how, I decided to write it down, in case anyone else runs into the same problem. Suppose you have a file of XML employee information, and you want display a list of employees, grouped by department. The employee information looks like this:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="/deptlist.xsl" type="text/xsl"?> <company> <emp id="01276" gender="Female"> <dept>Accounting</dept> <name> <first>Sarah</first> <last>Collins</last> </name> <DOB>1978-04-11</DOB> </emp> <emp id="01001" gender="Male"> <dept>Payroll</dept> <name> <first>Fred</first> <last>Smith</last> </name> <DOB>1969-10-20</DOB> </emp> <emp id="01711" gender="Male"> <dept>Personnel</dept> <name> <first>Juan</first> <last>Muñoz</last> </name> <DOB>1980-01-04</DOB> </emp> <emp id="00941" gender="Male"> <dept>Purchasing</dept> <name> <first>Sam</first> <last>Francisco</last> </name> <DOB>1972-08-31</DOB> </emp> <emp id="01868" gender="Female"> <dept>Maintenance</dept> <name> <first>Betty</first> <last>Carson</last> </name> <DOB>1978-11-21</DOB> </emp> </company>
Using the Muench method, the XML stylesheet, deptlist.xsl, might look like this:Us
01 <?xml version="1.0" encoding="UTF-8"?> 02 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> 03 <xsl:key name="depts" match="emp" use="dept"/> 04 <xsl:template match="/"> 05 <xsl:for-each select="//emp[generate-id(.)=generate-id(key('depts', dept)[1])]"> 06 <xsl:sort select="dept"/> 07 <xsl:for-each select="key('depts', dept)"> 08 <xsl:sort select="name/first"/> 09 <xsl:sort select="name/last"/> 10 <xsl:if test="position() = 1"> 11 <xsl:element name="br"/> 12 <xsl:element name="h3"> 13 <xsl:value-of select="dept"/> 14 </xsl:element> 15 </xsl:if> 16 <p> 17 <xsl:value-of select="name/first"/> 18 <xsl:text> </xsl:text> 19 <xsl:value-of select="name/last"/> 20 </p> 21 </xsl:for-each> 22 </xsl:for-each> 23 </xsl:template> 24 </xsl:stylesheet>
On line 3, the xsl:key generates an index of <emp> elements, based on their <dept> values. On line 5 in the xsl:for-each, it looks at each <emp> element, and generates a unique ID (a hash that includes position, so two different <emp> elements will always have different hashes, even if they are identical). It selects the <emp> element if its ID matches the first entry in the index for that department. The purpose of this is to select one <emp> element for each department. This lets us enumerate the departments, so we can have a group per department. The next line sorts these <emp> elements by department, so we get a sorted list of departments. On line 7 in the xsl:for-each, it processes each <emp> element for the department we just selected. Lines 8 and 9 sort the <emp> elements by first name and last name. Line 10 checks for the first <emp> for a department, and generates an HTML H3 heading with its department name. Lines 16 through 20 display the first name, a space, and the last name for each employee. The result looks like this:
Accounting
Sarah Collins
Maintenance
Betty Carson
Payroll
Fred Smith
Personnel
Juan Muñoz
Purchasing
Sam Francisco
That is how the Muench method normally works. But what if it is possible to belong to multiple groups? Suppose our employees can belong to more than one department? If we add additional <dept> tags to some of the employees, our XML employee information might look like this:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="/deptlist.xsl" type="text/xsl"?> <company> <emp id="01276" gender="Female"> <dept>Accounting</dept> <dept>Payroll</dept> <name> <first>Sarah</first> <last>Collins</last> </name> <DOB>1978-04-11</DOB> </emp> <emp id="01001" gender="Male"> <dept>Payroll</dept> <name> <first>Fred</first> <last>Smith</last> </name> <DOB>1969-10-20</DOB> </emp> <emp id="01711" gender="Male"> <dept>Personnel</dept> <dept>Maintenance</dept> <name> <first>Juan</first> <last>Muñoz</last> </name> <DOB>1980-01-04</DOB> </emp> <emp id="00941" gender="Male"> <dept>Purchasing</dept> <dept>Maintenance</dept> <name> <first>Sam</first> <last>Francisco</last> </name> <DOB>1972-08-31</DOB> </emp> <emp id="01868" gender="Female"> <dept>Maintenance</dept> <dept>Sales</dept> <name> <first>Betty</first> <last>Carson</last> </name> <DOB>1978-11-21</DOB> </emp> </company>
When we run this XML information against our XSL stylesheet, the results are not what we might expect, and definitely not what we want:
Payroll
Fred Smith
Sarah Collins
Purchasing
Sam Francisco
Juan Muñoz
Betty Carson
It is listing only the departments that have at least one employee who does not have multiple <dept> elements. The reason is line 5, where it says <xsl:for-each select="//emp[generate-id(.)=generate-id(key('depts', dept)[1])]"> The highlighted part will be the value of the <dept> element for employees with a single department, but for the rest, it will be a concatenation of the values of the employee's <dept> tags, which will not be found in the index. Thus, departments that consist solely of employees with multiple allegiances will not be selected. Luckily, it does not take many changes to the XML stylesheet to make it work with multi-department employees. Here is the modified XML stylesheet:
01 <?xml version="1.0" encoding="UTF-8"?> 02 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> 03 <xsl:key name="depts" match="emp" use="dept"/> 04 <xsl:template match="/"> 05 <xsl:for-each select="//emp/dept[generate-id(parent::*)=generate-id(key('depts',.)[1])]"> 06 <xsl:sort select="."/> 07 <xsl:variable name="thisdept" select="." /> 08 <xsl:for-each select="key('depts',.)"> 09 <xsl:sort select="name/first"/> 10 <xsl:sort select="name/last"/> 11 <xsl:if test="position() = 1"> 12 <xsl:element name="br"/> 13 <xsl:element name="h3"> 14 <xsl:value-of select="$thisdept"/> 15 </xsl:element> 16 </xsl:if> 17 <p> 18 <xsl:value-of select="name/first"/> 19 <xsl:text> </xsl:text> 20 <xsl:value-of select="name/last"/> 21 </p> 22 </xsl:for-each> 23 </xsl:for-each> 24 </xsl:template> 25 </xsl:stylesheet>
We change line 5 so that instead of looking at each <emp> element, it looks at each <dept> within an <emp> element. We replace the current node (".") with the parent of the current node "parent::*", so we are generating the same ID we did before, but we may process each <emp> element multiple times if it has more than one <dept> element. We compare that ID with the first ID in the index for an <emp> element that has that department, and select the <dept> element if they match. This selects one <emp> element per department, but since the for-each looks at each <dept> element, we can deal with having multiples. On line 6, since we are sorting by <dept> elements, rather than <emp> elements, we change "dept" to the current node ("."), since it already is a <dept> element. On line 7, we set a variable to the department we are working with. We will need this in a minute. On line 8, in the for-each where we are going through all the keys for the department we are working with, we change "dept" to the current node ("."), since it already is a <dept> element. On line 14, when we make the H3 heading, we use the value of the variable we set on line 7, so we get the name of the department we are working with, not a concatenation of all the departments for the first employee in the department. Now when we run our XML employee information against our XML stylesheet, we get this result:
Accounting
Sarah Collins
Maintenance
Sam Francisco
Juan Muñoz
Betty Carson
Payroll
Fred Smith
Sarah Collins
Personnel
Juan Muñoz
Purchasing
Sam Francisco
Sales
Betty Carson
This is what we were looking for. The departments are listed in alphabetical order, and each employee is listed (in alphabetical order) under each department that he or she is a member of.
A quick XSLT gallery
XSLT comes to the rescue again! I have a small website of tutorials that uses the old Coppermine software. Each tutorial is a gallery. The first image of each gallery is the title of the tutorial, and the titles of subsequent images are "Step 1", "Step 2", etc. The description for each image explains how to do that step. I recently had to move my website to a new host that was incompatible with Coppermine. I quickly looked around for photo gallery software, but I found that much of it does not accommodate having a long description for each image. I needed to get my site back up quickly, and I didn't want to rush into choosing gallery software, so I made a quick gallery with the help of XSLT. First I dumped the data from the Coppermine database to produce XML for each page that looked like this:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="/../gallery.xsl" type="text/xsl"?> <page> <title>Step 1</title> <image>LRG-20100804-001.jpg</image> <text><p>This is the first step in this process.</p> <p>Be <em>very</em> careful to follow these instructions.</p> </text> <next>page2</next> <prev>page0</prev> </page>
Most of this I could do using an SQL select statement with the CONCAT()
function and appropriate literals. I added the <next>
and <prev>
elements manually. The data was saved in files named page0.xml
, page1.xml
, etc, with a separate directory for each tutorial. The gallery XSLT stylesheet, which was saved in the parent directory so that it could be shared by all the tutorials, looked like this:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html" indent="yes"/> <xsl:template match="/"> <html> <head> <title> <xsl:value-of select="page/title"/> </title> <style> .icon { width:50px; height:50px; border-style:none; } </style> </head> <body> <h1> <xsl:value-of select="page/title"/> </h1> <!-- Image --> <div> <xsl:element name="img"> <xsl:attribute name="src">images/ <xsl:value-of select="page/image"/> </xsl:attribute> </xsl:element> </div> <div> <xsl:copy-of select="page/text"/> </div> <!-- Navigation buttons --> <div> <xsl:if test="page/next"> <xsl:element name="a"> <xsl:attribute name="href"> <xsl:value-of select="page/next"/> </xsl:attribute> <img src="/icons/next.png" class="icon" style="float:right;"/> </xsl:element> </xsl:if> <xsl:if test="page/prev"> <xsl:element name="a"> <xsl:attribute name="href"> <xsl:value-of select="page/prev"/> </xsl:attribute> <img src="/icons/back.png" class="icon" style="float:left;"/> </xsl:element> </xsl:if> </div> </body> </html> </xsl:template> </xsl:stylesheet>
The result was a quick gallery with buttons to go forward and backwards through the tutorials. Now, there are some disadvantages to implementing a gallery this way. By using stylesheets that are processed in the user's browser, you are at the mercy of how each different browser implements XML stylesheets. Also, processing the individual files this way makes it trickier to do things that require looking at all of them, like automatically generating an index of tutorials, or generating "page x of y" page numbers. And finally, it would be nice if I could have avoided manually adding the <next>
and <prev>
tags. I am in transition between host sites, so my Marklogic server was not available, but if it had been, I could have used an XQuery program to look at the <title>
elements and insert the appropriate <next>
and <prev>
elements using the xdmp:node-insert-child()
function. But if my Marklogic server had been available, I probably would have just done the whole thing using XQuery. Still, the way I have the tutorials XML-ized now, it will be easy to use XQuery to convert them to whatever form I need, either to serve them from the Marklogic server using XQuery, or to convert them for some gallery package, if I find one I like.
Page 2 of 2