Grouping data with multiple group membership using the Muench method

Grouping data with an XSLT 1.0 XML stylesheet is generally done using the Muench method, named after Steve Muench, who popularized the technique in his book Building Oracle XML Applications. Recently, I needed to group data where it was possible to be a member of multiple groups. I could not find anything in the literature about using the Muench method with multiple group membership, so after I figured out how, I decided to write it down, in case anyone else runs into the same problem. Suppose you have a file of XML employee information, and you want display a list of employees, grouped by department. The employee information looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/deptlist.xsl" type="text/xsl"?>
<company>
   <emp id="01276" gender="Female">
      <dept>Accounting</dept>
      <name>
         <first>Sarah</first>
         <last>Collins</last>
      </name>
      <DOB>1978-04-11</DOB>
   </emp>
   <emp id="01001" gender="Male">
      <dept>Payroll</dept>
      <name>
         <first>Fred</first>
         <last>Smith</last>
      </name>
      <DOB>1969-10-20</DOB>
   </emp>
   <emp id="01711" gender="Male">
      <dept>Personnel</dept>
      <name>
         <first>Juan</first>
         <last>Muñoz</last>
      </name>
      <DOB>1980-01-04</DOB>
   </emp>
   <emp id="00941" gender="Male">
      <dept>Purchasing</dept>
      <name>
         <first>Sam</first>
         <last>Francisco</last>
      </name>
      <DOB>1972-08-31</DOB>
   </emp>
   <emp id="01868" gender="Female">
      <dept>Maintenance</dept>
      <name>
         <first>Betty</first>
         <last>Carson</last>
      </name>
      <DOB>1978-11-21</DOB>
   </emp>
</company>

Using the Muench method, the XML stylesheet, deptlist.xsl, might look like this:

01    <?xml version="1.0" encoding="UTF-8"?>
02    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
03    <xsl:key name="depts" match="emp" use="dept"/>
04    <xsl:template match="/">
05    <xsl:for-each select="//emp[generate-id(.)=generate-id(key('depts', dept)[1])]">
06        <xsl:sort select="dept"/>
07        <xsl:for-each select="key('depts', dept)">
08            <xsl:sort select="name/first"/>
09            <xsl:sort select="name/last"/>
10            <xsl:if test="position() = 1">
11                <xsl:element name="br"/>
12                <xsl:element name="h3">
13                    <xsl:value-of select="dept"/>
14                </xsl:element>
15            </xsl:if>
16            <p>
17                <xsl:value-of select="name/first"/>
18                <xsl:text> </xsl:text>
19                <xsl:value-of select="name/last"/>
20            </p>
21        </xsl:for-each>
22    </xsl:for-each>
23    </xsl:template>
24    </xsl:stylesheet>

On line 3, the xsl:key generates an index of <emp> elements, based on their <dept> values. On line 5 in the xsl:for-each, it looks at each <emp> element, and generates a unique ID (a hash that includes position, so two different <emp> elements will always have different hashes, even if they are identical). It selects the <emp> element if its ID matches the first entry in the index for that department. The purpose of this is to select one <emp> element for each department. This lets us enumerate the departments, so we can have a group per department. The next line sorts these <emp> elements by department, so we get a sorted list of departments. On line 7 in the xsl:for-each, it processes each <emp> element for the department we just selected. Lines 8 and 9 sort the <emp> elements by first name and last name. Line 10 checks for the first <emp> for a department, and generates an HTML H3 heading with its department name. Lines 16 through 20 display the first name, a space, and the last name for each employee. The result looks like this:

Accounting
Sarah Collins

Maintenance
Betty Carson

Payroll
Fred Smith

Personnel
Juan Muñoz

Purchasing
Sam Francisco

That is how the Muench method normally works. But what if it is possible to belong to multiple groups? Suppose our employees can belong to more than one department? If we add additional <dept> tags to some of the employees, our XML employee information might look like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/deptlist.xsl" type="text/xsl"?>
<company>
   <emp id="01276" gender="Female">
      <dept>Accounting</dept>
      <dept>Payroll</dept>
      <name>
         <first>Sarah</first>
         <last>Collins</last>
      </name>
      <DOB>1978-04-11</DOB>
   </emp>
   <emp id="01001" gender="Male">
      <dept>Payroll</dept>
      <name>
         <first>Fred</first>
         <last>Smith</last>
      </name>
      <DOB>1969-10-20</DOB>
   </emp>
   <emp id="01711" gender="Male">
      <dept>Personnel</dept>
      <dept>Maintenance</dept>
      <name>
         <first>Juan</first>
         <last>Muñoz</last>
      </name>
      <DOB>1980-01-04</DOB>
   </emp>
   <emp id="00941" gender="Male">
      <dept>Purchasing</dept>
      <dept>Maintenance</dept>
      <name>
         <first>Sam</first>
         <last>Francisco</last>
      </name>
      <DOB>1972-08-31</DOB>
   </emp>
   <emp id="01868" gender="Female">
      <dept>Maintenance</dept>
      <dept>Sales</dept>
      <name>
         <first>Betty</first>
         <last>Carson</last>
      </name>
      <DOB>1978-11-21</DOB>
   </emp>
</company>

When we run this XML information against our XSL stylesheet, the results are not what we might expect, and definitely not what we want:

Payroll
Fred Smith
Sarah Collins

Purchasing
Sam Francisco
Juan Muñoz
Betty Carson

It is listing only the departments that have at least one employee who does not have multiple <dept> elements. The reason is line 5, where it says <xsl:for-each select="//emp[generate-id(.)=generate-id(key('depts', dept)[1])]"> The highlighted part will be the value of the <dept> element for employees with a single department, but for the rest, it will be a concatenation of the values of the employee’s <dept> tags, which will not be found in the index. Thus, departments that consist solely of employees with multiple allegiances will not be selected. Luckily, it does not take many changes to the XML stylesheet to make it work with multi-department employees. Here is the modified XML stylesheet:

01    <?xml version="1.0" encoding="UTF-8"?>
02    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
03    <xsl:key name="depts" match="emp" use="dept"/>
04    <xsl:template match="/">
05    <xsl:for-each select="//emp/dept[generate-id(parent::*)=generate-id(key('depts',.)[1])]">
06        <xsl:sort select="."/>
07        <xsl:variable name="thisdept" select="." />
08        <xsl:for-each select="key('depts',.)">
09            <xsl:sort select="name/first"/>
10            <xsl:sort select="name/last"/>
11            <xsl:if test="position() = 1">
12                <xsl:element name="br"/>
13                <xsl:element name="h3">
14                    <xsl:value-of select="$thisdept"/>
15                </xsl:element>
16            </xsl:if>
17            <p>
18                <xsl:value-of select="name/first"/>
19                <xsl:text> </xsl:text>
20                <xsl:value-of select="name/last"/>
21            </p>
22        </xsl:for-each>
23    </xsl:for-each>
24    </xsl:template>
25    </xsl:stylesheet>

We change line 5 so that instead of looking at each <emp> element, it looks at each <dept> within an <emp> element. We replace the current node (“.”) with the parent of the current node “parent::*”, so we are generating the same ID we did before, but we may process each <emp> element multiple times if it has more than one <dept> element. We compare that ID with the first ID in the index for an <emp> element that has that department, and select the <dept> element if they match. This selects one <emp> element per department, but since the for-each looks at each <dept> element, we can deal with having multiples. On line 6, since we are sorting by <dept> elements, rather than <emp> elements, we change “dept” to the current node (“.”), since it already is a <dept> element. On line 7, we set a variable to the department we are working with. We will need this in a minute. On line 8, in the for-each where we are going through all the keys for the department we are working with, we change “dept” to the current node (“.”), since it already is a element. On line 14, when we make the H3 heading, we use the value of the variable we set on line 7, so we get the name of the department we are working with, not a concatenation of all the departments for the first employee in the department. Now when we run our XML employee information against our XML stylesheet, we get this result:

Accounting
Sarah Collins

Maintenance
Sam Francisco
Juan Muñoz
Betty Carson

Payroll
Fred Smith
Sarah Collins

Personnel
Juan Muñoz

Purchasing
Sam Francisco

Sales
Betty Carson

This is what we were looking for. The departments are listed in alphabetical order, and each employee is listed (in alphabetical order) under each department that he or she is a member of.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *