Monday, 9 September 2013

Sort elements of arbitrary XML document recursively

Sort elements of arbitrary XML document recursively

I'm trying to sort and canonicalize some XML documents. The desired end
result is that:
every element's children are in alphabetical order
every elements attributes are in alphabetical order
comments are removed
all elements are properly spaced (i.e. "pretty print").
I have achieved all of these goals except #1.
I have been using this answer as my template. Here is what I have so far:
import javax.xml.transform.stream.StreamResult
import javax.xml.transform.stream.StreamSource
import javax.xml.transform.TransformerFactory
import org.apache.xml.security.c14n.Canonicalizer
// Initialize the security library
org.apache.xml.security.Init.init()
// Create some variables
// Get arguments
// Make sure required arguments have been provided
if(!error) {
// Create some variables
def ext = fileInName.tokenize('.').last()
fileOutName = fileOutName ?: "${fileInName.lastIndexOf('.').with {it
!= -1 ? fileInName[0..<it] :
fileInName}}_CANONICALIZED_AND_SORTED.${ext}"
def fileIn = new File(fileInName)
def fileOut = new File(fileOutName)
def xsltFile = new File(xsltName)
def temp1 = new File("./temp1")
def temp2 = new File("./temp2")
def os
def is
// Sort the XML attributes, remove comments, and remove extra whitespace
println "Canonicalizing..."
Canonicalizer c =
Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS)
os = temp1.newOutputStream()
c.setWriter(os)
c.canonicalize(fileIn.getBytes())
os.close()
// Sort the XML elements
println "Sorting..."
def factory = TransformerFactory.newInstance()
is = xsltFile.newInputStream()
def transformer = factory.newTransformer(new StreamSource(is))
is.close()
is = temp1.newInputStream()
os = temp2.newOutputStream()
transformer.transform(new StreamSource(is), new StreamResult(os))
is.close()
os.close()
// Write the XML output in "pretty print"
println "Beautifying..."
def parser = new XmlParser()
def printer = new XmlNodePrinter(new
IndentPrinter(fileOut.newPrintWriter(), " ", true))
printer.print parser.parseText(temp2.getText())
// Cleanup
temp1.delete()
temp2.delete()
println "Done!"
}
Full script is here.
XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="foo">
<foo>
<xsl:apply-templates>
<xsl:sort select="name()"/>
</xsl:apply-templates>
</foo>
</xsl:template>
</xsl:stylesheet>
Sample Input XML:
<foo b="b" a="a" c="c">
<qwer>
<zxcv c="c" b="b"/>
<vcxz c="c" b="b"/>
</qwer>
<baz e="e" d="d"/>
<bar>
<fdsa g="g" f="f"/>
<asdf g="g" f="f"/>
</bar>
</foo>
Desired Output XML:
<foo a="a" b="b" c="c">
<bar>
<asdf f="f" g="g"/>
<fdsa f="f" g="g"/>
</bar>
<baz d="d" e="e"/>
<qwer>
<vcxz b="b" c="c"/>
<zxcv b="b" c="c"/>
</qwer>
</foo>
How can I make the transform apply to all elements so all of an element's
children will be in alphabetical order?

No comments:

Post a Comment