Today I am going to write about how to extract the XML structure from a file. The basic idea is to bulk read the file from disk, place the content in an XML variable and traverse elements in the variable and ultimately output a resultset showing the structure of xml file.

I often use this to determine what kind of XML the file is by comparing the returned resultset with a lookup-table.
See comment in code to understand what happens

Feedback

I would suggest #Nodes table uses a clustered index (surrogate) and using try catch in sps :)BTW new sql server 2008 datatype hierarchy makes such operations much easier.Thanks for the stored procedures. I am always having problems with reading xml from files.And is there a way to directly assign the file contenct to an xml or at least text variable?

I have an XSLT 2.0 transform that does a similar thing -- not quite the same, it returns an XML document whose tree is all the unique-named element paths rather than a list of unique parent-child element names at each level.

However, it does seem to be a good deal faster for large/deep files, so it might make a useful preprocessor, since the rowset returned by dbo.uspGetVariableStructureXML will be the same in both cases.

When I say large files, my test file of 103MB returned 2500 rows with a maximum level of 16. dbo.uspGetVariableStructureXML took 8m 56s to run, the XSLT using Saxon b 9.1 took 25s on the same machine. In both cases, about 20s of that seems to have been loading the XML file.