If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Unanswered: How to parse XML file

Hi All,

I have a XML file of the following format receives from one of our external applications. I have a requirement to extract distributorid,sellingind for each distributor element and store the same into a table having two columns(distributorid,sellingind). The size of the xml file could be upto 2-3 GB. Instead of using java, I would like to use db2 in case if it supports. I had gone thru the documentation and identified 2 approaches.

1. DB2 XML extender
2. Federate using XML Wrapper

I had gone thru the documentation available over the internet, I still not understood which way is the better approach.

The largest column size for a CLOB in DB2 (which is where DB2 XML Extender would store the XML) is 2GB (unless it has been changed in DB2 9). The largest column size that can be logged (CLOBs can be defined as NOT LOGGED) is 1 GB.

XML Extender will probably not be supported beyond version 9 or 10, since its functionality has been replaced by "Pure XML" in V9. So use it at your own risk.

The largest column size for a CLOB in DB2 (which is where DB2 XML Extender would store the XML) is 2GB (unless it has been changed in DB2 9). The largest column size that can be logged (CLOBs can be defined as NOT LOGGED) is 1 GB.

XML Extender will probably not be supported beyond version 9 or 10, since its functionality has been replaced by "Pure XML" in V9. So use it at your own risk.

totally agree with Marcus_A. XML extender is not the best choice if you will deal a lot with XML.

For sandbox purpose, you can try DB2 v9(the expression version is free, I think) and use the 'pure xml' feature. You can store a XML data that larger than 2G. Because by theory, there is no size limitation for XML data though there are system limitation.

Then use XMLTABLE function to split the document into smaller piece before query on the data.

By all means, there is no cheap/easy way to query on a 2G~3G document.

Your case is one of the perfect example for XMLTABLE, the following stmt is from DB2/zos

(1) CAST(? as XML) will bring in your document(in text format),
(2) '/Distributor' will grab each 'Distributor' element from root, which will contruct to each row of the result table X
(3) COLUMNS "DistributorId" INTEGER,
"sellingind" CHAR(5)) X
will get the context of "DistributorId" and "sellingind" element from that particular 'Distributor'.
(4) eventrually, the insert from sub-select will put that data into your regular relational table.

I am pretty positive that DB2 can deal with 1GB document. It will be a challenge for 3GB. But if you are a big customer, DB2 will make it work for you.

I am pretty positive that DB2 can deal with 1GB document. It will be a challenge for 3GB. But if you are a big customer, DB2 will make it work for you.

The data type in the client will be CLOB or so, which introduces the 2GB limitation again.

Your only chance to handle documents bigger than 2GB would be the XML wrapper - if that wrapper supports it (You should give that a try/read the manual.) From a client application, you have to split the document yourself first.

The data type in the client will be CLOB or so, which introduces the 2GB limitation again.

Your only chance to handle documents bigger than 2GB would be the XML wrapper - if that wrapper supports it (You should give that a try/read the manual.) From a client application, you have to split the document yourself first.

p.s: The 2GB LOB limit has not been lifted (yet).

CAST(? as xml) bind in the hostvar as XML(not lobs) using streaming tech, which don't have such limitation. But I don't know whether any one tries it before.

As has already been said, the XML file to be passed in the "?" can be at most (a bit less than) 2GB large; the above query can safely be called several times, on pieces of the original file, but the "?" must each time be a full, valid XML file. Hence, in cutting apart 1.99 GB pieces of the file, you will have to add an extra <?xml ...> header between some consecutive
"</Distributor>"
and
"<Distributor>"