If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Read and write XML?

I receive a telegram in XML format and I need to parse it and send my data in the same format back. My main problem is that I'm not allowed to use any open source software or other 3rd party software except VS2010. I have to write it in C++.

Now my question: is there any good tutorial for such a parser? I was hoping that I could write something which can be used similar to the GEtPrivateProfil-functions which are handy for ini-files.

Re: Read and write XML?

My company isn't allowing this so it's not usefull to discuss about it, I fear. I'm pretty unhappy about this too. So I need to write it by myself but to be honest I hav eno idea how to start. Right now I fear that it's an endless programming with CString:Find(), CString::Left(), CString::Right(), CString::Pos and so on... I still hope that there is a more simple way to programm this parser....

Re: Read and write XML?

Originally Posted by grka

My company isn't allowing this so it's not usefull to discuss about it, I fear. I'm pretty unhappy about this too. So I need to write it by myself but to be honest I hav eno idea how to start. Right now I fear that it's an endless programming with CString:Find(), CString::Left(), CString::Right(), CString::Pos and so on... I still hope that there is a more simple way to programm this parser....

I don't get it -- so having you write code that could (no, will) have bugs is OK, but using highly tested libraries that are used by thousands of programmers with tons of documentation can't be used? Makes absolutely no sense.

First of all, you don't program a parser in an ad-hoc style as you're doing now. Do you know what a recursive descent parser is? How about an LALR parser? Do you have the production rules for the XML grammar? All of these things are requirements for you to properly code a parser. So its either you study up on these topics, or as others have stated, use one that has already been done.

Re: Read and write XML?

Thank you for the information about the SDK I think that should be allowed.

Our company does security software and to make sure we know exactly what our software is doing we have to write the stuff by ourselfs. There are only a very few things we are allowed to use from 3rd party software and for these some developers hat to do an entire codereview for this 3rd party software and a lawyer had to check all licences and so on. To keep costs of this extra work as low as possible we got the order from the boss not to use open source. If we do we would have to do at first a paper where we explain why it would be cheaper (including the costs for the lawyer and the codereview) to use it instead of writing it by ourselves and then we wouldn't be allowed to release our software before the lawyer and the codereview gave their okay which usually takes several months and I don't have that long till our customer wants its software.

The reason for this all is that in the past our company had already some problem with open source software that caused them costs of over 1 million dollar

Re: Read and write XML?

Originally Posted by grka

Thank you for the information about the SDK I think that should be allowed.

Our company does security software and to make sure we know exactly what our software is doing we have to write the stuff by ourselfs. There are only a very few things we are allowed to use from 3rd party software and for these some developers hat to do an entire codereview for this 3rd party software and a lawyer had to check all licences and so on. To keep costs of this extra work as low as possible we got the order from the boss not to use open source. If we do we would have to do at first a paper where we explain why it would be cheaper (including the costs for the lawyer and the codereview) to use it instead of writing it by ourselves and then we wouldn't be allowed to release our software before the lawyer and the codereview gave their okay which usually takes several months and I don't have that long till our customer wants its software.

The reason for this all is that in the past our company had already some problem with open source software that caused them costs of over 1 million dollar

Well, does your company know what it takes to write a parser correctly? This is where the "suits" should stay out of the programming business.

As I stated, you cannot write a parser in an adhoc style and have it maintainable, extensible, and understandable. If you go the adhoc route, every bug you will encounter will become a nightmare trying to untangle the "adhoc" code to accommodate the bug fix(es). To top that off, you can forget about adding anything to the adhoc parser. You basically have to start from scratch again just to extend it.

There are formal ways of writing a parser. It first starts out with the rules of the parser being written out, step by step. This is what you call the production rules. Then once you have the production rules, you choose the parser to write following the production rules. The easiest parser to write is a recursive descent parser. Once you write a parser using these techniques, bugs are usually easy to fix, and extensibility becomes much easier.

Re: Read and write XML?

Well I understand you sooo well but I have no chance to go against our rule here. Others tried it in the past already without success :-(

The telegram above is the only telegram that I receive at all so I don't need a full flexible parser. I just need to know how many tag1 and tag2 are inside tag0 and then get the attributes above from these 3 tags. and for tag2 I need to know how many param-tags are inside and the key and value attribute. There are no other tag-names or attribute-names in my telegram and also no other telegram formats at all. So I think the parser shouldn't be THAT large to write.

Re: Read and write XML?

It is really not helpful for me to discuss this because I can't change a rule in a large company. A lot of people in much higher positions than me have tried this before.

So please can we stop to discuss things that I can't change and instead talking about how I can parse this single telegram? There won't be any different formats in this special case in the coming years. It's for one special customer and in our business our software usually runs for years without any changes.

If I receive anything that doesn't fit to this telegram format it's going to be a mistake. The only "variable thing" in this telegram is the number of tag1 and tag2 elements and their order can be different. e.g. I can get 3 tag2 and then 2 tag1 and then 4 tag2 in one telegram. I want to read it end save the attríbutes in a struct which is going to be added to a list

Re: Read and write XML?

Originally Posted by grka

The telegram above is the only telegram that I receive at all so I don't need a full flexible parser. I just need to know how many tag1 and tag2 are inside tag0 and then get the attributes above from these 3 tags. and for tag2 I need to know how many param-tags are inside and the key and value attribute. There are no other tag-names or attribute-names in my telegram and also no other telegram formats at all. So I think the parser shouldn't be THAT large to write.

It doesn't matter if the parser is or is not flexible. The correct way to write the parser is exactly how I described. You should have a mockup of the rules you just stated above in a "formal" psuedo-code or language. Then from there, you write the code parsing the tokens and acting on those tokens. You asked us, and that is exactly how you write a parser.

Not only would you write your parser correctly, you never need to throw it out the window if a requirement changes.

To do this simply, I would start by defining the required syntax, then produce a syntax diagram (http://en.wikipedia.org/wiki/Syntax_diagram) then decide what type of parser is required. If you can get by with a one token look-ahead then using a recursive descent parser (as per Paul's post #7) will be the easiest (http://en.wikipedia.org/wiki/Recursive_descent_parser). IMO I would steer clear of trying an 'ad hoc' approach and do a RDP if possible. These aren't too difficult to write once you have the syntax diagram (which is the hard bit!) and once you understand the basics of these.

All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

Re: Read and write XML?

There won't be any different formats in this special case in the coming years. It's for one special customer and in our business our software usually runs for years without any changes.

The problem is that with what you posted, there could be many different combinations of comments, order of arguments in the tags, etc. This is why you can't just jump in and write CString::Find()'s and CString::Pos() code in an ad hoc fashion. You will go crazy trying to patch here and there trying to fix bugs, making the code an absolute mess.

When that customer adds a comment at the end, or the order of those arguments in the tag changes, etc. and your ad-hoc parser fails, that customer will not be happy. They will see a perfectly formed file, with the only difference being an extra comment, or the "index" value is placed before the "type" value.

If I receive anything that doesn't fit to this telegram format it's going to be a mistake.

Who will detect this mistake? Your code, or a phone call from your customer and then your manager asking why your parser failed (then you have to debug)? What if that string argument is not terminated with a double quote?

This is more reason why you need to write the parser correctly. What if a "bad" telegram happens to defeat your ad hoc parser and your parser assumes the telegram is OK? The processing afterwards assumes the data is OK when it isn't, then what? The parser has to detect bad telegrams in a consistent, deterministic manner.

What you need to do first is write the rules of what makes a valid telegram using EBNF or other methods. If the tag values must be in a certain order, those are part of the rules. Look at the links that 2kaud provided. Never code a parser without the rules being layed out in a formal way. "Word of mouth" rules or "the data looks like this example" doesn't count -- you need to know exactly what is expected, whether the order matters or not for the arguments, etc.

I will give you a real life example: Some time ago a colleague of mine had to write a specialized expression evaluator, and he did not want to use libraries. So he did the same thing you tried to do -- writing Find() functions and all sorts of searching, endless if() and switch() statements, all without a plan. Eventually the code failed when he was told that the expression could contain parentheses.

I came along and looked at what he was doing, and suggested he first write the rules of the expression first. He was not familiar with doing this step, so I spent about an hour writing the production rules for the expression evaluator. I then asked him to rewrite the parser using a recursive descent model, i.e. write functions that followed the production rules. Within a day or so, they had a working expression evaluator in C++, with complete error checking if an expression was deemed to be invalid.

The bottom line is this -- when you do things the right way the first time, there are no surprises later on.

Re: Read and write XML?

1) If your program runs on windows, you are already using libraries made by microsoft.
2) I understand the security software issue. But I doubt that extends to interfacing your app to the outside world.
that part has no security at all other than the actual datatransfer which you'll have th write yourself anyway.
3) MSXML is part of windows itself, and windows itself uses it for many parts of it's ui and IE. The headers of this are part of the WIndows SDK. Are you teling me you can' teven use this ?
Then what is the diference between using something like the CreateFile() API to open a file which is just as much a part of the WIndows API.

If you want a full compatible XML parser... Expect to spend weeks, even months of development time to provide for all the features xml offers to have a fully compliant xml parser that is performant enough to suit your needs.

If you only want "simple xml" with a specific guaranteed subset (mainly leaving out namespaces, entities, multiple encodings, validation, ...), then writing a parser may be doable in a few days.

if you only want to parse a very specific xml with a specific layout, then you can do it with a regex. (which is also an external library btw, even though it's part of the c++ standard)

* The Perfect Platform for Game Developers: Android
Developing rich, high performance Android games from the ground up is a daunting task. Intel has provided Android developers with a number of tools that can be leveraged by Android game developers.

* The Best Reasons to Target Windows 8
Learn some of the best reasons why you should seriously consider bringing your Android mobile development expertise to bear on the Windows 8 platform.