Operating Files using c...???

This is a discussion on Operating Files using c...??? within the C Programming forums, part of the General Programming Boards category; Hello friends.., Is it possible to read a Microsoft office document file (.docx format) in c...?
I know it's complicated ...

Operating Files using c...???

Hello friends.., Is it possible to read a Microsoft office document file (.docx format) in c...?
I know it's complicated task.. Also, the file that is to be read is well-formatted.. It contains tables., not images.., but fonts with different sizes.. Please help me... How can i solve this..? I just want to read that document and display it in the console screen (monitor).Thanks in advance...!!!

Is it possible to read a Microsoft office document file (.docx format) in c...?

Yes.

Originally Posted by Rehman khan

I know it's complicated task.. Also, the file that is to be read is well-formatted.. It contains tables., not images.., but fonts with different sizes.. Please help me... How can i solve this..? I just want to read that document and display it in the console screen (monitor).Thanks in advance...!

You could investigate the Office Open XML format (not to be confused with the, um, more open Open Document format). A caveat is that from what I heard, Microsoft does not quite implement the format according to the specification that it pushed for standardisation itself, but hopefully that will not be a problem. Alternatively, if you can get the tables exported in say, CSV format, then your life will be much easier.

Have a look at the code complexity for any major file format. You're literally talking about writing an OpenOffice/LibreOffice filter or thereabouts to make it display. Can you pull SOME plain-text information out of an office file - yeah, quite easily. docx can be opened with the zlib library but *interpreting* them is another matter entirely. You have to parse one of the most hideously documented and inconsistent standards of XML data known to man in order to work out what text is displayed where and in what format. And that's *before* you even touch on things like images, tables, etc.

For older versions of Office, the problem was even worse. Projects like antiword and wvWare are incredibly complex just to do simple extractions of data from those files.

If you're set on doing this, you're going to need to learn zlib, XML and read the documentation of the docx formats. Good luck doing that. To my knowledge, outside of closed-source offerings from Microsoft itself, there isn't anything short of a full office suite that's capable of displaying an MS office document anywhere near reliably (and even there, it's cited as the worst-operating part of suites like OpenOffice/LibreOffice because there's just so many things that aren't documented and don't work how they should).

If you want to display a Word document, use Word. You could probably do some old-style embedding tricks like the way that modern browsers "embed" PDF and Java plugin content into their windows (used to be called OLE in my day, but apparently that's old-hat now) but that would need Word on the PC and is no different to just opening up the document in Word, really. Otherwise, you have an awfully long rocky road in front of you that virtually nobody in the world except for large organisations with huge codebases that are millions of lines of code developed over decades has even seriously attempted in the last 20 years. Even the OpenOffice/LibreOffice import filters came from StarOffice originally (which was commercially developed by Sun/Oracle - the company also responsible for Java).

- Compiler warnings are like "Bridge Out Ahead" warnings. DON'T just ignore them.
- A compiler error is something SO stupid that the compiler genuinely can't carry on with its job. A compiler warning is the compiler saying "Well, that's bloody stupid but if you WANT to ignore me..." and carrying on.
- The best debugging tool in the world is a bunch of printf()'s for everything important around the bits you think might be wrong.