Hi, I can (with no problem) open edit and close a text file(.txt) How can I do the same with a MS Word file(.doc) Right now I'm just tring to print the contents of a simple Word file like this "This is a test file for perl " using this #!C:\Perl\bin\perl.exe use CGI qw(:all); print header; open(TF, "perltest.doc") || die "Can't open perltest.doc: $!"; $text=<TF>; close(TF); print "perltest is -- $text"

First off, a MS Word doc is not a text document...it is a Rich Text document in the most general sense (Microsoft calls the format: Rich Text+ as they put other things in the file like revision tracking, etc.).

You need to get (if one exists and if it did I wouldn't know where to get it) a module that will parse the data in the Word document down to a text format. I doubt that such a module exists as converting RTF down to text is a ugly process and removes all document formatting. No one ever gets what they think they will get when running documents through conversions.

Maybe someone knows more on the subject. My advise is not to beat your head on this too much. But then again, don't give up either!

I found this and read a little in perl doc but I have not had time to try it. It might add two lines of text to a *.doc file and then saves it as a *.doc file??? Don't worry about checking this unless you just want to, cause like I said I haven't had time to check it myself, yet......

use strict; use Win32::OLE; use Win32::OLE::Const 'Microsoft Word';

my(@line) = ('Here is the first line of text.', 'Here is the second line of text.'); my($outputFile) = 'perltest.doc'; unlink($outputFile);

Don't forget these files are very complex. airo's example actually opens word (invible though), using OLE/COM. A word file can be RTF (wich is a bit like HTML, and is readable as ASCII) Word files however, are binary. Thay means individual bytes (or combinations of them) in a file represent data. ex. 4 bytes for size, then 4 bytes telling the size of ht e following text, then ascii characters containing the text (maybe the autor field in word), and so on. If Microsoft has documented this structure, you can maybe use it. Still it's difficult to make use of that in perl (with unpack). Word documents can also contain OLE objects, VBA programming lines, styles, images, settings. You all need to take are of those.

Either use plain text files, or make a OLE/COM connection using the sample from airo above. If you program in VBA (in word) you \'d reconize there lines.