Scan Microsoft Office files

I am trying to put together a script that will allow me to scan
Microsoft Office files and store "keywords" for those files so they
are searchable by content not just title.

If you open a word file with Perl and look at the actual source it is
basically a text file with a bunch of bogus code. I was hoping someone
here might have heard of a module out there that can step out the
ambiguous code out and just store plain text words. Or is RegEx my
only option?

Advertisements

"Will Fawcett" <> wrote in message
news:...
>I am trying to put together a script that will allow me to scan
> Microsoft Office files and store "keywords" for those files so they
> are searchable by content not just title.
>
> If you open a word file with Perl and look at the actual source it is
> basically a text file with a bunch of bogus code. I was hoping someone
> here might have heard of a module out there that can step out the
> ambiguous code out and just store plain text words. Or is RegEx my
> only option?
>
> -Will

foreach my $para ( in $paras ) {
my $style = $para->Style->{ NameLocal };
my $text = $para->Range->{ text };
print "$style\t$text\n"
}
Assumes Word is open and a document is open. The vba help files have all the
methods/properties. A search on Win32::OLE will bring up many
tutorials/references.

Share This Page

Welcome to The Coding Forums!

Welcome to the Coding Forums, the place to chat about anything related to programming and coding languages.

Please join our friendly community by clicking the button below - it only takes a few seconds and is totally free. You'll be able to ask questions about coding or chat with the community and help others.
Sign up now!