Simple Java Search Engine for Word Docs

I am a complete newbie when it comes to programming, the only real knowledge I have was when I created a program 3 years ago in Visual Basic. I guess I know the theory behind programming concepts but have no idea what to do in Java. Not sure if someone on here can help me out?

I need to basically create a search engine to search for words or group of words in Word Documents. In essence, program the Find (Ctrl + F) feature in Microsoft Word, so much so that my search engine will also need to be able to locate where to find the word in the document.

Can anyone help me out? I need this done by Sunday, is this even possible? I am panicing so much...

I understand I will need some code to first open the document and probably put it in an array to search through? However, even that I have no idea how to do.

Generally, I would recommend using the Apache POI library to read Microsoft Office files, including Word documents. http://poi.apache.org/ Or if running on Windows, one could use ActiveX to drive the Word program directly. Neither of these is very accessible for a beginner.

Something that might be possible for you is to copy the text from Word to text files by hand. You could probably write a Java program to search text files without having to be an expert.

Or you could use 3rd party tools. Google desktop is good at searching Word documents.

but I am only wanting to search through .txt files so there shouldn't be the mass of coding as in xml

hfx642

7 Years Ago

Your original post said "I need to basically create a search engine to search for words or group of words in Word Documents."
Are you searching Word Documents or a text file?
There is a BIG difference!

so far I have managed to make my code read a .txt and print it out but not too sure where I should go from here? If I post my code here would you be able to give me some insight in where I should head to now?

so far I have managed to make my code read a .txt and print it out but not too sure where I should go from here? If I post my code here would you be able to give me some insight in where I should head to now?

What end result do you want to achieve?

If you could imagine working infinitely fast, and having infinite patience, could you imagine how you could do this work by hand? (You have to understand how something can be done "by hand" before you can teach the computer to do it.)

The end result seems to be quite farfetched at the moment, but I would like the program to be able to:
1) search for a word
2) show the location of the word (the line it is written on will do)
3) rank the result compared to what the user wants (for example if i am looking red car, if the words red and car are a few words away from each other then it would bring back a low score)
4) display the results in order

I understand that there will be loops where each line is individually scanned to see if there are any results, once a result is found it will have to be logged in a buffer. once the whole document has been scanned, it will then be shown in a table.