I have a bunch of PDF documents that I need to use in a website I am making. I do need to be able to search the PDFs as well. So is it better to save these files to the database or to a file folder? Also, in both cases how do I search them? I will basically be searching them for 1 or 2 words and return the list of PDFs that have the results in them. What is the best and easiest way to do all of this? Also, the PDF file will be changed once a year at the most often and sometimes even less often and I will not need to keep revision history.

I will be storing no more than 1000 documents.
–
RandomBenMar 1 '10 at 16:27

Voyager - This is different than all of those. My real concern is searching files and if SQL Server does a better job or if there is some way/a better way to do it via searching a folder of files.
–
RandomBenMar 1 '10 at 16:31

4 Answers
4

You can store the PDF inside of a table using a varbinary field and an extension field. Then you can take advantage of the Fulltext serch engine to search inside of the PDFs. You will have to install a PDF iFilter in your SQL server. I do not know if this is the easiest way to do it, but I know it works great. I am using that schema to store hundred of thousands of documents and it performs great.

This is the same argument over and over again about saving things in the file system vs saving them in the database. Sadly, there is no right or wrong answer, and it all depends on the scope of your project. Take a look at this stackoverflow question. It's about saving images in a DB, but it's the same principle.

As ppl say, I suppose that there are many advantages and disadvantages, in both ways, but if I´d had to take this decission, I definitely wouldn´t save pdf files in the database. I´m not talking only in terms related to efficiency... I´m thinking what would you do in the future if you´ll have to change your database engine, for example. I always try to get the most standard database types as possible. =)

I would probably make a database table where I map document information such as the name, a description, who uploaded it, etc. to a filename. I would not store the entire files in the database.

This way, you would need to synchronize the files on disk with the database so to speak. When someone deletes a file (using the web interface), remove the entry from the database and delete the file that was on disk.

Images are one thing and I understand. This deals more specifically with searching the documents though. Which is what I was wondering. I was not even sure if it was possible to search documents in a folder via a .NEt website.
–
RandomBenMar 1 '10 at 16:29