12 Answers
12

Moving your images into a database and writing the code to extract the image may be more hassle than it's worth. It will all go back to the business requirements surrounding the need to protect the images, or the requirement for performance.

I'd suggest sticking to the tried and true system of storing the filepath or directory location in the DB, and keeping the files on disk. Here's why:

A filesystem is easier to maintain. Some thought has to be put into the structure and organization of the images. i.e. a directory for each customer, and a subdirectory for each [Attribute-X] and another subfolder for each [Attribute-Y]. Keeping too many images in one directory will end up slowing down the file access (i.e. hundreds of thousands)

If the idea of storing in a DB is a counter-measure against filesystem loss, (i.e. a disk goes down, or a directory is deleted by accident), then I'd counter with the suggestions that when you use source control, it's no problem to retrieve any lost/missing/delete files.

If you ever need to scale and move to a content distribution scenario, you'd have to move out back to the filesystem or perform a big extract to push out to the providers.

Having your images stored as varbinary or a blob of some kind (depending on your platform), I'd suggest it's more hassle than it's worth. The effort that you'll need to extend means more code that you'll have to maintain, unit test, and defend against defects.

If your environment can support SQL Server 2008, you can have the best of both worlds with their new FileStream datatype in SQL 2008.

It doesn't matter where you store them in terms of preventing "theft". If you deliver the bytestream to a browser, it can be intercepted and saved by anyone. There's no way to prevent that (I'm assuming you're talking about delivering the images to a browser).

If you're just talking about securing images on the machine itself, it also doesn't matter. The operating system can be locked down as securely as the database, preventing anyone from getting at the images.

In terms of performance (when presenting images to a browser), I personally think it'll be faster serving from a filesystem. You have to present the images in separate HTTP transactions anyway, which would almost certainly require multiple trips to the database. I suspect (although I have no hard data) that it would be faster to store the image URLs in the database which point to static images on the file system - then the act of getting an image is a simple file open by the web server rather than running code to access the database.

You're probably going to have to get a whole ton of "but the filesystem is a DB" answers. This isn't one of them.

The filesystem option depends on many factors, for example, does the server have write premissisons to the directory? (And yes, I have seen servers where apache couldn't write to DocumentRoot.)

If you want 100% cross-compatibility across platforms, then the Database option is the best way to go. It'll also let you store image-related metadata such as a user ID, the date uploaded, and even alternate versions of the same image (such as cached thumbnails).

On the down side, you need to write custom code to read images from the DB and serve them to the user, while any standard web server would just let you send the images as they are.

When it comes to the bottom line, though, you should just choose the option that fits your project, and your server configuration.

Of course you can make this scalable and distributed, you just need to keep the images dirs synched between them (for JackM). Or use a shared storage connected to multiple web frontend servers.

Anyway, the stealing part was covered in your other question and is basically impossible. People that can access the images will always be able (with more or less work) to save them locally ... even if it means "print-screen" and paste into photoshop and saving.

I had the same thought about security; once the image is viewed in the browser, you can theoretically kiss it goodbye. The security problem may originally have been ACLs on the filesystem, or perhaps some kind of browser authentication issues.
–
p.campbellJun 18 '09 at 17:57

It depends on how many images you expect to handle, and what you have to do with them. I have an application that needs to temporarily store between 100K and several million images a day. I write them in 2gb contiguous blocks to the filesystem, storing the image identifier, filename, beginning position and length in a database.

For retrieval I just keep the indices in memory, the files open read only and seek back and forth to fetch them. I could find no faster way to do it. It is far faster than finding and loading an individual file. Windows can get pretty slow once you've dumped that many individual files into the filesystem.

I suppose it could contribute to security, since the images would be somewhat difficult to retrieve without the index data.

For scalability, it would not take long to put a web service in front of it and distribute it across several machines.

I'd call this a hybrid solution, and it would address a need for security. How fast is the seeking through 2gb of a large file?
–
p.campbellJun 18 '09 at 15:43

It seems quite fast. It isn't a sequential read. I'm just doing fileStream.Seek(blockOffset,SeekOrigin.Begin) and then reading the next imageLength bytes. I guess I could keep track of my current location and seek to an offset, but that would probably add more complexity than it was worth. If I wanted additional security, I could encrypt or mount a TrueCrypt volume and write the blocks there, but for a non-DMZ server, that too would be overkill.
–
R UbbenJun 18 '09 at 16:46

I take it you never delete or modify the files once they're written? If you did, you'd end up writing your own version of a file system!
–
Mark RansomJun 18 '09 at 22:13

No. It is a lot easier to just delete the index entry, or if I modified one, add a new image. This is intended for temporary storage, but if I made it permanent, I would still do it that way. Speed is a lot more expensive than disk space. Besides, if this was a permanent system of record, auditors would have a problem with the fact it was possible to delete images.
–
R UbbenJun 18 '09 at 22:27

For a web application I look after, we store the images in the database, but make sure they're well cached in the filesystem.

A request from one of the web server frontends for an image requires a quick memcache
check to see if the image has changed in the database and, if not, serves it from the filesystem. If it has changed it fetches it from the central database and puts a copy in the filesystem.

This gives most of the advantages of storing them in the filesystem while keeping some
of the advantages of database - we only have one location for all the data which makes
backups easier and means we can scale across quite a few machines without issue. It
also doesn't put excessive load on the database.

I'm not going to vote you down but I think your answer is wrong. File systems can be every bit as scalable as DBMS'. I'd be interested in why you think that was the case.
–
paxdiabloJun 18 '09 at 3:50

@Pax: Agreed. I am imagining the webfarm at Flickr reading a DB to serve all its images.
–
p.campbellJun 18 '09 at 3:55

Perhaps I was mislead then. I'm working on major rewrite of a web application, due to scaling issues, the architect recommended to the stakeholders that it be completely rewritten [the code and data model are awful] but the big issue was scaling. His main reason was, since the images are all file based, you can't scale up with multiple servers and use load balancers.
–
Jack MarchettiJun 18 '09 at 3:59

Maybe the location of the images is stored in a database?
–
Jack MarchettiJun 18 '09 at 4:02

2

@JackM, you can scale up filesystem images with multiple servers - you just have to have a copy on each server. A single NFS mounted location would be as bad as a non-replicated database. In addition, having a billion images in a single directory may also cause problems but that's bad design, and no different from not indexing the images in the DB. More than likely your architects are making work for themselves :-)
–
paxdiabloJun 18 '09 at 4:11

Saving your files to the DB will provide a some security in terms that another user would need access to the DB in order to retrieve the files, but, as far as efficiency goes, a sql query for every image loaded, leaving all the load to the server side. Do yourself a favor and find a way to protect your images inside the filesystem, they are many.

The biggest out-of-the-box advantage of a database is that it can be accessed from anywhere on the network, which is essential if you have more than one server.

If you want to access a filesystem from other machines you need to set up some kind of sharing scheme, which could add complexity and could have synchronization issues.

If you do go with storing images in the database, you can still use local (unshared) disks as caches to reduce the strain on the DB. You'd just need to query the DB for a timestamp or something to see if the file is still up-to-date (if you allow files that can change).

If the issue is scalability you'll take a massive loss by moving things into the database. You can round-robin webservers via DNS but adding the overhead of both a CGI process and a database lookup to each image is madness. It also makes your database that much harder to maintain and your images that much harder to process.

As to the other part of your question, you can secure access to a file as easily as a database record, but at the end of the day as long as there is an URL that returns a file you have limited options to prevent that URL being used (at least without making cookies and/or javascript compulsory).