This question exists because it has historical significance, but it is not considered a good, on-topic question for this site, so please do not use it as evidence that you can ask similar questions here. This question and its answers are frozen and cannot be changed. More info: help center.

56 Answers
56

I'm in charge of some applications that manage many TB of images. We've found that storing file paths in the database to be best.

There are a couple of issues:

database storage is usually more expensive than file system storage

you can super-accelerate file system access with standard off the shelf products

for example, many web servers use the operating system's sendfile() system call to asynchronously send a file directly from the file system to the network interface. Images stored in a database don't benefit from this optimization.

things like web servers, etc, need no special coding or processing to access images in the file system

databases win out where transactional integrity between the image and metadata are important.

it is more complex to manage integrity between db metadata and file system data

it is difficult (within the context of a web application) to guarantee data has been flushed to disk on the filesystem

what off the shelf products are available for "super-accelerating" the file system?
–
Andrei RîneaOct 4 '08 at 10:53

22

While I only manage 3TB of files, I definitely agree. Databases are for structured data, not blobs.
–
derobertMar 22 '09 at 19:31

7

@derobert: quite so, if you will never use a data element in a query, as a condition or for a join, it probably doesn't belong in the database. Then again, if you have a nice database function to query images for likeness...
–
Nils WeinanderMay 18 '09 at 14:34

14

what off the shelf products are available for "super-accelerating" the file system?
–
ablmfJul 31 '09 at 15:16

5

Re: "super-accelerating" products: Most web servers can now take advantage of the sendfile() system call to deliver static files asynchronously to the client. It offloads to the operating system the task of moving the file from disk to the network interface. The OS can do this much more efficiently, operating in kernel space. This, to me, seems like a big win for file system vs. db for storing/serving images.
–
Alan DonnellyNov 20 '10 at 17:07

Normally, I'm storngly against taking the most expensive and hardest to scale part of your infrastructure (the database) and putting all load into it. On the other hand: It greatly simplifies backup strategy, especially when you have multiple web servers and need to somehow keep the data synchronized.

Second the recommendation on file paths. I've worked on a couple of projects that needed to manage large-ish asset collections, and any attempts to store things directly in the DB resulted in pain and frustration long-term.

The only real "pro" I can think of regarding storing them in the DB is the potential for easy of individual image assets. If there are no file paths to use, and all images are streamed straight out of the DB, there's no danger of a user finding files they shouldn't have access to.

That seems like it would be better solved with an intermediary script pulling data from a web-inaccessible file store, though. So the DB storage isn't REALLY necessary.

In my experience, sometimes the simplest solution is to name the images according to the primary key. So it's easy to find the image that belongs to a particular record, and vice versa. But at the same time you're not storing anything about the image in the database.

File paths in the DB is definitely the way to go - I've heard story after story from customers with TB of images that it became a nightmare trying to store any significant amount of images in a DB - the performance hit alone is too much.

Small static images (not more than a couple of megs) that are not frequently edited, should be stored in the database. This method has several benefits including easier portability (images are transferred with the database), easier backup/restore (images are backed up with the database) and better scalability (a file system folder with thousands of little thumbnail files sounds like a scalability nightmare to me).

Serving up images from a database is easy, just implement an http handler that serves the byte array returned from the DB server as a binary stream.

This might be a bit of a long shot, but if you're using (or planning on using) SQL Server 2008 I'd recommend having a look at the new FileStream data type.

FileStream solves most of the problems around storing the files in the DB:

The Blobs are actually stored as files in a folder.

The Blobs can be accessed using either a database connection or over the filesystem.

Backups are integrated.

Migration "just works".

However SQL's "Transparent Data Encryption" does not encrypt FileStream objects, so if that is a consideration, you may be better off just storing them as varbinary.

From the MSDN Article:

Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.
FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.

The word on the street is that unless you are a database vendor trying to prove that your database can do it (like, let's say Microsoft boasting about Terraserver storing a bajillion images in SQL Server) it's not a very good idea. When the alternative - storing images on file servers and paths in the database is so much easier, why bother? Blob fields are kind of like the off-road capabilities of SUVs - most people don't use them, those who do usually get in trouble, and then there are those who do, but only for the fun of it.

I would go with the file system approach. As noted by a few others, most web servers are built to send images from a file path. You'll have much higher performance if you don't have to write or stream out BLOB fields from the database. Having filesystem storage for the images makes it easier to setup static pages when the content isn't changing or you want limit the load on the database.

One thing that I haven't seen anyone mention yet but is definitely worth noting is that there are issues associated with storing large amounts of images in most filesystems too. For example if you take the approach mentioned above and name each image file after the primary key, on most filesystems you will run into issues if you try to put all of the images in one big directory once you reach a very large number of images (e.g. in the hundreds of thousands or millions).

Once common solution to this is to hash them out into a balanced tree of subdirectories.

It's better to use a filesystem that has no problem with large directories
–
Seun OsewaOct 29 '08 at 3:46

8

I had an app with millions of files in one directory (server running RHEL 4) - to even list the directory contents (piping to a file) took days and created an output file 100's of MB in size. Now they are in a database I have a single file that I can move or backup quite easily.
–
RichardJun 17 '09 at 15:24

1

@Seun Osewa : every file system has limitations ... and if you know of one that has no problems storing millions of entries in the same directory, please let me know !
–
GuillaumeNov 4 '10 at 12:51

1

@Seun Osewa : the database is up to 28GB now, with 5.4 M records. I ended up having to partition the database table so I have several files to back up that are approx 5GB in size.Moving the individual images onto Amazon S3 now so I only have to store the filename in the DB (and Amazon can do the backups)
–
RichardNov 12 '10 at 7:41

I'm not sure how much of a "real world" example this is, but I currently have an application out there that stores details for a trading card game, including the images for the cards. Granted the record count for the database is only 2851 records to date, but given the fact that certain cards have are released multiple times and have alternate artwork, it was actually more efficient sizewise to scan the "primary square" of the artwork and then dynamically generate the border and miscellaneous effects for the card when requested.

The original creator of this image library created a data access class that renders the image based on the request, and it does it quite fast for viewing and individual card.

This also eases deployment/updates when new cards are released, instead of zipping up an entire folder of images and sending those down the pipe and ensuring the proper folder structure is created, I simply update the database and have the user download it again. This currently sizes up to 56MB, which isn't great, but I'm working on an incremental update feature for future releases. In addition, there is a "no images" version of the application that allows those over dial-up to get the application without the download delay.

This solution has worked great to date since the application itself is targeted as a single instance on the desktop. There is a web site where all of this data is archived for online access, but I would in no way use the same solution for this. I agree the file access would be preferable because it would scale better to the frequency and volume of requests being made for the images.

Hopefully this isn't too much babble, but I saw the topic and wanted to provide some my insights from a relatively successful small/medium scale application.

The only reason we store images in our tables is because each table (or set of tables per range of work) is temporary and dropped at the end of the workflow. If there was any sort of long term storage we'd definitely opt for storing file paths.

It should also be noted that we work with a client/server application internally so there's no web interface to worry about.

I once worked on an image processing application. We stored the uploaded images in a directory that was something like /images/[today's date]/[id number]. But we also extracted the metadata (exif data) from the images and stored that in the database, along with a timestamp and such.

Not having a seperate backup strategy can be a big deal when you are writing applications that are installed on premise (like SharePoint). When you create a SharePoint backup everything is in the DB which makes it very easy.
–
spoon16Oct 2 '08 at 23:40

44

Security by obscurity is not really an access control strategy!
–
Jon CageOct 9 '08 at 10:46

File system, for sure. Then you get to use all of the OS functionality to deal with these images - back ups, webserver, even just scripting batch changes using tools like imagemagic. If you store them in the DB then you'll need to write your own code to solve these problems.

One thing you need to keep in mind is the size of your data set. I believe that Dillie-O was the only one who even remotely hit the point.

If you have a small, single user, consumer app then I would say DB. I have a DVD management app that uses the file system (in Program Files at that) and it is a PIA to backup. I wish EVERY time that they would store them in a db, and let me choose where to save that file.

For a larger commercial application then I would start to change my thinking. I used to work for a company that developed the county clerks information management application. We would store the images on disk, in an encoded format [to deal with FS problems with large numbers of files] based on the county assigned instrument number. This was useful on another front as the image could exist before the DB record (due to their workflow).

As others have said SQL 2008 comes with a Filestream type that allows you to store a filename or identifier as a pointer in the db and automatically stores the image on your filesystem which is a great scenario.

If you're on an older database, then I'd say that if you're storing it as blob data, then you're really not going to get anything out of the database in the way of searching features, so it's probably best to store an address on a filesystem, and store the image that way.

That way you also save space on your filesystem, as you are only going to save the exact amount of space, or even compacted space on the filesystem.

Also, you could decide to save with some structure or elements that allow you to browse the raw images in your filesystem without any db hits, or transfer the files in bulk to another system, hard drive, S3 or another scenario - updating the location in your program, but keep the structure, again without much of a hit trying to bring the images out of your db when trying to increase storage.

Probably, it would also allow you to throw some caching element, based on commonly hit image urls into your web engine/program, so you're saving yourself there as well.

I'm the lead developer on an enterprise document management system in which some customers store hundreds of gigabytes of documents. Terabytes in the not too distant future. We use the file system approach for many of the reasons mentioned on this page plus another: archiving.

Many of our customers must conform to industry specific archival rules, such as storage to optical disk or storage in a non-proprietary format. Plus, you have the flexibility of simply adding more disks to a NAS device. If you have your files stored in your database, even with SQL Server 2008's file stream data type, your archival options just became a whole lot narrower.

I don't see anyone claiming that a filesystem is faster than a DB 100% of the time (read Mark Harrison's answer). That's a bit of a strawman. There are probably situations in which it's preferable not to wear your seatbelt, but generally speaking, wearing a seatbelt is a good idea.
–
CalvinApr 8 '09 at 16:56