How scalable is this?

Hello all
We have a box from which we ftp data.
The data is stored in the filesystem, in a hierarchy of directories and subdirectories, for example:
productx--
|
year
|
month
|
day
|
hour
|
minute

so to look up a file, you have to find the right product, and then navigate down to the minute the product was produced, and then parse to find the right file.
There is rarely more than a half-dozen of the same product that were produced at the same minute, and commonly only a few.

Some people want to use this box for a much higher volume
product, and there are concerns that such a lookup system won't scale, so I thought I'd ask the experts for their opinions.
The people who work with the box regularly say it can scale, but those who don't think that not using a modern RDBMS is ridiculous.
Thanks much for your input, and I'll also give at least the few best answers some points, too, maybe 50 each.
Vic

Where is the concern about scaleability? In terms of storage, access time, FTP server load?

It would seem to me that storage would not be an issue with this sort layout unless there could be more files located in a particular minute than were inodes on the disk that contained them. The layout of the storage lends itself to multiple mount points so even a humongous amount of 'product' could be easily handled.

Likewise I doubt that access time would be an issue. There are other limits that would come into play long before traversing the file system(s) would be a problem.

Server load could well be an issue. Presumably if a much higher volume product is to be handled there would be a higher volume of FTP traffic, both in placing the product on the server and in retrieving it. And this is an area that I don't see an RDBMS helping all that much. Distributing the existing file system across multiple servers looks to be a better solution as far as scalability is concerned. That way both the data storage and FTP load is distributed. With an RDBMS you'd still have a single choke point, even if there were multiple FTP servers.

So it would seem to me that a file system based FTP repository would scale better than an RDBMS based system.

Well, since the access to the data is by FTP, the network and FTP service will limit the process first. The actual data transfer via FTP is the most efficient means of moving data between systems, but the overhead associated with getting from the initial FTP connect to the get/put is significant. Also you have to consider what limits the OS and hardware have, like max number of processes, max open files, installed memory, etc. Obviously, if there are lots and lots of FTP sessions active you could easily get into a situation where the server would start to swap which would significantly impact access time. And that's where a number of servers, each with a portion of the storage, has an advantage.

Featured Post

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

I have been running these systems for a few years now and I am just very happy with them. I just wanted to share the manual that I have created for upgrades and other things. Oooh yes! FreeBSD makes me happy (as a server), no maintenance and I al…

FreeBSD on EC2
FreeBSD (https://www.freebsd.org) is a robust Unix-like operating system that has been around for many years. FreeBSD is available on Amazon EC2 through Amazon Machine Images (AMIs) provided by FreeBSD developer and security office…

Learn how to navigate the file tree with the shell.
Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…