Using big flat databases it will not work because perl won't let it. You will get an timeout error.

I searched the internet and found a module called: file::sort. This module should work?? Or is there an other way to sort easily big flat databases! Has to work on NT and UNIX!!!

How to implement the file::sort module to the like the script I used above!

And is it possible to use this module without installing it. I can not install modules myself and my provider won't do it for me!! Some modules can be uploaded to a certain directory and you can link to it using something like this:

I want to use the sort module the same way I did uisng the subroutine.

# subroutine sort_data # # Subroutine that does the actual sort. # # Accepts 4 params viz. # # 1. The column number to sort. Column no. start from 0. # # 2. The type of sort numeric or alphabetic. Default is Alphabetic. # # 3. The order of sort. Default order is ascending # # 4. The referrence of the array that needs to be sorted. #

# subroutine sort_data # # Subroutine that does the actual sort. # # Accepts 4 params viz. # # 1. The column number to sort. Column no. start from 0. # # 2. The type of sort numeric or alphabetic. Default is Alphabetic. # # 3. The order of sort. Default order is ascending # # 4. The referrence of the array that needs to be sorted. #

Note that throwing 9,000 lines into memory for sorting will probably make your program crawl (and maybe time out, as you already experienced), regardless of whether or not you're using a module. Modules are just a group of specially written and grouped functions that exist outside of your main program. The effect on the machine that's running the code is the same, with the exception that modules generally are tested and optimized.

File::Sort performs a "Sort a file or merge sort multiple files". Are you sure this is what you want/need? Perhaps [url=http://search.cpan.org/search?dist=Sort-Fields]Sort::Fields may be closer to what you need. Docs are [url=http://search.cpan.org/doc/JNH/Sort-Fields-0.90/Fields.pm]here.

With flatfile of 9,000 files, I'd urge you to consider MySQL or other database program. Any sorting algorithm on that much flatfile data (considering that the entire file will be slurped into memory and then sorted record by record) will be slow.

The code that you wrote in your first post of this thread looks fine... if it times out, it just means that the function you are asking for is taking a exorbitant amount of time.

1) You may find you get thrown off your host 2) It will slow down the rest of your site. 3) Why not use MySQL? There are features in mySQL that let you do exactly what you are asking, and it is much faster and safer

Hmm after reading these posts, I'm left wondering how big is too big for a flatfile DB?

I'm running a site that has a similar DB to the one discussed here. It is only 830 lines long at the moment, but could grow to over the 9000+ lines that are being discussed here. I do very similar things to what is being attempted with sorting the contents then displaying the sorted output.

Here is the thing... My script is reading the entire file into memory, then picking out any number of entries as verified by a pattern match, then sorting alphabetically, weeding out the duplicates if neccessary, then displaying the output. Even though it has over 800 lines, the script never takes more than 1 second to run. Will it start having the same timeout problems when and if it reaches over 9000+ lines? Right now the filesize is only 300K as read on disk.

Also, could there be a difference in Unix and NT handling of large files as well?

My script is reading the entire file into memory, then picking out any number of entries as verified by a pattern match, then sorting alphabetically, weeding out the duplicates if neccessary, then displaying the output.

It sounds like you don't need to slurp the file into memory. You can go through each line of the file one by one, keep only the lines that match, throw the rest away, perform your functions, sort then output. Example:

[perl]#!/usr/bin/perl -w

use strict;

# initialize the matches hashref. i prefer hashrefs over hashes because they # can easily be thrown around throughout programs. it's a good idea to always # append _href to hash ref names or _aref for array ref names -- this way, # you always know what reference type you're working on.

my $matches_href = {};

open( FILE, "<test.db" ) or die $!;

# instead of my @db = <FILE>, using while ( <FILE> ) will read one line at # a time, saving the overhead of keeping possibly large files in # memory.

while ( <FILE> ){ chomp;

# here, grab the id number and throw the rest of the line in an array. # this assumes that the id number is the first element of the line :)

my ( $id, @row ) = split( /\|/ );

# this line checks the search criteria, and if it succeeds, it adds the # line to the matches href. Note that $id is the key and an array ref # which contains the remainder of the matched line. If you don't use # an array ref (if you add @row instead of \@row), you will receive # "Can't use ('something') as an ARRAY ref" errors later on in your # program.

$matches_href->{ $id } = \@row if $row[0] =~ /Perl/ && $id; }

close FILE or die $!;

# sort the records based on the first element of the arrayref. if you wanted # to sort it based on the id instead (remember that the id was the key, and # isn't in the arrayref), you can use this instead: # sort { lc $matches_href->{$a} <=> lc $matches_href->{$b} } %$matches_href; # before you sort, you can do whatever else you wanted to weed out duplicates.

You can probably stop the timeouts by sending dummy HTML every 500 lines or so (comments should work). I found that after about 4000 lines or so, the sorts can time out the browser. Writing your own manual database based on flatfiles, is kinda like re-inventing the wheel. MySQL is probably the best way to go, and well worth looking into.

Even if you get the thing to not time out, you still eat up proctime. If volume goes up, you will be right back where you were! If you can split out the db into smaller db's based on some criteria, you might get away with it.

Thanks for your input Jasmine... But I seem to have come to the conclusion, after some experimentation, that my "slurping" of the whole file is actually more efficient for me. Reason: I have to output nearly the entire file anyway after processing the data, so slurping it all into memory tends to allow it to output faster. :-)