Storing/processing binary file input help needed

I need to read a binary file and store it into a buffer in memory (system
has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the
whole file has been processed. I haven't worked with binary files before so
I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since
that's what the function is taking as input? Or do I need to know the format
of the original data that binary file is encoding and store it in that?
That's the part that is really confusing me.

I believe I'll need to used fread to copy the file to that array. I plan on
getting the size of file, then determining how many DWORD are present in it
(for example 9000) and use that my number of object parameter in fread. So
in this case:

Once I get the file into the buffer, I can then do a loop where I pass 512
elements of the array to a function until all 9000 elements are processed. I
hope that's right. Any other tips on improving speed and efficiency would be
appreciated. Thanks.

Advertisements

On Tue, 06 Jan 2004 08:10:52 +0000, Arnold wrote:
> Once I get the file into the buffer, I can then do a loop where I pass 512
> elements of the array to a function until all 9000 elements are processed. I
> hope that's right. Any other tips on improving speed and efficiency would be
> appreciated. Thanks.

As an alternative to the mmap solution from Glanni, the easiest way to do
this would be to read 512 words, process them, write back result, repaet
until end-of-file. No need to read the whole file in memory.

You can write back results in place, if they should occupy the same
storage, ro to some other file. If the data has to be replaced, it is
often best to write the output to a new file, then move the new file over
the old file. That way you will not corrupt the original file if your
program crashes half way through.

Advertisements

I am not a C wizard but I have some suggestions.
> I need to read a binary file and store it into a buffer in memory (system
> has large amount of RAM, 2GB+) then pass it to a function. The function
> accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
> words to it at a time. So I would pass them in chunks of 512 words until the
> whole file has been processed.

By the term "words" means to say that it is a chunk of chars and a
delimiters with an ASCII space? Or each "words" size is 512 bytes?
> I'm confused with how to store the binary file into memory. What sort of
> array do I use? Does C allow char only? Can I declare a DWORD buffer since
> that's what the function is taking as input? Or do I need to know the format
> of the original data that binary file is encoding and store it in that?
> That's the part that is really confusing me.

By the term binary file and file format are you talking about the
first two letters in a file according to the DOS assembly language
(example MZ in .exe file) or the format of data present in a file
(fields and record with a kind of delimiter). If it is the second then
it is more related with the file's record design concept.
>
> I believe I'll need to used fread to copy the file to that array. I plan on
> getting the size of file, then determining how many DWORD are present in it
> (for example 9000) and use that my number of object parameter in fread. So
> in this case:
>
> fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
> file
>
> Is that right?

Just 512 elements or unknown during the run time? Is not the time to
take up with linked list rather than using array data type?

> Once I get the file into the buffer, I can then do a loop where I pass 512
> elements of the array to a function until all 9000 elements are processed. I
> hope that's right. Any other tips on improving speed and efficiency would be
> appreciated. Thanks.

Optimizing in C is not a kind of "instructions management" like in
asm.

"Martijn Lievaart" <> wrote in message
news...
> On Tue, 06 Jan 2004 08:10:52 +0000, Arnold wrote:
>
> > Once I get the file into the buffer, I can then do a loop where I pass
512
> > elements of the array to a function until all 9000 elements are
processed. I
> > hope that's right. Any other tips on improving speed and efficiency
would be
> > appreciated. Thanks.
>
> As an alternative to the mmap solution from Glanni, the easiest way to do
> this would be to read 512 words, process them, write back result, repaet
> until end-of-file. No need to read the whole file in memory.

I thought of that but speed is a concern so I want to keep the number of
disk accesses at a minimum.
>
> You can write back results in place, if they should occupy the same
> storage, ro to some other file. If the data has to be replaced, it is
> often best to write the output to a new file, then move the new file over
> the old file. That way you will not corrupt the original file if your
> program crashes half way through.
>

In my case, I don't have to write any data back to the original file. Thanks
for the suggestions.
> HTH,
> M4
>

"sathyashrayan" <> wrote in message
news:...
> "Arnold" <> wrote in message
news:<g4uKb.395$>...
>
> I am not a C wizard but I have some suggestions.
>
> > I need to read a binary file and store it into a buffer in memory
(system
> > has large amount of RAM, 2GB+) then pass it to a function. The function
> > accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
> > words to it at a time. So I would pass them in chunks of 512 words until
the
> > whole file has been processed.
>
> By the term "words" means to say that it is a chunk of chars and a
> delimiters with an ASCII space? Or each "words" size is 512 bytes?

Each word is a DWORD, so each one is 32 bits. I can pass a maximum of 512
DWORDs at a time to the function.

> > I'm confused with how to store the binary file into memory. What sort of
> > array do I use? Does C allow char only? Can I declare a DWORD buffer
since
> > that's what the function is taking as input? Or do I need to know the
format
> > of the original data that binary file is encoding and store it in that?
> > That's the part that is really confusing me.
>
> By the term binary file and file format are you talking about the
> first two letters in a file according to the DOS assembly language
> (example MZ in .exe file) or the format of data present in a file
> (fields and record with a kind of delimiter). If it is the second then
> it is more related with the file's record design concept.

It is the second.
>
> >
> > I believe I'll need to used fread to copy the file to that array. I plan
on
> > getting the size of file, then determining how many DWORD are present in
it
> > (for example 9000) and use that my number of object parameter in fread.
So
> > in this case:
> >
> > fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my
binary
> > file
> >
> > Is that right?
>
>
> Just 512 elements or unknown during the run time? Is not the time to
> take up with linked list rather than using array data type?
>

512 is the maximum the function can handle at a time so that is fixed,
except for the last iteration though as the file won't have a multiple of
512 number of DWORDs.
>
>
> > Once I get the file into the buffer, I can then do a loop where I pass
512
> > elements of the array to a function until all 9000 elements are
processed. I
> > hope that's right. Any other tips on improving speed and efficiency
would be
> > appreciated. Thanks.
>
> Optimizing in C is not a kind of "instructions management" like in
> asm.

"Arnold" <> wrote in message
news:g4uKb.395$...
> I need to read a binary file and store it into a buffer in memory (system
> has large amount of RAM, 2GB+) then pass it to a function. The function
> accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
> words to it at a time. So I would pass them in chunks of 512 words until
the
> whole file has been processed. I haven't worked with binary files before
so
> I'm confused with how to store the binary file into memory. What sort of
> array do I use? Does C allow char only? Can I declare a DWORD buffer since
> that's what the function is taking as input? Or do I need to know the
format
> of the original data that binary file is encoding and store it in that?
> That's the part that is really confusing me.
>
> I believe I'll need to used fread to copy the file to that array. I plan
on
> getting the size of file, then determining how many DWORD are present in
it
> (for example 9000) and use that my number of object parameter in fread. So
> in this case:
>
> fread(buffer, 4,9000,fp); file://each DWORD is 4 bytes, 900 DWORDs in my
binary
> file
>
> Is that right?
>

You don't need to read the whole file, you can read 512 bytes at a time into
a buffer of appropriate size:

char buffer[512];
x=fread(buffer,512 1, fp); // don't forget to check the value of x (which
is the number of bytes actually read)
...

You can then pass a pointer to this buffer to you function which has been
prototyped to accept an
array of DWORD, and the number of elements to process (which will be x/4
from the fread above)
e.g.

Of course this makes an assumption that the data in the file is stored in
the same byte order as the processor you are running your program on (most
likely you are using an Intel Pentium so Little-Endian is the byte order you
are assuming). If the file uses another byte order then you can write
(or google for) a macro that will do the conversion for you..

On Tue, 06 Jan 2004 08:10:52 GMT, "Arnold" <>
wrote:
>I need to read a binary file and store it into a buffer in memory (system
>has large amount of RAM, 2GB+) then pass it to a function. The function
>accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
>words to it at a time. So I would pass them in chunks of 512 words until the
>whole file has been processed. I haven't worked with binary files before so
>I'm confused with how to store the binary file into memory. What sort of
>array do I use? Does C allow char only? Can I declare a DWORD buffer since
>that's what the function is taking as input? Or do I need to know the format
>of the original data that binary file is encoding and store it in that?
>That's the part that is really confusing me.

The I/O function (fread as you suggest below) does not care how you
define the buffer. However, how you use the buffer may make a
difference. If you define the buffer as unsigned char, then you are
guaranteed that all possible 256 values are acceptable (unsigned char
cannot have trap values) and the buffer will be portable (at least for
systems which have CHAR_BIT defined as 8). If you define the buffer
as DWORD, are you sure that all 4 billion plus possible values that
could come from a binary file are acceptable and your program will
never execute on a machine with a different sizeof(unsigned long)?
>
>I believe I'll need to used fread to copy the file to that array. I plan on
>getting the size of file, then determining how many DWORD are present in it
>(for example 9000) and use that my number of object parameter in fread. So
>in this case:

There is no portable way to get the file size (unless you read the
entire file) so you probably need to use a system specific extension
or function for this.
>
>fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
>file

You meant 9000.
>
>Is that right?
>
>Once I get the file into the buffer, I can then do a loop where I pass 512
>elements of the array to a function until all 9000 elements are processed. I
>hope that's right. Any other tips on improving speed and efficiency would be
>appreciated. Thanks.

How you pass a quantity of array elements will determine the
suitability of your design. (Actually, the method of passing the
argument(s) should drive the design.) What is the prototype for the
receiving function?

The odds on the file containing an exact multiple of 512 DWORDs is
about 1 in 500 so you may want to be able to handle the last set as a
smaller quantity.

On Tue, 06 Jan 2004 18:01:29 +0000, Arnold wrote:
>
> "Martijn Lievaart" <> wrote in message
> news...
>> On Tue, 06 Jan 2004 08:10:52 +0000, Arnold wrote:
>>
>> > Once I get the file into the buffer, I can then do a loop where I pass
> 512
>> > elements of the array to a function until all 9000 elements are
> processed. I
>> > hope that's right. Any other tips on improving speed and efficiency
> would be
>> > appreciated. Thanks.
>>
>> As an alternative to the mmap solution from Glanni, the easiest way to do
>> this would be to read 512 words, process them, write back result, repaet
>> until end-of-file. No need to read the whole file in memory.
>
> I thought of that but speed is a concern so I want to keep the number of
> disk accesses at a minimum.

Memory mapping the file is probably still the best way, but suffers of a
size limit. To get around this, you can also read in large chunks of the
file. Instead of 512 words, read a few 100KB at the time and operate on
that. Experiment with buffer sizes to see what gives the best result.

I'm not sure what will be faster. Large buffers reduce the number of
system calls slightly (good), but decrease locality of reference (bad).
The mmap solution does not suffer either of these disadvantages I think.

Note that the number of disk accesses will be the same whatever solution
you chose. You have to read the whole file, period. I guess the main speed
factors are the number of system calls and how effectively you use your
memory. Also, you should try to do some useful work while waiting for the
disk, maybe asynchronous I/O or multithreading can be of help?

(If you look into multithreading, be sure you know what synchronisation
machisms are lightweight and which are heavyweight, huge difference).

I would just try a simple solution. If it isn't fast enough, try others.
Profile to see where your program spends its time. If most of the time is
spend on calculations, all of the above will give only very marginal
speedups. If run on a fast machine, maybe a naive implementation will be
fast enough for your needs. Remember the old truism about optimizing:
Don't (until you have proven you need it).

Share This Page

Welcome to The Coding Forums!

Welcome to the Coding Forums, the place to chat about anything related to programming and coding languages.

Please join our friendly community by clicking the button below - it only takes a few seconds and is totally free. You'll be able to ask questions about coding or chat with the community and help others.
Sign up now!