If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register or Login
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Read binary file with line delimeter

Hello to all,

First post in this concurred forum. I hope someone could help me.

I want to read a binary file using as line separator "ff77" in order to parse further each line one by one with some regex
since the file is big. I have a small ruby code shown below, but I'm new in C++, and I don't know how to replicate in C++
what this ruby code does.

Code:

#!/usr/bin/env ruby
BEGIN{ $/="\xff\x77" } # Line separator = FF77
File.open(ARGV[0],"rb") # Open in binary mode# Process each line one by one
while gets
line = $_.unpack('H*')[0] #Storing the bytes for each line in "line "variable
next unless line =~ /(..)(\d+)([A-B])/ # Regex with back-reference
printf("%d %s %s\n",$1,$2,$3) #Printing backreferenced patterns
end

I've been looking for a way to set the line delimeter and found getline function, but it seems getline only accepts one character
and I need 4 characters as line separator.

Re: Read binary file with line delimeter

I want to read a binary file using as line separator "ff77" in order to parse further each line one by one with some regex

Opening a file in binary mode means that you're on your own and you get no help from C++ as to what is or are "end-of-line" character(s). That luxury goes to opening a file in text mode (and even that is limited).

In other words, there is no such thing as a "line separator" to the C++ stream when you open a file in binary mode. You have to parse the line yourself with the knowledge of what is a "line separator".

Re: Read binary file with line delimeter

Hello Paul,

Thanks for the answer. The term "line separator" I've used like a way to separate the data by blocks, since each block begins with begins with
77 and ends with FF. So, when FF77 is found it means a new block begins.

The issue is I don't know how to separate each block to parse it one at a time.

Re: Read binary file with line delimeter

Originally Posted by Philidor

Hello Paul,

That is something similar to what I'm asking for help, I'm really a newbie in programming, the ruby code wasn't done by me.

Then you need someone already versed in C++ or programming in general to write this code. Or take the time to learn how to conceptualize a problem, write a plan on how to solve the problem using pencil and paper (no code), and then translate what you wrote to C++ code.

Maybe use an if statement to match ff 77 to know where begins a block. Maybe exists method more directly in C++,

There isn't one. C++ is not Ruby, and I think this was your initial mistake. You equated what you can do with Ruby in one or two lines of code, and hoped that C++ could do the same thing with similar effort. That is not the case.

For C++, and really, any programming language you have to:

1) Read a block into memory.
2) Search the block of memory for your delimited string sequence.
3) While doing this, retain where the text began and where the delimiter was found -- between these two points is the text.
4) Save this text in some sort of container.
5) Skip over the found delimiter, set the pointer to the characters after the delimiter, and repeat steps 2 through 5.
...
Basically, it is a delimited file parser, with the delimiter equals "ff77". This is not trivial if you don't know how to write a program. Throw into the mix that you have to read the file in chunks, so you have to check to see if you read only enough to get a "partial line", and know that your next read will give you the rest of that line.

Maybe you or somebody else could help me to be able to store each block in a variable to have the option to
parse this string later.

You want a comma-delimited file parser program or function (but allow the "comma" to be some other set of characters that delimits the text). That is as close as you can come to a "canned solution" in C++ (even though it isn't really canned, it's just that someone wrote the function to do so).

Re: Read binary file with line delimeter

Hello Paul,

Thanks for the help.

I've been able to do steps 2 to 4 and partially 5, since I'm don't know how to set the correct condition for the "while loop" to stops when any other delimiter is found in the current block of memory that is being read.

What I've done is:

Code:

while (not end of current block of memory) { // This is the condition I don't know how could be
x1 = curr_string.find("ff77",x2-1,4);
x2 = curr_string.find("ff77",x1+1,4);
string temp=curr_string.substr(x1, x2 - x1);
}

Re: Read binary file with line delimeter

Originally Posted by Philidor

Hello Paul,

Thanks for the help.

I've been able to do steps 2 to 4 and partially 5, since I'm don't know how to set the correct condition for the "while loop" to stops when any other delimiter is found in the current block of memory that is being read.

You know how big the block is. The string variable has a size() argument.

Why not start with something simple? Assume the file is comma delimited (a simple 1 character delimiter), and you had to extract the text between the commas. Forget about file, how about a simple hard-coded string:

Re: Read binary file with line delimeter

Hello Paul,

Thanks for the suggestion, I'll try to think how to get a function that works for this.

One question, this way would be fine thinking that the real file I need to read is more than 2 GB? since I think if I'll need to read for example 1000 bytes and apply the code you suggests me or open the complete file, I don't know.

Re: Read binary file with line delimeter

Originally Posted by Philidor

One question, this way would be fine thinking that the real file I need to read is more than 2 GB? since I think if I'll need to read for example 1000 bytes and apply the code you suggests me or open the complete file, I don't know.

What you would do is read (much more than) 1000 bytes into a buffer. Then you parse the buffer for the character sequence that terminates each line.

The issue is that if your read straddles a line or the character sequence, which means that the next read of 1,000 bytes completes the string (or line terminator) and you have to take that into consideration.

Re: Read binary file with line delimeter

Originally Posted by Paul McKenzie

What you would do is read (much more than) 1000 bytes into a buffer. Then you parse the buffer for the character sequence that terminates each line.

The issue is that if your read straddles a line or the character sequence, which means that the next read of 1,000 bytes completes the string (or line terminator) and you have to take that into consideration.

Regards,

Paul McKenzie

Hello Paul,

Thanks for your reply, I'm taking your suggestions and I've been trying with the code below, the positions where commas ocurre are fine, but I get errors (Run exit value 1) to assing the substring to the V[i] (in red).

I'm putting the condition "pos2<10000" because when a value is not found I receive the value 18446744073709551615.

There are also some logic errors (you only need 1 find in the while loop) but stepping through the code with the debugger and comparing the result with the function design should enable these to be found fairly easily.

Last edited by 2kaud; October 9th, 2013 at 06:38 PM.

All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.
C, C++ Compiler: Microsoft VS2015

Re: Read binary file with line delimeter

Originally Posted by Philidor

Hello Paul,

Thanks for your reply, I'm taking your suggestions and I've been trying with the code below, the positions where commas ocurre are fine, but I get errors (Run exit value 1) to assing the substring to the V[i] (in red).

Well, one thing is that you should not assume your string is less than 10,000 characters.

Code:

while (pos2<10000) {

The std::string has a size() function that returns you the number of characters. You should be using the value of size(), and not hard-code 10,000.

I'm putting the condition "pos2<10000" because when a value is not found I receive the value 18446744073709551615.

Re: Read binary file with line delimeter

Hello 2kaud and Paul,

Thanks for your help. I was able to do a function to return Vector elements as Paul said with comma delimiters and then I've changed to "FF77" and the code below it seems to work. The element "Test1" is not consider since in the real file the first characters shouldn't be consider, so that part is not incorrect.

I deleted 1 find in the loop, maybe you can see if the code so far has some issues or something to improve.

And besides any issue you can see that could be improved, I have 2 problems,
1- I get exit value 1 using the 2 lines in red to get the position of last field separator.
2- I wanted to replace with a variable the delimiter string, but for some reason the error says that is expected 2 parameters and provided 3 (this if I use the line in blue and replace "FF77" with Sep in all places).

Re: Read binary file with line delimeter

Code:

for (int i=0; i<=sVector.size();i++)

You are going beyond the bounds of the vector. Vectors (and arrays) in C++ start from 0 and go to n-1, where "n" is the number of elements. If that vector has 10 elements in it, you are erroneously going from 0 to 10 instead of 0 to 9. That's why you have a failure at the end of your program.