(f)scanf Question - Grab String of Spaces

I have a file full of data that I want to tokenize. My function works
as long as the data I want to grab doesn't have padded whitespaces,
however, I want to preserve the padded whitespaces. Can I modify
fscanf to include them in the match?

When it gets to "MyKey3" it fails to match P3 thus returns 2
elements. I want P3 to be " ". Shouldn't "%32[^,]" be matching
anything but ",", aka spaces as well? A way around this? Different
way I should be tokenizeing such data?

Advertisements

On Apr 7, 2:13 pm, Eric Sosman <> wrote:
> NvrBst wrote:
> > I have a file full of data that I want to tokenize. My function works
> > as long as the data I want to grab doesn't have padded whitespaces,
> > however, I want to preserve the padded whitespaces. Can I modify
> > fscanf to include them in the match?
>
> > ---Example File---
> > MyKey1: INT, 3341, 1
> > MyKey2: STRING, Hello World, 1
> > MyKey3: STRING, , 1
>
> > --Format is Like so "KEYWORD: TYPE, Data1, Data2"---
> > fscanf(fFile, "%32[^:]: %32[^,], %32[^,], %d\n", P1, P2, P3, &P4);
>
> > When it gets to "MyKey3" it fails to match P3 thus returns 2
> > elements. I want P3 to be " ". Shouldn't "%32[^,]" be matching
> > anything but ",", aka spaces as well? A way around this? Different
> > way I should be tokenizeing such data?
>
> Yes, "%32[^,]" matches anything other than a comma, including
> spaces. But the spaces have already been swallowed by the " "
> you put right before it. If you want to preserve the spaces, don't
> write a " " directive to gobble them up. (If you want to gobble
> exactly one space, try "%*1[ ]" instead.)
>
> --
>

Ahh I didn't know " " gobbles more than one The %*1[ ] made
everything work perfectly. Thank you kindly

NvrBst <> wrote:
> I have a file full of data that I want to tokenize. My function works
> as long as the data I want to grab doesn't have padded whitespaces,
> however, I want to preserve the padded whitespaces. Can I modify
> fscanf to include them in the match?
>
>
> ---Example File---
> MyKey1: INT, 3341, 1
> MyKey2: STRING, Hello World, 1
> MyKey3: STRING, , 1
>
> --Format is Like so "KEYWORD: TYPE, Data1, Data2"---
> fscanf(fFile, "%32[^:]: %32[^,], %32[^,], %d\n", P1, P2, P3, &P4);
>
> When it gets to "MyKey3" it fails to match P3 thus returns 2
> elements. I want P3 to be " ". Shouldn't "%32[^,]" be matching
> anything but ",", aka spaces as well? A way around this? Different
> way I should be tokenizeing such data?
>
> Note: P1/P2/P3 are just "char[32+1]"'s. P4 is an int.
>
> Thanks in Advance; I'm using GNU GCC 4.3.2 on a Ubuntu Machine w/
> Latest Eclipse CDT.

I can suggest you to develop a self-made and overflow-free getline()
method to get the whole line in a file, something like this:

feof() doesn't do quite what you seem to be assuming it does. It can
be called *after* fgetc() returns EOF, to determine whether it did so
because it reached the end of the file or because it encountered an
error. You should check whether ch is equal to EOF (which is why it
needs to be an int) *instead* of calling feof(). See section 12 of
the comp.lang.c FAQ, <http://www.c-faq.com/>.

I don't think checking for '\r' makes much sense; if the file is in
text mode, and you're on a system where end-of-line is represented as
a CR LF pair, then that sequence will be converted to '\n' anyway. If
you're on a system where end-of-line is represented as '\n', you might
see '\r' in a text file copied from another system, but adding
special-case code for it is questionable.

Calling realloc() for each character read is likely to be inefficient,
and may cause heap fragmentation on some systems. A common scheme is
to double the allocated size when you run out of room; you can then do
a final realloc() to shrink it down to what's needed. The cast is
superfluous, and can mask errors.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

In article <>,
Keith Thompson <> wrote:
>Calling realloc() for each character read is likely to be inefficient,
>and may cause heap fragmentation on some systems. A common scheme is
>to double the allocated size when you run out of room; you can then do
>a final realloc() to shrink it down to what's needed.

All the realloc() implementations I've checked recently effectively do
this internally anyway. They don't have the quadratic behaviour you'd
get if they had to copy each time, or every Nth time. (This might not
be true if other *alloc() calls were interleaved with the realloc()s;
I didn't test that.)

True, but calling feof() still isn't the right way to check whether
you've run out of input. If there's an error, fgetc() will return EOF
and feof() will return 0 (but ferror() will return a true value).

You're right that feof() is called after fgetc() in the posted code,
and calling both feof() and ferror() would probably make the code
work, but checking whether fgetc() returned EOF is still better.
> By the way, feof() and ferror() can be called at any time
> on an open stream; there's no need to wait until some other
> I/O function indicates an abnormality.

Right -- but if you just called fgetc() and it *didn't* return EOF,
then feof() and ferror() should both return 0, and there's no point in
calling them.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Eric Sosman <> writes:
> Keith Thompson wrote:
>> Eric Sosman <> writes:
>>> [...]
>>> By the way, feof() and ferror() can be called at any time
>>> on an open stream; there's no need to wait until some other
>>> I/O function indicates an abnormality.
>>
>> Right -- but if you just called fgetc() and it *didn't* return EOF,
>> then feof() and ferror() should both return 0, and there's no point in
>> calling them.
>
> feof() and ferror() are "sticky:" if a transient failure bollixes
> one I/O operation and then a subsequent operation succeeds, the success
> of the second does not clear the stream's eof or error indicator. One
> situation where this arises with some frequency is in handling input
> from an interactive device that allows further input after an end-of-
> input indication like ^Z or ^D: One fgetc() could return EOF due to
> the transient end-of-input condition, and the next fgetc() could
> succeed and return an actual input character. feof() would return
> true even after the second fgetc() succeeded.

If the end-of-file indicator for the stream is set, or if the
stream is at end-of-file, the end-of-file indicator for the stream
is set and the fgetc function returns EOF. Otherwise, the fgetc
function returns the next character from the input stream pointed
to by stream. If a read error occurs, the error indicator for the
stream is set and the fgetc function returns EOF.

In other words, the standard doesn't allow for a "transient
end-of-input condition", though you can explicitly reset it calling
fseek(stream, 0, SEEK_CUR).

But a quick experiment shows that at least one implementation doesn't
behave as the standard specifies; fgetc() can return something other
than EOF even when the end-of-file indicator is set.
> Perhaps a more usual case is to "summarize" the outcome of a lot
> of I/O operations, as an alternative to testing each one for failure
> at the time it's attempted. For example, a program might make a large
> number of fprintf() calls from a large number of places in the code,
> such that testing each individual fprintf()'s return value would be
> cumbersome. As an alternative, the program could simply ignore the
> returned values until the very end, finishing up with something like
>
> if (ferror(stream)) {
> ... something went wrong ...
> }
> else if (fclose(stream) != 0) {
> ... something else went wrong ...
> }
> else {
> ... all is well ...
> }

Yes, that can work (but it's not what's going on in the posted code,
which calls feof() after each fgetc()).
> (Of course, this technique is not a panacea. If the program's
> fourth fprintf() call bumps up against "disk quota exceeded," it would
> be nicer to discover the problem fairly promptly than to wait until
> after another four million fprintf()'s had also failed ...)

It's also possible that one fprintf() call can hit a "disk quota
exceeded" error, but the next call, either because it produces less
output or because some disk space has been freed, might be successful,
so you could get gaps in your output. If the implementation is
conforming, the error flag should prevent any further fprintf() calls
from succeeding until the flag is reset, but if the implementation
doesn't conform -- well, then there are no guarantees anyway.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Guest

Eric Sosman <> wrote:
>
> feof() and ferror() are "sticky:" if a transient failure bollixes
> one I/O operation and then a subsequent operation succeeds, the success
> of the second does not clear the stream's eof or error indicator. One
> situation where this arises with some frequency is in handling input
> from an interactive device that allows further input after an end-of-
> input indication like ^Z or ^D: One fgetc() could return EOF due to
> the transient end-of-input condition, and the next fgetc() could
> succeed and return an actual input character.

Not in a conforming C implementation. It's the underlying end-of-file
and error indicators that are sticky, not feof() and ferror(). And
fgetc() is required to fail if either of the indicators is set, it is
not allowed to return any subsequent input, even if there is some. If
you want to read past EOF (when that's possible), you have to call
clearerr() to reset the indicators first.
--
Larry Jones

Mom would be a lot more fun if she was a little more gullible. -- Calvin

wrote:
> Eric Sosman <> wrote:
>> feof() and ferror() are "sticky:" if a transient failure bollixes
>> one I/O operation and then a subsequent operation succeeds, the success
>> of the second does not clear the stream's eof or error indicator. One
>> situation where this arises with some frequency is in handling input
>> from an interactive device that allows further input after an end-of-
>> input indication like ^Z or ^D: One fgetc() could return EOF due to
>> the transient end-of-input condition, and the next fgetc() could
>> succeed and return an actual input character.
>
> Not in a conforming C implementation. It's the underlying end-of-file
> and error indicators that are sticky, not feof() and ferror(). And
> fgetc() is required to fail if either of the indicators is set, it is
> not allowed to return any subsequent input, even if there is some. If
> you want to read past EOF (when that's possible), you have to call
> clearerr() to reset the indicators first.

I think you're right about feof(), because 7.19.7.1p2
says that fgetc() fails if the eof indicator is set, and all
the other input functions work "as if" by calling fgetc().
But I don't see any similar language about ferror() and the
error indicator. Can you offer a citation?

NvrBst <> wrote:
> I have a file full of data that I want to tokenize. My function works
> as long as the data I want to grab doesn't have padded whitespaces,
> however, I want to preserve the padded whitespaces. Can I modify
> fscanf to include them in the match?
>
>
> ---Example File---
> MyKey1: INT, 3341, 1
> MyKey2: STRING, Hello World, 1
> MyKey3: STRING, , 1
>
> --Format is Like so "KEYWORD: TYPE, Data1, Data2"---
> fscanf(fFile, "%32[^:]: %32[^,], %32[^,], %d\n", P1, P2, P3, &P4);
>
> When it gets to "MyKey3" it fails to match P3 thus returns 2
> elements. I want P3 to be " ". Shouldn't "%32[^,]" be matching
> anything but ",", aka spaces as well? A way around this? Different
> way I should be tokenizeing such data?
>
> Note: P1/P2/P3 are just "char[32+1]"'s. P4 is an int.
>
> Thanks in Advance; I'm using GNU GCC 4.3.2 on a Ubuntu Machine w/
> Latest Eclipse CDT.

Guest

Eric Sosman <> wrote:
>
> I think you're right about feof(), because 7.19.7.1p2
> says that fgetc() fails if the eof indicator is set, and all
> the other input functions work "as if" by calling fgetc().
> But I don't see any similar language about ferror() and the
> error indicator. Can you offer a citation?

Share This Page

Welcome to The Coding Forums!

Welcome to the Coding Forums, the place to chat about anything related to programming and coding languages.

Please join our friendly community by clicking the button below - it only takes a few seconds and is totally free. You'll be able to ask questions about coding or chat with the community and help others.
Sign up now!