Counting number of words in a text file...

Posted 18 December 2009 - 11:29 PM

Hi,

I'm was attempting to write a program that counts number of words in a given text file (sorry I didn't have the program write a txt file first before counting, so you'll probably need to make an output.txt file first). The problem I'm having is that the program is not telling me the right number of words in said text file. (NOTE: this is not any homework assignment I have to do, just something I'm doing myself since I'm in the process of learning C programming) Thanks for all the help in advance!!

Replies To: Counting number of words in a text file...

Re: Counting number of words in a text file...

Posted 18 December 2009 - 11:35 PM

Welcome to DIC!

Please give us some more details of your problem.
( a ) Does your code compile?
( b ) Any errors or warnings? If there are then share them with us.
( c ) Is the program producing any output?
( d ) How is the actual output different to what you want / expect? Give details and, ideally, examples.

Re: Counting number of words in a text file...

Posted 18 December 2009 - 11:41 PM

janotte, on 18 Dec, 2009 - 10:35 PM, said:

Welcome to DIC!

Please give us some more details of your problem.
( a ) Does your code compile?
( b ) Any errors or warnings? If there are then share them with us.
( c ) Is the program producing any output?
( d ) How is the actual output different to what you want / expect? Give details and, ideally, examples.

Thank you for quick response.

If you did not see my changes I made to original post, please reread it again.

1. My code does compile without any errors.
2. I am using Microsoft Visual C++ 2008 Express Edition to compile the code.
3. I made a simple output.txt file and it has only one word .... text.
4. When attempting to run my code via the command prompt (or CMD) of windows vista, program tells me that I have 2 words in my text file. Program failed, as everyone knows that a text file containing the word text only has one word.

Re: Counting number of words in a text file...

So your test file "output.txt" contains only a single word?
Is that what this sentence means?

Yes, that is correct. The oputput.txt file I am testing with, when you open up the file you see ONE SINGLE WORD. (I've attached the output.txt file if you still want to see it)

Using if ((s = fgetc(f))!= ' ') makes the program tell me that output.txt file as 4 words, and using if (s != ' ') tells me that the output.txt has 3 words. Both which are incorrect.

As far as I understand, fget(f) would be returning a value for the character it is reading, so s is receiving it. I think that's right. Correct me if I wasn't, but that's what I was thinking.

So here is my code once again in full (I've added comments this time, so you can see what I was thinking when I typed the code)

#include "stdafx.h"
#include <stdio.h>
int main()
{
int count;
FILE *f; [color=#006600]//file pointer declared, so that program can keep track of the file being accessed[/color]
char s; [color=#006600]//character array here to hold the characters[/color]
count = 0;
f=fopen("output.txt","r"); [color=#006600]//program will now go to location open up output.txt in read only mode.[/color]
while ((s = fgetc(f))!= EOF) [color=#006600]//S is a character, and by using fgetc(f, the character array would have characters picked up from the txt file stored in memory and this will continue until EOF is reached.[/color]
{
if ((s != ' ')
count++; [color=#006600]//program will check if the character at any one given point and Add one to the counter.[/color]
}
fclose(f);
printf ("\n%d words in output.txt.\n", count); [color=#006600]//program will print out final number stored in counter.[/color]
return 0;
}

I realize now that there is a problem, it seems that I am just counting the number of characters there are in the text file. That's not what I want. I wanted number of words. So then I'm now thinking, how do fix code so program would recognize whether or not I'm in the middle of a word and not count it? So meaning, if file contains the words TEXT TEXT, it would know that the first T is start of a word (so it's counted as a word) then it skips along until we get to the third T, which it would recognize as start of second word and just stop counting.

Thanks for being patient with me and helping me out one step at a time.

Re: Counting number of words in a text file...

Posted 19 December 2009 - 08:51 PM

Hi x2x3i5x,

I think you may want to take a look here:

while ((s = fgetc(f))!= EOF) [color=#006600]//S is a character, and by using fgetc(f, the character array would have characters picked up from the txt file stored in memory and this will continue until EOF is reached.[/color]
{
if ((s != ' ')
count++; [color=#006600]//program will check if the character at any one given point and Add one to the counter.[/color]
}

In the nested if statement if((s != ' '), it looks like to me your are checking to see if s is not a space, if it's not a space then increment count. Wouldn't you want to do the opposite if you are checking for words? You would want to test s to see if it equals a space, then if it does equal a space that means you are encountering the end of a word, and a new word will start on the next character.

So I think something like this would count the words:

while ((s = fgetc(f))!= EOF) [color=#006600]//S is a character, and by using fgetc(f, the character array would have characters picked up from the txt file stored in memory and this will continue until EOF is reached.[/color]
{
if (s == ' ')
count++; [color=#006600]//program will check if the character at any one given point and Add one to the counter.[/color]
}

Now I didn't really test that, so I'm sorry if it doesn't work. But I hope you at least get the idea.

Re: Counting number of words in a text file...

Posted 19 December 2009 - 09:09 PM

Fib, on 19 Dec, 2009 - 07:51 PM, said:

Hi x2x3i5x,

I think you may want to take a look here:

while ((s = fgetc(f))!= EOF) [color=#006600]//S is a character, and by using fgetc(f, the character array would have characters picked up from the txt file stored in memory and this will continue until EOF is reached.[/color]
{
if ((s != ' ')
count++; [color=#006600]//program will check if the character at any one given point and Add one to the counter.[/color]
}

In the nested if statement if((s != ' '), it looks like to me your are checking to see if s is not a space, if it's not a space then increment count. Wouldn't you want to do the opposite if you are checking for words? You would want to test s to see if it equals a space, then if it does equal a space that means you are encountering the end of a word, and a new word will start on the next character.

So I think something like this would count the words:

while ((s = fgetc(f))!= EOF) [color=#006600]//S is a character, and by using fgetc(f, the character array would have characters picked up from the txt file stored in memory and this will continue until EOF is reached.[/color]
{
if (s == ' ')
count++; [color=#006600]//program will check if the character at any one given point and Add one to the counter.[/color]
}

Now I didn't really test that, so I'm sorry if it doesn't work. But I hope you at least get the idea.

I hope that helps!

No it doesn't work. When the program now tells me I have no words in my output.txt. But there is the one word "text" in the txt file so program gave me wrong answer.

Re: Counting number of words in a text file...

Posted 19 December 2009 - 09:30 PM

Hmm I think if you initialize count to 1 then it will be correct, because if there is only one word in the text file it will not encounter a space, if there are multiple words then it will actually encounter a space. So that's why I think count should be initialized to one.

Try putting more words in the text file and testing it. Then let me know.

After double checking setting count = 0 worked fine. I don't know why it didn't work before, but now it does oddly. I have one problem still ... when I have extra spaces between words, answer will be off by one. Else if I write a sentence in the text file as normal, then program counts the number of words correctly. How would I prevent the program from incorrectly counting number of words if there is (accidentally) one too many space between a word or between words?

I have one problem still ... when I have extra spaces between words, answer will be off by one. Else if I write a sentence in the text file as normal, then program counts the number of words correctly. How would I prevent the program from incorrectly counting number of words if there is (accidentally) one too many space between a word or between words?

Thanks for all the help so far!!

i did a problem like this but in a string maybe you could make the file into a string then use strtok() to do it here is my code maybe it could help you.

I have one problem still ... when I have extra spaces between words, answer will be off by one. Else if I write a sentence in the text file as normal, then program counts the number of words correctly. How would I prevent the program from incorrectly counting number of words if there is (accidentally) one too many space between a word or between words?

Thanks for all the help so far!!

i did a problem like this but in a string maybe you could make the file into a string then use strtok() to do it here is my code maybe it could help you.

1. How to get program to correctly print out that # of words is 0 when the text file is empty? I was thinking of either checking of first character is EOF (by way of fgetc) or by checking if string[0] is a null, but I somehow can't get it working....

As far as I'm getting, program always tell me (incorrectly) that text file has 1 word if text file is empty.

2. If I have following in two words separated by a long empty line (see file attached if you don't understand what I mean), program will report that I have only one word. How to fix that?

Your program does work, but few questions here:
1. what is size_t n;? I get rid of that and simply declare n after r in the int declaration part and code still works
2. what does "%*100s" mean in when you wrote "r = fscanf(fp, "%*100s""?

Re: Counting number of words in a text file...

Posted 24 December 2009 - 12:53 AM

x2x3i5x said:

what is size_t n;?

if you can have more than 32 767 words in the file this type gives you ability for 4294967295 words (size_t type is the greatest unsigned type)
you can also use the long int type for 4294967296/2 - 1 words

x2x3i5x said:

what does "%*100s" mean

* - don't save the string
100 - a maximal length of the word (if you have a word that is longer than 100 characters it will be splitted to two words, but if you have a word for five characters it will take only those five characters and no more)
also the variant with flags is available, but I thought it could be harder for you