I have been playing around with parsing text files. Basically what I want to do is to create a tokenizer (Lexical analysis - Wikipedia, the free encyclopedia), but I am unsure what the best way to do it is. The way I am doing now is to read one character at a time and build the tokens from that, but I am not sure that is very efficient. Anyone has an idea? What is the best way to do this? For a beginner, that is, this is something I am quite new to.

Also, I have been wanting to try the java.nio package. Would this be a good use of it?

And if you wonder about any code, well, my code is so far from functional it wouldnt serve any actual purpose to post it. Quite literally... it does nothing at all.

To clarify, though, I didnt mean that I am new to IO but to parsing text files. Really not sure how to do it (and I am of course talking about the IO part of parsing a text file). But looking at that link I think it will still help me :). Thanks! Will return if I have more questions.

December 24th, 2011, 11:14 AM

Norm

Re: What is the best way to read from a text file?

Quote:

the IO part of parsing a text file)

Do you mean you want to use the file I/O to move around in the text on disk vs reading the text into a buffer/array/String and moving around in the text in memory?

December 26th, 2011, 07:40 AM

Kerr

Re: What is the best way to read from a text file?

The second one I guess. Let me take an example. If the first character the tokenizer finds is a letter it will create a string containing that letter and all letters that follows. When it encounters something that is not a letter it will just stop building the string and return it. That string is a token. The next time the tokenizer is called, it will continue from where it stopped. This time it may encounter a digit. So it will create a new string, this time from the digit and all the digits that directly follows it, until it encounters a character that is not a digit. And so on, until the end of the file is reached.

Just not sure what the best way to do something like this is. Or if this is the best approach to the solution.

December 26th, 2011, 08:09 AM

Norm

Re: What is the best way to read from a text file?

That sounds like a reasonable approach.
The String class has methods for looking at the contents of a String and for extracting parts of the String.

December 26th, 2011, 08:10 AM

Mr.777

Re: What is the best way to read from a text file?

Quote:

Originally Posted by Kerr

The second one I guess. Let me take an example. If the first character the tokenizer finds is a letter it will create a string containing that letter and all letters that follows. When it encounters something that is not a letter it will just stop building the string and return it. That string is a token. The next time the tokenizer is called, it will continue from where it stopped. This time it may encounter a digit. So it will create a new string, this time from the digit and all the digits that directly follows it, until it encounters a character that is not a digit. And so on, until the end of the file is reached.

Just not sure what the best way to do something like this is. Or if this is the best approach to the solution.

What's your approach? What are your constraints, you must need to follow? It varies, how developers think.

December 26th, 2011, 08:27 AM

Kerr

Re: What is the best way to read from a text file?

Quote:

Originally Posted by Norm

That sounds like a reasonable approach.
The String class has methods for looking at the contents of a String and for extracting parts of the String.

I suppose I could just read the entire file into a string and then move through it. Would it be better then using a BufferedReader (like I am now)?

December 26th, 2011, 08:29 AM

Kerr

Re: What is the best way to read from a text file?

Quote:

Originally Posted by Mr.777

What's your approach? What are your constraints, you must need to follow? It varies, how developers think.

I am not sure, hence the question. Quite new to this, and my way of learning tend to be to jump into it head first and see what works and not. So you can say that this is just one giant training exercise for me :p.

The closest thing to an approach is the one I describe above, though.

December 26th, 2011, 08:45 AM

Mr.777

Re: What is the best way to read from a text file?

Quote:

Originally Posted by Kerr

I suppose I could just read the entire file into a string and then move through it. Would it be better then using a BufferedReader (like I am now)?

Well, reading the entire file into a String can be quiet expensive operation. Assume if file is 600 MB (i know it's too huge but what if we assume)

December 26th, 2011, 08:57 AM

Kerr

Re: What is the best way to read from a text file?

Quote:

Originally Posted by Mr.777

Well, reading the entire file into a String can be quiet expensive operation. Assume if file is 600 MB (i know it's too huge but what if we assume)

I know, which is one of the reasons I hesitate to do that. Dont think it is too much of a problem, since this is not a serious thing, but I prefer to do it in a good way. Atm I am using a BufferedReader, and I think I may stick with that.

December 26th, 2011, 09:19 AM

Norm

Re: What is the best way to read from a text file?

You can do some buffering yourself. Just like the getToken method will move thru a String that you have read token by token, you can call the BufferedReader methods to get the next line when the current String in memory has been used up to go thru the lines in the file one by one.

December 26th, 2011, 10:13 AM

Kerr

Re: What is the best way to read from a text file?

Quote:

Originally Posted by Norm

You can do some buffering yourself. Just like the getToken method will move thru a String that you have read token by token, you can call the BufferedReader methods to get the next line when the current String in memory has been used up to go thru the lines in the file one by one.

Dont think I have to read the file per-line tbh. BufferedReader has a rather large internal buffer (8192 characters) that it appear to fill, so that should be more then enough :).

December 26th, 2011, 10:18 AM

Norm

Re: What is the best way to read from a text file?

I don't know how you get at the data in the BufferedReader's buffer without calling a read method which removes the data from the buffer. One of the constructors allows you to specify the size of the buffer.
I don't see how much is buffered is not relevant to your project.
You could use the mark and reset methods to move around in the contents of the file

December 26th, 2011, 11:57 AM

Kerr

Re: What is the best way to read from a text file?

Quote:

Originally Posted by Norm

I don't know how you get at the data in the BufferedReader's buffer without calling a read method which removes the data from the buffer. One of the constructors allows you to specify the size of the buffer.
I don't see how much is buffered is not relevant to your project.
You could use the mark and reset methods to move around in the contents of the file

Think I missunderstood your post, lol. Thought you mean that I should read an entire line (the "readLine" method) and then go through it rather then just go through it (the "read" method):P.

May have to confess that my brain is, at the moment, not functioning that well because I am rather tired.

December 26th, 2011, 12:02 PM

Norm

Re: What is the best way to read from a text file?

I'm sure there are many different ways to scan through the characters coming from a file.
readLine gives you a String and skips over the end of line characters.
read() would give you a single character (or an array full with the other read() method)

It's up to you.

December 26th, 2011, 12:08 PM

Kerr

Re: What is the best way to read from a text file?

Quote:

Originally Posted by Norm

I'm sure there are many different ways to scan through the characters coming from a file.
readLine gives you a String and skips over the end of line characters.
read() would give you a single character (or an array full with the other read() method)

It's up to you.

Since I would have to step through the characters anyway I just figured I might just as well use the read() method.

December 26th, 2011, 12:10 PM

Norm

Re: What is the best way to read from a text file?

Makes sense to do it what you find to be the easiest way.

December 29th, 2011, 10:57 PM

bgroenks96

Re: What is the best way to read from a text file?

How you go about handling this task depends on what the goal is. What do you want to do with the parsed information? Write it to another file?

If you want to keep it in memory, I recommend StringBuilder (java.lang.StringBuilder)

January 4th, 2012, 10:26 AM

piulitza

Re: What is the best way to read from a text file?

There is one more way to read from a file: using Scanner class. You can use this constructor:

Code :

Scanner in = new Scanner (new File(url));

And after you can read Strings from the file, or Integers or double just like using this class to read stuff from the keyboard, and this class also has method hasNext() which checks if it did not reach the end of the file. I see this easiest way to read from a .txt file.

January 4th, 2012, 05:41 PM

Kerr

Re: What is the best way to read from a text file?

Ok, I have maybe finished the tokenizer class now. Dont think its perfect, far from it. So I thought I would post it here and if anyone has any input feel free to give it :). It is meant to be a part of a simple scripting language I am making. The tokenizer is used to divide a source file into a stream of tokens, which can then be used to build an abstract syntax tree. Have some own issues with the current implementation of the class. For example I use some static inner classes and then I create static instances of them, which feels odd (they are stateless, so I figured its better to have one instance anyway rather then to create a new one anytime they are needed... which is a lot). Also I am rather terrible at comments.

How you go about handling this task depends on what the goal is. What do you want to do with the parsed information? Write it to another file?

If you want to keep it in memory, I recommend StringBuilder (java.lang.StringBuilder)

I want to keep it in memory. Creating a primitive scripting language (for training purposes, its nothing serious). StringBuilder is the way to go when it comes to that I guess.

Quote:

Originally Posted by piulitza

There is one more way to read from a file: using Scanner class. You can use this constructor:

Code :

Scanner in = new Scanner (new File(url));

And after you can read Strings from the file, or Integers or double just like using this class to read stuff from the keyboard, and this class also has method hasNext() which checks if it did not reach the end of the file. I see this easiest way to read from a .txt file.

I can check it out, but I am not sure I will use it, since I may need more in depth control of how things are read from the file (and because I kind of like doing things the hard way :p).