I am having input file with junk characters (see attached file... U will see some junk characters in it...)

These characters are causing problems in the workflow.

I want to remove these characters from the file by scanning the file and removing any junk character found using a Java Programs.

Can anyone tell me the right way to do this? (I found that in this editor in which I am posting this thread removes these junk characters on pasting them)

On command line, I was able to perform the same using:
sed -i 's/\o013//' RawInput.txt

But I want a generalized program which removes all junk characters.

November 10th, 2009, 10:52 AM

helloworld922

Re: How to eliminate Junk charaters from the file...

If you can thoroughly define what a junk character is, it's possible to come up with a good regular expression that can find the junk characters and remove them (or even just parse anything read in by hand to remove them). Are they anything that don't look like a path? Or is it anything proceeded by a comma (including the comma)? Are junk characters always separated from the rest of the good characters by white-spaces (i'm guessing not since there are some ,8 stuff after the paths, but I don't know if they're really junk or not)?