Doug Cutting
added a comment - 07/Jun/10 23:11 For this issue, I'm imagining a java tool. A similar C tool would also be tremendously useful.
Some potential details:
The Avro schema used would simply be "bytes".
Compression would be enabled by default.
Input and output could be from files named on the command line or standard in and out.
Hadoop URIs should be accepted as input and output, so that one can use this to, e.g., pipe output to a compressed, splittable file in HDFS.

I prefer naming these "totext" and "fromtext", or perhaps "tolines" and "fromlines", and classes named ToTextTool and FromTextTool. Avro's implicit.

System.getProperty("line.separator").getBytes() should be stored in a constant

inStream and outStream should be buffered for performance so that every call to read or write doesn't result in a system call. DataFileStream doesn't automatically add buffering. Hadoop streams are always buffered, and DataFileWriter add's buffering, but adding it redundantly shouldn't cause problems either.

compressionLevel should be optional, no? you can use withOptionalArg, e.g.

Doug Cutting
added a comment - 11/Jun/10 19:50 Looks good! A few nits:
I prefer naming these "totext" and "fromtext", or perhaps "tolines" and "fromlines", and classes named ToTextTool and FromTextTool. Avro's implicit.
System.getProperty("line.separator").getBytes() should be stored in a constant
inStream and outStream should be buffered for performance so that every call to read or write doesn't result in a system call. DataFileStream doesn't automatically add buffering. Hadoop streams are always buffered, and DataFileWriter add's buffering, but adding it redundantly shouldn't cause problems either.
compressionLevel should be optional, no? you can use withOptionalArg, e.g.
OptionSpec< Integer > level = p.accepts( "level" , "compression level" )
.withOptionalArg().ofType( Integer .class);
OptionSet opts = p.parse(...);
if (ops.hasArgument(level))
compressionLevel = level.value(opts);

Doug Cutting
added a comment - 11/Jun/10 21:27 I think the default compression level should be 1: fast, but compressed.
Also, where do we document that '-' means standard in or standard out? The Util class is package-private, so that doesn't count. Perhaps we should add it to the help string?
Other than that, +1.