The expense of the ByteArrayOutputStream is the resizing of the underlying array. Your fixed block routine eliminates much of that. If the resizing isn't expensive enough to you to matter (i.e. in your testing the ByteArrayOutputStream is "fast enough", and doesn't provide undo memory pressure), then perhaps subclassing ByteArrayOutputStream, as suggested by vanza, would work for you.

I don't know your compression algorithm, so I can't say why your list of blocks is making it less flexible, or even why the compression algorithm would even KNOW about the blocks. But since the blocks can by dynamic, you may be able to tune the block size as appropriate to better support the variety of the compression algorithm you're using.

If the compression algorithm can work on a "stream" (i.e. fixed size chunks of data), then the block size should matter as you could hide all of those details from the implementation. The perfect world is if the compression algorithm wants its data in chunks that match the size of the blocks your allocating, that way you wouldn't have to copy data to feed the compressor.

While you can certainly use an ArrayList for this, you pretty much look at an memory overhead of 4-8times - assuming that bytes aren't newly allocated but share one global instance (since this is true for integers I assume it works for Bytes as well) - and you lose all cache locality.

So while you could subclass ByteArrayOutputStream, but even there you get overhead (the methods are synchronized) that you don't need. So I personally would just roll out my own class that grows dynamically when you write to it. Less efficient than your current method, but simple and we all know the part with the amortized costs - otherwise you can obviously use your solution as well. As long as you wrap the solution in a clean interface you'll hide the complexity and still get the good performance

Or otherwise said: No you pretty much can't do this more efficiently than what you're already doing and every built-in java Collection should perform worse for one reason or the other.

Well for whatever reason the methods in the class are defined synchronized - although I think you can overwrite synchronized methods with normal methods? not sure. If not, well the synchronization overhead is quite useless for your situation.
–
VooJun 26 '11 at 3:23

Well it takes 4-8times the memory and you lose the cache locality. Worse yet, you eliminate any chances of the JIT to vectorize the code - or at least just work on larger values at once (ie you can't work on word sized values and do some bit tricks). "slightly" worse may be a bit optimistic.
–
VooJun 26 '11 at 1:12

True enough, thanks for the info. This would be one of the worse solutions if performance is a concern.
–
CalvinJun 26 '11 at 1:28