You are here:

Hi,
I'm having some troubles with JAI : I would like to read a huge BMP file (~4Gb) and write it in a pyramid tiff. I used to process the BMP file with a BufferedImage, but it doesn't work any more with big files (memory problem) :

Reply viewing options

Rather than calling mosaic.setSource() inside the loop, you should call
pbMosaic.setSource() in the loop and then one mosaic.setParameterBlock()
afterwards. It should be more efficient that way, although perhaps not
by much if no rendering is done in the meantime.

I think what you need to do is to set a RenderingHint on the mosaic
operator to tell it to use 1024x1024 tiles. It could be that it's
getting an inappropriate tile size by default (which may even be the
entire image). You can check this by asking mosaic what its tile size
is and printing it out. You probably should also tell the mosaic what
size to be via RenderingHints (see the MosaicDescriptor docs). It
"should" figure it out automatically, but it's safer to be explicit.
Again, print out what it thinks the size is first as a test, to see if
it got it right. If that print takes forever, then the simple act of
determining the output size is triggering a rendering, which would be a
bad thing (it shouldn't need to do that).

Finally, this all depends on the tiff writer being able to make use of
tiled inputs when writing. It's certainly possible to do so, but I
don't recall how well that's implemented in the IIO tiff plugin. You
might need to tell the tiff writer to write in a tiled format. It's
quite possible the default is to write untiled (which could force
everything to be loaded) or striped (which could force an entire row of
tiles to be loaded). Worst case would be striped where the available
memory was insufficient to hold an entire row of tiles, meaning that
you'd have to re-read the entire row of tiles for every stripe in the
output. THAT would be bad!! So you definitely want to make sure the
output is tiled. I think you can specify that via parameters to the
tiff writer.

Ah, yeah, that. So, in the pseudo code you included, you have a chain of processing steps like this:

planarImage -> subsampleaverage -> imagewrite

Which is good, as long as the connections are always approximately tile-sized.

But if the scale argument passed to subsampleaverage is large, you will have a problem. The subsampleaverage routine, if you read it carefully, creates a box 1/scale by 1/scale pixels in size. If you tried to use this operator to create a 100x100 thumbnail from a large 10k x 10k image, you would use a scale of 100/10000=.01. This requires that for each of the 100x100 output pixels, an average operation is done on an area of the input image containing about (1/.01)^2=10k pixels. I'm not a fan.

Use the 'scale' operator instead. Set the interpolation to bilinear, and repeatedly down sample the image by powers of 2 until you can set the scale to exactly the remaining difference in scale without using a scale less than .5. This is essentially like creating a pyramid, where the only level you care about is the one on top.

Note that nearest neighbor interpolation is used to resample the last level because averaging pixels together when scale is between .5 and 1 is dicey -- you will have many pixels that are smudged together with others, degrading the data, so it is often better to use nearest neighbor interpolation instead.

No, you pass in a mosaic the first time, but each time you down sample the image, you write the result to a file on disk, and reread the file for the next scale operator. This is done so it won't matter how big the intermediate result is. If you didn't do this, and just essentially did write(scale(scale(scale(scale(mosaic))))), you would (eventually) get a result, but you would end up reading each input tile many, many, many times. So, you can trade some temporary disk space for processing time, and if your mosaic is very large, it is a trade well worth making.

But, as I wrote in the first post, I will need to build a whole pyramid TIFF. So in my TIFF File, I should write :

1. the whole full res mosaic Image (this writing step is ok thanks to the first mosaic : ~50000*50000)
2. a thumbnail (~1280*1024) of the whole Image
3. lowers resolutions images (width and height divide by 4 each time until width and height are around 2k pixels).

So I thought I could just get some downsampled mosaic for each resolution step and write it.

MoreOver, I will need a small jpeg file (~192*148), but in fact, If I can write the thumbnail or any lower resolution Image, the downsampling and writing of this file should be trivial.

I tried to downsample as you wrote, but the writing is terribly slow, and, as the 2nd higher Resolution will be around 12000*12000, I may have problems to write it if it's not mosaic-like.

I'm also triing to downsample each initial files and write it, so I get 1000 1280*1024 files, 1000 320*256 files, and so on (and each planarImage would be associated with 1000 differents files), but I would rather not do it. This is a really bad idea I think, and quite inefficient.

You should get decent performance if you create each level from the preceding level's results as I indicated above, and make sure you are writing into a tiled image.

Specifically, use ImageWrite to create a TIFF image for each level, with a fairly large tile size, like 512x512. That first operation will take a long, long time. If everything works right, then when down sampling by powers of 4, it should actually take something like 95% of the total processing time, just to create that first image.

Using a single disk, single CPU that fairly imitates much of the desktops out there, my pyramid tiff builder cranks through about 100 MB / minute. It's *very* stupid, though, and of the optimizations we've talked about here, it only does tiling. Is your routine anywhere near that fast?

Finally, if things are running slowly, increase the heap size and the JAI tile cache. I tend to run these things with something like 1 GB of ram given to tile cache, so as to forestall overrunning the cache as long as possible. It will eventually happen, with big images, but the less often we have to purge the cache the better performance is :-/

Edit 3 :
With scale = 4, it seemed to work, the file was already 1GB when I stopped the writing.
Programming is sometimes random :(
Scale 0.5f worked until 300MB and then failed
Scale 2 is 1.7 GB and still running. Is it really random ? I must miss something (it failed at 4GB).

In fact, I can't even write the original PlanarImage. Maybe it comes from the PlanarImage.

Steps 2 and 3 could be the same, *if* you can write to an image at sequence N+1 in a tiff file while reading image N from the same tiff file. But, on to more interesting business.

First, the 4 GB limit may just be a TIFF file format limitation. I hear the imageio-ext project has a promising bigtiff imageio driver that may scale to, well, a very great deal more. So check that out if you need a single file more than 4 GB as a result.

But first, when you say scale=4 let the program run to 4 GB before failing, do you mean the arguments to the 'scale' operator were 4f instead of .25f? Basically, can you be more specific about the edits you made?

yes it was 4f instead of 0.25f.
I'm running it again to see whether it continues over 4 Go.

For the 4GB limitation, I think it should not be an issue. Each level of the pyramid is jpeg compressed, so the final pyramid TIFF file should not be over 2 or 3 GB.
But it may explain why I can't write the first mosaic, because it's around 6GB uncompressed.

I read that libtiff (a tiff library for c language) could switch the size field of the header depending on the final size.

edit :
With scale=4f, the file is currently 8.3GB big and still writing. I will not have enough space disk.
I stopped it at 11GB.

What you say about libtiff is just the difference between bigtiff and tiff -- they're both tiff files, but bigtiff uses 64 bit size fields instead of 32 bit. You can look around for an all-Java bigtiff driver, or you could get at the libtiff driver you mentioned through imageio-ext-tiff + GDAL (but this is pretty hairy.)

So, I guess the mystery is why you cannot write the scale=0.5 mosaic. Could the mosaic you are building not actually be tiled, or have a tile size and offset so discordant with the output tile size and offset that a single output tile is dependent on a huge number of input tiles?

It may comes from the tiles indeed, I was writing the small TIFF Files in tiles of 1280*1024 (1 file = 1 tile), I put it back to 512*512.

It seems better : I tried scale=1f, it write a tiff around 7GB, but crash probably because of the tiff limitation.
I'm currently running a scale=0.25 and up to now, it's still running. In fact, it's finished. It was rather slow (around 15 minutes, but I dont care right now), but it seems to have succeeded.

Ok I can read the file (even if there's some problems problems coming from the frames juxtaposition but it has nothing to do with programming). So I guess it's OK.
At 0.25f, the file is only 480MB big.

The 2 others lower resolution files were written as well (a few seconds - very quick).

The last step step is to read from these files and write to the TIFF pyramid. It'll be for tomorrow.

Congratulations. That doesn't seem like such a bad running time, when you realize it is reordering and encoding mostly with disk space. If you can, try using 64-bit Java with heap and tile cache sizes set to the amount of RAM you have on the system... that will probably speed it up pretty nicely.

The issue was solved when converting each file to a tiled TIFF Image. Now i'm working on building the pyramid TIFF.

Do I have to put the same tile size in the small TIFF Files and in the Mosaic Operator ? if No, I guess it means I can use untiled tiff for the first part (image size : 1280*1024, not that big) ? I guess there will be an heavy retiling step then.

Still I dont know why my BMP Images didn't work.

I will probably have some more questions.
Once it's done, I will post some code.

Edit :
I have some problems with the downsampling operation.
I don't know which operator to use(scale, subsambpleaverage, ...). I have some OutOfMemory errors again when downscaling.

Edit2 :
erf, i'm stupid. I created a new project and forgot to increase the JVM Memory...

Don't use the 'stream' operator to read the bmp input files. Use "ImageRead"; it does a much better job of fetching just what is needed. This is a probable cause of your memory issues.

Another possible cause of memory issues is the input images. If they are compressed, then reading even one pixel often requires reading the whole file. And then it may be kept in memory in its entirety. I generally find converting my input images to tiled tiffs to be a good way to prevent OutOfMemory issues -- and I would never dream of using compressed images, unless each one is already essentially the size of one tile. And if they *are* the size of one tile, I will try very hard to make all of the JAI operators use the same tile size, so that every time ImageWrite requests a tile, it results in only one of your input images being read.

When you set up your JAI tile cache, also ensure ImageIO caching is enabled. See the javadocs for ImageIO.

The Mosaic operator can be finicky if you provide an ImageLayout without a SampleModel and ColorModel. It will usually work, but it can do weird things sometimes, so it is safer to just set the SampleModel and ColorModel from the first tile:

Note that the SampleModel was read from the image, and then resized to a tile. This can be done because the Mosaic operator, and probably any ImageOp subclass, applies the SampleModel on a per-tile basis.

If you set up the image layout on the mosaic operator properly, you won't need to mess with the image layout of the ImageWrite operation; indeed, any decisions you make there are pointless, because if you have a bad layout at the mosaic stage, you can't fix it at the write stage, and if you have a good layout at the mosaic stage and let the writer make its own choices, it will usually do the right thing. You do need to set up the tiling to use on the TIFFImageWriteParam, so keep that code as it is.

If you still have problems, make sure you have a recent Java version, and the latest JAI and JAI-ImageIO releases. The daily builds are often better for solving problems like this, since the current "stable" builds have numerous ssues that are fixed in the dailies. Also, it is often useful to forcefully disable medialib acceleration, since the medialib implementations have a weird set of limitations, the mosaic operator especially, and medialib is not generally much faster than pure Java these days.

For whatever it is worth, I routinely use these tools to build multi-resolution image pyramids at the 10-100 gigabyte level, coming from upwards of 50 image fragments mosaicked together. I've also done subsetting and display tools for things like Deep Zoom tilesets, where there are many thousands of files, each one a single tile. There are potential pits to fall into, but as long as you're not doing something wrong this set of tools is immensely powerful.

Whatever mean I use to write the TIFF (tiffimagewriter, JAI encoder, ImageReadDescriptor, ImageIO.write), I always have the same problem around 615 MB.

So It may comes from the planarImage, but I display the sizes of the full picture ands tiles, and number of tiles. Everything seems OK. Tomorrow I will try to write every tile of the PlanarImage into small files too see if the PlanarImage is correct.

It works fine for "medium file", but as soon as the result file is bigger than ~500-600 MB :

java.lang.IndexOutOfBoundsException
at java.io.BufferedInputStream.read(BufferedInputStream.java:310)
at com.sun.media.jai.codecimpl.BMPImage.read24Bit(BMPImageDecoder.java:717)
at com.sun.media.jai.codecimpl.BMPImage.computeTile(BMPImageDecoder.java:1228)
at com.sun.media.jai.codecimpl.BMPImage.getTile(BMPImageDecoder.java:1300)
at javax.media.jai.RenderedImageAdapter.getTile(RenderedImageAdapter.java:148)
at javax.media.jai.NullOpImage.computeTile(NullOpImage.java:162)
at com.sun.media.jai.util.SunTileScheduler.scheduleTile(SunTileScheduler.java:914)
at javax.media.jai.OpImage.getTile(OpImage.java:1129)
at com.sun.media.jai.opimage.TranslateIntOpImage.getTile(TranslateIntOpImage.java:132)
at javax.media.jai.PlanarImage.getData(PlanarImage.java:2085)
at javax.media.jai.PlanarImage.getExtendedData(PlanarImage.java:2440)
at com.sun.media.jai.opimage.MosaicOpImage.computeTile(MosaicOpImage.java:432)
at com.sun.media.jai.util.SunTileScheduler.scheduleTile(SunTileScheduler.java:904)
at javax.media.jai.OpImage.getTile(OpImage.java:1129)
at javax.media.jai.PlanarImage.getData(PlanarImage.java:2085)
at javax.media.jai.RenderedOp.getData(RenderedOp.java:2276)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageWriter.writeTile(TIFFImageWriter.java:1904)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageWriter.write(TIFFImageWriter.java:2686)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageWriter.insert(TIFFImageWriter.java:2903)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageWriter.writeInsert(TIFFImageWriter.java:2862)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageWriter.writeToSequence(TIFFImageWriter.java:2754)
at Main3.main(Main3.java:191)

Everything works fine for "medium" images, but as soon as the final size is over 500-600 MB :

java.lang.IndexOutOfBoundsException
at java.io.BufferedInputStream.read(BufferedInputStream.java:310)
at com.sun.media.jai.codecimpl.BMPImage.read24Bit(BMPImageDecoder.java:717)
at com.sun.media.jai.codecimpl.BMPImage.computeTile(BMPImageDecoder.java:1228)
at com.sun.media.jai.codecimpl.BMPImage.getTile(BMPImageDecoder.java:1300)
at javax.media.jai.RenderedImageAdapter.getTile(RenderedImageAdapter.java:148)
at javax.media.jai.NullOpImage.computeTile(NullOpImage.java:162)
at com.sun.media.jai.util.SunTileScheduler.scheduleTile(SunTileScheduler.java:914)
at javax.media.jai.OpImage.getTile(OpImage.java:1129)
at com.sun.media.jai.opimage.TranslateIntOpImage.getTile(TranslateIntOpImage.java:132)
at javax.media.jai.PlanarImage.getData(PlanarImage.java:2085)
at javax.media.jai.PlanarImage.getExtendedData(PlanarImage.java:2440)
at com.sun.media.jai.opimage.MosaicOpImage.computeTile(MosaicOpImage.java:432)
at com.sun.media.jai.util.SunTileScheduler.scheduleTile(SunTileScheduler.java:904)
at javax.media.jai.OpImage.getTile(OpImage.java:1129)
at javax.media.jai.PlanarImage.getData(PlanarImage.java:2085)
at javax.media.jai.RenderedOp.getData(RenderedOp.java:2276)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageWriter.writeTile(TIFFImageWriter.java:1904)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageWriter.write(TIFFImageWriter.java:2686)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageWriter.insert(TIFFImageWriter.java:2903)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageWriter.writeInsert(TIFFImageWriter.java:2862)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageWriter.writeToSequence(TIFFImageWriter.java:2754)
at Main3.main(Main3.java:191)

this line is the "encoder" line.

any ideas ? it's making me crazy.

I guess the PlanarImage is correct. So I need to find a way to write it. I tried to loop with pmosaic.getTile(x,y) but it returns a Raster and I am unable to use it.

I suspect your use of FileOutputStream is the cause of your troubles. This seems like what you have to do, because of the encode() method, but FileOutputStream is not random access. The output image will have to be written in byte order, which is often *very* bad for performance and requires holding large parts of the output file in memory at certain parts of the encoding operation.

Instead of JAI's encoder, try the ImageWrite operator [1], which is part of jai-imageio. The first argument may be a RandomAccessFile, which the encoder can use to much greater advantage than a FileOutputStream. You would also need to set up a custom TIFFImageWriteParam [2] to specify the tile settings you want, and set it as the second to last parameter of the ImageWrite operator.

Edit 4 :
Still memory problems.
Maybe I need to clean the TileCache or something like that ?

Edit 5 :
I guess there is a problem with the "translate" operator. The limit size comes from the two flat values. If I put the maximum size of the resulting Image, I will write me the whole file though it's not readable. I dont know what these values are. I thought there were offset in pixels. But I guess I'm wrong because with the correct offsets, there is no writing.
Ok I got these right.

I think I got it, I got something working if the size of the resulting file is under 500Mb. If more, I got an IndexOutOfBounds during the encoder.encode(planarImage). If it can help, the first Exception is at java.io.bufferedInputStream.read(BufferedInputStream.java:310).

Still working on it.
I used the code of the link just above to get the planarImage. But I think there's still problems with tiles and memory.

I will post my new code during the week-end.

Edit 8 :
I think that I can increase this "memory limit" by dicreasing the tiles width and height of the PlanarImage.
...or maybe not in fact.

I still do not understand why he would call the BufferedInputStream.read functions and its crash (The inputstreams are for inputs, or the crash is in the encode step).

It should be, but you have to both set the mosaic's ImageLayout hint, as Bob suggested, and set the TIFF image writer with the parameters to write a tiled tiff. If you are doing both, I would try a smaller scale test to verify that it's actually doing what you want it to. Also, depending on your TIFF writer, you may have to set the tile size separately from requesting tiling, like so:

However, i may have done a mistake while describing my problem. In fact, I don't really need to process from a huge BMP because this huge BMP is created by myself from 1000 images (res : 1024*1280 each).

So maybe it will be easier to try to open, write (in the tiled tiff) and close each of these small files.

I guess there should be a loop for each layer/level. Something like

for(i=0;i

}

What code should I use inside ? open every images as a buffered Stream and make a TIFFImageWriter.write(bufferedImage) ? how about the metadata and fields ? Do i just need to write it for the first tiledImage of each level ? I'm afraid that the tile offset would be wrong this way. Maybe I can add them as a fied myself.

RenderedImage is just an interface, which can be implemented in any way
you want. Sounds like you should have something implementing
RenderedImage that presents the huge image to the caller with tiles of
size 1024x1280. Whenever a tile is requested, your class reads the
appropriate image, massages the metadata appropriately (probably
wrapping a new Raster around the underlying data with an appropriate
offset), and returns that.

Thus to the caller it looks like a huge image, but each tile image is
read only when needed.

You'll want to use a tile cache to avoid re-reading tiles, to the limits
of memory you have available. To that end, you might write the whole
thing as a custom sourceless JAI operator.

I know there's been talk of such things in the past (images backed by
tiles where each tile is its own image file) but I don't recall if any
of them ever got implemented or not...

-Bob

jai-interest@javadesktop.org wrote:
> Ok, I still got OutOfMemoryException with your code.
>
> However, i may have done a mistake while describing my problem. In fact, I don't really need to process from a huge BMP because this huge BMP is created by me from 1000 images (res : 1024*1280 each).
>
> So maybe it will be easier to try to open, write (in the tiled tiff) and close each of these small files.
>
> I guess there should be a loop for each layer/level. Something like
>
> for(i=0;i
>
> }
>
> What code should I use inside ? open every images as a buffered Stream and make a TIFFImageWriter.write(bufferedImage) ? how about the metadata and fields ? Do i just need to write it for the first tiledImage of each level ? I'm afraid that the tile offset would be wrong this way. Maybe I can add them as a fied myself.
>
> I'm going to try something like that :
>
> for(i=0;i
> BufferedImage bi=new BufferedImage(new File(Image[i]));
> if (i==0) writer.writeToSequence(IIOMetadata);
> else writer.write(bi);
> }
>
> and look at the fields.
>
> Maybe the TiledImageClass would be more appropriate but I think there will still be an OutOfMemoryError since it seems to contain all the data before writing.
> [Message sent by forum member 'pierre36']
>
> http://forums.java.net/jive/thread.jspa?messageID=400115
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: interest-unsubscribe@jai.dev.java.net
> For additional commands, e-mail: interest-help@jai.dev.java.net
>

If these input tiles are to be seen like the squares on a checkerboard, then the JAI 'Mosaic' operator will do what Bob suggests. Combined with the ImageRead and Translate operators, you can construct a PlanarImage that requires very little memory and will be available very quickly since it does almost nothing until you need actual tiles.

Given a list of objects that contain the file and offsets for your mosaic:

So you get out a PlanarImage that takes up very little memory and does no real work until you ask it to. If you pass this to ImageIO.write(), it will compute tiles as it needs to, but when you run out of memory, it will drop old tiles and just create new ones on demand. If your Java program is *only* doing this one thing, be sure to call JAI.getTileCache().setMemoryCapacity() to something like 70% of the Java heap size, so you don't needlessly discard tiles you might need again.

Well, there are a couple of problems to solve if you want it to scale, and scale, and scale, and...

One is reading the data in. The ImageRead operator in the aforementioned jai-imageio project can read many input image formats on demand, but JAI's love of tiling clashes badly with a BMP file, which is striped. So, an input image that supports tiling will work better than one that doesn't. Much.

The other is lazily computing the additional levels of the pyramid, since even several steps later could be too large to fit in memory. Fortunately you can do this with the Scale operator, a stock JAI offering. By creating each downstampled level as a Scale of the prior level, you can essentially materialize everything out of thin air.

The following implementation is an example. It can be made faster I suspect (certainly by increasing Java heap size, and brining the JAI cache size to within a 100 MB of the heap size), but I didn't want to distract from the above notes too much. Also, I'm using Image, a helper of my own, but for these purposes, just pretend it's doing JAI.create("op",arg1,arg2).

I already set 1,5Gb to the JVM, which is the max. And the goal is to be able to process images even bigger than 4Gb. So I need to be able not to load the whole Image/File in memory, but only load parts/tiles of it and do the same operations as before (writing in tiff file + metadata +downsample).