full-res pan-and-zoom JS viewer on a sufia/hyrax app

The repository currently has around 10,000 TIFF scanned page and photographic images. They are stored (for better or worse) as TIFFs with no compression, and can be 100MB and up, each. (Typically around 7500 × 4900 pixels). It’s a core use case for us that viewers be able to pan-and-zoom on the full-res images in the browser. OpenSeadragon is the premiere open source solution to this, although some samvera/hydra stack apps use other JS projects that wrap OpenSeadragon with more UI/UX, like UniversalViewer. All of our software is deployed on AWS EC2 instances.

OpenSeadragon works by loading ’tiles’: Sub-regions of the source image, at the appropriate zoom level, for what’s in the viewport. In samvera/hydra community it seems maybe popular to use an image server that complies with the IIIF Image API as a tile source, but OpenSeadragon (OSD) can work with a variety of types of tile sources, with an easy plug-in architecture for adding your own too.

Our team ended up spending 4 or 5 weeks investigating various options, finding various levels of suitability, before arriving at a solution that was best for us. I’m not sure if our needs and environment are different than most others in the community; if others are having better success with some solutions than we did; if others are using other solutions not investigated by us. At any rate, I share our experiences hoping to give others a head-start. It can sometimes be difficult for me/us to figure out what use cases, options, or components in the samvera/hyrax stack are mature, production-battle-tested, and/or commonly used, and which are at more of the proof-of-concept stage.

As tile sources for OpenSeadragon, we investigated, experimented with, and developed at least basic proofs of concept for: riiif, imgix.com, cantaloupe, and pre-generated “Deep Zoom Image” tiles to be stored on AWS S3. We found riiif unsuitable; imgix worked but was going to have some fairly high latency for users; cantaloupe worked pretty fine, but may take fairly expensive server resources to handle heavy load; the pre-generated DZI images actually ended up being what we chose to put into production, with excellent performance and maintainability for minimal cost.

Details on each:

Riiif

A colleague and I learned about riiif at advanced hydra camp in May and Minneapolis. riiif is a Rails engine that lets you add a IIIF server to a new or existing Rails app. It was easy to set up a simple proof of concept at the workshop. It was easy to incorporate authorization and access controls for your image. I left with the impression that riiif was more-or-less a community standard, and was commonly used in production.

So we started out our work assuming we would be using riiif, and got to work on implementing it.

Knowing that tile generation would likely be CPU-intensive, disk-IO-intensive, and RAM intensive, we decided at the outset that the actual riiif IIIIF image server would live on a different server than our main Rails app, so it could be scaled independently and wouldn’t have resource contention with the main app. I included the riiif stuff in the same Rails app and repo, but used some rails routing definition tricks so that our main app server(s) would refuse to serve riiif IIIIF routes, and the “image server” would refuse to serve anything but IIIF routes.

Since the riiif image server was obviously also not on the same server as our fedora repo, and shared disk can be expensive and/or unreliable in AWS-land, riiif would be using its HTTPFileResolver to fetch the originals from fedora. Since our originals are big and we figured this would be slow, we planned to give it enough disk space to cache all of them. And additionally worked up code to ‘ping’ the riiif server with an ‘info’ request for all of our images, forcing it to download them to it’s local cache on bootstrapped startup or future image uploads, thereby “pre-loading” them.

Later, in experiments with other tools, I think we saw that downloading even huge files from a fedora on one AWS EC2 to another EC2 on same account is actually pretty quick (AWS internal network is awfully fast), and this may have been a premature optimization. However, it did allow us to do some performance testing knowing that time to download originals from fedora was not a factor.

riiif worked fine in development, and even worked okay with only one user using in a deployed staging app. (Sometimes you had to wait 2-3 seconds for viewer tiles, which is not ideal, but not as disastrous as it got…).

But when we did a test with 3 or 4 (manual, human) users simultaneously opening viewers (on the same or different objects), things got very rough. People waiting 10+ seconds for tiles, or even timing out on OpenSeadragon’s default 30 second timeout.

We spent a lot of time trying to diagnose and trying different things. Among other things, we tried having riiif useGraphicsMagick instead of ImageMagick. When testing image operations individually, GM did perform around 50% faster than IM, and I recommend using it instead of IM wherever appropriate. We also tried increasing the size of our EC2 instance. We tried an m4.large, then a c4.xlarge, and then also keeping our data on a RAID-arrayed EBS trying to increase disk access speeds. But things were still disastrous under multi-user simultaneous use.

Originally, we were using riiif additionally for generating thumbnails for our ‘show’ pages and search results pages. I had the impression from a slack conversation this was a common choice, and it certainly is convenient if you already have an image server to use it this way. (Also makes it super easy to generate multiple resolutions for responsive srcset attribute). At one point in trying to get riiif working better, we turned off riiif for thumbs, using riiif only on the viewer, to significantly reduce load on the image server. But still, no dice.

After much investigation, we saw that CPU use would often go to 99-100%, and disk IO levels were also through the roof during concurrent use tests. (RAM was actually okay). Also doing a manual command-line imagemagick conversion on the server at the same time as concurrent use testing, operations were seen to sometimes take 30+ seconds that would only take a few seconds on an otherwise unloaded server. We gave up on riiif before diagnosing exactly what was going on (this kind of lower-level OS/infrastructure profiling and diagnosis is kinda hard!), but my guess is saturated disk IO. If you look at what riiif does, this seems plausible. Riiif will do a separate shell-out to an imagemagick or graphicsmagick command line operation for every image request, if the derivative requested is not yet cached.

If you open up an OpenSeadragon viewer, OSD can start out by sending requests for a dozen+ different tiles. Each one, with a riiif-backed tile source, will result in an independent shell-out to imagemagick/graphicsmagick command line tool, with one of our 100MB+ source image TIFFs as input argument. With several people concurrently using the viewer, this could be several dozens of imagemagick shellouts, each trying to use a 100MB+ file on disk as input. You could easily imagine this filling up even a fairly fat disk IO pipeline, and/or all of these processes fighting for access to various OS concurrency locks involved in reading from the file system, etc. But this is just hypothesis supported by what we observed, we didn’t totally nail down a diagnosis.

At some point, after spending a week+ on trying to solve this, and with deadlines looming, we decided to explore the other tile source alternatives we’ll discuss, even without being entirely sure what was going on with riiif.

It may be that other people have more success with riiif. Perhaps they are using smaller original sources; or running on AWS EC2 may have exacerbated things for us; or we just had bad luck for some as of yet undiscovered reason.

But we got curious how many people were using riiif in production, and what their image corpus looked like. We asked on samvera-tech listserv, and got only a handful of answers, and none of them were using riiif! I have an in-progress side-project I’m working on that gives some dependency-use statistics for public github repos holding hydra/samvera apps — of 20 public repos I had listed, 3 of them had riiif as a dependency, but I’m not sure how many of those are in production. I did happen to talk to one other developer on samvera slack who confirmed they were having similar problems to ours. Still interested in hearing from anyone that is using riiif successfully in production, and if so what your source images are like, and how many concurrent viewers it can handle.

Cantaloupe

Wanting to try alternatives to riiif, the obvious choice was another IIIF server. There aren’t actually a whole lot of mature, reliable, open source IIIF server options, I think IIIF as a technology hasn’t caught on much outside of library/cultural heritage digital repositories. We knew from the samvera-tech listserv question that Loris (written in python) was a popular choice in the community, but Loris is currently looking for a new maintainer, which gave us pause.

We eventually decided on Cantaloupe, written in Java, as the best bet to try first. While it being in Java gave us a bit of concern, as nobody on the team is much of a Java dev, even without being Java experts we could tell looking at the source code that it was impressively clean and readable. The docs are good, and revealed attention to performance issues. The Github repo has all the signs of being an active and well-maintained project.

Cantaloupe too was having a bit of performance trouble when we tried using it for show-page thumbnails too (we have a thumb for every page in a work on our ‘show’ page, as in default sufia/hyrax, and our works can have 200+ pages). So we decided fairly early on that we’d just keep using a pre-generated derivative for thumbs, and stick to our priority use case, the viewer, in our tests of all our alternatives from here on out.

And then, Cantaloupe did pretty well, so long as it had a powerful and RAM-ful enough EC2.

Cantaloupe lets you configure caching separately for originals and derivatives, but even with no caching turned on, it was somehow doing noticeably better than riiif. Max 1-2 second wait times even with 3-4 simultaneous viewers. I’m not really sure how it pulled off doing better, but it did!

Our largest image is a whopping 1G TIFF. When we asked cantaloupe to generate derivatives for that, it unfortunately crashed with a Java OOM, and was then unresponsive until it was manually restarted. We gave the server and cantaloupe more RAM, now it handled that fine too (although our EC2 was getting more expensive). We hadn’t quite figured out how to approach defining how many simultaneous viewers we needed to support and how much EC2 was necessary to do that before moving on to other alternatives.

We started out running cantaloupe on an EC2 c4.xlarge (4 core Xeon E5 and 7.5 GB RAM), and ended up with a m4.2xlarge (8 core and 32 GB RAM) which could handle our 1G image, and generally seemed to handle load better with lower latency. We used the JAI image processor for Cantaloupe. (It seemed to perform better than the Java2D processor; Cantaloupe also supports ImageMagick or GraphicsMagick as a processor, but we didn’t get to trying those out much).

Auth

If you’re not using riiif, and have images meant to be only available to certain/all logged-in users, you need to think about auth. With any external image server, you could do auth by proxying all access through your rails app, which would check auth in the usual way. But I worry about taking up web worker processes/threads in the main app with dozens of image requests. It would be better to keep the servers entirely separate.

There also might be a way to have apache/nginx proxying directly, rather than through the rails app, which would make me more comfortable, but you’d have to figure out how to use a custom auth plugin for apache or nginx.

Cantaloupe also has the very nice option of writing your own custom auth in ruby (even though the server is Java; thanks JRuby!), so we could actually check the existing Rails session (just make sure the cantaloupe server knows your Rails secret key, and figure out the right classes to call to decrypt and load data from the session cookie), and then Fedora/Solr to check auth in the usual samvera way.

Any live checking of auth before delivering an image tile is of course going to increase image response latency.

These were the options we thought of, but we didn’t get to any of them before ultimately deciding to choose pre-generated tile images.

However, Cantaloupe was definitely our second choice — and first choice if we really were to need need a full IIIIF server — it for sure looked like it could have worked well, although at potentially somewhat expensive AWS charges.

Imgix.com

Imgix.com is a commercial cloud-hosted image server. I had a good opinion of it from using it for thumbnail-generation on ecommerce projects while working at Friends of the Web last year. Imgix pricing is pretty affordable.

Imgix does not conform to IIIF API, but it does do pretty much all the image operations that IIIF can do, plus more. Definitely everything we needed for an OpenSeadragon tile source.

I figured, let’s give it a try, get out of the library/cultural-heritage silo, and use a popular, successful, well-regarded commercial service operating in the general web app space.

OpenSeadragon can not use imgix.com as a tile source out of the box, but OSD makes it pretty easy to write your own tile source. In a day I had a working proof of concept for an OSD imgix.com tile source, and in a couple more had filed off all the rough edges.

It totally worked. But. It was slow. Imgix.com was willing to take our 100MB TIFF sources, but it was clear this was not really the use case it was designed for. It was definitely slow downloading our original sources from our web app–the difference, I guess, between downloading directly from fedora on the same AWS subnet, and downloading via our Rails app from who knows where. (I did have to make a bugfix/improvement to samvera code to make sure HTTP headers were delivered quicker for a streaming download, to avoid timing out imgix. Once that was done, no more imgix timeout problems). We tried pinging it to “pre-load” all originals as we had been doing with riiif — but as a cloud service, and one not expecting originals to be so huge, we had no control over when imgix purged originals from cache, and we found it did sometimes purge not-recently-accessed originals fairly quickly.

Also imgix has a (not really unreasonable) 512MB max for original images; our one 1G TIFF was not gonna be possible (and of course, that’s the one you really want a pan-and-zoom viewer for, it’s huge!).

On the plus side:

with imgix, we don’t need to worry about the infrastructure at all, it’s outsourced. We don’t need to plan some redundancy for max uptime or scaling for heavy use, they’re already doing it.

The response times are unlikely to change under heavy use, it’s already a cloud-scale service designed for heavy use.

Can handle the extra load of using it for thumbs too, just as well as it can for viewer tiles.

Our cost estimates had it looking cheaper (by 50%+) than hosting our own Cantaloupe on an EC2.

Once originals and derivatives being accessed (say tiles for a given viewer) were cached, it was lightning fast, reliably just 10s of ms for a tile image. But again, you have no control over how long these stay in cache before being purged.

For non-public images, imgix offers signed-and-expiring URLs. The downside of these is you kind of lose HTTP cacheability of your images. And imgix.com doesn’t provide any way to auth imgix to get your originals, so if they’re not public you would have to put in some filters recognizing imgix.com IP addresses (which are subject to change, although they’re good at giving you advance notice), and let them in to private images.

But ultimately the latency was just too high. For images where the originals were cached but not the derivatives, it could take 1-4 seconds to get our tile derivatives; if the originals were not cached, it could take 10 or higher. (One option would be trying to give it a much smaller lzw or zip compressed TIFF as a source, instead of our uncompressed original originals, cutting down transfer time for fetching originals. But I think this would be probably unlikely to improve latency sufficiently, and we moved on to pre-generated DZI. We would need to give imgix a lossless full-res original of some kind, cause full-res zoom is the whole goal here!)

I think imgix is potentially a workable last resort (and I still love it for creating thumbs for more reasonably sized sources), but it wasn’t as good an option as other alternatives explored for this use case, a tile source for enormous TIFFs.

Pre-Generated Deep Zoom Tiles

Eventually we came back to an earlier idea we originally considered, but then assumed would be too expensive and abandoned/forgot about. When I realized Cantaloupe was recommending pyramidal TIFFs , which require some preprocessing/prerendering anyway, why not go all the way and preprocess/prerender every tile, and store them somewhere (say, cheap S3?)? OpenSeadragon has a number of tile sources it supports that are or can be pre-generated images, including the first format it ever supported, Deep Zoom Image (file suffix .dzi). (I had earlier done a side-project using OpenSeadragon and Deep Zoom tiles to put the awesome Beehive Collective Mesoamerica Resiste poster online).

But then we learned about riiif and it looked cool and we got on that tip, and kind of assumed pre-generating so many images would be unworkable. It took us a while to get back to exploring pre-generated Deep Zoom tiles. But it actually works great (of course we had learned a lot of domain knowledge about manipulating giant images and tile sources at this point!).

We use vips (rather than imagemagick or graphicsmagick) to generate all the tiles. vips is really optimized for speed, CPU and RAM usage, and if you’re creating all the tiles at once vips can read the source image in once and slice it up into tiles. We do this as a background job, that we have running on a different server than the Rails app; the built-in sufia bg jobs still run on our single rails app server. (In sufia 7.3, out of the box they can’t run on a different server without a shared file system; I think this may have been improved in as-yet-unreleased-hyrax-master).

We hook into the sufia/hyrax actor stack to trigger the bg job on file upload. A small EC2 (t2.medium 4 GB RAM, 2 core CPU) with five resque workers can handle the dzi creation much faster than the existing actor stack can trigger them when doing a bulk ingest (the existing actor stack is slow, and the dzi creation can’t happen until the file is actually in fedora, so that the job can retrieve it from fedora to make it into dzi tiles. So DZI’s can’t be generated any faster than sufia can do the ingests no matter what). The files are uploaded to an S3 bucket.

We also provide a rake task to create the .dzi files for all Files in our fedora repo, for initial bootstrapping or if corruption ever needs to be fixed, etc. For our 8000-file staging server, running the creation task on our EC2 t2.medium, it takes around 7 hours to create and upload them all to S3 (I use some multi-threaded concurrency in the uploading), and results in ~3.2 million S3 keys taking up approx 32GB.

Believe it or not, this is actually the cheapest option, taking account of S3 storage and our bg jobs EC2 instance for dzi creation (that we’ll probably try to move more bg jobs to in the future). Cheaper than imgix, cheaper than our own Cantaloupe on an EC2 big enough to handle it.

If you have 800K or 8 million images instead of 8000, it’ll get more complicated/expensive. But S3 is so cheap, and a spot-priced temporary fleet of EC2s to do a mass dzi-creation ingest affordable enough you might be surprised how affordable it is. Alas fedora makes it a lot less straightforward to parallelize ingest than if it were a more conventional stack, but there’s probably some way to do it. Unless/until fedora itself becomes your bottleneck. There are costs to our stack.

It handles our 1GB original source just fine (takes about 30 seconds to create all tiles for this one). It’s also definitely fast for the end-user. Once the tiles are pre-generated, it’s just a request from S3. Which I’m seeing taking like 40-80ms in Chrome Dev Tools. Under a really lot of load (I’m guessing 100+ users concurrently using viewer), or to reduce latency beyond that 40-80ms, the standard approach would be to put a CDN in front of S3. Probably either Amazon’s own CloudFront, or CloudFlare. This should be simple and affordable. But this is already reliably faster than any of our other options, and can handle way more concurrent load without a CDN compared to our other options, we aren’t worrying about it for now. When/if we want to add a CDN, it oughta only be a couple clicks and a hostname change to implement.

And of course, there’s very little server maintenance to deal with, once the files are generated, they’re just static assets on S3, there’s nothing to “crash” really. (Well, except S3 itself, which happens very occasionally. If you wanted to be very safe, you’d mirror your S3 bucket to another AWS region or another cloud provider). Just one pretty standard and easily-scalable bg-job-running server for creating the DZIs on image upload.

We’re still punting on auth for now. Which talking on slack channel, seems to be a common choice with auth and IIIF image servers. One dev told me they just didn’t allow non-public images to be viewed in the image viewer (or via their image server) at all, which I guess works if your non-public images are all really just in-progress-in-workflow only viewable to staff. As is true for us here. Another dev told me they just don’t worry about it, no links will be generated to non-public images, but they’ll be there via the image server without auth — which again works if your non-public images aren’t actually sensitive or legally un-shareable, they’re just in-process-not-quite-ready. Which is also true for us, for now anyway. (I would notever count on “nobody knows the URL”-based-security for actual sensitive or legally un-shareable content, for anything where it actually matters if someone happens to come across it. For our current and foreseeable future content, it doesn’t really. knock on wood. It does make me nervous!).

IIIF vs Not

Some readers may have noticed that two of the alternatives we examined are not IIIF servers, and the one we ended up with — pre-generated DZI tiles — is not a dynamic image server at all. You may be reacting with shocked questions: You can do that? But what are you missing? Can you still use UniversalViewer? What about those other IIIF things?

Well, the first thing we miss is truly dynamic image generation. We instead need to pre-generate all the image derivatives we need upon image upload, including the DZI tiles. If we wanted a feature like, say, user enters a number of pixels and we deliver a JPG scaled to the user-specified width, a dynamic image server would be a lot more convenient. But I only thought of that feature when brainstorming for things that would be hard without a dynamic image server, it’s not something we are likely to prioritize. For thumbs and downloads at various preset sizes, pre-generating should work just fine with regards to performance and cost, especially with a bg job running on it’s own jobs server and stored on S3 (both don’t happen out of the box on sufia, but may in latest unreleased-master hyrax).

So, UniversalViewer. UniversalViewer uses OpenSeadragon to do the actual pan-and-zoom viewer. Mirador seems to as well. I think OpenSeadragon is pretty much the only viable open source JS pan-and-zoom viewer, which is fine, because OSD is pretty great. UV, I believe, just wraps OSD in some additional UI/UX, with some additional features like table of contents viewing, downloads, etc.

We decided, even when we were still planning on using riiif, to not use UniversalViewer but instead develop directly with OpenSeadragon. Some of the fancier UV features we didn’t really need right now, and it was unclear if it would take more time to customize UV UX to our needs, or just develop a relatively light-weight UI of our own on top of OSD.

So using OpenSeadragon directly, we don’t need to give it an IIIF Image API URL, we can give it anything it handles (or you write a plugin for), and it works just fine. No code changes necessary except giving it a URL pointing to a different thing. No problem, everything just worked, it required no extra work in our front-end to use DZI instead of IIIF. (We did do some extra work to add some feature toggles so we could switch between various back-ends easily). No problem at all, the format of your tile source, so long as OSD can handle it, is a very loosely coupled dependency.

But what if you want to use UV or Mirador? (And we might in the future ourselves, if we need features they provide and we discover they are non-trivial to develop in our homegrown UI). They take IIIF as input, right?

To be clear, we need to distinguish between the IIIF Image API (the one where a server provides image derivatives on demand), and the IIIF Manifest spec. The Manifest spec, part of the IIIF Presentation API, defines a JSON-ld file that “represents a single object and any intellectual work or works embodied within that object… includes the descriptive, rights and linking information for the object… embeds the sequence(s) of canvases that should be rendered to the user.”

It’s the IIIF Manifest that is input to UV or Mirador. Normally these tools would extract one or more IIIF Image API URLs out of the Manifest, and just hand them to OpenSeadragon. Do they do anything else with an IIIF Image API url except hand it to OSD? I’m not sure, but I don’t think so. So if they just handed any other URI that OSD can handle to OSD, it should work fine? I think so.

Of course, as with all linked data work, standards-compliant doesn’t actually make it interoperable. We need mutually-recognizable shared vocabulary. UV or Mirador would have to recognize the image resource URL supplied in the Manifest as being a DZI, or at any rate at least as something that can be passed to OSD. As far as I know UV or Mirador won’t do this now. It should probably be pretty trivial to get them to, though, perhaps by supporting configuration for “recognize this @context uri as being something you can pass to OSD.” If we in the future have need for UV/Mirador (or IIIF Manifests), I’d look into getting them to do that, but we don’t right now.

What about these other tools that take IIIF Manifests and aggregate images from different sites? Probably the same deal, they just gotta recognize an agreed-upon identifier meaning “DZI format”, and know they can pass such to OpenSeadragon.

I’m not sure if any such tools currently exist used for real work or even real recreation, rather than as more of a demo. I’ll always choose to achieve greatness rather than mediocrity for our current actual real prioritized use cases, above engineering for a hoped-for-but-uncertain future. Of course, when you can do both without increasing expense or sacrificing quality for your present use cases, that’s great too, and it’s always good to keep an eye out for those opportunities.

But I’m feeling pretty good about our DZI choice at the moment. It just works so well, cheaply, with minimal expected ongoing maintenance, compared to other options — and works better for end-users too, with reliable nearly instantaneous delivery of tiles even under heavy load. Now, if you have a lot more images than us, the cost-benefit calculus may end up different. Especially because a dynamic image server scales (gets more expensive) with number of concurrent users/requests more or less regardless of number of images, while the pre-gen DZI solution gets more expensive with more images more or less regardless of concurrent request level. If you have a whole lot of images (say two orders of magnitude bigger than our 10K), your app typically gets pretty low use (and you don’t care about it supporting lots of concurrent use), and maybe additionally if your original source images aren’t nearly as large as ours, pre-gen DZI might not be such a sweet spot. However, you might be surprised, pre-generated DZI is in the end just such a simple solution, and S3 storage is pretty affordable.