Comments

On some devices (but not all), attempting to open a large archive (> 500 MB) results in NSData falling to initialize using NSDataReadingMappedAlways with a "Cannot allocate memory" generic Cocoa error.

This comment has been minimized.

Did you get this error with a minimal project or when you insert your code into -[AppDelegate application:didFinishLaunchingWithOptions]? It's possible that you were already running low in memory when you tried to open the archive.

If you're still getting the memory at the start, try doing a mmap with various options to see if it returns -1 i.e. failure to memory map. It may also be that NSDataReadingMappedAlways doesn't actually do mmap'ing for whatever reason. If so, I may move the code over to use mmap directly.

This comment has been minimized.

I'm getting the same issue, previously on iOS 7.x and now on iOS 8. Zip file is 597Mb and was created by ZipZap. Smaller files open correctly.

Error occurs on iPad mini generation 1. I'll try the mmap too. Is there any theoretical limit to mmap size on the platform? Presumably there's some big address space limit but the memory manager should be paging in and out as required.

It looks like the devices are artificially limiting the amount of addressable memory available to each process. I can't see anything in the iOS SDK or even lower level BSD/POSIX level that would increase the addressable memory e.g. I tried setrlimit but it doesn't seem to affect the process, presumably because it's not run as superuser.

The addressable memory doesn't really correspond to the physical memory available. After all, the benefit of mmap is that it only pages in and occupies physical memory when needed.

If we mmap smaller chunks we can eventually access more memory, but not significantly more e.g. the iPad Mini in 64-bit mode is still limited to something like 2GB, compared to over 16GB in simulator mode.

All the device memory address limits are well below the theoretical limit for standard zip files (4GB).

This comment has been minimized.

Back the archive with a cache of smaller memory maps, which can then be purged as needed.

Note that this can't really be a NSCache or similar object that reacts to real memory pressure, but something that reacts to virtual memory pressure.

We can't make the smaller memory maps fit exactly into each zip entry, since memory maps have to start at 4096-byte (or 16384-byte) offsets from the start of the file and zip entries could start anywhere.

This comment has been minimized.

My opinion is that the library should be able to handle large files whichever device is being used. The memory map should allow that, but isn't behaving consistently between models. And it's not easy to predict when it won't work. I'm working on an App which needs to allow import/export of large files for backup purposes. My nightmare scenario is that someone could export a file which they couldn't later import!

I was until recently a Windows developer so I'm a little new to the platform, so please excuse me if this is a silly question, but can't we map a compliant window around each file as we decode/encode it and release once done? I really like the access to an NSData* for an archive entry, and I use it to encrypt on at that level for security.

This comment has been minimized.

... can't we map a compliant window around each file as we decode/encode it and release once done? I really like the access to an NSData* for an archive entry, and I use it to encrypt on at that level for security.

Not a silly question at all!

For reading, we allow random access to individual entries, so it's not easy to predict when data is no longer needed. Using a memory map keeps this efficient since the system pager should dump parts of the map under memory pressure. The library already does do some wrapping with decryption/decompressing but the bottom layer of the stack is still backed by the memory map. Quite likely, I'll have to rewire that bottom layer to read straight from the zip file instead. The central directory entries can continue to live in the memory map, since we're necessarily limiting that to at most a 64K trailer.

This comment has been minimized.

That sounds pretty reasonable to me. It's a shame the mapping isn't more robust on the platform. It's the standard answer on windows for lots of complex stuff.

My encryption operates at the NSData level so that should be unaffected as long as you still pass that back. I have made a tiny change to expose the zip comment as I needed somewhere to put the encryption IV and salt but I can reintegrate that again!

Note: This code runs inside the main function of NSOperation. Because updateEntries can take a long time, we call it after each iteration so that we can stop should the operation get canceled (I assume there isn't a better way to do it, is there?).

This comment has been minimized.

@chrisze While you can update entries in a loop like that, it's pretty inefficient. In particular, every time through the loop you are compressing/storing the previous entries all over again, which adds to the slowness. You really have two choices:

In the loop, compose the entire entries. After the loop, call updateEntries:error once only. However if there's an error, it will just atomically abort i.e. the archive remains unchanged.

In the loop, get the archive's current entries, make a new array by adding your new entry, then call updateEntries:error: with that array. You could arrange to call updateEntries:error: in batches to balance between efficiency and safety.

Why is the 2nd choice more efficient than your code? If the array passed to updateEntries:error: has existing entries from the archive in front, it will skip writing those entries out again.

This comment has been minimized.

Thanks @pixelglow. That's a good idea. It would be awesome to have a method that can take just one entry and can be efficiently called in a loop. If you are archiving large number of files, it is important to be able to cancel the operation.

This comment has been minimized.

@phoenix42 At the moment, I'm swamped with other work but I hope to commit a fix in about 2 weeks.

Alternatively you can attempt a patch based on the outline I gave above i.e.

Read in the central directory from last 64K bytes, this can continue to be a memory map. We'll likely have to subclass NSData to support an arbitrary file offset during mapping.

Rewire any classes that use entry fileData to read directly from the zip file instead. The archive should store a NSFileHandle, the entries should get a weak handle to this to use for reading the data. Alternately the channel could wrap the NSFileHandle so that it works seamlessly with NSData-based archives.

Confirm that the unit tests all pass.

Either item is likely self-contained, if you're not up to doing the whole hog. Let me know if you're proceeding or waiting for me to fix it.

This comment has been minimized.

Hi there, I'm getting lots of failures in my app for large ZIPs, and I traced it down to this issue.
Has there been any progress on this bug? If I offer to somehow sponsor this bugfix would that speed things up? I don't think I'm clever enough to tackle it myself....
thanks for your awesome work, hope we can fix this one :)

This comment has been minimized.

@xaphod I haven't fixed the large zip issue, mainly due to lack of time. As a contractor, I do have to prioritise paying work. I did make a start on an arbitrarily offset mmap NSData though.

Would appreciate the sponsorship! What method would you suggest?

I suggest using ChangeTip which is funded by Bitcoin. I'll set a bounty of $1,000 representing how much work I think it will take to get done. Then whoever believes this work is worthwhile tips me any amount. If the total amount tipped by everyone exceeds $1,000 or I think I can get it done for less, I'll claim all the tips and proceed with the work. Otherwise, I'll refund everyone their tips (by leaving them unclaimed).

Does that sound reasonable?

Note: the ChangeTip site says you can only tip in pull requests and commits. Don't know if it will work with an issue thread like this one. Someone can try with a tiny amount first.

Full disclosure: my main commercial driver for zipzap is Instaviz, my iOS diagram sketching app. I use zip files for the file format, with the intention of allowing embedded images. At the moment, the current version doesn't have embedded images and the DOT format compresses so well that I haven't hit the large file limit in my own work. If this work isn't funded, I will still get to it when I end up supporting embedded images in Instaviz -- maybe in 6+ months' time, depending on sales etc.? So in a sense, you're tipping here for certainty and immediacy.

This comment has been minimized.

@pixelglow Thanks for taking the time to answer, and for the level of details & openness. I appreciate it 👍

Although I like the idea of ChangeTip (I didn't check it out in detail yet, as I had an appmergency :S ), I'm afraid I think at the $1000 level it won't work. I don't have a wildly successful app yet, and the little earnings I do have are not as much as i'm spending on advertising or design assets. I was thinking more like max $100 from me... but even if all 6 people on this thread kicked in that amount, it would not be enough.

So in the meantime I have instrumented my app (WeDownload) that uses ZipZap so that in future versions I can see how many people are hitting this issue. If it's only a handful per week then that's one thing... but if it's lots of people I might have to find something else you want... :)

Thanks so much for your hard work on ZipZap, and good luck on Instaviz (looks great!)

This comment has been minimized.

@pixelglow I was just reading your earlier post about possible solutions ("Absent Apple "fixing" iOS devices to allow more memory addresses, we could..."), and something strikes me as missing. Perhaps you ruled it out for a reason I don't see?

What about using an NSInputStream and pointing it at the file? This would require that the data of the zip file resides on disk somewhere, but that is a very reasonable requirement if the zip file doesn't fit into memory in the first place.

The advantage of a stream-based approach (to me, err, as not as super advanced senior emperor developer) would be that you use standard Apple-supported APIs without needing to customize NSData, so should be more future-proof. NSInputStreams backed by files support seeking to arbitrary locations.

This comment has been minimized.

@xaphod The October 10 post is closer to my current thinking. Incorporating your file/stream-based idea, it would be a hybrid approach:

Memory map the central directory. This is guaranteed by the format never to exceed 64K and memory-mapping will improve performance. This is pretty straightforward work.

Stream or otherwise read in the data for the actual entries. In general there's no definite performance improvement using memory maps here, except for stored (uncompressed) entries using the data accessor. This needs significant rework in the implementation to work off files/streams instead of memory maps, but we'll need to do this to avoid the memory map caps on individual entries e.g. if we had a >1GB entry.

NSInputStream is not seekable, but will make a good abstraction over the data entry blob! So we could provide it as-is for stored entries, and wrapped with the unzip code for compressed entries.

This comment has been minimized.

This is guaranteed by the format never to exceed 64K and memory-mapping will improve performance.

Looks like I was wrong.

The end of central directory record is max 65K. But the central directory itself can be up to 4G. Although if file names were reasonably sized e.g. 40 bytes max, it would occupy a more reasonable max of 5M only. I still think memory-mapping is the way to go here, but pathological cases may still stump the new implementation.

An instance of NSInputStream created from a path to a file on disk will be seekable by getting/setting its NSStreamFileCurrentOffsetKey property.

Indeed. I missed that. I'd still like the NSInputStream we use to be strictly bound by its offset and length in the file, to prevent client abuse though. And we'd need multiple NSInputStream referencing the same file but with independent stream positions, since we don't restrict clients to streaming in order. Ah fun.

This comment has been minimized.

I know this is a bit of an old issue, but I recently came across the same issue when trying to open a large file on an iPad4. I don't know if changes recently allowed this to work but I am using a different method of initialization to solve my problem.

I had a kernel memory inspector that I placed in the my loop that iterated over the entry list that would print out memory. With this, I can see that the system is safely freeing memory once it gets the memory warning from the OS.