we could do like 10-15gb megawarcs and should be fine even with just 32gb of ram
then it takes some of the pain out of the small files
just need to work out if we can megawarc quicker than things come in

what do you think of deduplication
should be still do deduplication on the main server
or shall we leave it to the discoverer
that will mean we'll get some duplicate articles
but given that we check with IA CDX API at least images should not be duplicated
and other static stuff
which makes it a lot less painful to duplicate an article