Jordan Westhoff's Blog

SHOOTOUT! Data Usage Statistics!

Over the weekend I spent a good deal of time looking at just how intensive a full RAW data workflow can be. I also wanted to compare the burden of 4K RAW vs Arri’s 2.8K RAW via a S.Two recording device and see which required the most data overhead to work with. This allows us to simply look at how much drive space is required and discount the physical CPU usage of the project since pretty much all of my machines were running at almost full bore whenever renders were required.

While storage is not really a problem for a lot of industry professionals, it can be quite the burden for independents or students. Not every student has several terabytes (a terabyte is 1,024 gigabytes) of unused, high speed storage. A lot of people wonder if they van get away with slower, basic desktop drives for data of this proportion but it really comes down to how long you want to wait. Slow drives serve information, well, slowly. Waiting for 300GB of renders to load can take ages and when deadlines are at stake, it really isn’t viable.

Below, I’ve compiled a good deal of raw statistics from our recent shootout project. Since I was in charge of managing the data, image processing and running the servers we worked off of, I have the entirety of the raw footage as well as a significant portion of the renders. This accounts for tons and tons of space, enough space in fact that I thought comprehensive statistics might be helpful to visualize where all of that information is going.

There are a couple foreword things to note though, before we get started. In some of the statistics, I pitted the total usage by camera which encompasses all of the information used, start to finish on each camera platform. In one or two other statistics, I broke up the information to reflect intermediate stages. For the Arri D-21, this required converting raw S.Two DPX files to .ARI files and then exporting them again from ARRI RAW Converter to DPX or .TIFF file sequences to color grade and then make a final export of. For the Sony there was simply taking the camera files from the onboard SD card and then grading and re-exporting. In 4K, however, there was significantly more to do. Dumping the card gave a nice, proprietary, MXF wrapper with all of the files which had to be opened with Sony RAW Viewer in order to convert them to 16-bit DPX files. These could then be graded and exported again to a DPX or TIFF sequence to be imported for analysis and editing. Each of these reflect storage as you can imagine, and it presents quite a trend in the statistics.

This accounts for all of the ‘mission critical’ information stored for the shootout.

Here, we can see just how much data there was overall. In total, the final aggregate size of all resources exceeded 1.6TB! This included all footage, start to finish, ARRI, Sony and Sony 4K as well as renders, CC passes, graphics for our final video and any other data in between. Keep in mind that for the actual camera footage (which comprised a significant portion of the overall data used, but more on that later) totaled only about 10 minutes per camera (and less for the Sony 4K). This is because most of the shots used in the shootout were of charts or color scenes – the longest scenes were barely over 50 seconds apiece. Therefore, shooting an entire film on any of these platforms would consume an incredible amount of data. Broke up above are four different categories and each are perhaps a bit vague so I’ll take a moment to explain them.

The first, and the largest is the Footage Archive. This is an aggregate gathering of just the base footage captured from each camera. This also incorporates some intermediate files in the case of the ARI – essentially all of the footage classified here was footage that was ready to go into editing minus any major color correction. The Shootout Archive contains all of the intermediates of the pick scenes and the color corrected scenes. This means that any footage that was observed and chosen to be good enough for analysis went on to continue the chain of picture processing. The files contained in this directory are renders from the S.Two and then processed in ARRI’s ARC as well as the Sony HD and 4K clips that were chosen – those also underwent their respective processing steps as well. Shootout MAIN is the working directory for all of the analysis, as well as the video production portion of the project. Here are all of the final renders, color correction finals, stock footage, B-roll, preliminary video screening renders and narration as well as all of the graphics that our team generated as well. Finally, there is a Web Access Point directory. This was a separate directory created on a network server in order to provide each member of the team with fast, reliable intermediary storage for their own assets in production. These could be screen captures, editing files, project files, you name it. This is the working miscellaneous that helped make the workflow so efficient – each member had a fast directory to work from and then contribute to the final project being assembled in real time.

Each day of shooting generated different amounts of storage requirement based on scene.

Since the shootout was spread over three (technically four, when you look at 4K) days, it was useful to look at how usage varied by day. Some of the graph information was cut off but the four largest portions were indicative of the longest shots. Day 1 files came close to taking the lead in storage but our Day 3 files took the lead with 19.6% of total data usage – these stats merely incorporate the files coming from the ARRI and the Sony in HD video mode. The third largest, at 17% was from our fourth day of shooting and this comprises all of the Sony 4K raw shoot files. Each of the much smaller portions is broken up by shot – some scenes took many shots and some took far less.

This shows a better look of how production workflow can impact your data needs for each project.

Here, this is a final, final look at how much information from each step of production comprises the total. This specific figure ties directly into the final, cleaned up and organized storage stats of the shootout in its entirety. Of the approximate 1.6TB required, the most costly stage of production was generating all of the intermediate files. This was especially true of the 4K tests which equaled almost half of this information despite shooting for only about %20 as long as the ARRI and Sony HD tests. Both RAW tests required multiple intermediate steps which chewed through tons of space because of each’s respective resolution. We chose to work with DPX and TIFF’s since those are lossless formats and overall exhibited the best quality.

All in all, shooting RAW is a very exhausting process, both from a processing and storage perspective. Your storage needs will be dependent on the camera and the codec/format you choose to edit in but it’s always safe to budget one to two terabytes for shooting a short and always, always remember to BACKUP your information! All of the statistics here leave out the backups that were set in place to safeguard our information. At any one point, our information was backed up in two additional places – one in a hardware RAID attached to a workstation on another end of campus and a full minute-to-minute backup stored on a NAS. This NAS also pulled all of the web assets from each member in order to keep their assets online and safe at the same time. Feel free to contact me if you have questions as well!

In the future I’ll be making a post dedicated to the labyrinth of storage and why different types are better than others, as well as a look into what I’m using to manage all of this information! Thanks for reading!