The AS-11 video standard used by the Digital Production Partership comes with an XML sidecar. Generally, all the metadata contained in this file is also embedded within the .MXF as well, aside from the MD5 checksum. It makes sense to do a fixity check as soon as the drive arrives, and why not automate that task, especially if a delivery of 100 episodes comes in on one day?

The script works well, but if anyone finds any issues, or preferably, ideas for improvement, please let me know via the issues page on github: AS-11 Fixity Issues

If you already know bash, head straight to the github page cos what follows is for those unfamiliar with scripting.

If you do not know how to run a script, or how to run this particular script, please get in touch and I’ll talk you through it if I can spare the time, though I’d recommend trying to help yourself by using stack overflow and the various bash guides out there. Those familiar with bash can just go straight to the github page, but for those unfamiliar with scripting, here’s an explanation:

A word of warning, the various if statements and while loops require indenting, which will not be conveyed well in these code blocks. Please look at my code on Github in order to view it correctly.

1. FIRST STEPS – :

I found that if I ran the script multiple times on the same directory, it just completed a repeat run through and appended the same info to the end of the speadsheet. The following prevents this from happening:

We need some headings for our spreadsheet report, so now’s a good time to create them. This next section only runs when the if statement in the previous section of code returns a false value, which is why the next statement starts with “else”:

This will create a new file with the extension “.csv” and will add that list of headings to the first line.

2. FINDING ALL XML & MXF IN YOUR SUBFOLDERS:

Now we have to actually find all xml files in all subfolders. AS-11 files will most likely come in on a hard drive or LTO in a subfolder structure. See that second line? I ended up replacing my original code with what is listed as the first question in the BASHFAQ

The variables should hopefully be self explanatory. The last one just grabs the filename with no extension, so AS11.XML is stored in memory as just AS11. This allows us to search for AS11.MXF later on.

3. LETS EXTRACT THE CHECKSUM FROM THE XML!!

Now we get to the really fun part. We have to find the xml element containing the MD5 value. We do this by calling the xmlstarlet program. This is a handy tool that parses through xml for you and allows you to evaluate and transform your documents. The easiest way to pick out the xml value is by reducing the Xml to a simpler structure, Xpath.Click here for the basics on XPATH. You can see that the MD5 value is located in Programme/Technical/Additional/MediaChecksumValue.

From what I’ve seen, the DPP metadata app produces very consistent XML files, which makes scripting much easier. Here’s the command I use to find the md5 value:

XPATH is usually much simpler if a namespace isn’t declare at the start of a document (Mediainfo is like this). Anyhow, you can look up the xmlstarlet documentation for how all this works, and I may do another blog purely on this later on. Needless to say, the 32 character md5 is now stored as the variable “md5xml”

4. LETS GET THE MD5 FROM THOSE MXF FILES!

We now need to get the current checksum of our mxf media file. I’m using md5deep for this, as it easily allows you to get checksums from files that are in subfolders, also known as a recursive search.

md5mxf=($(md5deep -e "$sourcepath/${filenoext}.mxf"))

Notice how we’re calling the “filenoext” variable that we declared earlier on, and adding on .mxf. This action will store the checksum of your media file as the variable “md5mxf”.

5. FIXITY CHECK AND SPREADSHEET REPORT!

Now we’re pretty much at the end. We need to compare the value of the md5 from the xml, and the fresh md5 from the mxf:

The following is just in case some other xml files happen to be in your directories. This is unlikely to happen as in a usual usage scenario, you’ll only have AS11 files on your drive. If a null value is found when searching for the md5 value in the xml, it is seen as not being an AS11 file and the info isn’t written to our report.

if [[ "${md5xml}" == "" ]] ; then
echo "not a sidecar"

If it is an AS11 sidecar xml, then this will verify that both are identical, and print a whole bunch of info to that csv file. It will print the filename, the title of the show and the episode number (retrieved from the xml), the md5 xml value, the mxf md5 value, as well as a judgement on the outcome of the fixity check.

6. OK THATS ENOUGH

Verifying all those checksums by eye is an inaccurate pain, especially if a shipment of 50-100 files is delivered.

I would like to rewite the script in python, purely because it makes it easier to run on multiple Operating Systems. Bash works really smoothly in OSX/Linux, but it gets quite awkward getting it to run on Windows. Python also has built in xml so it doesn’t require calling out to xmlstarlet.

If you’ve any questions, just hit me up on twitter with the user name kieranjol, and my email is in the about page.