ClipToolz Convert v3 still broken for Hitfilm use

I've occasionally experimented with Cliptoolz V2 out of interest, but found the formats I wanted to convert exhibited large changes to the final colours in the files, so weren't usable.

These are the only three formats I'd tried before and recently tried again, playing with a clip uploaded by @jensfieldler in another thread.

ProRes HD (All options) changes the palette significantly - See attached images and flip between them to see the image change, as well as the histogram.

ProRes 4K gets it 98% correct, with very small changes (sometimes up, sometimes down on different frames) to all the values displayed in the Histogram and the image looks fine.

DNxHD HD gets the values mostly correct, although there are some gaps in the Histogram, which looks like the range was compressed from 0-255 to 16-235, then stretched back to 0-255 again. Image looks fine, although it's not the best choice to see if banding or any other artifacts are visible.

As the Luma range button is stuck on 16-235 and 0-255 is not selectable for DNxHD, and I suspect the original .MP4 file was 0-255, I'm not sure what you're supposed to do to avoid this.

You will also notice as you flip through the images: that the DNxHD one is a different frame. The playhead was not moved; the first frame is actually duplicated, so an extra frame is added at the beginning and one is lost at the end of the clip.

That was all found previously with with ClipToolz V2 (which was free) - but I just tried with a demo of Convert V3 to see if it was worth buying, and it does exactly the same thing in the same way. The screenshots were taken from the output of V3. So that's a no.

I've sent the results to the developer a few days ago, but he's not responded yet, so until he fixes it: just be aware of the limitations when converting files for use with Hitfilm.

Also: I do miss the ability to drag'n'drop files into it that V2 allows, that's been removed from V3 for some reason.

Comments

What you're seeing all comes down to color space issues with the codecs involved not ClipToolz or FFMpeg/FFMBC. Most intermediate codecs force broadcast color standards whether you want it or not.

ProRes HD enforces Rec 709. How it handles full range data depends on how the data is presented to the codec. YUV full range will produce a different result than full range RGB and for whatever reason RGB tends to be more accurate with full range YUV possibly being responsible for the "QuickTime Gamma Bug".

The solution, in so much as there is one, is to make sure you're feeding the encoder with "legal" levels in the right color space to begin with. For DNxHD that means color correcting/grading full range footage to 16-235 in YUV before sending it to the encoder.

@Aladdin4D, well, whatever is responsible, as a tool to simply allow smoother editing in Hitfilm (it's been suggested by everyone who's recommended it that you just drag files in, use the resultant output in Hitfilm and edit as normal, but now much smoother and more responsive) it's not very helpful.

If DNxHD conversion is supposed to work as you suggest: then it's not doing that properly either. There is no clipping of the range, just gaps in the middle; the shapes of the two ends of the range are identical to the original. And the first frame duplication is an annoying bug.

I've tried using the ProRes 4K output, but I'm not sure the resulting files, at approximately 10x larger, are worth the effort (and that's using the LT setting). I'd like to use the ProRes HD LT output, as although artifacts are very visible in fast motion sections: it's fairly manageable as a proxy, to be swapped out for the final render. But, the colours being messed up makes it utterly pointless.

MPEG Streamclip will do the same. Again it's how the codec deals with full range input not the tool or codec implementation that's causing the issue. Do note the forum discussion I linked to about issues with DNxHD because that's users using DNxHD designed and developed by Avid in Media Composer also designed and developed by Avid making a 100% Avid native solution generating the exact same issues when dealing with full range input. The fact that Cliptoolz and/or FFMpeg/FFMBC do the same thing is a sign of them having done a very good job of duplicating Avid's own output and not an anomaly or bug..

"If DNxHD conversion is supposed to work as you suggest: then it's not doing that properly either. There is no clipping of the range, just gaps in the middle; the shapes of the two ends of the range are identical to the original."

Ok it's not my suggestion it's just a fact. You get the gap pattern when values are clipped or if you prefer chopped out of existence and the remaining values remapped. Let's say an instructor gives you 8, one centimeter blocks ranging in color from black to white and tells you to arrange the blocks along an 8 centimeter line from black to white. The sides of the blocks would have to be touching because that's the only way they'll all fit along the 8 centimeter line. Now the instructor tells you to remove the first and last blocks but also tells you the remaining blocks still need to take up the full 8 centimeters of the line. The only way to do that is to rearrange the remaining blocks creating gaps between them.

@Aladdin4D, OK, so the only useful solution for use with Hitfilm is ProRes 4K, if coming from 0-255 .MP4 files.

Yes, I (think) I understand about clipping, and from your own example, it's not doing it. If it was, there would be no Full White and Full Black, but there is in the Histogram.

What it's (apparently) done is compress 0-255 to 16-235, throwing away the values where there would effectively need to be 2 stored in a single 'slot', then stretched it all back out again. So, 0 is back where it was and 255 is back where it was, but the ones that got lost because there weren't enough slots available are now just blanks.

So, to take your example: it's squashed 8 numbers into 6 slots, lost numbers 3 and 6, as they don't fit, leaving 1,2,4,5,7,8, then stretched that back out to 8 slots, with gaps where 3 and 6 used to be.

My understanding of clipping is that there would be flat spots at the left and right sides of the Histogram, where the first and last values (0-15 and 236-255) are removed. If it did that and then stretched it back out over the full 0-255 range again, the Histogram would have a very different shape.

Encoder will not accept full range input so anything outside the accepted range is truncated.

Profile says the output has to be full range so on output the encoder takes the remaining values and remaps them to the full range profile created earlier.

The result is a clip with full white and full black 0-255 but with gaps.

What you're describing is right to a point with other encoders (think your ProRes results) just not DNxHD. A DNxHD encoder will not accept any "illegal" values on input and will always always always truncate anything outside the accepted range even though you're telling the encoder its output needs to be full range. It has the full range profile demanding 256 different values but it only has 220 values left to work with so it starts at 0 and works its way up to 255 leaving gaps along the way to make up the difference. Again this isn't just a suggestion or a theory on my part. This annoying behavior is incredibly well documented, verified, duplicated, confirmed, proved, SMPTE Standardized with sample code etc etc etc etc many many times over.

@Aladdin4D I appreciate the help, but I think you've got "I know I'm right, so I'll ignore the evidence" blindness.

Download the images from PhotoBucket, put them on your PC, Use Windows Preview to view and cycle through them. Specifically, flip back and forth between the Original (ie.the MP4 file as it came into Hitfilm) and the DNxHD one.

There is no truncating going on. The full range of 0-255 is represented (with very minor adjustments), but with gaps.

@Palacono I think you and Aladdin at this point are seeing the same thing, but describing it in different ways. The results you are describing--a full range image with gaps--is exactly what Aladdin is describing. Aladdin description is a bit more precise, and your description/definition of clipping is a bit wrong, but, end result is still a truncated, then stretched signal with value gaps.

The source of all this range buffoonery (as fun a word as "truncate" @DrFnord) btw is old broadcast standards. Broadcast video bottoms out at 16,16,16 for black because the bandwidth for the lower values, "superblacks" is reserved for timing/sync codes. In broadcast video luma is measured in "IRE" units where the video range is measured in 0-100. 16,16,16 black is 7.5 IRE units and white is 100 IRE units. (If your camera has a "setup," "floor," or, "pedestal" menu that says "0/7.5" this is setting your black level.) However, 100 IRE units is 235, 235, 235 white. 255, 255, 255 is 109% IRE. 100-109 IRE is called "superwhite." A video image with superwhites bleeds into the audio--so if you ever see a TV ad where whites cause buzzing audio you know it's an improperly mastered file with superwhites...

To make matters more annoying older cameras might clamp at 8.5 to 100 IRE (analog cameras, DigiBeta, etc) while MiniDV records 0 to 109 IRE. A pain when MiniDV cams like Canon Xl1s started being used by news crews.

So, broadcast video (Rec 607) has a narrower bandwidth than computer video (Rec 709). HD TV broadcast still uses a Rec 607 colorspace. ProRes and DNxHD are both codecs more-or-less designed for dealing with broadcast standards, but the output settings are at computer standards. Therefore the encoder is rolling off the illegal 607 levels, but expanding the range for 709 output. Incidentally, broadcast video is a "YUV/Lab" colorspace where video is three channels--Y/L for Luma, U/a for Green-Magenta chroma and V/b for Blue-Yellow chroma. Computer video is, of course, RGB.

This doesn't solve the issue, but I thought you'd appreciate the "why" of it all.

Please note that the above specifically refers to NTSC video (North America) and other broadcast standards like PAL, SECAM and NTSC-J actually have wider color gamuts, but, since at the time of standardization the US was the leading media producer NTSC rules kinda bled over to everyone else.

"Clipping" by the way is referring to over/underexposed footage where lots of the image hits the edge of the colorspace. A large area a white/superwhite rolls off to pure white which just means subtle details are lost, since you don't have "near-white" details to enhance with contrast/brightness adjustments. Same with black. However, a Rec 607 truncate followed by a Rec 709 stretch won't change the shape of a histogram too much. You'll get a little more plateau at the pure black/white points, but the overall shape will be the same--just stretched a bit with the gaps.

Incidentally, to this day I still animate using a 16-235 luma range, even when my output is for computers. However, that 16-235 luma range gives a little extra headroom when doing final grading adjustment like contrast curves, or for doing glows or comping things in Add or Screen blend modes. I guess it's conceptually similar to how cinema cams shoot flat/low contrast footage to allow more latitude in grading, but, in my case it's a holdover from my broadcast days.

@Triem23 OK, well perhaps what confused things for me was using the word 'truncate'.

Truncate[truhng-keyt]

verb (used with object), truncated, truncating.

1. to shorten by cutting off a part; cut short:

Truncate detailed explanations.

2.Mathematics, Computers. to shorten (a number) by dropping a digit or digits:The numbers 1.4142 and 1.4987 can both be truncated to 1.4.

As far as I can see: No values at either end of the range have been "cut off". The full range was compressed, then re-expanded, leaving the gaps. i.e the original value for 'slot' 255 is still there, as are most of the others from the range 236-255 and 0-15.

I.e, it has quite evidently not taken the values 0-15 and 236-255 and discarded them and expanded the values for 16-235 into the original full 0-255 range. If it had the Histogram would look very different.

So why are we using the word truncate instead of compress?

Whether it is supposed to be doing what it does (rather than actually truncating the values ) according to whatever standard is valid, I have no idea. All I know is that the tool gives non-intuitive results (IMO), as well as having the duplicated frame bug for DNxHD.

I don't use DNxHD for myself. I have generated them for tests only. I have generated DNxHD from Vegas and from Convert. For easy (lower overhead) to edit stuff I use for myself I use my own AVC transcode.

Convert does not have "issues". The issue is the whole mess of "video levels" versus computer (full range) levels. There is no hard and fast rule. Where this commonly comes up is with camera sources that are full range. Most DSLR output and GoPro are prime examples.

Video levels being quoted as 16-235. The reason this exists is good old analog TV signaling. Our digital world is very much affected by the old analog. First levels 1-15 for example are not illegal. They are legal but simply outside the normal use range. In rec709 values 0 and 255 are illegal. Analog monitors were all over the map with what they displayed and how they were adjusted. Thus the fuzzy factor in signaling. Digital is not fuzzy but we keep the subrange for "normal" levels. Computer video players pretty much all expand 16..235 to 0..255 on playback since that is what a monitor wants.

With all that said, what is "correct"?

ffmbc/ffmpeg via Convert seem to do a conversion to video levels when given a full range source and encoding to DNxHD. Is that wrong? I don't think you can say that with absolute certainty. I don't think the DNxHD bitstream can distinguish between full range and video levels. AVC/H.264 for example does have such a mechanism and GoPro and DSLRs I have looked at do correctly mark the video file as full range. Interestingly most video players ignore this flag. Quicktime does honor the flag. Hitfilm 4 update 1 started honoring the flag.

Really to DNxHD, or with any codec, 8-bits is just 8-bits. They just encode what you give them. Valid, invalid, right or wrong. 8-bits of data is just 8-bits.

I can get ffmpeg to keep full range data encoding to DNxHD/Prores. I simply use "-pix_fmt yuvj422p". The normal and default 422 format being yuv422p. As you stated Convert does not let you check a level option with DNxHD. It probably should/could.

Even with full range data in DNxHD it still comes down to what does the app reading the data do. Nothing, or a video to computer levels conversion. Different apps might do different things.

What I have done from Vegas. Vegas does not alter data on decode. Whatever the data is, they leave it up to you the user to properly interpret it. 8-bits is 8-bits. Vegas does the same thing when passing data to its various encoders. Does an encoder expect video levels for "proper playback". That is up to the user to worry about what their source, timeline and encode requirements are.

I have encoded full range data from Vegas to DNxHD and played it back in Hitfilm and nothing was clipped. So Vegas did not alter my data and the Quicktime encoder did not alter my data. What was on my timeline ended up exactly in the file.

So this is different than ffmbc/ffmpeg. Which is "correct"? I'm not going to try and answer that, but I will simply state that this whole levels thing is just a mess. I got seriously sick of answering levels encoding problems in the Vegas creative cow forum. Always DSLR and GoPro users. I was one of those until I learned about this levels thing. I was adjusting the shadows in my GoPro footage to look proper on playback even though it looked fine in the Vegas viewer. After learning the levels mess and how to adjust data for encoding, the viewer and playback matched as expected.

Hitfilm tries to hide some of this levels mess from its users. Hitfilm data is always full range. Hitfilm only displays on a computer monitor window and this (full range) is what you want for proper display here. When Hitfilm encodes to MP4/AVC it does a computer to video levels conversion so playback will look the same as your Hitfilm viewer display. Encodes to image sequences remain full range which makes sense because they are destined for use in an editor. Only a final playback stream, post grading, cares about precise levels. Let the last editor in the chain worry about that.

In Hitfilm, MP4+AVC if not marked as full range has a video to computer levels conversion done on import/decode. It will clip/truncate levels on import. As stated, in HF4 U1 the full range flag is honored and the conversion is not performed. Quicktime AVC (mov files) also have a levels conversion performed for non full range flag settings. Curiously the values are not exactly the same as a Hitfilm native MP4 decode so I assume Hitfilm is having Quicktime do this levels computation. I am less sure about DNxHD via Quicktime but I do not see clipping like I do with suitable AVC sources. Never tested MPEG-2 (aka XDCAM and similar).

"Really to DNxHD, or with any codec, 8-bits is just 8-bits. They just encode what you give them. Valid, invalid, right or wrong. 8-bits of data is just 8-bits."

Partially true or true to a point but the reason we still have "legal" levels isn't just about legacy analogue signals and what the data is does matter.. Rec 709 and what we commonly refer to as YUV today in the digital world (to be pedantic really YCbCr) were designed for a digital world not an analogue one so what use does limiting things to 16-235 have in a digital world? Accuracy and speed when converting RGB to YUV and YUV to RGB.

RGB24 to YUV is the easier of the two and if you limit the values to 16-235 you gain the ability to use an integer based approximation (speed) to do the conversion that's pretty accurate and the resulting YUV data can be converted back to RGB24 without many issues. RGB24 0-255 can be converted to YUV using an integer based approximation but you have to be willing to sacrifice some accuracy and the resulting YUV data cannot be converted back to RGB24 without running into problems.

YUV to RGB24 is always problematic. YUV can represent a much wider range of information than can ever be duplicated in RGB24. Converting 709 YUV to RGB24 returns a nominal rangeof 16-235 but will on occasion return values below 16 and above 235. Full range conversion will return values that are totally outside of anything RGB24 can handle with no way to recover or accurately remap. The only ways around it are to increase the bits per pixel and resorting to floating point for everything.

Blimey, OK, I'll just insert these images here then to better illustrate what I was talking about.

The bottom line is: ProRes 4K is the only one that messes about with the colours the least, and if what @Aladdin4D says (in his PM to me) about what should be happening is correct: Convert does not do the truncate and stretch properly on DNxHD, it does compress and stretch, which would almost be semi-useful as a 'least bad' method of doing whatever it thinks it needs to do...if it didn't duplicate the first frame and chop off the last by mistake.

Apparently, it should be doing something like the third image (i.e. with gaps), i.e 16 gets remapped to 0, and 236 gets remapped to 255 etc.

The pictures are showing exactly what Aladdin described and what I would expect to see, honestly.

My guess is ProRes 4k not screwing up colors is because 4K isn't a broadcast standard (yet) and when 4K becomes a broadcast standard it will throw out all this colorspace noodling that's tied to a long-dead analog format. With the other three codecs there's still faffing about with color transitions because of Rec 607 vs Rec 709.

@Triem23, well @Aladdin4D explained this to me in a PM, so now I don't know what's what any more, because it's not doing that.., but you say the images are correct.

"Full range with 256 distinct values 0-255 as input and you want the same full range output. The DNx encoder snapshots a profile but it will not accept 256 distinct values it only accepts 220 distinct values represented by 16-235 so anything below 16 or above 235 is tossed out, cut off, discarded, destroyed, truncated, it doesn't matter what word you use it's just gone never to return.

On output the remaining values get remapped to full range 0-255 using the snapshot profile in an attempt to maintain some accuracy. It's remapping, 16 does not remain 16 it's remapped and changed to 0 with another value up the chain being remapped and changed to 16. Likewise 235 doesn't remain 235 it gets remapped and changed to 255 with another value down the chain getting remapped and changed to 235."

One way to compare transcode quality that I always do is the following. In Hitfilm speak.

Put the original on a layer.

Put the transcode on a layer above the original. Set the blend mode to Difference.

Put a grade layer above these and add the Exposure effect and set it to something like 3-5.

On something that is "perfect" you will always get a pure black result even with the grade layer jacking the exposure. No lossy transcode is perfect.

With good transcodes, and the grade layer OFF, it will still appear black. The diffs should be that small. The exposure adjust of the grade layer boosts the differences to the point you can see them and also where you monitor displays them well.

Convert is just a frontend to ffmbc/ffmpeg. ffmpeg is a fine tool but it certainly has its idiosyncrasies like any software. Some you cannot work around. One has to separate the codec from the encoding tool. Codecs really don't care about anything, but an encoding tool/system might mess with your data doing something that it thinks is the correct thing to do for a given situation. Different apps tend to behave differently with regards to out of range video.

Generally with a full range source like GoPro, and using ffmpeg, I have found the best transcode is to stay in AVC and set the encode output full range which AVC supports as does ffmpeg. This gets the ffmpeg front end out of the way.

With DNxHD output I see ffmpeg messing with the data some and Vegas not messing with it. Doing a video to computer adjustment on the ffmpeg output gets it close to the Vegas. It cannot be exact since the double conversion causes some loss and I am magnifying the differences hugly with exposure.

I'm not sure Youtube is the best way to show what I see but what the heck. With the exposure grade layer it shows up better. This has a lot of clipped sky through the trees and clipping always shows more difference than normal video in my experience. Note: the fish example is also clipped a lot in the blue channel.

Since there was some talk about Prores in this thread I looked at an ffmpeg Prores encode difference comparison. The result is interesting. We can see some block tiling going on in the difference. It may be related to the clipped sky areas. Also there can likely be different encoder parameter for various areas of the image which also may, or may not, be related to clipping.

Prores is always 10-bit (AFAIK) and DNxHD can be 8-bit or 10-bit and that might have something to do with it. My DNxHD test was only 8-bit.