I understand it is 10-bit Y'CbCr 422. You are right, I have not really had much success reading files into MATLAB that are not in RGB space.

The alternative would be 'perhaps' to read in a binary file and parse the data. I am not sure if I can use any such file format/extension/container available in ffmpeg to generate the binary file(s). I recall not having much success with *.yuv files which were sent to me by someone I was collaborating with a few months back. They ended up packing the data into TXT files and marking Y,U,V channels separately, solely because we were handicapped by MATLAB and were short on time for some analysis.

Alternatively, if I know what color space conversion is used (say based on BT.601 or BT.709), I can handle the RGB data as long as it would be 10-bits. (I think the option would be to use a 16-bit container for the 10-bits as 10-bit support is not offered in ffmpeg for good reasons). I could also use some help here (which commands to use) to get a handle on the color space conversion and also generate RGB image sequence. Consequently, (hopefully) use these images to combine back to v210 encoded video sequence.

Alternatively, if I know what color space conversion is used (say based on BT.601 or BT.709), I can handle the RGB data as long as it would be 10-bits. (I think the option would be to use a 16-bit container for the 10-bits as 10-bit support is not offered in ffmpeg for good reasons). I could also use some help here (which commands to use) to get a handle on the color space conversion and also generate RGB image sequence. Consequently, (hopefully) use these images to combine back to v210 encoded video sequence.

ffmpeg doesn't have a native 10bit RGB implementation . It's 10bit RGB implementation (gbrp10le or gbrp10be) is actually stored as YUV, and uses a lossless RGB<=>YUV transform function. The problem is matlab probably isn't going to understand that, unless it has ffmpeg to translate or uses the same lossless transform function

ffmpeg does have a native 16bit RGB implementation, and more common image formats like TIFF, PNG, SGI support it. But then you introduce other variables in the 10bit 422 => 16bit RGB conversion. Not only the matrix used, but the chroma up (and downsampling if going back to YUV 422) algorithm used. In theory, only the "Nearest Neighbor" or point resizing is reversible & lossless. I guess if you're "mainly" interested in the Y' component, then the chroma issues are less irrelevant

(dpx/cineon image sequence can hold 10bit YUV 422 as one of the variants, but ffmpeg doesn't offer it, only the RGB implementation )

Yes, there are losses from round tripping it. Binary compare, amplified differences don't match . There are always some rounding loss from going Y'CbCr<=>RGB as they are non overlapping color models

If you still want to pursue this, the syntax is -pix_fmt bgr48le for 16bit RGB. You can specify a scaling algorithm by usings -sws_flags . e.g. nearest neighbor would use "neighbor". I think default is bicubic

The "%04d" is the number of placeholder digits in the image sequence. So output%04d.tiff would be output0000.tiff , output0001.tiff etc.. ; where as output%05d.tiff would give output00000.tiff, output00001.tiff , etc....

You cannot force a matrix in ffmpeg, it uses Rec601 for almost everything (or at least it used to), unless the source file has a Bt709 flag, the output is v210 (it automatically uses 709 for v210 output from an RGB source, even SD dimensions) or unless you use the colormatrix filter (but that filter changes the values in YUV before the conversion to RGB, and is an 8bit filter)

I did try the command you mentioned, and it returned me 16-bit per channel deep image sequence (auto selected rgb48le instead of bgr48le). Loaded into MATLAB to plot histograms and check ranges to just verify all the 16 bits are being used. I was just curious to see how this is true 16-bit while using all of them.

I think I understand from your description that video to image sequence used Rec601 by default and image sequence to video (output v210) uses Rec709. Correct me if I am wrong.

Let me briefly describe what may be the small problem/missing piece in my workflow and the potential solution.

1. I will convert my video sequence to RGB (48bit), and manipulate in MATLAB (add a frame index as text for alignment later).
2. Convert these sequences (images) back to v210 encoded AVI sequence.
3. This AVI sequence is used to test performance of an encoder/decoder. I can also capture decoder output in v210 encoded AVI on a disk.
4. To compare my reference (in step 3) to decoder output, I only need to extract Y' channel from both these sequences. These two sets of Y' data as long as readable in MATLAB would suffice for my workflow.

How I arrive at end of Step 2 is perhaps not super important. Chroma interpolation, color space conversion (back and forth) would be 'ok' theoretically.

As a side bar, how can I convert the image sequences to a v210 encoded sequence by specifying frame rate etc.

I would really appreciate any pointers. Thanks again for walking me through some of the commands and getting me started.

I did try the command you mentioned, and it returned me 16-bit per channel deep image sequence (auto selected rgb48le instead of bgr48le). Loaded into MATLAB to plot histograms and check ranges to just verify all the 16 bits are being used. I was just curious to see how this is true 16-bit while using all of them.

So what did matlab show ?

I think I understand from your description that video to image sequence used Rec601 by default and image sequence to video (output v210) uses Rec709. Correct me if I am wrong.

The thing is ffmpeg is always changing, there are commits almost daily. So what I say now, might not be true tomorrow.

ffmpeg used to use Rec601 by default for all YUV<=>RGB conversion (I don't know if that's still the case), unless the output was v210, then it would automatically switch to 709. That would potentially screw up your workflow, because if it used 601 convering to RGB, then 709 back to v210.....

And it never used to read source flags, but it does now . You have to verify / test the results to be sure. I might run some quick tests later if I have time

Some of the new filters might give you more control. For example -vf scale now has an 'in_color_matrix' , ‘out_color_matrix’ switch. But you have to be careful , because some of the filters work in 8bithttps://www.ffmpeg.org/ffmpeg-filters.html#scale-1

1. I will convert my video sequence to RGB (48bit), and manipulate in MATLAB (add a frame index as text for alignment later).

You probaby want to sort out for sure which matrix is used for the conversion. If I have time I'll look into it later

As a side bar, how can I convert the image sequences to a v210 encoded sequence by specifying frame rate etc.

Use -r for FPS e.g. -r 30

Otherwise uses the same sprintif syntax for the image sequence as input:

It's not necessary to input the -pix_fmt, because ffmpeg will automatically use " -pix_fmt yuv422p10le" when "v210" is specified

eg.

Code:

ffmpeg -i input%04d.tiff -r 30 -c:v v210 -an output.avi

And you probably want to check/confirm what matrix is being used for the trip back to v210

4. To compare my reference (in step 3) to decoder output, I only need to extract Y' channel from both these sequences. These two sets of Y' data as long as readable in MATLAB would suffice for my workflow.

Yes , confirmed, 601 is still used by default for Y'CbCr => RGB conversion, AND back RGB => Y'CbCr. Even from RGB to v210 (I was wrong above about v210 automatically using 709, or something has changed)

Both options are imperfect, there is more chroma aliasing when you view test charts, probably not noticable on "normal" content

And RGB to YUV should be the same for -vf scale, but -vf colormatrix would probably be bt601:bt709, but I didn't these these because if you use the default 601 for both trips, it should be Ok, and it looks cleaner (less chroma aliasing)

Thank you! I am only using ffmpeg on the mac. AVI is still the desired format.

Observing something odd: The original sequence is 10 sec and 2.59 GB . The reconstructed sequence is 19 or 20 sec and 2.59GB also. Somehow the reconstructed sequence also not 'look' right (seems like judder), when played back in VLC or QT Pro. Windows explorer shows original sequence at 2211857 kbps while reconstructed one at 200 kbps (a little unbelievable as the quality would look really bad at this rate).

To say the least I am confused :-S

EDIT: The original sequence is also at 50fps. I did not change the frame rate which could possibly introduce judder.

I wouldn't pay any attention to that 200kb/s. v210 is uncompressed, so always the same bitrate (might be a few bytes difference between different headers , metadata or container differences), but the coded frame size will always be the same for each frame. It's analgous to how a BMP will always be the same size for a given frame dimension and bit depth

I wouldn't pay any attention to that 200kb/s. v210 is uncompressed, so always the same bitrate (might be a few bytes difference between different headers , metadata or container differences), but the coded frame size will always be the same for each frame. It's analgous to how a BMP will always be the same size for a given frame dimension and bit depth

Filesize = Bitrate * Running Time

Thank you! That helps.

I need to figure out now how to extract Y' - something I understand from the files without doing color space conversion and chroma interpolation. I believe we are still mixing some chroma information in generating the Green Channel in case of RGB48. Green Channel could otherwise be treated as Y' but this conversion is a little more intricate. I wish there was a simpler way..

I need to figure out now how to extract Y' - something I understand from the files without doing color space conversion and chroma interpolation. I believe we are still mixing some chroma information in generating the Green Channel in case of RGB48. Green Channel could otherwise be treated as Y' but this conversion is a little more intricate. I wish there was a simpler way..

Yes, as soon as you go into RGB, there is some loss, both due to rounding and illegal (negative) values from non overlapping colorspace. You would have to avoid RGB completely

ffmpeg doesn't have a 10bit greyscale pix_fmt, only 8bit and 16bit. If your source was 8 or 16bit I think you could use -pix_fmt gray for 8bit or gray16be / gray16le for 16bit .

I need to figure out now how to extract Y' - something I understand from the files without doing color space conversion and chroma interpolation. I believe we are still mixing some chroma information in generating the Green Channel in case of RGB48. Green Channel could otherwise be treated as Y' but this conversion is a little more intricate. I wish there was a simpler way..

Yes, as soon as you go into RGB, there is some loss, both due to rounding and illegal (negative) values from non overlapping colorspace. You would have to avoid RGB completely

ffmpeg doesn't have a 10bit greyscale pix_fmt, only 8bit and 16bit. If your source was 8 or 16bit I think you could use -pix_fmt gray for 8bit or gray16be / gray16le for 16bit .

Or you can read up on the lossless GBR matrix , the lossless YUV<=>RGB transform equation, maybe program it into matlab . RCT (reversible color transform) is used in FFV1 for example

Thanks a lot for the help! Sorry I went on a break and now back to tackling and moving forward.

I am using the 'filter_complex' and 'extractplanes' as follows though I get y.avi to be encoded again in some compressed format. The quality is quite bad and for a 2.7 GB sequence, the y.avi is only 6.8 MB.

I wanted to try out this filter and see what I get in the luma and chroma streams.

There is no 10bit format for greyscale in ffmpeg . The chroma values are greyed out with this

Code:

ffmpeg -i input.mov -vf extractplanes=y -c:v v210 -an output.avi

But the filesize is the same, because it's still v210 . Even though the CbCr are set to 1 value, they are still present. That will affect standard conversion to RGB format using normal methods

If you export to image sequence, you should be able to use -vf extractplanes with -pix_fmt gray16be or gray16le (difference between be and le is big vs. little endian) . Not sure what image format supports that directly, but it looks like TIFF can

Not sure what happens if you use extractplanes with -pix_fmt rgb48le. In theory, it should only be giving you the Y channel exporessed as 16bit RGB

There is no 10bit format for greyscale in ffmpeg . The chroma values are greyed out with this

Code:

ffmpeg -i input.mov -vf extractplanes=y -c:v v210 -an output.avi

But the filesize is the same, because it's still v210 . Even though the CbCr are set to 1 value, they are still present. That will affect standard conversion to RGB format using normal methods

If you export to image sequence, you should be able to use -vf extractplanes with -pix_fmt gray16be or gray16le (difference between be and le is big vs. little endian) . Not sure what image format supports that directly, but it looks like TIFF can

Not sure what happens if you use extractplanes with -pix_fmt rgb48le. In theory, it should only be giving you the Y channel exporessed as 16bit RGB

Gives a warning : Full chroma interpolation for destination format 'rgb48le' not yet implemented
Results in a 3 channel 48bit TIFF which has exact same R, G, and B channels. If I do a diff pairwise for the matrices in the 3 channels, I get zeros.

Results in a grayscale 16-bit image. Gave a message that gray16le was used by default instead of gray16be.
Resulting tiff image is single channel, almost 1/3 the size as in previous case.

I imagine here, the luma channel only is extracted and no color space conversion is happening?

Also realized in both the above case that the signal range was [4096, 60032] for one of my sequences. If I divide this by 2^6, I get 64 for the lowest gray level. Seems very reasonable for the standard black level for a 10-bit video. I therefore concluded that the data bits were packed in the 10 MSBs of the 16-bit word. Correct me if this is incorrect.

I think Case 2 fits my requirement quite well.

Questions:

I am trying to understand if I am doing any color space conversions, when I am going gray16le from v210 avi file?

I don't know what is ACTUALLY going on "behind the scenes" in ffmpeg, there might be colorspace conversions , rounding losses going on with any of these. You would have to look at the code or ask someone that knows the actual code

In 10bit YUV, 64-940 is "legal range" out of 0-1023 (analgous to how 16-235 for 0-255 in 8bit)

I don't know how valid this is, because there is no native 10bit Y' greyscale format in ffmpeg. If you did some tests on known test graphics/images (known values, in Y' for v210) , you might be able to figure out how valid these test are

Yeah, it appears to be happy to read the gray16le tiff. I am using the OS X version of MATLAB 2014a. I can also display the image within MATLAB without any data parsing or bit rearrangement.

The images are rendered also in Finder and in Preview app.

Out of curiosity I did the following:

Compare the channels from the RGB48 (they are same w.r.t. each other) to the single channel tiff file in gray16le. The difference comes to zero, which means they are both producing equivalent results and the color space conversions (whatever they are or not) are the same. Of course we are limited by 1 count of precision in integer numbers. Perhaps the luma channel is replicated since we instructed ffmpeg to extract only the y channel and it populates it to all 3 R,G,B channels

I am not sure if I understand myself the differences in the two approaches (case 1 and case 2 as you had suggested) to 'guess' if there is some color space conversion or not.

Yes, thank you for confirming that. I was also alluding to the 64-940 valid range for the 10-bits similar to the 16-235 luma (16-240 for chroma) for 8-bit.

I did a quick test round tripping it. MD5's don't compare (or any method of comparison such as amplified differences), so there are losses incurred somewhere. It might be from dithering during bit depth conversions

Neither original, method 1 , or method 2, match with each other

EDIT: it looks like a levels shift

In theory, you should just be able to copy over the values if you had a 10bit RGB image format. e.g 0-1023 in Y' is 0-1023 in 10bit R, G, B. That is "grey" when R=G=B. You should be able to copy it back over as well without any loss. eg. a Y' value =100, would be R=B=G=100 .

Yes ffmpeg is scaling the video . I used a "full" range v210 test video 0-1023 gradient , and result in the 1st trip to TIFF was scaled to "standard" range. I guess if your source v210 is all standard range it might be ok. Or you might be able to adjust a switch in -vf scale or swscale .

I did a quick test round tripping it. MD5's don't compare (or any method of comparison such as amplified differences), so there are losses incurred somewhere. It might be from dithering during bit depth conversions

Neither original, method 1 , or method 2, match with each other

EDIT: it looks like a levels shift

In theory, you should just be able to copy over the values if you had a 10bit RGB image format. e.g 0-1023 in Y' is 0-1023 in 10bit R, G, B. That is "grey" when R=G=B. You should be able to copy it back over as well without any loss. eg. a Y' value =100, would be R=B=G=100 .

Level shift as an offset or scaling?

I am sorry, I don't understand the round trip that you are referring to. Is that (v210 <--> RGB48le)? Or you mean using the test sequence with all gray levels (0-1023) (which do end up in 64-940 range post conversion), through method 1 and method 2 give different results?

I tested v210 => TIFF => v210 with both methods. Ideally the 2nd v210 should be bit identical to 1st v210 if it were a lossless round trip

Method 1 didn't equal Method 2 either , ie each gave different results. (both were clamped, but still not equal to each other)

EDIT:
Yes, it's definitely clamping the levels on the roundtrip , even with "standard" range input v210 , comparing greyscale only on both, ignoring chroma completely. It's visible to naked eye, just do the test with a "normal" video and view the Y' channel

I couldn't find any switches that work properly to set in range/ out range (vf scale is supposed to have in_range, out_range switches, doesn't work properly here maybe because it's not 8bit YUV)

I tested v210 => TIFF => v210 with both methods. Ideally the 2nd v210 should be bit identical to 1st v210 if it were a lossless round trip

Method 1 didn't equal Method 2 either , ie each gave different results. (both were clamped, but still not equal to each other)

EDIT:
Yes, it's definitely clamping the levels on the roundtrip , even with "standard" range input v210 , comparing greyscale only on both, ignoring chroma completely. It's visible to naked eye, just do the test with a "normal" video and view the Y' channel

I couldn't find any switches that work properly to set in range/ out range (vf scale is supposed to have in_range, out_range switches, doesn't work properly here maybe because it's not 8bit YUV)

Thank you for checking all of that for me, it is very valuable information.

I am attempting to better understand the roundtrip in method 1 and method 2. From the extracted luma channel, are you only reconstructing the grayscale v210 video? Video 1 and Video 2 (how do we go to v210 with only grayscale - luma channel tiffs?) do not match although they are clamped at the same video levels. I am correct? I am thinking if there are some colorspace conversions which are different in both cases. Thank you again!

I am attempting to better understand the roundtrip in method 1 and method 2. From the extracted luma channel, are you only reconstructing the grayscale v210 video

It's clamped during the first step, with either method to TIFF. The "roundtrip" is going from TIFF back to v210 . The CbCr channels are a single value as expected, not absent. v210 by defintion will still have CbCr because it's not grey10

So I thought doing the round trip back to v210 (from the greyscale TIFF) might unclamp or reverse it, and hopefully give you the original Y' channel but that' s not the case.

Video 1 and Video 2 (how do we go to v210 with only grayscale - luma channel tiffs?) do not match although they are clamped at the same video levels. I am correct? I am thinking if there are some colorspace conversions which are different in both cases. Thank you again!

Yes, video1 and video2 have the same approximate levels, and both are obviously clamped. ie. They look the same to each other with the naked eye, and are obviously different than the original when looking at the Y' channel; but video1 & video2 have bit differences that can be detected by various methods (difference testing, ssim, psnr, md5 etc...)

The clamp makes those two methods useless for analysis or any manipulations. I'm not sure what's causing it, maybe a bug in the extractplanes, because a "normal" conversion to 16bit RGB TIFF doesn't exhibit those issues