Opened 18 months ago
Closed 18 months ago
#1598 closed defect (fixed)
Y4M reader not handling chroma-format `420mpeg2` correctly
Reported by: | mindfreeze | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | VTM-21.0 |
Component: | VTM | Version: | VTM-20.0 |
Keywords: | Cc: | ksuehring, XiangLi, fbossen, jvet@… |
Description
Hi,
Background
We have been trying to integrate VVC into a popular compression testing framework, AreWeCompressedYet (AWCY)[1]. Typically we store the input videos as Y4M files for ease of file handling and maintainability. We have the initial support of VVC_VTM in an experimental YUV pipeline, where the videos are converted from y4m->yuv on the fly. it is problematic when we want to scale the jobs. Thus y4m is useful as other encoders are built and tested with y4m for many years.
Y4M is a not-so-well-defined spec, [in fact no official spec per-se! (?)], so different authors/encoders have different implementations. Thus implementing is a bit tricky.
One page which people tend to refer to as a loose spec is the multimedia wiki[2], which is obviously not fully defined, thus discrepancies are there.
VVC have initial support for encoding and decoding of Y4M files, which works well for most cases. I have been testing and comparing the YUV encoding pipeline with the Y4M encoding pipeline for various videos typically available in our xiph's Media collection[2] and other public resources.
In the testing, there was this edge case where the VVC-VTM is not handling c420mpeg2 Y4M files correctly.
Not handling in the sense, the bitstream in a YUV pipeline is mismatched from the same file in Y4M pipeline. At the same time if hack the chroma-format to be 420 or 420jpeg it works and gives bit-identical output
What is 420mpeg2?
I tried to have a bit of digging into this, and it seems to be there since ~2006 in the mjpegtools[3]. And other public implementations of y4m handlers[4,5,6,7, 8]
CLI from yuv4mpeg 420mpeg2 - 4:2:0 MPEG-2 (horiz. cositing)
Many samples on the internet are with 420mpeg2.
Proposed solution
Typically in y4m readers, say HDRTools, FFmpeg or any other libraries which supports many y4m inputs, it is handles in the input parsing is the same as other 420 input formats[4,6,7,8].
So one quick fix would be handling it in the same fashion as 420jpeg in VVC-VTM[9] to mimic other public implementations.
Samples
https://media.xiph.org/video/aomctc/test_set/b1_syn/EuroTruckSimulator2_1920x1080p60_v2.y4m
https://media.xiph.org/video/aomctc/test_set/b1_syn/STARCRAFT_1080p60.y4m
https://media.xiph.org/video/aomctc/test_set/a4_360p/BlueSky_360p25_v2.y4m
Sample command line
EncoderAppStatic -i $INPUT.Y4M -c encoder_randomaccess_vtm.cfg --ReconFile=$INPUT-recon.yuv --QP=$X -b $INPUT-bitstream.bin
EncoderAppStatic -i $INPUT.YUV -c encoder_randomaccess_vtm.cfg --ReconFile=$INPUT-recon.yuv --SourceWidth=$WIDTH --SourceHeight=$HEIGHT --FrameRate=$FPS --InputBitDepth=$DEPTH --QP=50 -b $INPUT-bitstream.bin
Steps to reproduce
- Build latest VVC-VTM
- Encode a video with Y4M
- Convert the y4m to YUV
- Encode the same video in YUV pipeline
- Cross-check the md5 of the bitstream and a mismatch is observed.
- Modify the Y4M header to be non 420mpeg2[A.1]
- Repeat 1..4, and you will get a bit-identical bitstream as expected.
[1]: https://github.com/xiph/awcy
[2]: https://wiki.multimedia.cx/index.php/YUV4MPEG2
[2]: https://media.xiph.org/
[3]: https://mjpeg.sourceforge.io/
[4]: https://gitlab.com/standards/HDRTools/-/blob/master/common/src/InputY4M.cpp#L209
[5]: https://github.com/xiph/daala/blob/master/tools/y4m_input.c#L180
[6]: https://github.com/image-rs/y4m/blob/58375d69120a33e2a21320e17449e84e4de9949d/src/lib.rs#L249
[7]: https://gitlab.com/AOMediaCodec/avm/-/blob/main/common/y4minput.c#L909
[8]: https://source.ffmpeg.org/?p=ffmpeg.git;a=blob;f=libavformat/yuv4mpegenc.c;h=2fa5ee2714ddba9f15c998a9295f153b26a21985;hb=HEAD#l104
[9]: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/blob/master/source/Lib/Utilities/VideoIOYuv.cpp#L233
[A.1]: Steps to modify a header
echo "YUV4MPEG2 W640 H360 F25:1 Ip C420jpeg" > BlueSky_360p25_v3.y4m
tail -n+2 BlueSky_360p25_v2.y4m >> BlueSky_360p25_v3.y4m
Please do let me know if you have any questions,
Best,
V
Change history (15)
comment:1 Changed 18 months ago by fbossen
comment:2 follow-up: ↓ 5 Changed 18 months ago by mindfreeze
Hi Frank,
Thanks for the reply,
I tend to rely on daalatools y4m2yuv as it is part of our AWCY testing pipeline. We can totally use ffmpeg too (Works good),
If you do not have daalatools there, steps to build,
git clone https://github.com/xiph/daala.git daalatool
cd daalatool
./autogen.sh && ./configure
make && make tools
daalatools/tools/y4m2yuv -i $input.y4m -o $output.yuv
If you have FFmpeg, then it is also straightforward,
ffmpeg -i $input.y4m $output.yuv
I believe HDRTools should also do this but haven't tested it
Also, this is a sample script which I used for cross-checking YUV and Y4M any input video, where if you set the paths correctly for binary, it will encode 2 frames in both y4m and yuv and compute libvmaf for objective metrics, and then crosschecks using cmp and MD5: https://code.videolan.org/-/snippets/1775.
It is a bit janky but might serve as a base if you like to do more testing:),
Best,
V
comment:3 Changed 18 months ago by XiangLi
Thanks for the report. Could you clarify with which sequences you observed the mismatch, the following 3 sequences? Could you also provide the encoding command line for the case of mismatch?
https://media.xiph.org/video/aomctc/test_set/b1_syn/EuroTruckSimulator2_1920x1080p60_v2.y4m
https://media.xiph.org/video/aomctc/test_set/b1_syn/STARCRAFT_1080p60.y4m
https://media.xiph.org/video/aomctc/test_set/a4_360p/BlueSky_360p25_v2.y4m
comment:4 Changed 18 months ago by mindfreeze
I tested with a bunch of videos, probably close to around 144 videos in Y4M, and out of that, 17 were encoded but it was having mismatch with over YUV pipeline. Digging them further, all of them had c420mpeg2 as Y4M chroma-format tag. If we change them, it gives bit-identical output to YUV pipeline.
Yes out of the failed ones, these were one of the few, I believe these three samples might be enough to test. So I think any video with this tag would be wrong in VTM at the moment.
I believe the sample command line part in the ticket is sufficient as encoding command. I did give both Y4M and YUV commands. Secondly, the code snippet above gives a bash script to test YUV/Y4M pipeline.
Please do let me know if that is sufficient or not.
comment:5 in reply to: ↑ 2 Changed 18 months ago by fbossen
Replying to mindfreeze:
Hi Frank,
Thanks for the reply,
I tend to rely on daalatools y4m2yuv as it is part of our AWCY testing pipeline. We can totally use ffmpeg too (Works good),
If you do not have daalatools there, steps to build,
git clone https://github.com/xiph/daala.git daalatool
cd daalatool
./autogen.sh && ./configure
make && make tools
daalatools/tools/y4m2yuv -i $input.y4m -o $output.yuv
If you have FFmpeg, then it is also straightforward,
ffmpeg -i $input.y4m $output.yuv
I believe HDRTools should also do this but haven't tested it
Also, this is a sample script which I used for cross-checking YUV and Y4M any input video, where if you set the paths correctly for binary, it will encode 2 frames in both y4m and yuv and compute libvmaf for objective metrics, and then crosschecks using cmp and MD5: https://code.videolan.org/-/snippets/1775.
It is a bit janky but might serve as a base if you like to do more testing:),
Best,
V
Thank you for providing detailed instructions for the conversion. It seems that daalatools and ffmpeg do not produce the same YUV output for y4m files with the 420mpeg2 chroma format. I suspect daalatools might be filtering the chroma samples such that the YUV output always conforms to a same chroma format (including chroma sample location).
What would make most sense to me here is to update the VTM software to consider the chroma sample location signalled in the y4m file header (which it currently ignores) and signal it in the VUI. Also when writing an output y4m file, the exact chroma format would be derived based on the chroma sample location. In other words, if you feed a "420mpeg2" file to the encoder, the decoder would also output a "420mpeg2" file. AFAICT this would be similar to what the avm code does. However, your script using intermediate YUV files would need to be modified to produce the same result.
comment:6 follow-up: ↓ 7 Changed 18 months ago by mindfreeze
I suspect daalatools might be filtering the chroma samples such that the YUV output always conforms to a same chroma format (including chroma sample location).
Just to be clear, are you suggesting currently daalatools might be ignoring chroma sampling location when converting to YUV? Which is a different problem altogether.
On a quick check, https://github.com/xiph/daala/blob/master/tools/y4m2yuv.c, I do not see anything obvious on which handles chroma-format in daalatools. I will double-check with daala folks later.
FFmpeg do signal chroma-sampling location in y4ms i believe, https://source.ffmpeg.org/?p=ffmpeg.git;a=blob;f=libavformat/yuv4mpegenc.c;h=2fa5ee2714ddba9f15c998a9295f153b26a21985;hb=HEAD#l104
So do HDRTools, https://gitlab.com/standards/HDRTools/-/blob/master/common/src/InputY4M.cpp#L209
Also when writing an output y4m file, the exact chroma format would be derived based on the chroma sample location. In other words, if you feed a "420mpeg2" file to the encoder, the decoder would also output a "420mpeg2" file. AFAICT this would be similar to what the avm code does.
From my very little understanding, this makes sense and I too believe AVM do the same behaviour, ie. passthrough the info of chroma locations from the input to output file including frame-rate.[AVM handles it in the same way as VP9/AV1 does].
If there is a plan to touch input/output passthrough of Y4M, would be beneficial if frame-rate is also passed through for Y4M files. Since right now it is not and is always at 50fps due to ignoring the framerate for y4ms.
Thanks again for the reply:)
comment:7 in reply to: ↑ 6 Changed 18 months ago by fbossen
Replying to mindfreeze:
I suspect daalatools might be filtering the chroma samples such that the YUV output always conforms to a same chroma format (including chroma sample location).
Just to be clear, are you suggesting currently daalatools might be ignoring chroma sampling location when converting to YUV? Which is a different problem altogether.
AFAICT daalatools doesn't ignore the chroma sampling location, but assumes that the chroma sampling location is always the same for a YUV file. If the chroma sampling location of a Y4M file doesn't match the assumed chroma sampling location of the YUV file, a format conversion occurs when converting Y4M to YUV (chroma sample values are modified). ffmpeg doesn't seem to do such a conversion.
From daalatools:
else if(strcmp(_y4m->chroma_type,"420mpeg2")==0){ _y4m->src_c_dec_h=_y4m->dst_c_dec_h=_y4m->src_c_dec_v=_y4m->dst_c_dec_v=2; _y4m->dst_buf_read_sz=_y4m->pic_w*_y4m->pic_h; /*Chroma filter required: read into the aux buf first.*/ _y4m->aux_buf_sz=_y4m->aux_buf_read_sz= 2*((_y4m->pic_w+1)/2)*((_y4m->pic_h+1)/2); _y4m->convert=y4m_convert_42xmpeg2_42xjpeg; }
If there is a plan to touch input/output passthrough of Y4M, would be beneficial if frame-rate is also passed through for Y4M files. Since right now it is not and is always at 50fps due to ignoring the framerate for y4ms.
Frame rate should already be passed through if you feed a Y4M file to the encoder.
comment:8 follow-up: ↓ 9 Changed 18 months ago by mindfreeze
AFAICT daalatools doesn't ignore the chroma sampling location, but assumes that the chroma sampling location is always the same for a YUV file
Thank you for the clarification. I am on same page now:).
Frame rate should already be passed through if you feed a Y4M file to the encoder.
Sorry, i meant during decoding to y4m file.
Encode y4m-> get bitstream -> decode bitstream to y4m. During decoding it suggests assigning default frame-rate and keeps 50fps.
Fwiw, It is not related to this, i wonder should i open a different ticket for this?
comment:9 in reply to: ↑ 8 Changed 18 months ago by fbossen
Replying to mindfreeze:
Frame rate should already be passed through if you feed a Y4M file to the encoder.
Sorry, i meant during decoding to y4m file.
Encode y4m-> get bitstream -> decode bitstream to y4m. During decoding it suggests assigning default frame-rate and keeps 50fps.
Fwiw, It is not related to this, i wonder should i open a different ticket for this?
The decoder shouldn't default to 50fps if the encoder was given a y4m file. If that happens, then it's a bug. Opening a separate ticket would be appropriate.
comment:10 Changed 18 months ago by mindfreeze
The decoder shouldn't default to 50fps if the encoder was given a y4m file. If that happens, then it's a bug. Opening a separate ticket would be appropriate.
Thanks,
I just created a different ticket with some PoC examples there:),
https://jvet.hhi.fraunhofer.de/trac/vvc/ticket/1599#ticket
comment:11 Changed 18 months ago by fbossen
The following should address the issue
https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/merge_requests/2588
comment:12 Changed 18 months ago by mindfreeze
Hi,
I had a chance to test the patch,
On my testing, the non-c420mpeg2 stays intact, and when doing c420mpeg2 testing, it gives non-identical decoded streams (I would imagine the bitstream will be different due to extra HRDParameters in the stream now).
How I tested,
- A. Take the y4m input, encode it,
- B. Convert the Y4M2YUV using daalatools,
- Decode both A, and B.
- Compare the decoded stream+reconstructed files of A and B.
I am not very sure if using daalatools for conversion to YUV be OK, do you have any suggestions on how I should convert Y4M to YUV for testing this patchset,
Example 1: C420mpeg2 file, https://media.xiph.org/video/aomctc/test_set/e_nonpristine/Shaky_Baseball_3840x2160_5994fps.y4m
Could do this file too: https://media.xiph.org/video/aomctc/test_set/a4_360p/BlueSky_360p25_v2.y4m
Output MD5s
3ec0f53e1c3e19dfda084ee913eceb5f y4m/Shaky_Baseball_3840x2160_5994fps.y4m-32-2f-bitstream.bin
62eebdcb2d9b964d0a417bd622ecd747 y4m/Shaky_Baseball_3840x2160_5994fps.y4m-32-2f-decoded.y4m
37cdf2e9aa7a2549e3653d90a8f5c721 y4m/Shaky_Baseball_3840x2160_5994fps.y4m-32-2f-decoded.yuv
f63a1b1a6697e9b459b88c2894195220 y4m/Shaky_Baseball_3840x2160_5994fps.y4m-32-2f-recon.yuv
a96c26acbc030643cb3a3c9ec17c2030 yuv/Shaky_Baseball_3840x2160_5994fps.y4m-32-2f-bitstream.bin
19594df242194865208fd1ff017d2b2d yuv/Shaky_Baseball_3840x2160_5994fps.y4m-32-2f-decoded.y4m
64b002af90e6d6969bac8b8f3dec872d yuv/Shaky_Baseball_3840x2160_5994fps.y4m-32-2f-decoded.yuv
c7dd7cf2c39ad9914524bd7e213342a3 yuv/Shaky_Baseball_3840x2160_5994fps.y4m-32-2f-recon.yuv
Encode-output:
YUV Pipeline: 2 a 42.4000 35.7974 37.7242 38.4374 35.4319
Y4M Pipeline: 2 a 2540.9790 35.7997 37.7699 38.5086 35.4483
Example 2: C420jpeg file, https://media.xiph.org/video/aomctc/test_set/a5_270p/FourPeople_480x270_60.y4m
(Could do this too https://media.xiph.org/video/aomctc/test_set/a4_360p/BlueSky_360p25_v2.y4m)
Output MD5s
d7ec2bdc205b82c8d6a0b52dbcef6b00 y4m/FourPeople_480x270_60.y4m-32-2f-bitstream.bin
71183e20012ac982c64c1e3e11ec2721 y4m/FourPeople_480x270_60.y4m-32-2f-decoded.y4m
6a564f192a437addad6a253916105d27 y4m/FourPeople_480x270_60.y4m-32-2f-decoded.yuv
f503ecee677b4f4f211a783b93381e8e y4m/FourPeople_480x270_60.y4m-32-2f-recon.yuv
3448d6dfab5d84897a2425ee28517683 yuv/FourPeople_480x270_60.y4m-32-2f-bitstream.bin
71183e20012ac982c64c1e3e11ec2721 yuv/FourPeople_480x270_60.y4m-32-2f-decoded.y4m
6a564f192a437addad6a253916105d27 yuv/FourPeople_480x270_60.y4m-32-2f-decoded.yuv
f503ecee677b4f4f211a783b93381e8e yuv/FourPeople_480x270_60.y4m-32-2f-recon.yuv
Encode-output:
YUV Pipeline:2 a 5.5440 26.1547 35.0009 35.7001 27.6602
Y4M Pipeline:2 a 335.7600 26.1547 35.0009 35.7001 27.6602
If I load the bitstreams into YUVView to inspect, in the YUV pipeline's decoded file, I could see more visible chroma artefacts and different compression artefacts.
To cross-check if daalatools is doing some weird conversions or not, tested by loading source file of YUV and Y4M and compared it side-by-side, it was identical on quick visual inspection.
Relevant bitstreamsfiles if it is helpful: https://people.videolan.org/~mindfreeze/vvc-vtm/1598/
comment:13 Changed 18 months ago by fbossen
Testing with the most recent commit (fffeeda) in https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/merge_requests/2588 and the script below.
Things seem to work ok. Note that I am using ffmpeg for the y4m to yuv conversion to avoid the filtering that daalatools applies.
#!/bin/sh #src=/Users/bossen/Downloads/Shaky_Baseball_3840x2160_5994fps.y4m src=/Users/bossen/Downloads/FourPeople_480x270_60.y4m enc=bin/umake/clang-14.0/x86_64/release/EncoderApp dec=bin/umake/clang-14.0/x86_64/release/DecoderApp md5=md5 ffmpeg -i $src $src.yuv w=$(head -1 $src | tr ' ' '\n' | grep -E '\bW' | tr -d 'W') h=$(head -1 $src | tr ' ' '\n' | grep -E '\bH' | tr -d 'H') fr=$(head -1 $src | tr ' ' '\n' | grep -E '\bF' | tr -d 'F') chr=$(head -1 $src | tr ' ' '\n' | grep -E '\bC') loc=6 case $chr in C420jpeg) loc=1 ;; C420mpeg2) loc=0 esac echo $w $h $fr $chr $loc bs1=str1.266 bs2=str2.266 dec1=dec1.y4m dec2=dec2.y4m props="-wdt $w -hgt $h -fr $fr --ChromaSampleLocType=$loc --ChromaLocInfoPresent=1 -vui 1 -hrd 1 --ProgressiveSource=1" $enc -c cfg/encoder_randomaccess_vtm.cfg -i $src -v 6 -f 2 -q 50 -b $bs1 $enc -c cfg/encoder_randomaccess_vtm.cfg -i $src.yuv -v 6 -f 2 -q 50 -b $bs2 $props $md5 $bs1 $md5 $bs2 dopt="--OutputBitDepth=8" $dec -b $bs1 -o $dec1 $dopt $dec -b $bs2 -o $dec2 $dopt $md5 $dec1 $md5 $dec2 head -n 1 $src head -n 1 $dec1 head -n 1 $dec2
comment:14 Changed 18 months ago by mindfreeze
Things seem to work ok. Note that I am using ffmpeg for the y4m to yuv conversion to avoid the filtering that daalatools applies.
Thanks for the explanation,
This makes sense,
On testing here, I can also confirm that with the proposed patchset resolves the issue;),
comment:15 Changed 18 months ago by XiangLi
- Milestone set to VTM-21.0
- Resolution set to fixed
- Status changed from new to closed
The fix (!2588) has been merged.
Thank you for the detailed report. Could you clarify how you convert y4m to yuv (step 2 to reproduce the issue)?