Opened 3 years ago
Closed 3 years ago
#1542 closed defect (fixed)
QPA algorithm does not work properly when internal bitdepth is 8
Reported by: | JaviCru | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | VTM-16.1 |
Component: | VTM | Version: | VTM-15.0 |
Keywords: | QPA, internal-bitdepth | Cc: | ksuehring, XiangLi, fbossen, jvet@… |
Description
When you enable the QPA algorithm (Perceptually optimised QP adaptation, -qpa 1 --SliceChromaQPOffsetPeriodicity=1) is enabled and the internal bit-depth parameter is the default set in the main .cfg files (--internal-bitdepth=10), the result is as expected, a considerable rate saving and an improvement in perceptual quality.
However, if we force the internal bitdepth to be 8 (--internal-bitdepth=8), the result is a considerable increase in bitrate compared to not enabling the QPA algorithm.
As an example, I have encoded frames 1, 101, 201, 301, 401 and 501 of the BQTerrace sequence in All Intra mode, with base QPs 22, 27, 32, 37 and 42. Here are the average results:
InternalBitDepth=8, QPA off
base QP Mbps MS-SSIM 22 2.659 0.996 27 0.989 0.986 32 0.503 0.979 37 0.268 0.966 42 0.138 0.942
InternalBitDepth=10, QPA off
base QP Mbps MS-SSIM 22 2.724 0.996 27 1.021 0.987 32 0.517 0.979 37 0.275 0.967 42 0.142 0.944
InternalBitDepth=8, QPA on
base QP Mbps MS-SSIM 22 3.388 0.998 27 1.561 0.992 32 0.601 0.983 37 0.309 0.972 42 0.167 0.952
InternalBitDepth=10, QPA on
base QP Mbps MS-SSIM 22 2.454 0.996 27 0.820 0.986 32 0.408 0.977 37 0.210 0.963 42 0.115 0.935
As can be seen, for a base QP of 22, enabling the QPA algorithm increases the bit rate by almost 1 Mbps if we compare the bit depth level of 8 versus 10, when it would be expected to have a very similar value.
I attach the RD curves corresponding to the tables, where you can visually appreciate the mentioned rate increase.
I suspect that the problem, if any, may be in the calculation of picture or block energy (activity), or in the algorithm that modifies the Lagrangian during the rate control.
Attachments (1)
Change history (6)
Changed 3 years ago by JaviCru
comment:1 Changed 3 years ago by crhelmrich
Hi,
if I remember correctly (it's been a while), this is a feature, not a bug. When run at an internal bit-depth of 8, the WPSNR based perceptual QPA assumes the output bit-depth will also be 8. Since using bit-depth 8 instead of 10 leads, on some content, to more banding artifacts at the same rate-quality level (as specified by the base QP), I tried to reflect this in the WPSNR measure. This, in turn, then leads to slightly more bits being spent when the respective QPA is being enabled, in order to counteract the introduction of more visible banding (and possibly other) artifacts. Note that (MS-)SSIM does not consider such aspects, so they may not be reflected in the VQA measurements when comparing 8 and 10 bit, as you do.
I think you can balance out this difference to 10-bit simply by increasing the base QP by 1 or 2 when using --internal-bitdepth=8, without any visual side effects.
Best,
Christian
--
Christian Helmrich
Fraunhofer HHI
comment:2 Changed 3 years ago by JaviCru
Hello Christian.
After reading your reply, I have started to study the implementation of the WPSNR metric, focusing for now only on its use to compare two frames, in the same way that other metrics such as PSNR, SSIM, VMAF, etc. are used.
I have exported the calculation of the metric from the reference software VTM 16.0 (functions "xCalculateAddPSNR", "xFindDistortionPlane" and "calcWeightedSquaredError" from EncGOP.cpp), and I have performed the following experiment: comparison of metrics in three scenarios.
Scenario 1: original (BQTerrace frame 100) and reconstructed frames at 8-bit depth.
Scenario 2: original and reconstructed frames at 10-bit depths. Both frames of this scenario have been obtained by converting the frames of scenario 1 (x*4 or x<<2).
Scenario 3: original and reconstructed frames at 12-bit depths (x*16 or x<<4).
By doing this, I want to compare how the bit depth value affects the different internal variables until the final value of the metric is reached.
These are the results obtained. The metrics other than WPSNR have been obtained from the VMAF software.
BD | QP | PSNR_Y | PSNR_Cb | PSNR_Cr | MS-SSIM | VMAF | WPSNR_Y | WPSNR_Cb | WPSNR_Cr |
---|---|---|---|---|---|---|---|---|---|
8 | 22 | 44.63996 | 44.840436 | 46.071737 | 0.998258 | 96.387476 | 42.4463 | 38.3298 | 38.7811 |
10 | 22 | 44.665469 | 44.865945 | 46.097246 | 0.998258 | 96.387476 | 45.4566 | 41.3401 | 41.7914 |
12 | 22 | 44.671835 | 44.872311 | 46.103612 | 0.998258 | 96.387476 | 48.4669 | 44.3504 | 44.8017 |
8 | 32 | 35.962792 | 41.132409 | 43.402503 | 0.98733 | 91.37576 | 33.9514 | 34.7698 | 36.2343 |
10 | 32 | 35.988302 | 41.157918 | 43.428012 | 0.98733 | 91.37576 | 36.9617 | 37.7801 | 39.2446 |
12 | 32 | 35.994667 | 41.164284 | 43.434377 | 0.98733 | 91.37576 | 39.9720 | 40.7904 | 42.2549 |
8 | 42 | 31.370339 | 38.109276 | 40.358537 | 0.959213 | 73.714746 | 29.608 | 31.8346 | 33.3096 |
10 | 42 | 31.395849 | 38.134785 | 40.384046 | 0.959213 | 73.714746 | 32.6183 | 34.8449 | 36.3199 |
12 | 42 | 31.402214 | 38.141151 | 40.390412 | 0.959213 | 73.714746 | 35.6286 | 37.8552 | 39.3302 |
As can be seen, there is a lot of difference between the result given by the WPSNR when comparing the bit depth value. Specifically, this difference is -3.0103 dB between each bit-depth step. Or, in other words, a difference of
After seeing these results, I debug the case where QP=32 (WPSNR_Y_8bit = 33.9514, WPSNR_Y_10bit = 36.9617 and WPSNR_Y_12bit = 39.9720). Here I show the values of the most important intermediate variables for the calculation of the WPSNR metric returned by "calcWeightedSquaredError" and "xFindDistortionPlane" functions. I also show the ratio or relationship between the different bit-depth scenarios.
8-bit | 10-bit | 12-bit | (10-bit/8-bit) | (12-bit/8-bit) | |
---|---|---|---|---|---|
wmse | 561135.6746 | 2244542.6986 | 8978170.7943 | 4 | 16 |
sumAct | 8413.0185 | 33652.0739 | 134608.2957 | 4 | 16 |
uiTotalDiff | 51468771 | 411750171 | 3294001372 | 8 | 64 |
The calculation of the WPSNR value is performed as follows (using Matlab language):
uiSSDtemp = uiTotalDiff; maxval = uint32(bitshift(255, bitDepth - 8)); % 255 for 8-bit, 1020 for 10-bit, % 4080 for 12-bit pic_size = uint32(width * height); fRefValue = double(maxval) * double(maxval) * double(pic_size); if uiSSDtemp == 0 WPSNR = 999.99; else WPSNR = 10.0 * log10(fRefValue / double(uiSSDtemp)); end
Therefore, I think that a correction factor should be included in the final formula to eliminate the 3dB increase depending on the bit-depth chosen, and thus obtain the same "or very similar" WPSNR quality value, as is the case with the other metrics. Now the question is: what is the "correct" WPSNR value? ¿33.9514, 36.9617, 39.9720, other...?
PS: if this is a bug, it is possible that it also occurs when modifying the Lagrangian, and therefore my initial question has to do with this.
comment:3 follow-up: ↓ 4 Changed 3 years ago by crhelmrich
Thanks for the detailed analysis! My apologies, you are right. It seems that, during the last years, I forgot to commit a fix which I mentioned in Sec. 5 of our ITU paper about XPSNR [1], an improved variant of the WPSNR method, to the VTM codebase. The following two assignments are wrong in VTM for 8-bit input:
In EncGOP.cpp, around line 4561:
// integer weighted distortion sumAct = 16.0 * sqrt ((3840.0 * 2160.0) / double((W << chromaShiftHor) * (H << chromaShiftVer))) * double(1 << BD);
In EncSlice.cpp, around line 189:
{ const double hpEnerPic = 16.0 * sqrt ((3840.0 * 2160.0) / double(picOrig.width * picOrig.height)) * double(1 << uBitDepth);
The correct usage of the bit-depth (BD, uBitDepth) parameter is demonstrated in our XPSNR assessment plug-in for FFmpeg, see line 343 of file vf_xpsnr.c at https://github.com/fraunhoferhhi/xpsnr
Could you please try
- "double(1 << (2 * BD - 10))" instead of "double(1 << BD)" in EncGOP.cpp and, likewise,
- "double(1 << (2 * uBitDepth - 10))" instead of "double(1 << uBitDepth)" in EncSlice.cpp
and let me know if this fixes the issue? I will then create a corresponding VTM merge request. The 10-bit results are considered correct and should not change after this fix.
Thanks and best,
Christian
--
Christian Helmrich
Fraunhofer HHI
[1] C. Helmrich, S. Bosse, H. Schwarz, D. Marpe, and T. Wiegand, «A Study of the Extended Perceptually Weighted Peak Signal-to-Noise Ratio (XPSNR) for Video Compression with Different Resolutions and Bit Depths», ITU Journal: ICT Discoveries – Special Issue: The Future of Video and Immersive Media, vol. 3, no. 1, online, May 2020. Open access: http://handle.itu.int/11.1002/pub/8153d78b-en
comment:4 in reply to: ↑ 3 Changed 3 years ago by JaviCru
let me know if this fixes the issue? I will then undraft the merge request. The 10-bit QPA and WPSNR behavior is considered correct and should not change after this fix.
Hi Christian,
This arrangement solves both the issues raised in the ticket and in the WPSNR metric. Now the 8-bit curve with QPA enabled is similar to its 10-bit counterpart.
Thank you very much for your interest in this issue.
Javier.
comment:5 Changed 3 years ago by XiangLi
- Milestone set to VTM-16.1
- Resolution set to fixed
- Status changed from new to closed
Fixed as suggested.
RD curves