Opened 2 years ago

Closed 2 years ago

#1542 closed defect (fixed)

QPA algorithm does not work properly when internal bitdepth is 8

Reported by: JaviCru Owned by:
Priority: minor Milestone: VTM-16.1
Component: VTM Version: VTM-15.0
Keywords: QPA, internal-bitdepth Cc: ksuehring, XiangLi, fbossen, jvet@…

Description

When you enable the QPA algorithm (Perceptually optimised QP adaptation, -qpa 1 --SliceChromaQPOffsetPeriodicity=1) is enabled and the internal bit-depth parameter is the default set in the main .cfg files (--internal-bitdepth=10), the result is as expected, a considerable rate saving and an improvement in perceptual quality.

However, if we force the internal bitdepth to be 8 (--internal-bitdepth=8), the result is a considerable increase in bitrate compared to not enabling the QPA algorithm.

As an example, I have encoded frames 1, 101, 201, 301, 401 and 501 of the BQTerrace sequence in All Intra mode, with base QPs 22, 27, 32, 37 and 42. Here are the average results:

InternalBitDepth=8, QPA off

base QP Mbps MS-SSIM
22 2.659 0.996
27 0.989 0.986
32 0.503 0.979
37 0.268 0.966
42 0.138 0.942

InternalBitDepth=10, QPA off

base QP Mbps MS-SSIM
22 2.724 0.996
27 1.021 0.987
32 0.517 0.979
37 0.275 0.967
42 0.142 0.944

InternalBitDepth=8, QPA on

base QP Mbps MS-SSIM
22 3.388 0.998
27 1.561 0.992
32 0.601 0.983
37 0.309 0.972
42 0.167 0.952

InternalBitDepth=10, QPA on

base QP Mbps MS-SSIM
22 2.454 0.996
27 0.820 0.986
32 0.408 0.977
37 0.210 0.963
42 0.115 0.935

As can be seen, for a base QP of 22, enabling the QPA algorithm increases the bit rate by almost 1 Mbps if we compare the bit depth level of 8 versus 10, when it would be expected to have a very similar value.

I attach the RD curves corresponding to the tables, where you can visually appreciate the mentioned rate increase.

I suspect that the problem, if any, may be in the calculation of picture or block energy (activity), or in the algorithm that modifies the Lagrangian during the rate control.

Attachments (1)

RD_QPA_bitdepth.png (216.6 KB) - added by JaviCru 2 years ago.
RD curves

Download all attachments as: .zip

Change history (6)

Changed 2 years ago by JaviCru

RD curves

comment:1 Changed 2 years ago by crhelmrich

Hi,

if I remember correctly (it's been a while), this is a feature, not a bug. When run at an internal bit-depth of 8, the WPSNR based perceptual QPA assumes the output bit-depth will also be 8. Since using bit-depth 8 instead of 10 leads, on some content, to more banding artifacts at the same rate-quality level (as specified by the base QP), I tried to reflect this in the WPSNR measure. This, in turn, then leads to slightly more bits being spent when the respective QPA is being enabled, in order to counteract the introduction of more visible banding (and possibly other) artifacts. Note that (MS-)SSIM does not consider such aspects, so they may not be reflected in the VQA measurements when comparing 8 and 10 bit, as you do.

I think you can balance out this difference to 10-bit simply by increasing the base QP by 1 or 2 when using --internal-bitdepth=8, without any visual side effects.

Best,

Christian
--
Christian Helmrich
Fraunhofer HHI

comment:2 Changed 2 years ago by JaviCru

Hello Christian.

After reading your reply, I have started to study the implementation of the WPSNR metric, focusing for now only on its use to compare two frames, in the same way that other metrics such as PSNR, SSIM, VMAF, etc. are used.

I have exported the calculation of the metric from the reference software VTM 16.0 (functions "xCalculateAddPSNR", "xFindDistortionPlane" and "calcWeightedSquaredError" from EncGOP.cpp), and I have performed the following experiment: comparison of metrics in three scenarios.

Scenario 1: original (BQTerrace frame 100) and reconstructed frames at 8-bit depth.

Scenario 2: original and reconstructed frames at 10-bit depths. Both frames of this scenario have been obtained by converting the frames of scenario 1 (x*4 or x<<2).

Scenario 3: original and reconstructed frames at 12-bit depths (x*16 or x<<4).

By doing this, I want to compare how the bit depth value affects the different internal variables until the final value of the metric is reached.

These are the results obtained. The metrics other than WPSNR have been obtained from the VMAF software.

BD QP PSNR_Y PSNR_Cb PSNR_Cr MS-SSIM VMAF WPSNR_Y WPSNR_Cb WPSNR_Cr
8 22 44.63996 44.840436 46.071737 0.998258 96.387476 42.4463 38.3298 38.7811
10 22 44.665469 44.865945 46.097246 0.998258 96.387476 45.4566 41.3401 41.7914
12 22 44.671835 44.872311 46.103612 0.998258 96.387476 48.4669 44.3504 44.8017
8 32 35.962792 41.132409 43.402503 0.98733 91.37576 33.9514 34.7698 36.2343
10 32 35.988302 41.157918 43.428012 0.98733 91.37576 36.9617 37.7801 39.2446
12 32 35.994667 41.164284 43.434377 0.98733 91.37576 39.9720 40.7904 42.2549
8 42 31.370339 38.109276 40.358537 0.959213 73.714746 29.608 31.8346 33.3096
10 42 31.395849 38.134785 40.384046 0.959213 73.714746 32.6183 34.8449 36.3199
12 42 31.402214 38.141151 40.390412 0.959213 73.714746 35.6286 37.8552 39.3302

As can be seen, there is a lot of difference between the result given by the WPSNR when comparing the bit depth value. Specifically, this difference is -3.0103 dB between each bit-depth step. Or, in other words, a difference of

https://i.imgur.com/5IOVGaZ.gif

After seeing these results, I debug the case where QP=32 (WPSNR_Y_8bit = 33.9514, WPSNR_Y_10bit = 36.9617 and WPSNR_Y_12bit = 39.9720). Here I show the values of the most important intermediate variables for the calculation of the WPSNR metric returned by "calcWeightedSquaredError" and "xFindDistortionPlane" functions. I also show the ratio or relationship between the different bit-depth scenarios.

8-bit 10-bit 12-bit (10-bit/8-bit) (12-bit/8-bit)
wmse 561135.6746 2244542.6986 8978170.7943 4 16
sumAct 8413.0185 33652.0739 134608.2957 4 16
uiTotalDiff 51468771 411750171 3294001372 8 64

The calculation of the WPSNR value is performed as follows (using Matlab language):

  uiSSDtemp = uiTotalDiff;

  maxval = uint32(bitshift(255, bitDepth - 8)); % 255 for 8-bit, 1020 for 10-bit,
                                                % 4080 for 12-bit
  pic_size = uint32(width * height);
  fRefValue = double(maxval) * double(maxval) * double(pic_size);
 
  if uiSSDtemp == 0
      WPSNR = 999.99;
  else
      WPSNR = 10.0 * log10(fRefValue / double(uiSSDtemp));
  end

Therefore, I think that a correction factor should be included in the final formula to eliminate the 3dB increase depending on the bit-depth chosen, and thus obtain the same "or very similar" WPSNR quality value, as is the case with the other metrics. Now the question is: what is the "correct" WPSNR value? ¿33.9514, 36.9617, 39.9720, other...?

PS: if this is a bug, it is possible that it also occurs when modifying the Lagrangian, and therefore my initial question has to do with this.

Last edited 2 years ago by JaviCru (previous) (diff)

comment:3 follow-up: Changed 2 years ago by crhelmrich

Thanks for the detailed analysis! My apologies, you are right. It seems that, during the last years, I forgot to commit a fix which I mentioned in Sec. 5 of our ITU paper about XPSNR [1], an improved variant of the WPSNR method, to the VTM codebase. The following two assignments are wrong in VTM for 8-bit input:

In EncGOP.cpp, around line 4561:

  // integer weighted distortion
  sumAct = 16.0 * sqrt ((3840.0 * 2160.0) / double((W << chromaShiftHor) * (H << chromaShiftVer))) * double(1 << BD);

In EncSlice.cpp, around line 189:

{
  const double hpEnerPic = 16.0 * sqrt ((3840.0 * 2160.0) / double(picOrig.width * picOrig.height)) * double(1 << uBitDepth);

The correct usage of the bit-depth (BD, uBitDepth) parameter is demonstrated in our XPSNR assessment plug-in for FFmpeg, see line 343 of file vf_xpsnr.c at https://github.com/fraunhoferhhi/xpsnr

Could you please try

  • "double(1 << (2 * BD - 10))" instead of "double(1 << BD)" in EncGOP.cpp and, likewise,
  • "double(1 << (2 * uBitDepth - 10))" instead of "double(1 << uBitDepth)" in EncSlice.cpp

as in merge request https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/merge_requests/2204 and let me know if this fixes the issue? I will then undraft the merge request. The 10-bit QPA and WPSNR behavior is considered correct and should not change after this fix.

Thanks and best,

Christian
--
Christian Helmrich
Fraunhofer HHI

[1] C. Helmrich, S. Bosse, H. Schwarz, D. Marpe, and T. Wiegand, «A Study of the Extended Perceptually Weighted Peak Signal-to-Noise Ratio (XPSNR) for Video Compression with Different Resolutions and Bit Depths», ITU Journal: ICT Discoveries – Special Issue: The Future of Video and Immersive Media, vol. 3, no. 1, online, May 2020. Open access: http://handle.itu.int/11.1002/pub/8153d78b-en

Last edited 2 years ago by crhelmrich (previous) (diff)

comment:4 in reply to: ↑ 3 Changed 2 years ago by JaviCru

let me know if this fixes the issue? I will then undraft the merge request. The 10-bit QPA and WPSNR behavior is considered correct and should not change after this fix.

Hi Christian,

This arrangement solves both the issues raised in the ticket and in the WPSNR metric. Now the 8-bit curve with QPA enabled is similar to its 10-bit counterpart.

Thank you very much for your interest in this issue.

Javier.

comment:5 Changed 2 years ago by XiangLi

  • Milestone set to VTM-16.1
  • Resolution set to fixed
  • Status changed from new to closed

Fixed as suggested.

Note: See TracTickets for help on using tickets.