Context navigation

← Previous ticket
Next ticket →

#1542 closed defect (fixed)

QPA algorithm does not work properly when internal bitdepth is 8

Reported by:	JaviCru	Owned by:
Priority:	minor	Milestone:	VTM-16.1
Component:	VTM	Version:	VTM-15.0
Keywords:	QPA, internal-bitdepth	Cc:	ksuehring, XiangLi, fbossen, jvet@…

Description

When you enable the QPA algorithm (Perceptually optimised QP adaptation, -qpa 1 --SliceChromaQPOffsetPeriodicity=1) is enabled and the internal bit-depth parameter is the default set in the main .cfg files (--internal-bitdepth=10), the result is as expected, a considerable rate saving and an improvement in perceptual quality.

However, if we force the internal bitdepth to be 8 (--internal-bitdepth=8), the result is a considerable increase in bitrate compared to not enabling the QPA algorithm.

As an example, I have encoded frames 1, 101, 201, 301, 401 and 501 of the BQTerrace sequence in All Intra mode, with base QPs 22, 27, 32, 37 and 42. Here are the average results:

InternalBitDepth=8, QPA off

base QP Mbps MS-SSIM
22 2.659 0.996
27 0.989 0.986
32 0.503 0.979
37 0.268 0.966
42 0.138 0.942

InternalBitDepth=10, QPA off

base QP Mbps MS-SSIM
22 2.724 0.996
27 1.021 0.987
32 0.517 0.979
37 0.275 0.967
42 0.142 0.944

InternalBitDepth=8, QPA on

base QP Mbps MS-SSIM
22 3.388 0.998
27 1.561 0.992
32 0.601 0.983
37 0.309 0.972
42 0.167 0.952

InternalBitDepth=10, QPA on

base QP Mbps MS-SSIM
22 2.454 0.996
27 0.820 0.986
32 0.408 0.977
37 0.210 0.963
42 0.115 0.935

As can be seen, for a base QP of 22, enabling the QPA algorithm increases the bit rate by almost 1 Mbps if we compare the bit depth level of 8 versus 10, when it would be expected to have a very similar value.

I attach the RD curves corresponding to the tables, where you can visually appreciate the mentioned rate increase.

I suspect that the problem, if any, may be in the calculation of picture or block energy (activity), or in the algorithm that modifies the Lagrangian during the rate control.

Attachments (1)

RD_QPA_bitdepth.png (216.6 KB) - added by JaviCru 4 years ago.: RD curves

Download all attachments as: .zip

Change history (6)

Changed 4 years ago by JaviCru

Attachment RD_QPA_bitdepth.png added

RD curves

comment:1 Changed 4 years ago by crhelmrich

Hi,

if I remember correctly (it's been a while), this is a feature, not a bug. When run at an internal bit-depth of 8, the WPSNR based perceptual QPA assumes the output bit-depth will also be 8. Since using bit-depth 8 instead of 10 leads, on some content, to more banding artifacts at the same rate-quality level (as specified by the base QP), I tried to reflect this in the WPSNR measure. This, in turn, then leads to slightly more bits being spent when the respective QPA is being enabled, in order to counteract the introduction of more visible banding (and possibly other) artifacts. Note that (MS-)SSIM does not consider such aspects, so they may not be reflected in the VQA measurements when comparing 8 and 10 bit, as you do.

I think you can balance out this difference to 10-bit simply by increasing the base QP by 1 or 2 when using --internal-bitdepth=8, without any visual side effects.

Best,

Christian
--
Christian Helmrich
Fraunhofer HHI

comment:2 Changed 4 years ago by JaviCru

Hello Christian.

After reading your reply, I have started to study the implementation of the WPSNR metric, focusing for now only on its use to compare two frames, in the same way that other metrics such as PSNR, SSIM, VMAF, etc. are used.

I have exported the calculation of the metric from the reference software VTM 16.0 (functions "xCalculateAddPSNR", "xFindDistortionPlane" and "calcWeightedSquaredError" from EncGOP.cpp), and I have performed the following experiment: comparison of metrics in three scenarios.

Scenario 1: original (BQTerrace frame 100) and reconstructed frames at 8-bit depth.

Scenario 2: original and reconstructed frames at 10-bit depths. Both frames of this scenario have been obtained by converting the frames of scenario 1 (x*4 or x<<2).

Scenario 3: original and reconstructed frames at 12-bit depths (x*16 or x<<4).

By doing this, I want to compare how the bit depth value affects the different internal variables until the final value of the metric is reached.

These are the results obtained. The metrics other than WPSNR have been obtained from the VMAF software.

BD	QP	PSNR_Y	PSNR_Cb	PSNR_Cr	MS-SSIM	VMAF	WPSNR_Y	WPSNR_Cb	WPSNR_Cr
8	22	44.63996	44.840436	46.071737	0.998258	96.387476	42.4463	38.3298	38.7811
10	22	44.665469	44.865945	46.097246	0.998258	96.387476	45.4566	41.3401	41.7914
12	22	44.671835	44.872311	46.103612	0.998258	96.387476	48.4669	44.3504	44.8017
8	32	35.962792	41.132409	43.402503	0.98733	91.37576	33.9514	34.7698	36.2343
10	32	35.988302	41.157918	43.428012	0.98733	91.37576	36.9617	37.7801	39.2446
12	32	35.994667	41.164284	43.434377	0.98733	91.37576	39.9720	40.7904	42.2549
8	42	31.370339	38.109276	40.358537	0.959213	73.714746	29.608	31.8346	33.3096
10	42	31.395849	38.134785	40.384046	0.959213	73.714746	32.6183	34.8449	36.3199
12	42	31.402214	38.141151	40.390412	0.959213	73.714746	35.6286	37.8552	39.3302

As can be seen, there is a lot of difference between the result given by the WPSNR when comparing the bit depth value. Specifically, this difference is -3.0103 dB between each bit-depth step. Or, in other words, a difference of

After seeing these results, I debug the case where QP=32 (WPSNR_Y_8bit = 33.9514, WPSNR_Y_10bit = 36.9617 and WPSNR_Y_12bit = 39.9720). Here I show the values of the most important intermediate variables for the calculation of the WPSNR metric returned by "calcWeightedSquaredError" and "xFindDistortionPlane" functions. I also show the ratio or relationship between the different bit-depth scenarios.

	8-bit	10-bit	12-bit	(10-bit/8-bit)	(12-bit/8-bit)
wmse	561135.6746	2244542.6986	8978170.7943	4	16
sumAct	8413.0185	33652.0739	134608.2957	4	16
uiTotalDiff	51468771	411750171	3294001372	8	64

The calculation of the WPSNR value is performed as follows (using Matlab language):

  uiSSDtemp = uiTotalDiff;

  maxval = uint32(bitshift(255, bitDepth - 8)); % 255 for 8-bit, 1020 for 10-bit,
                                                % 4080 for 12-bit
  pic_size = uint32(width * height);
  fRefValue = double(maxval) * double(maxval) * double(pic_size);
 
  if uiSSDtemp == 0
      WPSNR = 999.99;
  else
      WPSNR = 10.0 * log10(fRefValue / double(uiSSDtemp));
  end

Therefore, I think that a correction factor should be included in the final formula to eliminate the 3dB increase depending on the bit-depth chosen, and thus obtain the same "or very similar" WPSNR quality value, as is the case with the other metrics. Now the question is: what is the "correct" WPSNR value? ¿33.9514, 36.9617, 39.9720, other...?

PS: if this is a bug, it is possible that it also occurs when modifying the Lagrangian, and therefore my initial question has to do with this.

Last edited 4 years ago by JaviCru (previous) (diff)

comment:3 follow-up: ↓ 4 Changed 4 years ago by crhelmrich

Thanks for the detailed analysis! My apologies, you are right. It seems that, during the last years, I forgot to commit a fix which I mentioned in Sec. 5 of our ITU paper about XPSNR [1], an improved variant of the WPSNR method, to the VTM codebase. The following two assignments are wrong in VTM for 8-bit input:

In EncGOP.cpp, around line 4561:

  // integer weighted distortion
  sumAct = 16.0 * sqrt ((3840.0 * 2160.0) / double((W << chromaShiftHor) * (H << chromaShiftVer))) * double(1 << BD);

In EncSlice.cpp, around line 189:

{
  const double hpEnerPic = 16.0 * sqrt ((3840.0 * 2160.0) / double(picOrig.width * picOrig.height)) * double(1 << uBitDepth);

The correct usage of the bit-depth (BD, uBitDepth) parameter is demonstrated in our XPSNR assessment plug-in for FFmpeg, see line 343 of file vf_xpsnr.c at https://github.com/fraunhoferhhi/xpsnr

Could you please try

"double(1 << (2 * BD - 10))" instead of "double(1 << BD)" in EncGOP.cpp and, likewise,
"double(1 << (2 * uBitDepth - 10))" instead of "double(1 << uBitDepth)" in EncSlice.cpp

and let me know if this fixes the issue? I will then create a corresponding VTM merge request. The 10-bit results are considered correct and should not change after this fix.

Thanks and best,

Christian
--
Christian Helmrich
Fraunhofer HHI

[1] C. Helmrich, S. Bosse, H. Schwarz, D. Marpe, and T. Wiegand, «A Study of the Extended Perceptually Weighted Peak Signal-to-Noise Ratio (XPSNR) for Video Compression with Different Resolutions and Bit Depths», ITU Journal: ICT Discoveries – Special Issue: The Future of Video and Immersive Media, vol. 3, no. 1, online, May 2020. Open access: http://handle.itu.int/11.1002/pub/8153d78b-en

Version 0, edited 4 years ago by crhelmrich (next)

comment:4 in reply to: ↑ 3 Changed 4 years ago by JaviCru

let me know if this fixes the issue? I will then undraft the merge request. The 10-bit QPA and WPSNR behavior is considered correct and should not change after this fix.

Hi Christian,

This arrangement solves both the issues raised in the ticket and in the WPSNR metric. Now the 8-bit curve with QPA enabled is similar to its 10-bit counterpart.

Thank you very much for your interest in this issue.

Javier.

comment:5 Changed 4 years ago by XiangLi

Milestone set to VTM-16.1
Resolution set to fixed
Status changed from new to closed

Fixed as suggested.

Note: See TracTickets for help on using tickets.

Download in other formats:

base QP	Mbps	MS-SSIM
22	2.659	0.996
27	0.989	0.986
32	0.503	0.979
37	0.268	0.966
42	0.138	0.942

base QP	Mbps	MS-SSIM
22	2.724	0.996
27	1.021	0.987
32	0.517	0.979
37	0.275	0.967
42	0.142	0.944

base QP	Mbps	MS-SSIM
22	3.388	0.998
27	1.561	0.992
32	0.601	0.983
37	0.309	0.972
42	0.167	0.952

base QP	Mbps	MS-SSIM
22	2.454	0.996
27	0.820	0.986
32	0.408	0.977
37	0.210	0.963
42	0.115	0.935

JVET VVC