Opened 9 months ago

Last modified 9 months ago

#93 new defect

Encoding results mismatch due to Hadamard SAD 8bits SIMD path

Reported by: cgisquet Owned by:
Priority: major Milestone:
Component: BMS Version: BMS-2.0.1
Keywords: Hadamard SIMD mismatch overflow Cc: patrice.onno@…, ksuehring, XiangLi, fbossen, jvet@…


Over CE11, we have observed occasional mismatches with --InternalBitDepth=8. So far, only RA class A sequences ParkRunning and CampFarty have been tested, but our analysis suspects any sequence could be impacted.

It should affect all versions of BMS and VTM when encoding using --InternalBitDepth=8. This is an encoder-only, non-normative bug. Decoding of encoded sequences doesn't trigger checksum mismatches between encoder and decoder.

An example of command-line triggering the issue:
<ENC> -c cfg/encoder_randomaccess_vtm.cfg -i Campfire_3840x2160_30fps_bt709_420_videoRange.yuv --InputBitDepth=10 --ip 32 -wdt 3840 -hgt 2160 -fr 30 --Level=5.1 -f 300 --CostMode=lossy --SEIDecodedPictureHash=1 --InputChromaFormat=420 -o out.yuv -b out.bin -q 34 --InternalBitDepth=8

An /encoding/ mismatch occurs on the first CTB row of the second encoded frame (POC 16, inter). Decoding is however fine, as it only impacts the encoder mode decision.

Results (SCALAR and AVX2 paths):
POC 16 TId: 0 ( B-SLICE, QP 35 ) 694032 bits [Y 36.4960 dB U 38.3312 dB V 38.8743 dB] [ET 1366 ] [L0 0 ] [L1 0 ]
C path:
POC 16 TId: 0 ( B-SLICE, QP 35 ) 693736 bits [Y 36.4962 dB U 38.3519 dB V 38.8577 dB] [ET 1597 ] [L0 0 ] [L1 0 ]

To reduce the energy to reproduce the case, we think cropping the sequence to the top 3840x256 (eg with ffmpeg's crop filter) is enough:

POC 16 TId: 0 ( B-SLICE, QP 35 ) 70032 bits [Y 36.9338 dB U 39.2011 dB V 39.3624 dB] [ET 170 ] [L0 0 ] [L1 0 ] [MD5:41d3f4acbb4d2efc988d7a1aa232e063,236cee1e6166fdfda3f534d4c10ede64,defcbe15edc5580b7d8a33d321d7ac2a]
POC 16 TId: 0 ( B-SLICE, QP 35 ) 69888 bits [Y 36.9326 dB U 39.1721 dB V 39.3503 dB] [ET 174 ] [L0 0 ] [L1 0 ] [MD5:38ce14608907f5037d61975ad7f1845f,e36bec12d6141cc4cc374f006456a6e8,720adfb19d587ef6133481e85be34a00]

Further digging into the problem, the issue on this example comes from the 8-bit path of xCalcHAD16x8_SSE from RdCostX86.h.

This is likely an unforeseen overflow, as the >= 10 bits path uses larger accumulators. Forcing this path even for 8 bits content seems to fix the mismatch for this frame.

However, we haven't evaluated/tested if other paths/functions may cause that overflow, nor if it can be formally proven they are correct.

Change history (1)

comment:1 Changed 9 months ago by cgisquet

'C path' above should read obviously 'SSE path' (e.g. for a SSE41-capable CPU without AVX2).

Note: See TracTickets for help on using tickets.