Opened 2 years ago
Closed 2 years ago
#1547 closed defect (fixed)
sb_coded_flag non-present inference logic error
Reported by: | swarrington | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | |
Component: | spec | Version: | VVC D10 vH |
Keywords: | Cc: | ksuehring, bbross, XiangLi, fbossen, jvet@… |
Description
p.178 definition of inferred value for sb_coded_flag[xS][yS] is as follows:
When sb_coded_flag[ xS ][ yS ] is not present, it is inferred to be equal to 1.
Notably, this is different than HEVC's definition for the inferred value of coded_sub_block_flag[xS][yS]:
When coded_sub_block_flag[ xS ][ yS ] is not present, it is inferred as follows:
– If one or more of the following conditions are true, coded_sub_block_flag[ xS ][ yS ] is inferred to be equal to 1:
– ( xS, yS ) is equal to ( 0, 0 ).
– ( xS, yS ) is equal to ( LastSignificantCoeffX >> 2, LastSignificantCoeffY >> 2 ).
– Otherwise, coded_sub_block_flag[ xS ][ yS ] is inferred to be equal to 0.
Since sb_coded_flag[xS][yS] is only signaled for subblock indices of (i<lastSubBlock && i > 0) (as per p. 81 table), this implies that sb_coded_flag for subblock indices >= lastSubBlock should be inferred to be 1. This appears to cause unintended behaviour.
An example of potential unintended behaviour for this can be seen in the following case: Regular (non-transform-skip) flow, TU size is 8x8, and lastSubBlock==2. Thus:
subblock 3 @ (1,1): no coeffs
subblock 2 @ (1,0): lastSubBlock, has coeffs
subblock 1 @ (0,1): has coeffs
subblock 0 @ (0,0): (don't care about coeffs)
sb_coded_flag will not be signaled on i==3 or i==2 as it is only signaled for i<lastSubBlock. However, sb_coded_flag will be signaled on i==1. On i==1, the process for deriving ctxInc for sb_coded_flag[0][1] is executed and will use eq.1548 on p. 433:
csbfCtx += sb_coded_flag[ xS + 1 ][ yS ]
Thus, csbfCtx for [0][1] will query value of sb_coded_flag[1][1]. But as [1][1] position corresponds to i==3 (>= lastSubBlockPos), sb_coded_flag[1][1] has not been signaled: it will follow the non-signaled inference rule and have a value of 1. So csbfCtx on [1][0] will be initialized as though there is a coded subblock in [1][1] when there actually is not. This does not seem to follow the intent of querying the flag in eq.1548.
Notably, VTM appears to use the HEVC flow for the configuration of sb_coded_flag in CABACReader::residual_coding_subblock:
// swarrington note: cctx.m_sigCoeffGroupFlag is zero before this point. bool sigGroup = ( isLast || !minSubPos ); if( !sigGroup ) { sigGroup = m_BinDecoder.decodeBin( cctx.sigGroupCtxId() ); } if( sigGroup ) { cctx.setSigGroup(); }
Because m_sigCoeffGroupFlag is initially zeroed for all positions, then is always set to 1 if either sb_coded_flag is present or this is either i==lastSubBlock or i==0. This has the effect of following the H.265 flow for coded_sub_block_flag rather than the VVC spec for sb_coded_flag. Following the VVC spec it seems this code should rather initialize all positions to 1 and then conditionally set to 0 if there is a flag present with value 0.
Because of the reasoning above I think this is a spec issue rather than a VTM issue.
Change history (7)
comment:1 Changed 2 years ago by fbossen
comment:2 Changed 2 years ago by jonathang
If I understand this correctly, the spec is not "wrong" per se, as decoding of sb_coded_flag just has different behaviour compared to how coded_sub_block_flag was decoded in HEVC. But if the analysis on VTM is correct, then VVC spec and VTM have been diverged for almost 2 years on the decoding behaviour for a very low-level syntax element. And, all our conformance bitstreams are tested on VTM.
comment:3 Changed 2 years ago by wadewan
I took a look at this with my team and we agree with the original reporter and Frank that this appears to be a mismatch between the VTM and spec.
It seems sufficiently clear that the intent in Section 9.3.4.2.6 was to derive the context for the current sb_coded_flag based on whether neighboring sub-blocks were coded or not. And any sub-block beyond lastSubBlock is clearly not coded.
The change in question seems to be a clean up of TS residual coding (JVET-N0280) that had the side effect of changing/breaking regular residual coding. While different than what was done in HEVC, we don't believe there is a significant benefit like coding efficiency or improved conformance gained by choosing the text solution compared to the VTM solution.
In addition, we believe most (perhaps all?) implementations, streams, etc. that existed prior to this bug report have based their behavior on the VTM. We believe it would be problematic and an unreasonable burden to change all those instances and products in development in order to align with the behavior described in the text. So we prefer the text be edited to match the VTM as the solution to this mismatch.
comment:4 Changed 2 years ago by iole_moccagatta
I checked the documents mentioned by Frank.
The inference rule in JVET-N0280 is aligned with VTM.
The inference rule in N1001 (Draft 5) is also aligned with VTM till v6, and is modified in v7. The modification refers to JVET-N0280 TSRC. The inference rule in N1001 v7 match the current spec.
Considering that JVET-N0280 reference rule is aligned with VTM, then its integration in N1001 should not have modified the inference rule. So, maybe something went wrong when editing N1001 v7 ?
Also, I agree with Wade on majority of implementations and streams basing their behavior on VTM, including conformance streams.
Considering that, and what seems lack of intention for this change, I too prefer the text to be edited to match the VTM.
comment:5 Changed 2 years ago by crosewarne
Some comments on suggested text fixes:
I think the necessary change is just to align the inference rule for sb_coded_flag with VTM behaviour.
Regarding additional text describing inference of transform coefficient levels, note the following line in the VVC spec:
7.4.12.11 Residual coding semantics
When sb_coded_flag[ xS ][ yS ] is equal to 0, all transform coefficient levels of the subblock at location ( xS, yS ) are inferred to be equal to 0.
But I think the above is redundant because of the following:
7.4.12.10 Transform unit semantics
The transform coefficient levels are represented by the arrays TransCoeffLevel[ x0 ][ y0 ][ cIdx ][ xC ][ yC ]. ...
When the value of TransCoeffLevel[ x0 ][ y0 ][ cIdx ][ xC ][ yC ] is not specified in clause 7.3.11.11, it is inferred to be equal to 0.
Proposed texts to align inference rules for sb_coded_flag with VTM behaviour also describe additional cases of when transform coefficient levels of a sub-block are inferred to be 0, and oppositely, when 'at least one' is nonzero.
I think the text in 7.4.12.10, along with residual_coding syntax of 7.3.11.11, already specifies transform coefficient levels.
Accordingly, I think just adjusting the text of the inference rule for sb_coded_flag is necessary.
Then, the line in 7.4.12.11 describing inference of transform coefficient levels of a sub-block as zero is redundant and could be removed. Moreover it is not necessary to further elaborate this (especially new text targeting the opposite case of 'at least one..is non-zero' but not specifying actual values).
comment:6 Changed 2 years ago by jonathang
I think crosewarne's comment above on what is necessary is correct.
However, the same redundancies are present in the HEVC specification in section 7.4.9.10 Transform unit semantics, and the semantics of coded_sub_block_flag in section 7.4.9.11
Therefore, I assume there is editorial intent.
The text I proposed in JVET-Z0250 is following the format of the original HEVC text. Whether that is still desired may be left to editor discretion.
comment:7 Changed 2 years ago by yk
- Resolution set to fixed
- Status changed from new to closed
Thanks! Fixed in JVET-Z2005-v1.
Thanks for the detailed report. This is an interesting one. Indeed, looks like the spec may not be quite right.
The draft spec used to be aligned with the VTM and HEVC. I tracked down the change of the inference rule to the integration of JVET-N0280 (residual coding for TS) into JVET-N1001 (happened at v7). Interestingly, the proposed spec text in JVET-N0280 did not include a change to the inference rule. Not sure what happened during text integration. Ben and Tung probably know better.