Opened 6 years ago
Closed 6 years ago
#94 closed defect (fixed)
Small issues and typos in inter prediction processes in VVC Draft
Reported by: | vdrugeon | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | VVC D5 v3 |
Component: | spec | Version: | VVC D4 v7 |
Keywords: | Cc: | ksuehring, bbross, XiangLi, fbossen, jvet@… |
Description
In section 7.4.5.6 Motion vector difference semantics, the value of MotionModelIdc[ x ][ y ] is used. However, x and y have never been defined at this stage. Only x0 and y0 are available as input parameter to mvd_coding. Should the value of MotionModelIdc[ x0 ][ y0 ] be used instead?
In section 8.3.1 General decoding process for coding units coded in inter prediction mode, in step 2, predSamplesCr is mentioned twice in the list of outputs. The first instance should be predSamplesCb.
In section 8.3.1 General decoding process for coding units coded in inter prediction mode, in step 3, when the decoding process for the residual signal is called for all three components, the current text states "...and the variable cIdx as inputs,...". However, the variable cIdx is not defined in this section. Actually, the variable cIdx should be set when calling this process. So the three calls to 8.3.5 should be modified to set the variable cIdx to the appropriate value:
– The decoding process for the residual signal of coding blocks coded in inter prediction mode as specified in clause 8.3.5 is invoked with the location ( xTb0, yTb0 ) set equal to the luma location ( xCb, yCb ), the width nTbW set equal to the luma coding block width cbWidth, the height nTbH set equal to the luma coding block height cbHeight and the variable cIdx set equal to 0 as inputs, and the array resSamplesL as output.
– The decoding process for the residual signal of coding blocks coded in inter prediction mode as specified in clause 8.3.5 is invoked with the location ( xTb0, yTb0 ) set equal to the chroma location ( xCb / 2, yCb / 2 ), the width nTbW set equal to the chroma coding block width cbWidth / 2, the height nTbH set equal to the chroma coding block height cbHeight / 2 and the variable cIdx set equal to 1 as inputs, and the array resSamplesCb as output.
– The decoding process for the residual signal of coding blocks coded in inter prediction mode as specified in clause 8.3.5 is invoked with the location ( xTb0, yTb0 ) set equal to the chroma location ( xCb / 2, yCb / 2 ), the width nTbW set equal to the chroma coding block width cbWidth / 2, the height nTbH set equal to the chroma coding block height cbHeight / 2 and the variable cIdx set equal to 2 as inputs, and the array resSamplesCr as output.
Same thing for step 4 in the same section:
– The picture reconstruction process for a colour component as specified in clause 8.4.5 is invoked with the block location ( xB, yB ) set equal to ( xCb, yCb ), the block width bWidth set equal to cbWidth, the block height bHeight set equal to cbHeight, the variable cIdx set equal to 0, the (cbWidth)x(cbHeight) array predSamples set equal to predSamplesL and the (cbWidth)x(cbHeight) array resSamples set equal to resSamplesL as inputs, and the output is a modified reconstructed picture before in-loop filtering.
– The picture reconstruction process for a colour component as specified in clause 8.4.5 is invoked with the block location ( xB, yB ) set equal to ( xCb / 2, yCb / 2 ), the block width bWidth set equal to cbWidth / 2, the block height bHeight set equal to cbHeight / 2, the variable cIdx set equal to 1, the (cbWidth / 2)x(cbHeight / 2) array predSamples set equal to predSamplesCb and the (cbWidth / 2)x(cbHeight / 2) array resSamples set equal to resSamplesCb as inputs, and the output is a modified reconstructed picture before in-loop filtering.
– The picture reconstruction process for a colour component as specified in clause 8.4.5 is invoked with the block location ( xB, yB ) set equal to ( xCb / 2, yCb / 2 ), the block width bWidth set equal to cbWidth / 2, the block height bHeight set equal to cbHeight / 2, the variable cIdx set equal to 2, the (cbWidth / 2)x(cbHeight / 2) array predSamples set equal to predSamplesCb and the (cbWidth / 2)x(cbHeight / 2) array resSamples set equal to resSamplesCb as inputs, and the output is a modified reconstructed picture before in-loop filtering.
In section 8.3.2.2 Derivation process for luma motion vectors for merge mode, when calling the derivation process for merging candidates from neighbouring coding units (first process called in this process), there is a typo where "the luma coding block width" is listed twice in the inputs. In addition, in the title of section 8.3.2.3, the process is called "derivation process for spatial merging candidates". The word spatial is missing when calling the process in section 8.3.2.2.
In section 8.3.2.4 Derivation process for subblock-based temporal merging candidates, when the inputs are listed, the meaning of X could be added: "Inputs to this process are, with X being 0 or 1:"
Change history (16)
comment:1 Changed 6 years ago by vdrugeon
comment:2 Changed 6 years ago by vdrugeon
In section 8.3.2.12, I think that the process would be a bit more clear if X was also one of the inputs, since the text mentions "where X being the value of X this process is invoked for". Currently it is unclear how this process knows the value of X it is invoked for.
In addition, when this process is called from process 8.3.2.4 (Derivation process for subblock-based temporal merging candidates), it is called twice: once for L0 and once for L1. Therefore, it seems that this process is intended to be called for one particular value of X only, contrary to many other processes in section 8.3 that are called with X being equal to both 0 and 1.
However, process 8.3.2.12 is also called from process 8.3.2.11 (Derivation process for temporal luma motion vector prediction), but only once for LX. This may not be an issue as long as process 8.3.2.11 is also called with X being one of the inputs. It would be consistent with section 8.3.2.2 (Derivation process for luma motion vectors for merge mode) that calls process 8.3.2.11 also twice: once for L0 and once for L1.
Generally, I think that it would be beneficial to clarify in which processes X is expected to take both values 0 and 1, and in which processes X is expecting to be equal to just one of these two values.
comment:3 Changed 6 years ago by vdrugeon
In section 8.3.3.5 Derivation process for neighbouring affine blocks partition information, the third neighbouring luma location is set as:
( xN2, yN2 ) = ( xCb + cbWidth − 1, yCb )
However, this is not a neighbouring location, this is inside the block. Should it be the following instead?
( xN2, yN2 ) = ( xCb + cbWidth, yCb − 1 )
comment:4 Changed 6 years ago by vdrugeon
At the end of section 8.3.3.4 Derivation process for luma affine control point motion vector predictors, it is written:
The affine control point motion vector predictor cpMvpLX with X being 0 is derived as follows:
cpMvpLX = cpMvpListLX[ mvp_lX_flag[ xCb ][ yCb ] ]
I think that it should be "with X being 0 or 1" otherwise cpMvpL1 is never derived.
In section 8.3.3.6 Derivation process for constructed affine control point motion vector prediction candidates, when deriving availableConsFlagLX, I think that one condition may be missing. Please check the following:
There are three cases that should be considered, where availableConsFlagLX should be set to 1:
- when availableFlagLX[ 0 ] = availableFlagLX[ 1 ] = availableFlagLX[ 2 ] = 1: this is covered
- when availableFlagLX[ 0 ] = availableFlagLX[ 1 ] = 1: this may be only partly covered
- when availableFlagLX[ 0 ] = availableFlagLX[ 2 ] = 1: this is covered
In the second case, the text only mentions what should happen when availableFlagLX[ 0 ] = availableFlagLX[ 1 ] = 1 AND MotionModelIdc[ xCb ][ yCb ] = 2. In which case, the cpMvLX[2] is derived and availableConsFlagLX is set to 1.
However, based on the source code in PU::fillAffineMvpCand, I believe that availableConsFlagLX should also be set to 1 when availableFlagLX[ 0 ] = availableFlagLX[ 1 ] = 1 and MotionModelIdc[ xCb ][ yCb ] = 1. The only difference to above is that cpMvLX[2] is not needed in that case and does not need to be derived.
Could you please check if my understanding is correct or not? If it is correct, then I believe that the following condition should be added in the text after the bullet checking availableFlagLX[ 0 ] = availableFlagLX[ 1 ] = 1 AND MotionModelIdc[ xCb ][ yCb ] = 2:
"– Otherwise, if availableFlagLX[ 0 ] is equal to 1, and availableFlagLX[ 1 ] is equal to 1, and MotionModelIdc[ xCb ][ yCb ] is equal to 1, availableConsFlagLX is set equal to 1"
comment:5 Changed 6 years ago by vdrugeon
In the inputs of section 8.3.4.1 General decoding process for inter blocks, the chroma motion vectors mvCL0 and mvCL1 should also be listed as an array:
"– the chroma motion vectors mvCL0[ xSbIdx ][ ySbIdx ] and mvCL1[ xSbIdx ][ ySbIdx ] with xSbIdx = 0 .. numSbX − 1, and ySbIdx = 0 .. numSbY − 1,"
In section 8.3.1 General decoding process for coding units coded in inter prediction mode, if MotionModelIdc[ xCb ][ yCb ] is equal to 0, the prediction list utilization flags predFlagL0[ xSbIdx ][ ySbIdx ] and predFlagL1[ xSbIdx ][ ySbIdx ] come out of process 8.3.2.1 as an array. However, if MotionModelIdc[ xCb ][ yCb ] is equal to 1 or 2, the prediction list utilization flags predFlagL0 and predFlagL1 come out of process 8.3.3.1 as simply two flags.
The issue is that process 8.3.4.1 that follows uses two simple flags predFlagL0 and predFlagL1 as inputs. It is currently unclear from the text which value of the arrays predFlagL0[ xSbIdx ][ ySbIdx ] and predFlagL1[ xSbIdx ][ ySbIdx ] should be used as inputs when MotionModelIdc[ xCb ][ yCb ] is equal to 0. If the whole array should be used as inputs instead of two simple flags, the text of section 8.3.4.1 needs to be adapted. In addition, it should be clarified how to build the arrays from the two simple flags when MotionModelIdc[ xCb ][ yCb ] is equal to 1 or 2.
In section 8.3.4.1 General decoding process for inter blocks, "The width and the height of the current luma coding sublock subCbWidth, subCbHeight are derived as follows:"
However, the following equations derive sbWidth and sbHeight, which are also the names of the variables used in the rest of the text. So I think that the sentence should be: "The width and the height of the current luma coding sublock sbWidth and sbHeight are derived as follows:"
In section 8.3.4.1 General decoding process for inter blocks, the fractional sample interpolation process specified in clause 8.3.4.3 is invoked with the motion vectors mvLX[ xSb ][ ySb ], mvCLX[ xSb ][ ySb ] as inputs. However, the arrays mvLX and mvCLX are indexed by subblocks and not by pixels. So the inputs should be mvLX[ xSbIdx ][ ySbIdx ] and mvCLX[ xSbIdx ][ ySbIdx ]. In addition, should it be process 8.3.4.3.1 that is invoked instead?
In section 8.3.4.1 General decoding process for inter blocks, should the weighted sample prediction process specified in clause 8.3.4.4 for chroma be invoked on a sub-block basis, same as for luma? Currently, the block width and height are used as input instead of the subblock width and height. Should it be the following instead?
"...with the coding subblock width nCbW set equal to sbWidth / 2, the coding subblock height nCbH set equal to sbHeight / 2, ..."
In section 8.3.5 Decoding process for the residual signal of coding blocks coded in inter prediction mode, when the size of the block is greater than the maximum transform block size, the process calls itself with "the intra prediction mode predModeIntra" as input. This is probably a copy-paste from the intra section and predModeIntra needs to be removed from the list of inputs.
comment:6 Changed 6 years ago by bbross
- Milestone set to VVC D3 v2
- Version changed from VVC D2 v7 to VVC D3 v1
comment:7 Changed 6 years ago by bbross
- Version changed from VVC D3 v1 to VVC D3 v6
comment:8 Changed 6 years ago by bbross
- Version changed from VVC D3 v6 to VVC D3 v7
comment:9 Changed 6 years ago by bbross
- Version changed from VVC D3 v7 to VVC D3 v8
comment:10 Changed 6 years ago by bbross
- Version changed from VVC D3 v8 to VVC D4 v2
comment:11 Changed 6 years ago by bbross
- Version changed from VVC D4 v2 to VVC D4 v3
comment:12 Changed 6 years ago by bbross
- Version changed from VVC D4 v3 to VVC D4 v4
comment:13 Changed 6 years ago by bbross
- Version changed from VVC D4 v4 to VVC D4 v6
comment:14 Changed 6 years ago by bbross
- Version changed from VVC D4 v6 to VVC D4 v7
comment:15 Changed 6 years ago by bbross
- Milestone changed from VVC D3 v2 to VVC D5 v3
comment:16 Changed 6 years ago by bbross
- Resolution set to fixed
- Status changed from new to closed
Fixed in D5 v3
In section 8.3.2.2, step 10, the first bullet point should mention "and X being replaced by 0 or 1" (as does the second bullet point).
In section 7.4.3.1 SPS semantics, for the semantics of log2_sbtmvp_default_size_minus2, it is written: "log2_sbtmvp_default_size_minus2 specifies the inferred value of the syntax element log2_sbtmvp_active_size_minus2 in the slice headers of slices with slice_type not equal to I in the CVS when slice_sbtmvp_size_override_flag is equal to 0."
However, slice_sbtmvp_size_override_flag is not defined anywhere. It probably refers to sbtmvp_size_override_flag actually defined in the slice header.
In section 8.3.2.4:
"The availability flag availableFlagSbCol is derived as follows.
– If either slice_temporal_mvp_enable_flag or sps_sbtmvp_flag is equal to 0, availableFlagSbCol is set equal to 0."
The names of the flags are wrong. The first flag misses a "d" at the end of "enabled". The second flag is called "sps_sbtmvp_enabled_flag" in the SPS syntax.
The function DiffPicOrderCnt is used throughout the document, but never defined.
Section 8.3.2.5 is rather difficult to follow (is it because the variable names are not consistent?):
At the beginning, tempMv[0] is set twice. Should the second equation set tempMv[1] instead?
One of the outputs is listed as tempMV, but only tempMv is defined in the rest of the process. Do I understand correctly that these two are actually the same thing?
A bit further down, the text states that a variable mvTemp is derived, but the actual text derives another variable mvTemporal, which is never initialised (it may never be set) and is also never used in the rest of the process. Should it be understood as actually being the same as the output variable tempMV or tempMv?
If not, the equations xColCb and yColCb do not need to use tempMv, because this variable is always equal to 0.
ctrRefIdxL0 and ctrRefIdxL1 are never derived. However, new undefined variables centerRefIdxL0 and centerRefIdxL1 are used as input to process 8.3.2.12. I assume that ctrRefIdxLX = centerRefIdxLX? In that case, the variables need to use the same names and they need to be derived somewhere before calling process 8.3.2.12.
The variable names are also wrong in the output of 8.3.2.12. I assume the outputs should be assigned to ctrMvL0, ctrMvL1, ctrPredFlagL0 and ctrPredFlagL1?
If colPredMode[xColCb][yColCb] is not equal to MODE_INTER, the variables ctrMvL0, ctrMvL1, ctrRefIdxL0 and ctrRefIdxL1 are never derived.
The process 8.3.2.12 is called "Derivation process for collocated motion vectors" and not "derivation process for temporal motion vector prediction". The derivation process for temporal motion vector prediction is in section 8.3.2.11 and has completely different inputs.
In section 8.3.2.12, I think that the process would be a bit more clear if X was also one of the inputs, since the text mentions "where X being the value of X this process is invoked for". Currently it is unclear how this process knows the value of X it is invoked for.