SJ 20770-2000 8 kbit/s speech coding using conjugate structure algebraic code excited linear prediction

Basic Information

Standard ID: SJ 20770-2000

Standard Name: 8 kbit/s speech coding using conjugate structure algebraic code excited linear prediction

Chinese Name: 使用共轭结构代数码激励线性预测的8kbit/s 语音编码

Standard category:Electronic Industry Standard (SJ)

state:in force

Date of Release2000-10-20

Date of Implementation:2000-10-20

standard classification number

Standard Classification Number:>>>>L5895

associated standards

Procurement status:ITU-T G.729-1996 IDTITU-T G.729 Annex A-1996 IDT

Publication information

publishing house:Industrial Electronics Press

Publication date:2000-10-20

other information

drafter:Hu Yi, Dai Shuang, Liu Hai, Cheng Chan, Du Mingyu, Wu Yaruo

Drafting unit:The 30th Research Institute of the Ministry of Information Industry

Focal point unit:China Electronics Standardization Institute

Proposing unit:Ministry of Information Industry of the People's Republic of China

Publishing department:Ministry of Information Industry of the People's Republic of China

Skip to download

Introduction to standards:

This standard specifies the 8 kbit/s speech coding algorithm using conjugate structure algebraic code excited linear prediction (CS-ACELP). SJ 20770-2000 8 kbit/s speech coding using conjugate structure algebraic code excited linear prediction SJ20770-2000 standard download decompression password: www.bzxz.net

Some standard content:

Military Standard FL5895 of the Electronic Industry of the People's Republic of China
SJ 20770—2000
Coding of speech at 8 khit/s using conjugate-structure algebraic-code-excited linear-predietive(CS-ACELP)
Published on October 20, 2000
Implementation on October 20, 2000
Approved by the Ministry of Information Industry of the People's Republic of China Foreword
This standard adopts ITU-TG.729 (March 1996) "8 kbit/s speech coding using conjugate-structure algebraic-code-excited linear-predietive (CS-ACELP)" and ITU-TG.729 Annex A (November 1996) "Reduced complexity 8 kbit/s CS-ACELP speech codec". ITU-T G.729 is the complete version of CS-ACELP8kbit/s speech coding. Annex A of ITU-T G.729 is a simplified version of the G.729 speech codec. The simplified version of the speech codec is interoperable with the full version of the speech codec at the bit stream level. The simplified version and the full version of the speech codec can communicate, and vice versa. Since Annex A of G.729 mainly describes the parts that have changed from the full implementation of G.729 to reduce the algorithm complexity of the codec, for those parts that have not changed, it is still necessary to refer to the corresponding clauses of the full version of G.729. Therefore, the content of Annex A of G.729 is listed in Appendix A (supplement) of this standard as a component of this standard. This standard contains the following contents:
Chapter 1: Overview:
Chapter 2: Encoder Overview:
Chapter 3: Encoder Functional Description;
Chapter 4: Decoder Functional Description:
Chapter 5: Bit-accurate description of the CS-ACELP encoder Appendix A: Reduced complexity 8kbit/s CS-ACELP speech codec supplement. rikAoNrKAca
Encoder Overview
3 Encoder Functional Description
4 Decoder Functional Description
5 Bit-accurate Description of CS-ACELP Encoder
Appendix A Reduced Complexity 8kbit/s CS-ACELP Speech Codec (Supplement) (1)
People's Republic of China Electronic Industry Military Standard Detailed Specification for 8kbit/s using conjugate-structureAlgebraic-code-excited linear-predictive(CS-ACEEP)
1 Overview
SJ20770--2000
This standard specifies the 8kbit/s (CS-ACELP) speech coding algorithm using conjugate-structure Algebraic-code-excited linear-predictive.
This encoder is designed to operate with digital signals. The input analog signal is first filtered by voice band (ITU Rec. G.712), then sampled at 8000 Hz, and then converted into 16-bit linear PCM code (digital signal) and input into the encoder. The output of the decoder is an analog signal obtained by the inverse process mentioned above. Other input/output signals, such as the 64 kbit/s PCM data signal specified in ITU Rec. G.711, must be converted into 16-bit linear PCM code before encoding, and converted from 16-bit linear PCM code to the corresponding format after decoding. This standard defines the bit stream from the encoder to the decoder. The structure of this standard is arranged as follows: Chapter 2 is a general description of the CS-ACELP algorithm: Chapters 3 and 4 describe the principles of the CS-ACELP encoder and decoder respectively; Chapter 5 gives the implementation software of this encoder obtained by 16-bit fixed-point algorithm.
2 Overview of the encoder
The CS-ACELP encoder is based on the code excited linear prediction (CELP) coding model. The encoder processes a speech frame of 10ms, each containing 80 samples, with a sampling rate of 8000 samples/second. It analyzes each 10ms speech signal and extracts CELP model parameters (LP filter coefficients, adaptive and fixed codebook numbers and gains). These parameters are then encoded and transmitted. The bit allocation of the encoder parameters is shown in Table 1. Ministry of Information Industry of the People's Republic of China Issued on October 20, 2000iiiKAoNrkAca
Implemented on October 20, 2000
Line Spectral Pair (LSP)
Adaptive Codebook Delay
Pitch Delay Parity
Fixed Codebook Sequence Number
Fixed Codebook Symbol
Codebook Gain (Level 1)
Codebook Gain (Level 2)
SJ 20770—2000
Table 18kbit/sCS-ACELP algorithm (10ms sequential) bit allocation code
LO, 1, 12, 13
PI, P2
GA1 GA2
GB1+ GB2
1st subframe
2nd subframe
bits/frame
In the decoder, these parameters are used to restore the excitation and synthesis filter parameters. Speech reconstruction is to first filter the excitation signal through the LP synthesis filter, then filter it through the long-time synthesis filter (or pitch synthesis filter) and the short-time synthesis filter, and finally obtain the output speech through post-filtering and signal enhancement processing. The decoder synthesis model block diagram is shown in Figure 1.
Excitation codebook
2.1 Encoder
Long-time synthesis
Filter
Short-time synthesis
Filter
Parameter decoding
Received bit stream
Figure 1 CELP synthesis model principle block diagram
Post-filter
Output speech
The principle of the encoder is shown in Figure 2. The input signal is high-pass filtered and level adjusted in preprocessing, and the preprocessed signal is used as the input for the subsequent analysis. First, an LP analysis is performed on each 10 IS recovery to calculate the LP filter coefficients, and then the LP coefficients are converted into line spectrum pairs (LSPs), and the coefficients are quantized using predictable two-level vector quantization (VO) to obtain 18 bits. The selection of the excitation signal adopts the analysis-synthesis search method, which minimizes the error between the original speech and the reconstructed speech according to the perceptual weighted distortion measure. That is, the error signal is filtered by the perceptual weighted filter to obtain the excitation signal. The perceptual weighted filter coefficients are derived from the technical quantization LP filter parameters. The values of the perceptual weightings are made to adapt to the flat frequency response to improve the performance of the input signal. 2
Fixed codebook
Adaptive codebook
LPC information
SJ 20770--2000
Original speech
Pitch analysis
Fixed codebook search
Parameter coding
Preprocessing
LP analysis
Quantization internalization
LPC information
Synthesis filter
LPC information
Perceptual weighting
Transmit bitstream
Figure 2 Coding block diagram of the CS-ACELP encoder
The excitation parameters (fixed and adaptive codebook parameters) are determined once in each 5 ms (40 samples) segment. The first subsequence uses interpolated quantized and unquantized LP filter coefficients, while the second subsequence uses quantized and unquantized LF filter coefficients. The open-loop pitch delay is based on the perceptually weighted speech signal and is calculated once every 10ms. The following operation is then repeated for each subsequence. The LP residual signal is filtered by a weighted synthesis filter (-)/A(-) to obtain the target signal x(n). The initial state of these filters is updated by filtering the error signal between the LP residual signal and the excitation signal. Equivalently, the zero input response of the weighted synthesis filter is subtracted from the weighted speech signal. To calculate the impulse response of the weighted synthesis filter: then a closed-loop pitch analysis is performed (to extract the adaptive codebook delay and gain), which uses the target signal x) and the impulse response n) to search around the open-loop pitch delay value. The resolution of the fractional pitch delay is 1/3 of the sample interval. The first subsequence pitch delay is encoded with 8 bits, and the second subsequence pitch delay is differentially encoded with 15 bits. The target signal is modified by subtracting the adaptive codebook vector (adaptive codebook after filtering), and the new target signal x() is used for fixed algebraic codebook search (finding the best excitation signal). The fixed codebook excitation uses a 17-bit algebraic codebook. The adaptive and fixed codebook gains (fixed codebook gains are predicted using moving average (mA)) are quantized using 7 bits. Finally, the filter memory is refreshed using the determined excitation signal. 3
iiikAoNirkAca
2.2 Decoder
Fixed codebook
Adaptive codebook
SJ20770--2000
Short-time filter
Figure 3 CS-ACELP decoder cabinet diagram
Post-processing
The decoder block diagram is shown in Figure 3. First, the parameter numbers are extracted from the effective bit stream, and these parameter numbers are decoded to obtain the parameters of the corresponding 10 ms speech frame. These parameters include LSP coefficients, 2 fractional pitch delays, 2 fixed codebook vectors, and 2 sets of adaptive and fixed codebook gains. The LSP coefficients are interpolated and converted into LP filter coefficients for each subframe. Then, the following operations are performed for each 40-sample subframe: 8. The adaptive and fixed codebook vectors after their respective gain adjustments are summed to construct the excitation signal: b. The excitation signal is filtered through the LP synthesis filter to reconstruct the speech signal; C. The reconstructed speech signal is post-processed, and the post-processing includes a white adaptive post-filter based on the long-time short-time synthesis filter, and finally high-pass filtering and level adjustment. 2.3 Delay
This encoder operates on 10ms frames for speech and other audio signals. In addition, a 5ms signal is pre-fetched for operation, resulting in a total algorithm delay of 15ms. There are additional delays in the actual implementation of the encoder, namely: 8. The processing time required for encoding and decoding operations: b. The transmission time of communication:
C. The multiplexing delay when audio data and other data are combined. 2.4 Speech Coder Description
The speech coding algorithm specified in this standard is based on bit-accurate fixed-point mathematical operations. The ANSI C code in Chapter 5 is the main part of this standard and gives the bit-accurate fixed-point algorithm program. The encoder (Chapter 3) and decoder (Chapter 4) algorithms can be expressed in a variety of other forms, which may lead to different codecs for each standard. Therefore, if there is a conflict between the mathematical description in Chapters 3 and 4 and the C code program in Chapter 5, the C code program in Chapter 5 shall prevail.
2.5 Symbol Conventions
This standard uses the following symbol conventions
a. Codebooks are represented by uppercase letters (e.g. C): b. Time signals are represented by the symbol and the sample time number in brackets (e.g. s(n)), and the symbol II is the sample time number:
Superscripts in brackets (e.g. gl) are used to indicate variables that vary with time. Variable m refers to the sub-time sequence number, while variable n is the sampling time sequence number: d. The recursive pointer is identified by a superscript with square brackets (e.g. E): e. The subscript pointer is used to identify a specific element of the coefficient array: -4
SJ 207702000
f. The symbol ^ indicates the quantized form of the parameter (e.g. .): g. The parameter range is represented by the limit value in square brackets (e.g. [0.6.0.0]): h. log is expressed as the logarithm with base 10: i.int indicates rounding
j. The decimal floating point number used is the rounded form of the 16-bit fixed-point ANSI C expression value. Table 2 lists the commonly used symbols in the whole text. Table 3 lists the commonly used related signals. Table 4 summarizes the commonly used related variables. Table 5 lists the related constants. Table 6 summarizes the abbreviations in the standard. Table 2 Related Symbols
Haz(a)
Art Reference
Formula (2)
Formula (1)
Formula (78)
Formula (84)
Formula (86)
Formula (91)
Formula (46)
Formula (27)
LP Synthesis Filter
Input High-Pass Filter
Long-Time Post-Filter
1Short-Time Post-Filter
Skew Compensation Filter
Output High-Pass Filter
Fixed Code Filter
Weighted Filter
Table 3 Related Signals
Reference
Fixed Code Filter
Target signal and () related signal
Error signal
Weighted and integrated filter impulse response
Residual signal
Processed speech signal
Reconstructed speech signal
Windowed speech signal
Post-filter output
Gain-adjusted filter output
Weighted speech signal
Target signal
Second target signal
Excitation of LP synthesis filter
Adaptive codebook loss
Convolution (n)*(n)
Convolution (cln)*n）
KAoNrKAca-
0.94/0.98
SJ 20770—2000
Table 4 Variables
Adaptive codebook gain
Fixed codebook gain
Long-time filter gain
Short-time filter gain
Tilt post-filter gain
Normalized gain
Open-loop pitch delay
LP coefficient (αo=1.0)
Reflection coefficient
Tilt post-filter reflection coefficient
LAR coefficient
LSF normalized frequency
LSF quantized MA predictor
LSP coefficient
Autocorrelation coefficient
Modified autocorrelation coefficient
LSP weighting coefficient
LSP quantizer output
Table 5 Constants
0.60:[0.4 ~-0.7]
See Table 7
See 3.2.4
Sampling frequency
Bandwidth extension
Perceptual weighted filter weighting factor
Perceptual weighted filter weighting factor
Post filter weighting factor
Pitch post filter weighting factorbZxz.net
Tilted post filter weighting factor
Standard (algebraic) codebook
Moving mean detector codebook
First level LSP codebook
Second level LSP codebook (low-pass)
SJ 20770—-2000
Continued Table 5
Second-level LSP codebook (high part)
Gain codebook (first level)
Gain codebook (second level)
See formula (6)
See formula (3)
CS-ACELP
3 Encoder function description
Correlation lag window
LPC analysis window
Table 6 Abbreviations Table
Code Excited Linear Prediction
Consonant Structure Algebra CELP
Moving Mean
Most Significant Bit
Mean Square Error
Log Area Ratio
Line Prediction
Line Spectral Pair
Line Spectral Frequency
Sharp Quantization
This chapter describes the various functions in the encoder shown in Figure 2: Figure 4 shows a detailed signal flow. 3.1 Preprocessing
The input signal of the speech encoder is a 16-bit PCM signal. Two preprocessing steps are required before the encoding process: 1) signal level adjustment; 2) high-pass filtering.
Level adjustment is achieved by dividing the input by 2 to reduce the possibility of overflow in fixed-point calculations. High-pass filtering is used to filter out unwanted low-frequency components. A second-order zero/pole filter with a carrier frequency of 140 Hz is used. The level adjustment and high pass filtering are combined, that is, the coefficient on the filter numerator is divided by 2. The final filter form is: Hhl(a)-0.46363718-0.92724705-+0.46363718-211.9059465z-1+0.9114024--2
H(=) The filtered signal is set to sn) for all subsequent encoder operations. 3.2 Linear prediction analysis and quantization
The short-time analysis and synthesis filter is based on a 10-order linear prediction (LP) filter: The LP synthesis filter is defined as:
KAorKAca-
SJ 207702000
where a, (=1., 10 is the (quantized) linear prediction (LP) coefficient. The short-time prediction or linear prediction analysis is to perform a sub-autocorrelation calculation on each spectral tone after 30ms asymmetric windowing. Every 80 samples (10ms), the autocorrelation coefficient of the windowed speech is calculated and converted into LP coefficients using the Levinson algorithm. The LP coefficients are then converted into the LSP domain for quantization and interpolation. The interpolated quantized and unquantized filter coefficients LSP are converted into LP filter coefficients (a comprehensive weighted filter is established for each sub-tone). 8
Preprocessing
Gaussian filtering and
windowing, automatic
Levin-Dubinshun method
I analysis
Adaptive
weighted
A(a)-→ tsp
LSP
interpolation and
LSP →A(2)
interpolation and
Lsp- a(z)
L2, 13
Note: 1) Leand-Dufa-
Open loop fundamental tone factor
Period cumulative speech
Find open loop
LST potential stimulus
Closed loop fundamental signal
《Adaptive code）
Target signal
(a) This)
Find open loop energy tone
Delay training
Calculate the pulse will be
Every frame
Algebraic code book search||t t||(fixed code book)
calculation code domain
month standard signal
pre-selection of possible
pulse amplitude spots on all 40 bits
→lucky delay
pre-filtering P(2)
GAI,GI
VQ complex
GA2,GB2
-Levinsun-Durbin method
Figure 4CS-ACELP Encoder process
iiiKAoNiKAca
Search codeword
Estimation
Calculation
Select||Receive
Gain prediction
Storage medium update
is2;C2
Code report
Calculation
And update
Filter state
SJ 20770--200013
Note: 1) Leand-Dufa-
Open loop fundamental tone factor
Period cumulative speech
Find open loop
LST potential stimulus
Closed loop fundamental signal
《Adaptive code）
Target signal
(a) This)
Find open loop energy tone
Delay training
Calculate the pulse will be
Every frame
Algebraic code book search||t t||(fixed code book)
calculation code domain
month standard signal
pre-selection of possible
pulse amplitude spots on all 40 bits
→lucky delay
pre-filtering P(2)
GAI,GI
VQ complex
GA2,GB2
-Levinsun-Durbin method
Figure 4CS-ACELP Encoder process
iiiKAoNiKAca
Search codeword
Estimation
Calculation
Select||Receive
Gain prediction
Storage medium update
is2;C2
Code report
Calculation
And update
Filter state
SJ 20770--200013
Note: 1) Leand-Dufa-
Open loop fundamental tone factor
Period cumulative speech
Find open loop
LST potential stimulus
Closed loop fundamental signal
《Adaptive code）
Target signal
(a) This)
Find open loop energy tone
Delay training
Calculate the pulse will be
Every frame
Algebraic code book search||t t||(fixed code book)
calculation code domain
month standard signal
pre-selection of possible
pulse amplitude spots on all 40 bits
→lucky delay
pre-filtering P(2)
GAI,GI
VQ complex
GA2,GB2
-Levinsun-Durbin method
Figure 4CS-ACELP Encoder process
iiiKAoNiKAca
Search codeword
Estimation
Calculation
Select||Receive
Gain prediction
Storage medium update
is2;C2
Code report
Calculation
And update
Filter state
SJ 20770--2000
Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.