This standard specifies the 8 kbit/s speech coding algorithm using conjugate structure algebraic code excited linear prediction (CS-ACELP). SJ 20770-2000 8 kbit/s speech coding using conjugate structure algebraic code excited linear prediction SJ20770-2000 standard download decompression password: www.bzxz.net
Some standard content:
Military Standard FL5895 of the Electronic Industry of the People's Republic of China SJ 20770—2000 Coding of speech at 8 khit/s using conjugate-structure algebraic-code-excited linear-predietive(CS-ACELP) Published on October 20, 2000 Implementation on October 20, 2000 Approved by the Ministry of Information Industry of the People's Republic of China ForewordbZxz.net This standard adopts ITU-TG.729 (March 1996) "8 kbit/s speech coding using conjugate-structure algebraic-code-excited linear-predietive (CS-ACELP)" and ITU-TG.729 Annex A (November 1996) "Reduced complexity 8 kbit/s CS-ACELP speech codec". ITU-T G.729 is the complete version of CS-ACELP8kbit/s speech coding. Annex A of ITU-T G.729 is a simplified version of the G.729 speech codec. The simplified version of the speech codec is interoperable with the full version of the speech codec at the bit stream level. The simplified version and the full version of the speech codec can communicate, and vice versa. Since Annex A of G.729 mainly describes the parts that have changed from the full implementation of G.729 to reduce the algorithm complexity of the codec, for those parts that have not changed, it is still necessary to refer to the corresponding clauses of the full version of G.729. Therefore, the content of Annex A of G.729 is listed in Appendix A (supplement) of this standard as a component of this standard. This standard contains the following contents: Chapter 1: Overview: Chapter 2: Encoder Overview: Chapter 3: Encoder Functional Description; Chapter 4: Decoder Functional Description: Chapter 5: Bit-accurate description of the CS-ACELP encoder Appendix A: Reduced complexity 8kbit/s CS-ACELP speech codec supplement. rikAoNrKAca Encoder Overview 3 Encoder Functional Description 4 Decoder Functional Description 5 Bit-accurate Description of CS-ACELP Encoder Appendix A Reduced Complexity 8kbit/s CS-ACELP Speech Codec (Supplement) (1) People's Republic of China Electronic Industry Military Standard Detailed Specification for 8kbit/s using conjugate-structureAlgebraic-code-excited linear-predictive(CS-ACEEP) 1 Overview SJ20770--2000 This standard specifies the 8kbit/s (CS-ACELP) speech coding algorithm using conjugate-structure Algebraic-code-excited linear-predictive. This encoder is designed to operate with digital signals. The input analog signal is first filtered by voice band (ITU Rec. G.712), then sampled at 8000 Hz, and then converted into 16-bit linear PCM code (digital signal) and input into the encoder. The output of the decoder is an analog signal obtained by the inverse process mentioned above. Other input/output signals, such as the 64 kbit/s PCM data signal specified in ITU Rec. G.711, must be converted into 16-bit linear PCM code before encoding, and converted from 16-bit linear PCM code to the corresponding format after decoding. This standard defines the bit stream from the encoder to the decoder. The structure of this standard is arranged as follows: Chapter 2 is a general description of the CS-ACELP algorithm: Chapters 3 and 4 describe the principles of the CS-ACELP encoder and decoder respectively; Chapter 5 gives the implementation software of this encoder obtained by 16-bit fixed-point algorithm. 2 Overview of the encoder The CS-ACELP encoder is based on the code excited linear prediction (CELP) coding model. The encoder processes a speech frame of 10ms, each containing 80 samples, with a sampling rate of 8000 samples/second. It analyzes each 10ms speech signal and extracts CELP model parameters (LP filter coefficients, adaptive and fixed codebook numbers and gains). These parameters are then encoded and transmitted. The bit allocation of the encoder parameters is shown in Table 1. Ministry of Information Industry of the People's Republic of China Issued on October 20, 2000iiiKAoNrkAca Implemented on October 20, 2000 Line Spectral Pair (LSP) Adaptive Codebook Delay Pitch Delay Parity Fixed Codebook Sequence Number Fixed Codebook Symbol Codebook Gain (Level 1) Codebook Gain (Level 2) SJ 20770—2000 Table 18kbit/sCS-ACELP algorithm (10ms sequential) bit allocation code LO, 1, 12, 13 PI, P2 GA1 GA2 GB1+ GB2 1st subframe 2nd subframe bits/frame In the decoder, these parameters are used to restore the excitation and synthesis filter parameters. Speech reconstruction is to first filter the excitation signal through the LP synthesis filter, then filter it through the long-time synthesis filter (or pitch synthesis filter) and the short-time synthesis filter, and finally obtain the output speech through post-filtering and signal enhancement processing. The decoder synthesis model block diagram is shown in Figure 1. Excitation codebook 2.1 Encoder Long-time synthesis Filter Short-time synthesis Filter Parameter decoding Received bit stream Figure 1 CELP synthesis model principle block diagram Post-filter Output speech The principle of the encoder is shown in Figure 2. The input signal is high-pass filtered and level adjusted in preprocessing, and the preprocessed signal is used as the input for the subsequent analysis. First, an LP analysis is performed on each 10 IS recovery to calculate the LP filter coefficients, and then the LP coefficients are converted into line spectrum pairs (LSPs), and the coefficients are quantized using predictable two-level vector quantization (VO) to obtain 18 bits. The selection of the excitation signal adopts the analysis-synthesis search method, which minimizes the error between the original speech and the reconstructed speech according to the perceptual weighted distortion measure. That is, the error signal is filtered by the perceptual weighted filter to obtain the excitation signal. The perceptual weighted filter coefficients are derived from the technical quantization LP filter parameters. The values of the perceptual weightings are made to adapt to the flat frequency response to improve the performance of the input signal. 2 Fixed codebook Adaptive codebook LPC information SJ 20770--2000 Original speech Pitch analysis Fixed codebook search Parameter coding Preprocessing LP analysis Quantization internalization LPC information Synthesis filter LPC information Perceptual weighting Transmit bitstream Figure 2 Coding block diagram of the CS-ACELP encoder The excitation parameters (fixed and adaptive codebook parameters) are determined once in each 5 ms (40 samples) segment. The first subsequence uses interpolated quantized and unquantized LP filter coefficients, while the second subsequence uses quantized and unquantized LF filter coefficients. The open-loop pitch delay is based on the perceptually weighted speech signal and is calculated once every 10ms. The following operation is then repeated for each subsequence. The LP residual signal is filtered by a weighted synthesis filter (-)/A(-) to obtain the target signal x(n). The initial state of these filters is updated by filtering the error signal between the LP residual signal and the excitation signal. Equivalently, the zero input response of the weighted synthesis filter is subtracted from the weighted speech signal. To calculate the impulse response of the weighted synthesis filter: then a closed-loop pitch analysis is performed (to extract the adaptive codebook delay and gain), which uses the target signal x) and the impulse response n) to search around the open-loop pitch delay value. The resolution of the fractional pitch delay is 1/3 of the sample interval. The first subsequence pitch delay is encoded with 8 bits, and the second subsequence pitch delay is differentially encoded with 15 bits. The target signal is modified by subtracting the adaptive codebook vector (adaptive codebook after filtering), and the new target signal x() is used for fixed algebraic codebook search (finding the best excitation signal). The fixed codebook excitation uses a 17-bit algebraic codebook. The adaptive and fixed codebook gains (fixed codebook gains are predicted using moving average (mA)) are quantized using 7 bits. Finally, the filter memory is refreshed using the determined excitation signal. 3 iiikAoNirkAca 2.2 Decoder Fixed codebook Adaptive codebook SJ20770--2000 Short-time filter Figure 3 CS-ACELP decoder cabinet diagram Post-processing The decoder block diagram is shown in Figure 3. First, the parameter numbers are extracted from the effective bit stream, and these parameter numbers are decoded to obtain the parameters of the corresponding 10 ms speech frame. These parameters include LSP coefficients, 2 fractional pitch delays, 2 fixed codebook vectors, and 2 sets of adaptive and fixed codebook gains. The LSP coefficients are interpolated and converted into LP filter coefficients for each subframe. Then, the following operations are performed for each 40-sample subframe: 8. The adaptive and fixed codebook vectors after their respective gain adjustments are summed to construct the excitation signal: b. The excitation signal is filtered through the LP synthesis filter to reconstruct the speech signal; C. The reconstructed speech signal is post-processed, and the post-processing includes a white adaptive post-filter based on the long-time short-time synthesis filter, and finally high-pass filtering and level adjustment. 2.3 Delay This encoder operates on 10ms frames for speech and other audio signals. In addition, a 5ms signal is pre-fetched for operation, resulting in a total algorithm delay of 15ms. There are additional delays in the actual implementation of the encoder, namely: 8. The processing time required for encoding and decoding operations: b. The transmission time of communication: C. The multiplexing delay when audio data and other data are combined. 2.4 Speech Coder Description The speech coding algorithm specified in this standard is based on bit-accurate fixed-point mathematical operations. The ANSI C code in Chapter 5 is the main part of this standard and gives the bit-accurate fixed-point algorithm program. The encoder (Chapter 3) and decoder (Chapter 4) algorithms can be expressed in a variety of other forms, which may lead to different codecs for each standard. Therefore, if there is a conflict between the mathematical description in Chapters 3 and 4 and the C code program in Chapter 5, the C code program in Chapter 5 shall prevail. 2.5 Symbol Conventions This standard uses the following symbol conventions a. Codebooks are represented by uppercase letters (e.g. C): b. Time signals are represented by the symbol and the sample time number in brackets (e.g. s(n)), and the symbol II is the sample time number: Superscripts in brackets (e.g. gl) are used to indicate variables that vary with time. Variable m refers to the sub-time sequence number, while variable n is the sampling time sequence number: d. The recursive pointer is identified by a superscript with square brackets (e.g. E): e. The subscript pointer is used to identify a specific element of the coefficient array: -4 SJ 207702000 f. The symbol ^ indicates the quantized form of the parameter (e.g. .): g. The parameter range is represented by the limit value in square brackets (e.g. [0.6.0.0]): h. log is expressed as the logarithm with base 10: i.int indicates rounding j. The decimal floating point number used is the rounded form of the 16-bit fixed-point ANSI C expression value. Table 2 lists the commonly used symbols in the whole text. Table 3 lists the commonly used related signals. Table 4 summarizes the commonly used related variables. Table 5 lists the related constants. Table 6 summarizes the abbreviations in the standard. Table 2 Related Symbols Haz(a) Art Reference Formula (2) Formula (1) Formula (78) Formula (84) Formula (86) Formula (91) Formula (46) Formula (27) LP Synthesis Filter Input High-Pass Filter Long-Time Post-Filter 1Short-Time Post-Filter Skew Compensation Filter Output High-Pass Filter Fixed Code Filter Weighted Filter Table 3 Related Signals Reference Fixed Code Filter Target signal and () related signal Error signal Weighted and integrated filter impulse response Residual signal Processed speech signal Reconstructed speech signal Windowed speech signal Post-filter output Gain-adjusted filter output Weighted speech signal Target signal Second target signal Excitation of LP synthesis filter Adaptive codebook loss Convolution (n)*(n) Convolution (cln)*n) KAoNrKAca- 0.94/0.98 SJ 20770—2000 Table 4 Variables Adaptive codebook gain Fixed codebook gain Long-time filter gain Short-time filter gain Tilt post-filter gain Normalized gain Open-loop pitch delay LP coefficient (αo=1.0) Reflection coefficient Tilt post-filter reflection coefficient LAR coefficient LSF normalized frequency LSF quantized MA predictor LSP coefficient Autocorrelation coefficient Modified autocorrelation coefficient LSP weighting coefficient LSP quantizer output Table 5 Constants 0.60:[0.4 ~-0.7] See Table 7 See 3.2.4 Sampling frequency Bandwidth extension Perceptual weighted filter weighting factor Perceptual weighted filter weighting factor Post filter weighting factor Pitch post filter weighting factor Tilted post filter weighting factor Standard (algebraic) codebook Moving mean detector codebook First level LSP codebook Second level LSP codebook (low-pass) SJ 20770—-2000 Continued Table 5 Second-level LSP codebook (high part) Gain codebook (first level) Gain codebook (second level) See formula (6) See formula (3) CS-ACELP 3 Encoder function description Correlation lag window LPC analysis window Table 6 Abbreviations Table Code Excited Linear Prediction Consonant Structure Algebra CELP Moving Mean Most Significant Bit Mean Square Error Log Area Ratio Line Prediction Line Spectral Pair Line Spectral Frequency Sharp Quantization This chapter describes the various functions in the encoder shown in Figure 2: Figure 4 shows a detailed signal flow. 3.1 Preprocessing The input signal of the speech encoder is a 16-bit PCM signal. Two preprocessing steps are required before the encoding process: 1) signal level adjustment; 2) high-pass filtering. Level adjustment is achieved by dividing the input by 2 to reduce the possibility of overflow in fixed-point calculations. High-pass filtering is used to filter out unwanted low-frequency components. A second-order zero/pole filter with a carrier frequency of 140 Hz is used. The level adjustment and high pass filtering are combined, that is, the coefficient on the filter numerator is divided by 2. The final filter form is: Hhl(a)-0.46363718-0.92724705-+0.46363718-211.9059465z-1+0.9114024--2 H(=) The filtered signal is set to sn) for all subsequent encoder operations. 3.2 Linear prediction analysis and quantization The short-time analysis and synthesis filter is based on a 10-order linear prediction (LP) filter: The LP synthesis filter is defined as: KAorKAca- SJ 207702000 where a, (=1., 10 is the (quantized) linear prediction (LP) coefficient. The short-time prediction or linear prediction analysis is to perform a sub-autocorrelation calculation on each spectral tone after 30ms asymmetric windowing. Every 80 samples (10ms), the autocorrelation coefficient of the windowed speech is calculated and converted into LP coefficients using the Levinson algorithm. The LP coefficients are then converted into the LSP domain for quantization and interpolation. The interpolated quantized and unquantized filter coefficients LSP are converted into LP filter coefficients (a comprehensive weighted filter is established for each sub-tone). 8 Preprocessing Gaussian filtering and windowing, automatic Levin-Dubinshun method I analysis Adaptive weighted A(a)-→ tsp LSP interpolation and LSP →A(2) interpolation and Lsp- a(z) L2, 13 Note: 1) Leand-Dufa- Open loop fundamental tone factor Period cumulative speech Find open loop LST potential stimulus Closed loop fundamental signal 《Adaptive code) Target signal (a) This) Find open loop energy tone Delay training Calculate the pulse will be Every frame Algebraic code book search||t t||(fixed code book) calculation code domain month standard signal pre-selection of possible pulse amplitude spots on all 40 bits →lucky delay pre-filtering P(2) GAI,GI VQ complex GA2,GB2 -Levinsun-Durbin method Figure 4CS-ACELP Encoder process iiiKAoNiKAca Search codeword Estimation Calculation Select||Receive Gain prediction Storage medium update is2;C2 Code report Calculation And update Filter state SJ 20770--200013 Note: 1) Leand-Dufa- Open loop fundamental tone factor Period cumulative speech Find open loop LST potential stimulus Closed loop fundamental signal 《Adaptive code) Target signal (a) This) Find open loop energy tone Delay training Calculate the pulse will be Every frame Algebraic code book search||t t||(fixed code book) calculation code domain month standard signal pre-selection of possible pulse amplitude spots on all 40 bits →lucky delay pre-filtering P(2) GAI,GI VQ complex GA2,GB2 -Levinsun-Durbin method Figure 4CS-ACELP Encoder process iiiKAoNiKAca Search codeword Estimation Calculation Select||Receive Gain prediction Storage medium update is2;C2 Code report Calculation And update Filter state SJ 20770--200013 Note: 1) Leand-Dufa- Open loop fundamental tone factor Period cumulative speech Find open loop LST potential stimulus Closed loop fundamental signal 《Adaptive code) Target signal (a) This) Find open loop energy tone Delay training Calculate the pulse will be Every frame Algebraic code book search||t t||(fixed code book) calculation code domain month standard signal pre-selection of possible pulse amplitude spots on all 40 bits →lucky delay pre-filtering P(2) GAI,GI VQ complex GA2,GB2 -Levinsun-Durbin method Figure 4CS-ACELP Encoder process iiiKAoNiKAca Search codeword Estimation Calculation Select||Receive Gain prediction Storage medium update is2;C2 Code report Calculation And update Filter state SJ 20770--2000 Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.