SJ 20771-2000 MOS evaluation method for sound quality of military communication systems

Basic Information

Standard ID: SJ 20771-2000

Standard Name: MOS evaluation method for sound quality of military communication systems

Chinese Name: 军用通信系统音质的MOS评价法

Standard category:Electronic Industry Standard (SJ)

state:in force

Date of Release2000-10-20

Date of Implementation:2000-10-20

standard classification number

Standard Classification Number:>>>>L5895

associated standards

Publication information

publishing house:Industrial Electronics Press

Publication date:2000-10-20

other information

drafter:Zhang Zhiyi, Wu Yaruo, Du Mingyu

Drafting unit:The 30th Research Institute of the Ministry of Information Industry

Focal point unit:China Electronics Standardization Institute

Publishing department:Ministry of Information Industry of the People's Republic of China

Skip to download

Introduction to standards:

This specification specifies the test voice materials, procedures, technical conditions and requirements for the Mean Opinion Score (MOS) evaluation method for the sound quality of military communication systems. This specification is applicable to the subjective evaluation of the sound quality of voice encoders, voice terminals (or equipment) or voice systems developed and produced in military communication systems and networks, and quantitatively gives the MOS evaluation results for the tested system. This standard is also applicable to the evaluation of voice encoding equipment and voice systems in communication systems and networks. SJ 20771-2000 MOS Evaluation Method for Sound Quality of Military Communication Systems SJ20771-2000 Standard download decompression password: www.bzxz.net

Some standard content:

Military Standard of the People's Republic of China for Electronic Industry FL5895
SJ20771—2000
MOS method of speech quality assessmentfor military communication systemsPublished on October 20, 2000
Implementation on October 20, 2000
Approved by the Ministry of Information Industry of the People's Republic of China1 Scope
2. Referenced Documents
3 Definitions
4 General Requirements
5 Detailed Requirements
Appendix A Basic Test Voice Materials (Supplement)Appendix B Filter Parameters and Transfer Characteristics (Supplement)Appendix CMOS Method Test Voting Opinion Record Table (Supplement)Appendix DMOS Evaluation Opinion Reference (Supplement)Times
TKANrKAca-
Military Standard of the People's Republic of China for Electronic IndustryMOS method of speech quality assessmentfor military communication systems
MOS meehod of speech quality assessmentfor military communication systems1Scope
sJ2077t—2000
1.1Subject Content
This standard specifies the test voice materials, procedures, technical conditions and requirements for the Mean Opinion Score (MOS) evaluation method for the sound quality of military communication systems. 1.2Scope of Application
This standard is applicable to the subjective evaluation of the sound quality of the voice encoders, voice terminals (or equipment) or voice systems developed and produced in military communication systems and networks, and the MOS evaluation results of the tested systems are given. This standard is also applicable to the evaluation of voice coding equipment and voice systems in communication systems and networks. 2 References
GB1781-92 Test tape
GB2019-87 Basic parameters and technical requirements for tape recorders GB7347--87 Long-term average power spectrum of Chinese speech GB/T13427-92 Technical requirements for incremental modulation terminal equipment GE/T13504--92 Speech intelligibility diagnosis rhyme test (DRT) method ITU-T G.72616, 24, 32, 40 kbit/s adaptive differential pulse code modulation (ADPCM) (September 11, 1992)
ITU-T G.729 8 kbit/s speech coding using conjugate structure algebraic code excited linear prediction (March 1996)
3 Definitions
3.t Weighted mean opinion score weightedMOS is the MOS score averaged after various voting opinions are weighted according to the specified values, see formula (1) in Section 5.4.5. 3.2 Weighted standard deviation weighted standard deviation The standard deviation obtained by weighting the variance of voting opinions according to voting probability, see formula (2) in 5.4.5. This deviation is only applicable to the statistical application of the sound quality MOS evaluation method. 3.3 Test speech unit test speech unit A speech unit composed of three test sentences and used for one listening and voting. Ministry of Information Industry of the People's Republic of China Issued on October 20, 2000 Implemented on October 20, 2000
SJ20771-2000
3.4 Test speech source table testspeechsourcelist A table composed of test speech units selected by each speaker uttering a test speech unit (which can be the same or different units).
3.5 Test speech source set testspeechsourceset A test speech source that is pronounced according to the speech source table, recorded on a tape in an analog recording manner, or generated as a data file on a CD or disk in a digital acquisition manner, and can be selected for testing. 4 General requirements
4.1 Speech material collection
The MOS evaluation method gives an evaluation of the overall sound quality of the tested speech system. The speech material should be composed of a set of sentences that have been speech balanced, the amplitude is basically consistent with the distribution, the long-term average power spectrum should generally comply with the provisions of GB7347, and the ear size is large enough.
4.2 Test requirements
4.2.1 In order to make the evaluation results reliable, the test team must meet the requirements of Article 5.4.1. 4.2.2 The stimulus of the reference test language conditions and the conditions of the tested speech system must meet the requirements of Article 5.4.3.1. 4.2.3 The speech source used for each condition must contain at least one speech unit for each speaker. 4.2.4 The difference in average energy (or sound level (A)) between the speech signal units of all test conditions must be controlled within ±2B.
5 Detailed requirements
5.1 Composition of speech material set
5.1.1 Test sentence table
Considering the needs of large-scale evaluation tests, a total of 10 sentence tables are included, each table contains 18 sentences, and every three sentences are a test speech unit.
5.1.2 Sentence length
Generally, it consists of about 10 syllables, see Appendix A (Supplement). 5.2 Establishment of speech source
5.2. 1 Speakers
At least three men and three women who speak standard Mandarin should pronounce the speech. If conditions permit, it is best to have announcers from radio and television stations pronounce the speech.
5.2.2 Pronunciation environmentbzxz.net
Recording should be done in a quiet office with background noise below 25~30 super (A) and reverberation time below 0.5s (500Hz). If conditions permit, it is best to complete the recording of the voice source in an acoustically treated recording studio. 5.2.3 Pronunciation requirements
Pronounce at a speed of 3 to 4 syllables (words) per second, with normal intonation and tone, and strive to balance the volume. Pause for about 1s between each sentence and about 3 to 4s between each test speech unit. 5.2. 4 Recording method
It is divided into two methods: analog recording and digital recording. 5.2.4.1 Analog recording
A first-class recorder that complies with GB2019 should be used as much as possible. If other high-fidelity recorders are used, their nonlinearity- 2 -
TTTKAONrKAca-
SJ 20771—2000
The distortion should be less than 0.3% and the jitter should be less than 0.03%. Low-noise tapes that meet the requirements of GB1781 or are close to it should be used.
5.2.4.2 Digital recording
5.2.4.2.1 Recording conditions
Recording should be performed with a sampling frequency of not less than 8000Hz and a linear PCM encoding of not less than 16 bits. 5.2.4.2.2 Recording file format
Generate data in the specified format according to the test speech unit File. For ease of operation, each file name should indicate the gender and number of the speaker.
5.2.5 Pronunciation sound level
Should meet the requirements of Article 5.6 of GB/T13504. 5.2.6 Test speech set
According to the sentence table in Article 5.1.1, each male and female speaker will pronounce the test speech unit in the table in turn. For the convenience of test application, each person only pronounces the sound of one unit in this table, which will obtain a standard speech source table, and according to the media conditions, the test is carried out according to the above requirements. All test speech source tables are pronounced in this way, which constitutes the test standard speech source set.
5.3 Establishment of test speech materials
The test speech materials should include the test materials of the tested speech system and the reference test speech materials. 5.3.1 Establishment of tested speech materials
5.3.1.1 Scale of speech materials used
For any MOS evaluation of the sound quality of the speech system, at least one standard test speech source table should be used. If necessary, it is best to use Use two tables for pronunciation. 5.3.1.2 Generation of test signals for the system under test The generation of the test signal is shown in Figure 1.
Test source signal
System
Test signal
Figure 1 Test signal generation
5.3.t.2.1 Test signal acquisition
Analog recording acquisition is not recommended. Generally, 8000Hz frequency sampling and digital recording in a PCM encoding format not less than linear 16 bits are used, and a data file marking the conditions of the system under test and the conditions of the voice source signal is generated. 5.3.1.2.2 Pay attention to the interface requirements for playback and acquisition. Pay special attention to the input, output impedance and signal electrical interface conditions of the voice system under test and the signal acquisition device. When the impedance is not matched and the level is not suitable, try to match the impedance and adjust the level to the dynamic range of the best operation of the system and the acquisition card.
5.3.2 Reference Test Voice Conditions
In the MOS evaluation test of the voice system under test, additive noise distortion, multiplicative noise distortion conditions and standard speech encoder conditions should be introduced, and these conditions should be evaluated and tested together with the test conditions. 5.3.2.1 Additive Noise Distortion Conditions
This represents the commonly existing noise distortion conditions, which should be continuous Gaussian distributed random white noise, and should include: signal-to-noise ratio (SNR) of -10, -5, 0, 5, 10, 15, 20, 30dB eight levels. 5.3.2.2 Multiplicative Noise Distortion Conditions
Multiplicative noise simulates the form of waveform encoder quantization noise. It appears with the appearance of the signal and is a random noise of fluctuating nature. It should also include signal-to-noise ratio (SNR) of -10, 5, 0, 5, 10, 5, 20, 30dB eight equal suffixes.
5.3.2.3 Standard speech coder conditions
MOS test should also include some distorted speech conditions of standard speech coders. This standard can select the following speech coders:
a. Waveform coder ITU-T G.726 32, 24, 16 kbit/s ADPCM and GB/T 13427 16kbit/s CVSD:
b. Parameter coder can select ITU-TG.729 8kbit/s or 2.4, 4.8kbit/s IMBE coder; C. Reference source signal 8kHz sampling, linear 16-bit PCM coded speech. 5.3.3 Generation of reference test speech conditions
Each reference distortion condition generally includes a speech recording of a test speech source table. 5.3.3.1 Generation of additive noise distortion conditions can be generated by digital or analog methods. Figure 2 is a block diagram of generating additive noise distortion conditions. s(t)
Level adjustment
Random noise source or
Generator
Level adjustment
Filter
200-3400Hz
S(C)-simulated speech signalS(n)-digital speech signalN(t)Analog noise signalN(n)-digital noise signalSN()-analog distortion signalSN(n)digital distortion signalFigure 2 Generation of additive noise distortion conditions
When generated in analog mode, the signal S(t) level is generally kept fixed, and only the noise level is adjusted. When generated in digital mode, the required distortion conditions are achieved by a file-by-file method. The transmission characteristics of digital filtering can be achieved by a fifth-order elliptic low-pass and a third-order Chebyshev high-pass filter. Its main parameters are shown in B1 in Appendix B (Supplement): The characteristics of the filter are shown in Figure B1 in Appendix B (Supplement). Regardless of the method used to generate the distorted signal condition, it should be ensured that the maximum level of the output signal is not distorted (limited). The distortion (SNR) condition generated by the digital method should also maintain the basic constant of the voice level. 5.3.3.2 Generation of multiplicative noise distortion condition The generation of multiplicative noise is achieved according to Figure 3.
iiKAoNrKAca
Electrical adjustment
Random noise or
Sequence generator
$J 20771—2000
SN (n)
Level adjustment
Figure 3 Generation of multiplicative noise
The characteristics of the bandpass filter are the same as those in 5.3.3.1.
5.3.3.3 Generation of semantic conditions for standard speech encoders Bandpass
Filter
200-3400H2
SN' (t)
Voice conditions for various encoders are equally valid when generated by hardware or software. The generation method is in accordance with 5.3.1. Each condition should include at least one test speech from the test speech source table. The speech in the speech source table is directly selected from the test speech source set.
5.4 Evaluation procedures
5.4.1 The test team
Generally, it consists of 24 to 32 men and women, half of whom are close to the age of the users of the system under test. When this scale is not reached, the method of increasing the number of voting rounds will be adopted to make up for the lack of people. 5.4.2 Personnel Training
Before the formal test, a short period of training is conducted to master the test requirements and understand the precautions, and non-test voices of quality levels are auditioned. Through audition, the general concept of each quality level is basically understood. This is not a training to familiarize yourself with the test voice, but a training on "what kind of voting opinion the program should give to the good or bad voice you hear". If you have ever been exposed to MOS testing, you do not need to do training. During the training, the test participants can first have a general understanding of the boundaries of each level in the "MOS Rating Level Reference" (see Appendix D). These boundaries are for reference only. When there are many conditions for a test, such as more than 200 test units, you can also use some voices of the test conditions to randomly broadcast for training.
5.4.3 Preparation before the test
5.4.3.1 Randomization of test speech materials The stimulation of MOS test speech units to the test personnel should be random. All the test speech units of the tested system and all the reference test speech units must be edited and processed for randomization of the test order before the test. The randomization program can be used to complete the randomization of the test playback order. The distance between the sequence numbers should be greater than 10 (that is, the randomized order Pk(i)-k+1(Gi)>10, i and j are the original adjacent sequence numbers). The order of speech units before randomization should be generated according to the test conditions.
5.4.3.2 Opinion form
In the case of no voting machine, a voting opinion form must be prepared in advance. In addition to the voting opinion level, the table must also have the voting order and randomization order consistent with the playback order to avoid errors in the voting process and provide convenience for result statistics query and error correction. The form of the record table is shown in Appendix C (Supplementary) 5.4.4 Test conditions
5.4.4.1 Listening workstation
SJ 20771-2000
MOS evaluation experiment block diagram, as shown in Figure 3. It includes a multimedia computer, a power amplifier, a workstation consisting of two headphones and listening environment conditions. The requirements for the listening workstation are as follows: a. Multimedia computer
In addition to the basic configuration of the PC, it should include
CD-ROM: 8XCD or above
Operating system: Win95 or Win97 or Win98 application self-edited playback program or CoolEdilPro sound card: 16 bits or above or equivalent performance products. b. Power amplifier: power not less than 30W, for multiple pairs (N=24~~32) of headphones to listen to. Listening conditions: The requirements for headphones and listening environment shall be implemented in accordance with Article 5.1.2 of GB/T13504. G
Multimedia computer
Tested speech
Sound material library
5.4.4.2 Listening sound level
Playback room
Figure 4 MOS evaluation experiment block diagram
Performed according to 8.5.1 of GB/T13504.
5.4.4.3 Playback
Specialized listening room
Playback by computer, its speed is controlled to pause 3~4s between each test unit, so that the test personnel can think and vote. Usually after 25~30 consecutive votes, a short break of 1~2min-Generally, it is not necessary to play again, but the test speech unit with poor sound quality can be repeated 1~2 times if necessary, which is controlled by a self-made program or Cool.Edit program that can play repeatedly.
5.4.4.4 Voting
The tester must listen to the first sentence of each test speech unit and give the voting answer mark (\V\) in the corresponding column of the unit table after careful consideration. The voting opinions are divided into five levels: excellent
. Refer to Appendix D (Supplementary Documents) for definitions. 5.4.5 Scoring
Count the voting "opinions" for the test speech conditions of the tested speech system and each reference test condition. Let K(m) be the number of votes cast by the mth test speech unit of condition I for the K units. N. is the total number of votes for test unit m, then the statistical score is:
iKAoNrKAca-
Weighted average opinion score
SJ20771—2000
The weighted values of various opinions are shown in Article 5.4.4.4. The weighted average opinion score of the mth test unit of test condition 1 is calculated as follows:
MOs(1,m) =
K(m)W(c)
W(1)=5, W(2)=4, and so on. b. Standard deviation
k=1,2**5
Using weighted standard deviation to calculate, the deviation of the test unit ㎡ is: Zk(m)rW(K)-MOs(m)
o(l,m)=1
The result represents
the test voice system condition! The evaluation result is given by the following formula: m=1, 2, 3---M -
MoS(D=±
Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.