Statistical interpretation of data - Detection and treatment of outliers in the sample from exponential distribution
Some standard content:
ICS 03.120.30
National Standard of the People's Republic of China
GB/T 8056-2008
Replaces GB/T8056—1987
Statistical interpretation of data-Detection and treatment of outliersin the sample from exponential distributionPublished on 16 July 2008
General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of ChinaAdministration of Standardization of the People's Republic of China
2009-01-0.1Implementation
GB/T8056—2008
1Scope
2Normative references
3Terms, definitions and symbols·
3.1 Terms and definitions
3.2 Symbols and abbreviations
4 Outlier judgment
Sources and judgment of outliers
Three cases of outliers
Upper limit of the number of outliers detected
Single outlier casewwW.bzxz.Net
Multiple outlier cases
Outlier processing
Processing method
Processing rules
6 Judgment rules for single outliers
Choice of test statistics
6.2 Test rules for upper side cases
6.3 Test rules for lower side cases
6.4 Test rules for two-sided cases Then
7 Rules for judging multiple outliers
7.1 Test steps
7.2 Example of multiple outlier test
8 Rules for judging outliers in samples with fixed number of cuts 8.1 Samples with fixed number of cuts
8.2 Rules for testing outliers
8.3 Example of outlier test in samples with fixed number of cuts Appendix A (Normative Appendix)
References
Critical value table
GB/T8056—2008
This standard replaces GB/T8056--1987 "Judgment and processing of sample outliers for statistical processing and interpretation index of data". Compared with GB/T80561987, the technical changes in this standard mainly include: adding general terms, definitions and symbols; changing "judgment and treatment of index sample outliers" to "judgment and treatment of index sample outliers"; changing the terms "detected outliers" and "highly outliers" to "outliers" and "statistical outliers", and further clarifying their meanings and differences; adding the definitions of detection level and rejection level: the detection level is changed from "the detection level is generally 1%, 5% or 10%" in the original standard to "unless otherwise agreed by the parties to the agreement based on this standard, the detection level should be 0.05" to clearly stipulate that unless otherwise agreed by the parties to the agreement based on this standard, the rejection level should be 0.01; adding various situations of "statistical outliers" The test steps of "no outliers" are changed; "no outliers" and "no highly required outliers" are changed to "no outliers found" and "no statistical outliers found", respectively. Examples of two-sided outlier test, multiple outlier test and fixed number cut sample outlier test are added. Appendix A of this standard is a normative appendix. This standard is proposed and coordinated by the National Technical Committee for Standardization of Statistical Methods. The drafting units of this standard are: Ningbo Institute of Technology, China Standardization and Elegance Research Institute, Peking University, Shanghai Normal University, and Fuzhou Chunlun Tea Co., Ltd. The main drafters of this standard are: Jing Guangzhu, Ding Wenxing, Yu Zhenfan, Liang Fangchu, Sun Shanze, Fei Liang, and Fu Tianlong. The previous versions of the standards replaced by this standard are: GB/T8056-1987. 8056--2008
Scientific research, industrial and agricultural manufacturing, and management work are all inseparable from data, and the organization, analysis, and interpretation of these data are inseparable from statistical methods. Statistics is a discipline that studies the organization, analysis, and correct interpretation of digital data: people obtain various digital data from different sources, and these digital data are usually disorganized and must be organized and simplified before they can be used. The use of perfect statistical methods can organize the data and arrange them in an orderly manner. Using graphics or a small number of important parameters, the characteristics of big data can be expressed, which can avoid incorrect interpretations and minimize the cost of obtaining satisfactory data, thereby improving economic benefits. "Statistical Processing and Interpretation of Data" contains a number of national standards, which are: Determination of Statistical Tolerance Intervals (GB/T3359 ) Estimation and confidence interval of the mean (GB/T3360) Comparison of two means in the case of paired observations (GB/T3361) Estimation and test of binomial distribution parameters (GB/T4088) Estimation and test of Poisson distribution parameters (GB/T4089) Normality test (GB/T4882) Judgment and treatment of outliers in normal samples (GB/T1883) Estimation and test of mean and variance of normal distribution (GB/T4889) Power of test of mean and variance of normal distribution (GB/T4890) Judgment and treatment of outliers in samples of type I extreme value distribution (GB/T6380) Parameter estimation of gamma distribution (Pearson type III distribution) (GB/T8055) Judgment and treatment of outliers in samples of exponential distribution (GB/T8056) There is no corresponding international standard for this standard.
User Partner Network
1 Scope
Statistical processing and interpretation of data,
Judgment and treatment of outliers in exponential distribution samples GB/T 8056--2008
This standard specifies the general principles and implementation steps for judging and treating outliers in samples from exponential distribution. This standard applies to samples from exponential populations. 2 Normative references
The clauses in the following documents become clauses of this standard through reference in this standard. For all dated references, all subsequent amendments (excluding errata) or revised versions are not applicable to this standard. However, parties to an agreement based on this standard are encouraged to study whether the latest versions of these documents can be used. For any undated referenced document, the latest version shall apply to this standard. GB/T 4086.4 Statistical distribution numerical table F distribution quantile table ISO 3534-1 Statistical vocabulary and symbols Part 1: General statistical terms and terms used in probability ISO 3531-2 Statistical vocabulary and symbols Part 2: Applied statistics 3 Terms, definitions and symbols
The terms, definitions and symbols established in IS0 3534-1 and IS0 3534-2 and the following terms, definitions and symbols apply to this standard. For ease of reference, some terms are directly quoted from the above standards. 3.1 Terms and definitions
Exponential distribution exponential distribution A continuous distribution with the following distribution function, F()
Where β>0.
Outlier
[-, 2 0
One or more observations in a sample that are far away from other observations, indicating that they may come from different populations. Note: Outliers are divided into stragglers and statistical outliers according to their significance. 3.1.3
I statistical outliers
Statistical outliers
Outliers that are statistically significant at the elimination level (3.1.6): 3.1.4
Straggler
Outliers that are statistically significant at the detection level (3.1.5) and insignificant at the exclusion level (3.1.6). 3.1.5
Detection level
The significance level of the statistical test specified for detecting outliers. Note: Unless otherwise agreed by the parties to an agreement based on this standard, the detection level shall be 0.05. 1
Partner Network
GB/T8056—2008
Deletion level
The significance level of the statistical test specified for detecting whether an outlier is highly outlier. Note: The value of the deletion level shall not exceed the value of the detection level. Unless otherwise agreed by the parties to an agreement based on this standard, the deletion level shall be 0.01. 3.1.7
P quantile
The minimum value such that the value of the distribution F() is not less than (0.1). 3.2 Symbols and abbreviations
F,(Ut:a)
Ti-.(n,n)
4 Outlier judgment
Sample size (number of observations)
Sample mean
The significance level used to test outliers, referred to as the detection levelThe significance level used to test outliers, referred to as the elimination level (α"α)The strategy after the observations are sorted from small to large; when the sample size n>100, the statistic used to test whether the largest observation m is an outlierWhen the sample size is greater than 100, the statistic used to test whether the smallest observation 1 is an outlierIn the truncated sample, the statistic used to judge whether 1 is an outlierThe quantile of the F distribution with the sum of degrees of freedom
When the sample size n≤100, test whether the largest observation When the sample size is n100, the statistic used to test whether the smallest observed value is an outlier. When the detection level is α, the statistic T is used as the critical value for the test. When the detection level is α, the statistic T is used as the critical value for the test. 4.1 Source and judgment of outliers
4.1.1 Source
Outliers are divided into two categories according to their causes: the first type of outliers are extreme manifestations of the inherent variability of the population. This type of outliers and the rest of the observations in the sample belong to the same population; the second type of outliers are the result of accidental deviations in test conditions and methods, or are caused by errors in observation, recording, and calculation. This type of outliers and the rest of the observations in the sample do not belong to the same population. 4. 1. 2 Determination
The determination of outliers can usually be made directly based on technical or physical reasons, such as when the experimenter already knows that the test deviates from the specified test method, or there is a problem with the test instrument. When the above reasons are unclear, the method specified in this standard can be used. 4.2 Three situations of outliers
This standard determines the outliers in the sample under the following different situations: a) Upper side situation: According to actual conditions or past experience, the outliers are all high-end values; h) Lower side situation: According to actual conditions or past experience, the outliers are all low-end values; h) Bilateral situation: According to actual conditions or past experience, the outliers can be high-end values or low-end values. Inverse: 1) The upper side situation and the lower side situation are collectively referred to as the one-sided situation; 2) If the one-sided situation cannot be identified, it shall be handled as the two-sided situation. The upper limit of the number of detected outliers
should specify the upper limit of the number of outliers detected in the sample (which should be small compared to the sample size). When the number of detected outliers reaches this limit, the sample should be carefully studied and handled. 4.4 The test rules for a single outlier are as follows:
GB/T 8056-2008
a) The null hypothesis is that all observations come from the same population. According to the actual situation or past experience, select one of the situations in 4.2 as the alternative hypothesis, and select the statistic for judging outliers according to statistical principles (see 6.1, 8.2); b) Determine an appropriate significance level;
c) Determine the critical value of the test based on the significance level and sample size; d) Calculate the value of the corresponding statistic from the observed values, and make a judgment based on the comparison between the obtained value and the critical value. 4.5 Multiple outlier situations
When the number of outliers allowed to be detected is less than 1, repeat the test rules specified in 1.4 for testing, and decide when to stop the test according to the following rules:
a) If no outliers are detected, the entire test stops. b) If outliers are detected, the test stops when the total number of detected outliers reaches the upper limit (4.3): otherwise, the same detection level and opposite rules are used to continue testing the remaining observations after removing the detected outliers. 5 Outlier processing
5.1 Processing methods
The methods for processing outliers are:
a) Keep the outliers and use them for subsequent data processing; b) Correct the outliers when the actual cause is found, otherwise keep them; c) Eliminate the outliers and do not add observations; d) Remove the outliers and add new observations or replace them with appropriate interpolation values. 5.2 Processing rules
For the detected outliers, we should try our best to find their technical and physical causes as the basis for processing the outliers. According to the nature of the actual problem, the cost of finding and determining the cause of outliers, the benefits of correctly determining outliers and the risk of incorrectly eliminating normal observations should be weighed to determine whether to implement one of the following three rules: a) If the cause of the outlier is found technically or physically, it should be eliminated or corrected; otherwise, it should not be eliminated or corrected. b) If the cause of the outlier is found technically or physically, it should be eliminated or corrected; otherwise, the outlier should be retained and the statistical outlier should be eliminated or corrected. In the case of repeatedly using the same test rule to test multiple outliers, each time an outlier is detected, it should be tested again to see if it is a statistical outlier. If an outlier detected in a certain time is a statistical outlier, this outlier and the outliers detected before it (including outliers) should be eliminated or corrected. c) All detected outliers (including outliers) should be eliminated or corrected: 5.3 Record
The observations that are eliminated or corrected and the reasons should be recorded for future reference. 6 Rules for judging a single outlier
6.1 Selection of the test statistic
When the sample size nS100, use the statistic T.. (or T.1) for the test; when the sample size n>100, use the statistic E.. (or E1) for the test.
6.2 Test rules for the upper case
6. 2. 1 Test when sample size n≤100 When the sample size is ≤100, the implementation steps are as follows: a) Calculate the value of the statistic T,:
T... = 2(n)
GB/T 8056—2008
Determine the detection level α, and find the critical value T1-(n, n) in Table A.1 of Appendix A. b)
When TT, (, n), it is judged as an outlier, otherwise it is judged as not found< is an outlier; d) For the detected outlier r(, determine the removal level a\, and find the critical value Ti in Table A.1. (n, n). When T.>T, (, n), it is judged as a statistical outlier, otherwise it is judged as not found(is a statistical outlier (that is, a divergent value). 6. 2. 2 Sample size n >100 When the sample size n>100, the implementation steps are as follows: a) Calculate the value of the statistical maximum E,:
(n-1)[αn-n)
[) (nn]
h) Determine the detection level, find F, (2.2n-2) in the quantile table of F distribution (see GB/T 1086.1); c) When E,>F,…(2,2n-2), determine s as an outlier, otherwise it is judged that no outlier is found (m is an outlier;.-(2)
d) For the detected outlier zt), determine the elimination level α, find F-(2,2n-2) in the quantile table of F distribution (see GB/T 4086.4). When EF,-(2,2n-2), determine m as a statistical outlier, otherwise it is judged that no outlier is found (i.e., it is a divergence value).
6.2.3 Test example for the upper case
15 samples are randomly selected from a type of electronic product and subjected to life tests under certain conditions. The failure times are (unit: kh):
Experience shows that the life of this type of electronic product follows an exponential distribution, and here the user is concerned about whether there are upper outliers in the data. Based on this, the test method in 6.2.1 can be used. 13
In this example, the sample maximum n=15, no)=5.1020, 2work16.78, according to formula (1) to calculate Ti5.15
Determine the detection level α=0.05, find the critical value To.95(15,15)=0.3346 in Table A.1, because T1s.1s1 Source
Outliers are divided into two categories according to their causes: the first type of outliers are extreme manifestations of the inherent variability of the population, and these outliers belong to the same population as the rest of the observations in the sample; the second type of outliers are the result of accidental deviations from the test conditions and test methods, or errors in observation, recording, and calculation, and these outliers do not belong to the same population as the rest of the observations in the sample. 4.1.2 Determination
The determination of outliers can usually be made directly based on technical or physical reasons, such as when the experimenter already knows that the experiment deviates from the prescribed test method, or when there is a problem with the test instrument. When the above reasons are unclear, the method specified in this standard can be used. 4.2 Three situations of outliers
This standard judges the outliers in the sample under the following different situations: a) Upper side situation: According to the actual situation or past experience, the outliers are all high-end values; h) Lower side situation: According to the actual situation or past experience, the outliers are all low-end values; h) Bilateral situation: According to the actual situation or past experience, the outliers can be high-end values or low-end values. Conversely: 1) The upper side situation and the lower side situation are collectively referred to as the one-sided situation; 2) If the one-sided situation cannot be identified, it shall be handled as the two-sided situation. The upper limit of the number of detected outliers
should specify the upper limit of the number of outliers detected in the sample (which should be smaller than the sample size). When the number of detected outliers reaches this limit, the sample should be carefully studied and handled. 4.4 The single outlier form
test rules are as follows:
GB/T 8056-2008
a) The original hypothesis is that all observations come from the same population. According to the actual situation or past experience, select one of the situations in 4.2 as the alternative hypothesis, and select the statistic for judging outliers according to statistical principles (see 6.1, 8.2); b) Determine an appropriate significance level;
c) Determine the critical value of the test according to the significance level and sample size; d) Calculate the value of the corresponding statistic from the observed values, and make a judgment based on the comparison between the obtained value and the critical value. 4.5 Multiple outliers
When outliers are allowed to be detected When the number of values is less than 1, repeat the inspection rules specified in 1.4 and determine the timing of stopping the inspection according to the following rules:
a) If no outliers are detected, the entire inspection stops. b) If outliers are detected, the inspection stops when the total number of detected outliers reaches the upper limit (4.3): Otherwise, the same detection level and opposite rules are used to continue the inspection of the remaining observations after removing the detected outliers. 5 Outlier processing
5.1 Processing methods
The methods for processing outliers are:
a) a) retain the outliers and use them for subsequent data processing; b) correct the outliers when the actual cause is found, otherwise retain them; c) remove the outliers and do not add observations; d) remove the outliers and add new observations or replace them with appropriate interpolation values. 5.2 Processing rules
For the detected outliers, their technical and physical causes should be found as much as possible as a basis for processing the outliers. The cost of finding and determining the causes of outliers, the benefits of correctly determining outliers and the risks of incorrectly removing normal observations should be weighed according to the nature of the actual problem to determine the implementation of the next step. One of the three rules mentioned above: a) If the technical or physical cause of the outlier is found, it should be eliminated or corrected; otherwise, it shall not be eliminated or corrected. b) If the technical or physical cause of the outlier is found, it should be eliminated or corrected; otherwise, the outlier is retained and the statistical outlier is eliminated or corrected. When the same test rule is used repeatedly to test multiple outliers, each time an outlier is detected, it must be tested again to see if it is a statistical outlier. If an outlier detected at a certain time is a statistical outlier, this outlier and the outliers detected before it (including outliers) should be eliminated or corrected. c) All detected outliers (including outliers) should be eliminated or corrected: 5.3 Record
The observations that are eliminated or corrected and the reasons for them should be recorded for future reference. 6 Rules for judging a single outlier
6.1 Selection of the test statistic
When the sample size nS100, use the statistic T.. (or T.1) for the test; when the sample size n>100, use the statistic E.. (or E1) for the test.
6.2 Test rules for the upper case
6. 2. 1 Test when sample size n≤100 When the sample size is ≤100, the implementation steps are as follows: a) Calculate the value of the statistic T,:
T... = 2(n)
GB/T 8056—2008
Determine the detection level α, and find the critical value T1-(n, n) in Table A.1 of Appendix A. b)
When TT, (, n), it is judged as an outlier, otherwise it is judged as not found< is an outlier; d) For the detected outlier r(, determine the removal level a\, and find the critical value Ti in Table A.1. (n, n). When T.>T, (, n), it is judged as a statistical outlier, otherwise it is judged as not found(is a statistical outlier (that is, a divergent value). 6. 2. 2 Sample size n >100 When the sample size n>100, the implementation steps are as follows: a) Calculate the value of the statistical maximum E,:
(n-1)[αn-n)
[) (nn]
h) Determine the detection level, find F, (2.2n-2) in the quantile table of F distribution (see GB/T 1086.1); c) When E,>F,…(2,2n-2), determine s as an outlier, otherwise it is judged that no outlier is found (m is an outlier;.-(2)
d) For the detected outlier zt), determine the elimination level α, find F-(2,2n-2) in the quantile table of F distribution (see GB/T 4086.4). When EF,-(2,2n-2), determine m as a statistical outlier, otherwise it is judged that no outlier is found (i.e., it is a divergence value).
6.2.3 Test example for the upper case
15 samples are randomly selected from a type of electronic product and subjected to life tests under certain conditions. The failure times are (unit: kh):
Experience shows that the life of this type of electronic product follows an exponential distribution, and here the user is concerned about whether there are upper outliers in the data. Based on this, the test method in 6.2.1 can be used. 13
In this example, the sample maximum n=15, no)=5.1020, 2work16.78, according to formula (1) to calculate Ti5.15
Determine the detection level α=0.05, find the critical value To.95(15,15)=0.3346 in Table A.1, because T1s.1s1 Source
Outliers are divided into two categories according to their causes: the first type of outliers are extreme manifestations of the inherent variability of the population, and these outliers belong to the same population as the rest of the observations in the sample; the second type of outliers are the result of accidental deviations from the test conditions and test methods, or errors in observation, recording, and calculation, and these outliers do not belong to the same population as the rest of the observations in the sample. 4.1.2 Determination
The determination of outliers can usually be made directly based on technical or physical reasons, such as when the experimenter already knows that the experiment deviates from the prescribed test method, or when there is a problem with the test instrument. When the above reasons are unclear, the method specified in this standard can be used. 4.2 Three situations of outliers
This standard judges the outliers in the sample under the following different situations: a) Upper side situation: According to the actual situation or past experience, the outliers are all high-end values; h) Lower side situation: According to the actual situation or past experience, the outliers are all low-end values; h) Bilateral situation: According to the actual situation or past experience, the outliers can be high-end values or low-end values. Conversely: 1) The upper side situation and the lower side situation are collectively referred to as the one-sided situation; 2) If the one-sided situation cannot be identified, it shall be handled as the two-sided situation. The upper limit of the number of detected outliers
should specify the upper limit of the number of outliers detected in the sample (which should be smaller than the sample size). When the number of detected outliers reaches this limit, the sample should be carefully studied and handled. 4.4 The single outlier form
test rules are as follows:
GB/T 8056-2008
a) The original hypothesis is that all observations come from the same population. According to the actual situation or past experience, select one of the situations in 4.2 as the alternative hypothesis, and select the statistic for judging outliers according to statistical principles (see 6.1, 8.2); b) Determine an appropriate significance level;
c) Determine the critical value of the test according to the significance level and sample size; d) Calculate the value of the corresponding statistic from the observed values, and make a judgment based on the comparison between the obtained value and the critical value. 4.5 Multiple outliers
When outliers are allowed to be detected When the number of values is less than 1, repeat the inspection rules specified in 1.4 and determine the timing of stopping the inspection according to the following rules:
a) If no outliers are detected, the entire inspection stops. b) If outliers are detected, the inspection stops when the total number of detected outliers reaches the upper limit (4.3): Otherwise, the same detection level and opposite rules are used to continue the inspection of the remaining observations after removing the detected outliers. 5 Outlier processing
5.1 Processing methods
The methods for processing outliers are:
a) a) retain the outliers and use them for subsequent data processing; b) correct the outliers when the actual cause is found, otherwise retain them; c) remove the outliers and do not add observations; d) remove the outliers and add new observations or replace them with appropriate interpolation values. 5.2 Processing rules
For the detected outliers, their technical and physical causes should be found as much as possible as a basis for processing the outliers. The cost of finding and determining the causes of outliers, the benefits of correctly determining outliers and the risks of incorrectly removing normal observations should be weighed according to the nature of the actual problem to determine the implementation of the next step. One of the three rules mentioned above: a) If the technical or physical cause of the outlier is found, it should be eliminated or corrected; otherwise, it shall not be eliminated or corrected. b) If the technical or physical cause of the outlier is found, it should be eliminated or corrected; otherwise, the outlier is retained and the statistical outlier is eliminated or corrected. When the same test rule is used repeatedly to test multiple outliers, each time an outlier is detected, it must be tested again to see if it is a statistical outlier. If an outlier detected at a certain time is a statistical outlier, this outlier and the outliers detected before it (including outliers) should be eliminated or corrected. c) All detected outliers (including outliers) should be eliminated or corrected: 5.3 Record
The observations that are eliminated or corrected and the reasons for them should be recorded for future reference. 6 Rules for judging a single outlier
6.1 Selection of the test statistic
When the sample size nS100, use the statistic T.. (or T.1) for the test; when the sample size n>100, use the statistic E.. (or E1) for the test.
6.2 Test rules for the upper case
6. 2. 1 Test when sample size n≤100 When the sample size is ≤100, the implementation steps are as follows: a) Calculate the value of the statistic T,:
T... = 2(n)
GB/T 8056—2008
Determine the detection level α, and find the critical value T1-(n, n) in Table A.1 of Appendix A. b)
When TT, (, n), it is judged as an outlier, otherwise it is judged as not found< is an outlier; d) For the detected outlier r(, determine the removal level a\, and find the critical value Ti in Table A.1. (n, n). When T.>T, (, n), it is judged as a statistical outlier, otherwise it is judged as not found(is a statistical outlier (that is, a divergent value). 6. 2. 2 Sample size n >100 When the sample size n>100, the implementation steps are as follows: a) Calculate the value of the statistical maximum E,:
(n-1)[αn-n)
[) (nn]
h) Determine the detection level, find F, (2.2n-2) in the quantile table of F distribution (see GB/T 1086.1); c) When E,>F,…(2,2n-2), determine s as an outlier, otherwise it is judged that no outlier is found (m is an outlier;.-(2)
d) For the detected outlier zt), determine the elimination level α, find F-(2,2n-2) in the quantile table of F distribution (see GB/T 4086.4). When EF,-(2,2n-2), determine m as a statistical outlier, otherwise it is judged that no outlier is found (i.e., it is a divergence value).
6.2.3 Test example for the upper case
15 samples are randomly selected from a type of electronic product and subjected to life tests under certain conditions. The failure times are (unit: kh):
Experience shows that the life of this type of electronic product follows an exponential distribution, and here the user is concerned about whether there are upper outliers in the data. Based on this, the test method in 6.2.1 can be used. 13
In this example, the sample maximum n=15, no)=5.1020, 2work16.78, according to formula (1) to calculate Ti5.15
Determine the detection level α=0.05, find the critical value To.95(15,15)=0.3346 in Table A.1, because T1s.1s3), the test stops: otherwise, the same detection level and opposite rules are used to continue testing the remaining observations after removing the detected outliers. 5 Outlier processing
5.1 Processing methods
The methods for processing outliers are:
a) retain the outliers and use them for subsequent data processing; b) correct the outliers when the actual reasons are found, otherwise retain them; c) eliminate the outliers and do not add observations; d) remove the outliers and add new observations or replace them with appropriate interpolation values. 5.2 Processing rules
For the detected outliers, their technical and physical reasons should be found as much as possible as a basis for processing the outliers. According to the nature of the actual problem, the cost of finding and determining the cause of outliers, the benefits of correctly determining outliers and the risk of incorrectly eliminating normal observations should be weighed to determine whether to implement one of the following three rules: a) If the cause of the outlier is found technically or physically, it should be eliminated or corrected; otherwise, it should not be eliminated or corrected. b) If the cause of the outlier is found technically or physically, it should be eliminated or corrected; otherwise, the outlier should be retained and the statistical outlier should be eliminated or corrected. In the case of repeatedly using the same test rule to test multiple outliers, each time an outlier is detected, it should be tested again to see if it is a statistical outlier. If an outlier detected in a certain time is a statistical outlier, this outlier and the outliers detected before it (including outliers) should be eliminated or corrected. c) All detected outliers (including outliers) should be eliminated or corrected: 5.3 Record
The observations that are eliminated or corrected and the reasons should be recorded for future reference. 6 Rules for judging a single outlier
6.1 Selection of the test statistic
When the sample size nS100, use the statistic T.. (or T.1) for the test; when the sample size n>100, use the statistic E.. (or E1) for the test.
6.2 Test rules for the upper case
6. 2. 1 Test when sample size n≤100 When the sample size is ≤100, the implementation steps are as follows: a) Calculate the value of the statistic T,:
T... = 2(n)
GB/T 8056—2008
Determine the detection level α, and find the critical value T1-(n, n) in Table A.1 of Appendix A. b)
When TT, (, n), it is judged as an outlier, otherwise it is judged as not found< is an outlier; d) For the detected outlier r(, determine the removal level a\, and find the critical value Ti in Table A.1. (n, n). When T.>T, (, n), it is judged as a statistical outlier, otherwise it is judged as not found(is a statistical outlier (that is, a divergent value). 6. 2. 2 Sample size n >100 When the sample size n>100, the implementation steps are as follows: a) Calculate the value of the statistical maximum E,:
(n-1)[αn-n)
[) (nn]
h) Determine the detection level, find F, (2.2n-2) in the quantile table of F distribution (see GB/T 1086.1); c) When E,>F,…(2,2n-2), determine s as an outlier, otherwise it is judged that no outlier is found (m is an outlier;.-(2)
d) For the detected outlier zt), determine the elimination level α, find F-(2,2n-2) in the quantile table of F distribution (see GB/T 4086.4). When EF,-(2,2n-2), determine m as a statistical outlier, otherwise it is judged that no outlier is found (i.e., it is a divergence value).
6.2.3 Test example for the upper case
15 samples are randomly selected from a type of electronic product and subjected to life tests under certain conditions. The failure times are (unit: kh):
Experience shows that the life of this type of electronic product follows an exponential distribution, and here the user is concerned about whether there are upper outliers in the data. Based on this, the test method in 6.2.1 can be used. 13
In this example, the sample maximum n=15, no)=5.1020, 2work16.78, according to formula (1) to calculate Ti5.15
Determine the detection level α=0.05, find the critical value To.95(15,15)=0.3346 in Table A.1, because T1s.1s3), the test stops: otherwise, the same detection level and opposite rules are used to continue testing the remaining observations after removing the detected outliers. 5 Outlier processing
5.1 Processing methods
The methods for processing outliers are:
a) retain the outliers and use them for subsequent data processing; b) correct the outliers when the actual reasons are found, otherwise retain them; c) eliminate the outliers and do not add observations; d) remove the outliers and add new observations or replace them with appropriate interpolation values. 5.2 Processing rules
For the detected outliers, their technical and physical reasons should be found as much as possible as a basis for processing the outliers. According to the nature of the actual problem, the cost of finding and determining the cause of outliers, the benefits of correctly determining outliers and the risk of incorrectly eliminating normal observations should be weighed to determine whether to implement one of the following three rules: a) If the cause of the outlier is found technically or physically, it should be eliminated or corrected; otherwise, it should not be eliminated or corrected. b) If the cause of the outlier is found technically or physically, it should be eliminated or corrected; otherwise, the outlier should be retained and the statistical outlier should be eliminated or corrected. In the case of repeatedly using the same test rule to test multiple outliers, each time an outlier is detected, it should be tested again to see if it is a statistical outlier. If an outlier detected in a certain time is a statistical outlier, this outlier and the outliers detected before it (including outliers) should be eliminated or corrected. c) All detected outliers (including outliers) should be eliminated or corrected: 5.3 Record
The observations that are eliminated or corrected and the reasons should be recorded for future reference. 6 Rules for judging a single outlier
6.1 Selection of the test statistic
When the sample size nS100, use the statistic T.. (or T.1) for the test; when the sample size n>100, use the statistic E.. (or E1) for the test.
6.2 Test rules for the upper case
6. 2. 1 Test when sample size n≤100 When the sample size is ≤100, the implementation steps are as follows: a) Calculate the value of the statistic T,:
T... = 2(n)
GB/T 8056—2008
Determine the detection level α, and find the critical value T1-(n, n) in Table A.1 of Appendix A. b)
When TT, (, n), it is judged as an outlier, otherwise it is judged as not found< is an outlier; d) For the detected outlier r(, determine the removal level a\, and find the critical value Ti in Table A.1. (n, n). When T.>T, (, n), it is judged as a statistical outlier, otherwise it is judged as not found(is a statistical outlier (that is, a divergent value). 6. 2. 2 Sample size n >100 When the sample size n>100, the implementation steps are as follows: a) Calculate the value of the statistical maximum E,:
(n-1)[αn-n)
[) (nn]
h) Determine the detection level, find F, (2.2n-2) in the quantile table of F distribution (see GB/T 1086.1); c) When E,>F,…(2,2n-2), determine s as an outlier, otherwise it is judged that no outlier is found (m is an outlier;.-(2)
d) For the detected outlier zt), determine the elimination level α, find F-(2,2n-2) in the quantile table of F distribution (see GB/T 4086.4). When EF,-(2,2n-2), determine m as a statistical outlier, otherwise it is judged that no outlier is found (i.e., it is a divergence value).
6.2.3 Test example for the upper case
15 samples are randomly selected from a type of electronic product and subjected to life tests under certain conditions. The failure times are (unit: kh):
Experience shows that the life of this type of electronic product follows an exponential distribution, and here the user is concerned about whether there are upper outliers in the data. Based on this, the test method in 6.2.1 can be used. 13
In this example, the sample maximum n=15, no)=5.1020, 2work16.78, according to formula (1) to calculate Ti5.15
Determine the detection level α=0.05, find the critical value To.95(15,15)=0.3346 in Table A.1, because T1s.1s, determine the level a\, and find the critical value T(n,1) in Table A.2.
Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.