Statistical interpretation of data - Detection and treatment of outliers in the sample from typeⅠextreme value distribution
Some standard content:
ICS03.120.30
National Standard of the People's Republic of China
GB/T6380-2008
Replaces GB/T 6380—198G
Statistical interpretation of data-Detection and treatment ofoutliers in the sanple front lype I exiremc value distributionPublished on July 28, 2008
General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of ChinaStandardization Administration of the People's Republic of China
Implementation on January 1, 2009
Normative references
3 Terms, definitions and symbols
3.1 Terms and definitions
3.2 Symbols and abbreviations
4 Outlier determination
Source and determination of outliers
1.2 Upper limit of the number of outliers to be picked out
4.3 Single outlier case
4.4 Multiple outlier case
5 Outlier processing
5, 1 Processing method
Processing rules
6 Rules for judging single outliers
6.1 Selection of test method
6.2 Dickson test method
6.3 Irwin test method
7 Rules for judging multiple outliers
7.1 Test steps
7.2 Example of multiple outlier test
Appendix A (Normative Appendix) Critical value table
References
GB/T 6380—2008
GB/T 6380—2008
This standard replaces GB/T6380-1986 "Statistical processing and interpretation of data - Judgment and processing of outliers in samples from type 1 extreme value distribution". Compared with GB/T6380-1986, the changes in the technical content of this standard mainly include: the standard format has been modified according to the requirements of GB/I1.12000 "Guidelines for standardization work Part 1: Structure and writing rules of standards";
has added terms, definitions and symbols;
- The "judgment and treatment of outliers in type I extreme value distribution samples" in the standard name has been changed to "judgment and treatment of outliers in type I extreme value distribution samples";
The terms "detected outliers" and "highly outliers" have been changed to "outliers" and "general outliers", and the meanings and differences between the two have been further clarified;
The definitions of detection level and elimination level have been added! The detection level in the original standard is changed from "the detection level is generally 1%, 5% or 10%" to "unless otherwise agreed by the parties who reach an agreement based on the technical standard, the detection level should be 0.05"; the value of the exclusion level is clearly stipulated, unless otherwise agreed by the parties who reach an agreement based on this standard, the exclusion level should be 0.01; the test steps of "statistical outliers" in various situations are added; "rules for judging multiple outliers" are added; "no outliers" and "no highly abnormal outliers" are changed to "no outliers found" and "no outliers found" respectively. No statistical outliers were found".
Appendix A of this standard is a normative appendix.
This standard was proposed and approved by the National Technical Committee for Standardization of Statistical Methods Application. Drafting units of this standard: Ningbo Institute of Technology, China National Institute of Standardization, Peking University, Tianjin University, Hainan Product Quality Supervision and Inspection Institute
Main drafters of this standard: Jing Guangzhu, Ding Wenxing, Yu Zhenfan, Cai Junwei, Sun Shanze, Ma Jinshi, Huang Yan, etc. The previous versions of the standards replaced by this standard are: GB/T 63801986.
GB/T6380—2008
Scientific research, industrial and agricultural manufacturing, and management work are all inseparable from data, and the organization, analysis, and interpretation of these data are inseparable from statistical methods. Statistics is a discipline that studies the organization, analysis, and correct interpretation of digital data. People obtain various digital data from different sources. These digital data are generally disorganized and must be organized and simplified before they can be used. Using a perfect statistical method, the data can be organized and arranged in an orderly manner. Using graphics or a small number of important parameters, the characteristics of a large amount of data can be expressed, which can avoid incorrect interpretations and minimize the cost of obtaining satisfactory data, thereby improving economic benefits. The National Standard for Statistical Processing and Interpretation of Data includes the following items: Determination of statistical tolerance intervals (G B/T3359) Estimation and confidence interval of the mean (GB/T3360) Comparison of two means in the case of paired observations (GB/T3361) Estimation and test of binomial distribution parameters (G/T4088) Estimation and test of Poisson distribution parameters (GB/T4089) - Normality test (GB/T4882)
Judgment and treatment of outliers in normal samples (GB/T4883) Estimation and test of mean and variance of normal distribution (CB/T1889) - Power of test of mean and variance of normal distribution (GB/T4890) Judgment and treatment of outliers in samples of "Type I extreme value distribution" (GB/T6380) Parameter estimation of gamma distribution (Pierre et al. type III distribution) (GB/T8055) Judgment and treatment of outliers in samples of exponential distribution (GB/T8056) There is no corresponding international standard for this standard.
1 Scope
Statistical processing and interpretation of data
1 Judgment and treatment of outliers in samples with type I extreme value distribution GB/T 6380-2008
This standard specifies the general principles and implementation methods for judging and treating upper outliers in samples with type I extreme value distribution and lower outliers in samples with type I minimum value distribution.
This standard is applicable to samples from the population of type I extreme value distribution or type I minimum value distribution. This standard is applicable to the case where the sample size is 5 to 50. Note: Since the random variable of type I minimum value distribution will obey type I extreme value distribution after transformation, the method of detecting upper outliers is only given for type I extreme value distribution.
2 Normative references
The clauses in the following documents become the clauses of this standard through reference in this standard. For dated references, all subsequent amendments (excluding errata) or revisions are not applicable to this standard. However, parties to agreements based on this standard are encouraged to investigate whether the latest versions of these documents can be used. For undated references, the latest versions apply to this standard. ISO 3534-1:2006 Statistical vocabulary and symbols Part 1: General statistical terms and terms used in probability ISO 3531-2:2006 Statistical vocabulary and symbols Part 2: Applied statistics 3 Terms, definitions and symbols
The terms, definitions and symbols established in ISO 3534-1:2006 and ISO 3534-2:2006 and the following terms, definitions and symbols apply to this standard.
3.1 Terms and definitions
Type I extreme value distribution (Gumbel distribution) A continuous distribution with the following distribution function. F(a) - exp( e--ab)
wherein, one r + α
Type I minimum value distribution type I minimum vale distribution is a continuous distribution with the following distribution function. F(α) = 1 exp(-e-(-b)
where:>0,-<+x
3, 1,3
Outlier
One or more observations in a sample that are far away from other observations, suggesting that they may come from different populations. Note: Outliers are divided into outliers and statistical outliers according to their significance. 3. 1. 4
Statistical outlier is an outlier that is statistically significant when the water level is eliminated (3.1.7). 1
GB/T6380—2008
Outlier
An outlier that is significant at the detection level (3.1.6) but not significant at the discrimination level (3.1.7) 3.1.6
Detection level level
The significance level of the statistical test specified for detecting outliers. Note: Unless otherwise agreed by the parties to an agreement under this standard, the detection level shall be 0.05. 3. 1.7
Removal level
The significance level of the statistical test specified for detecting whether an outlier is highly outlier. Note: The removal level should not exceed the detection level. Unless otherwise agreed by the parties to this standard, the removal level should be 0.01. 3.2 Symbols and abbreviations
Sample size (number of observations)
4 Outlier judgment
The significance level used in testing outliers, referred to as the detection level. The significance level used in testing outliers, referred to as the removal level (<) The statistic used to test whether the largest observation 3 is an outlier when the detection level is , the statistic D is used as the critical value for the test. Sample size 30<, n≤50 When the statistic used to test whether the largest observed value is an outlier is the detection level of 1, the statistic 1 is used as the critical value for the test. 4.1 Source and judgment of outliers
4.1.1 Source
Outliers are divided into two categories according to the cause: the first type of outliers are the extreme manifestations of the variability of the population quotient, and this type of outliers and the rest of the observed values in the sample belong to the same population; the second type of outliers are the result of accidental deviations from the test conditions and test methods, or are caused by errors in observation, recording, and calculation. This type of outliers and the rest of the observed values in the sample do not belong to the same population. 4.1.2 Judgment
The judgment of outliers can usually be made directly based on technical or physical reasons, such as when the experimenter already knows that the experiment deviates from the specified test method, or when there is a problem with the test instrument. When the above reasons are unclear, the method specified in this standard can be used. 4.2 Upper limit of the number of detected outliers
The upper limit of the number of detected outliers in the sample should be specified (it should be small compared to the sample size). When the number of detected outliers reaches this upper limit, the sample should be carefully studied and handled. 4.3 Single outlier case
The test rules are as follows:
) The null hypothesis is that all observations come from the same population, and the alternative hypothesis is that the observed data has a lower outlier. According to statistical principles, the statistic for judging outliers is selected (see 6.1): b) Determine an appropriate significance level;
Determine the critical value of the test based on the significance level and the sample; d) Calculate the value of the corresponding statistic from the observed value, and make a judgment based on the comparison between the obtained value and the critical value. 4.4 Multiple outlier situation
When the number of detected outliers is greater than 1, the inspection rules specified in 4.3 are used repeatedly for inspection, and the timing of stopping the inspection is determined according to the following rules:
a) If no outliers are detected, the entire inspection is stopped; GB/T 6380--2008
b) If outliers are detected, the inspection is stopped when the total number of detected outliers reaches the upper limit (see 4.2); otherwise, the same detection level and the same rules are used to continue the inspection of the remaining observations after removing the detected outliers. 5 Outlier processing
5.1 Processing methods
The methods for processing outliers are:
a) retain the outliers and use them for subsequent data processing; b) correct the outliers when the actual cause is found, otherwise retain them; c) eliminate the outliers and do not add additional observations; d) eliminate the outliers and add new observations or replace them with appropriate interpolation values. 5.2 Processing rules
For the detected outliers, their technical and physical causes should be sought as much as possible. As the basis for processing outliers, the cost of finding and determining the cause of the outliers, the benefits of correctly determining the outliers, and the risk of incorrectly eliminating abnormal observations should be weighed according to the nature of the actual problem to determine whether to implement one of the following three rules: a) If the cause of the outlier is found technically or physically, it should be eliminated or corrected; otherwise, it should not be eliminated or corrected: b) If the cause of the outlier is found technically or physically, it should be eliminated or corrected; otherwise, the outliers should be retained and the statistical outliers should be eliminated or corrected. In the case of repeatedly using the same test rule to test multiple outliers, each time an outlier is detected, it should be tested again to see if it is a statistical outlier. If an outlier detected in a certain time is a statistical outlier, this outlier and the outliers detected before it (including the outliers) should be eliminated or corrected: All detected outliers (including the outliers) should be eliminated or corrected. 5.3 Record
The deleted or corrected observations and their reasons should be recorded for future reference. 6 Rules for judging single outliers
6.1 Selection of test methods
When the sample size is 5≤n≤30, the Dixon test method is used; when the sample size is 30≤n≤50, the Irwin test method is used.
6.2 Dixon test method
6. 2, 1 Test steps
When the sample size is 5≤≤30, the implementation steps are as follows: a) According to the minimum observation value I(1), the maximum observation value (m), the second largest observation value -1: and the second largest observation value Ta-2) in the sample observation values, calculate the value of the statistic D,:
,5≤≤8
rt—re
2,9xn30
(m(I)
b) Determine the detection level and find the critical value Di-(n) in Appendix A Table A, 1. c) When D.>D,-.n), m is determined to be an outlier, otherwise it is not found to be an outlier. (1)
d) For the detected outlier x(n), determine to exclude water-α\, and find the critical value D\(n) in Appendix A Table A.1. When D,>Di-(n), (n) is judged to be a statistical outlier, otherwise it is judged that (> is a statistical outlier (that is, Z(n is a divergence value).
6.2.2 Dixon test example 3
GB/T 6380—2008
Automatic shearing machine cuts steel materials, and the length of the first 100 steel materials cut is recorded daily as batch data. Six batches are recorded in one week, and the monthly maximum values are as follows (unit: mm): 321.46319.62320.44319.51329.73320.41Based on experience, it can be considered that the population is a "type extreme value distribution. If the user is concerned about whether there are upper outliers in the data, the method in this article can be used.
In this example, the minimum observed value of n6 is cm=319.51, the maximum observed value is z>329.73, and the second largest measured value is xb)—321.46. The value of the statistic D is calculated according to formula (1).
De = () (52 - 3
329.73—321.46
(5)—2()—329.73319.51
The detection level α=0.05 is determined, and the critical value of PD, D,n(6)=0.681 is found in Appendix A Table A1. Since D—0.8092>0.681D(6), 329.73 is determined to be an outlier. For the detected outlier 2ts) =329. 73, the elimination water half is further given =0. 01, in Appendix A Table A. 1 The critical value of D, D. (6) -0.796 is found. Since D0.8090.796 = D3 (6), it is judged that D) -329.73 is a statistical outlier. After verification, this data is misrecorded and the actual value is 319.73. 6.3 Irwin test method
6.3.1 Test steps
When the sample maximum is 30<≤50, the implementation steps are as follows:) According to the maximum value of the sample observation The smallest observation value (1), the largest observation value and the second observation value (1), calculate the value of statistic 1:
Where:
n—2
---(2)
· (3)
(4)
The summation here is performed on all sample observations after removing the smallest observation value and the largest observation value. b)
Determine the detection level, find the critical value I,-. (n) in Appendix A Table A, 2 When I-. (n), it is determined to be an outlier, otherwise it is determined that it is not found (a is an outlier. For the detected outlier &cn) · Determine the detection level a, find the critical value I.. (n) in Appendix A Table A. 2 When I,1.. (n), it is determined to be a statistical outlier, otherwise it is determined that it is not found (that is, it is a divergent value). 6.3.2 Example of Irwin test method The annual observation data of the maximum annual flow of a river in a certain place are as follows (unit: km'/s); 1.69 1.22 0.75 1.26 1.73 1.74
3.09 1.57 1.97 2.23
1.18 2.12 1.380.90
2.102.021.74
Experience shows that the annual observation data of the maximum annual flow rat approximately obeys the extreme value distribution of the industrial type. It is necessary to judge whether the maximum value z(41=4.31 is an outlier.
A slight arrangement of the above observation data shows that the minimum observation value is 2(1:=0.75, the maximum observation value is ±<4)=4.31. The second largest observation value is 20)3.09. For all observations except the minimum observation value of 1> and the maximum observation value of 2(, first calculate the value of: according to formula (3), and then calculate the value of the statistic I, according to formula (2). 4.31 -3. 09 2. 43
4 = 2(0) - r(39)
GB/T6380—2008
Determine the detection level α=0.05, and check the critical value 1a.95(40)=2.88 in Appendix A Table A.2. Since 14a=2.432,88=Io.9(40), it is determined that (43=4.31 is not an outlier. 7 Rules for judging multiple outliers
7.1 Test steps
When there may be multiple outliers in the sample that need to be tested, follow the rules of 4.4. The specific method for judging outliers can be divided according to the size of the sample. Do not follow the steps in 6.2 and 6.3. 7.2 Example of multiple outlier tests Www.bzxZ.net
Randomly take 11 samples from a certain insulating material and conduct life tests under certain conditions. The failure times are (unit: h): 4.09, 17.31, 60.78, 62.16, 64.15, 70.67.71.85, 75.50, 79.35, 80.00, 88.01 Experience shows that the life T of this insulating material obeys the type I minimum distribution. Therefore, X-T obeys the type I extreme value distribution. Here x(19 —88. 01, <2) — —80, 00, -*,( - —62. 16, ) - —60. 78, x —17. 31,2() -一4.09,If the upper limit of the number of outliers to be detected is 2, the method in this article can be used. First, reversely judge whether r(11) is an outlier, because n=11, calculate the value of the statistic D1 according to formula (1): D = (4 - 2(0) (1: 09) - ( 60. 78)(—4.09)—(—88.01)
The detection level α=0.05 is determined, and the critical value D.(11)=0.656 is found in Appendix A Table A.1. Since D1~0.6750.656=De.9s(11), it is determined that z(:1)=—4.09 is an outlier, that is, 4.09 in the original data is determined to be an outlier. For the detected outlier r(1=4.09, the elimination level α=0.01 is determined, and the critical value Dssg(11)=0.748 is found in Appendix A Table A.1. For D:1—0.6750.748=Da.=(11), it is determined that (11=—4.09 is not a statistical outlier (that is, D(11)=4. 09 is a divergent value, that is, 4.09 in the original data is determined to be an outlier. The remaining 10 data are tested again. At this time, the sample size becomes 10. The value of the statistic D) is calculated according to formula (1): Dig) - (a2
(—17.31) -(—62.16)
2(10) -x(1>—(—17.31) — (88.01)0.634
The detection level α is still taken as 0.05. The critical value Ds, 5 (10) = 0.676 is found in Appendix A Table A.1. Since D.U.634 <0.676-D.% (10), it is judged that D<15) = —17.31 is not found as an outlier (that is, 17.31 in the original data is not found to be an outlier). The whole test stops at this point.
GB/T 6380—2008
Appendix A
(Normative Appendix)
Critical value table
The critical value table of Dixon test is shown in Table A.1, and the critical value table of Owen test is shown in Table A.2. Table A.1 Critical value table of Dixon test
Statistical maximum
r(a) -Ie
D, a) - Z(t-2)
atn, —z)
Statistic
Critical value table of Owen test
— () — 2(1)
GB/T 6380—-2008
GB/T 6380—2008
References
[1Ma Fengshi, Xu Qizhou. Outlier test for extreme value distribution[J]. Mathematical Statistics and Applied Probability, 1986, 1(1). 81-91[2] FE Heliang, Testing methods for abnormal data of polar and Weibull distributions LJ]. Journal of Applied Mathematics, 1998, 21 (4. 549-561.
L3] FE Grubbs. Sample criterion lor testing ohservation. Statistics. Annals of MathematicalStatisticstJl, 1950, 21. 27-58.[4] JO Irwin. On a critcrion for the rejection of Outlying ubservalions[J_. IBiomctrics, 1925, 17. 238-250
WJ Dixon. Analysis of extrcmc value, Annals of Mathematical Statistics, 1950, 21.[5]
[6] WJ Dixon. Processing data for outliers, Biometrics, 1953, 5(1). 74-89.656=De.9s(11), so z(:1)=-4.09 is judged as an outlier, that is, 4.09 in the original data is judged as an outlier. For the detected outlier r(1=4.09, the elimination level α=0.01 is determined, and the critical value Dssg(11)=0.748 is found in Appendix A Table A.1, which is used for D:1-0.6750.748=Da.=(11), so it is judged that (11=-4.09 is not a statistical outlier (that is, (11)=4.09 is an outlier, that is, 4.09 in the original data is judged to be an outlier). Then continue to test the remaining 10 data. At this time, the sample size becomes 10, and the value of the statistic D) is calculated according to formula (1): Dig)-(a2
(-17. 31)-(-62. 16)
2(10)-x(1>-(- 17. 31)-(88. 01)0.634
The detection level α is still taken as 0. 05, in Appendix A Table A. 1, find the critical value Ds, 5 (10) = 0. 676, because D. U. 634 <0.676-D.% (10), it is judged that D<15) = —17.31 is an outlier (that is, it is not found that 17.31 in the original data is an outlier). So far, the whole test stops.
GB/T 6380—2008
Appendix A
(Normative Appendix)
Critical value table
The critical value table of Dixon test is shown in Table A.1, and the critical value table of Owen test is shown in Table A.2. Table A.1 Critical value table of Dixon test
Statistical maximum
r(a) -Ie
D, a) - Z(t-2)
atn, —z)
Statistic
Critical value table of Owen’s test
— () — 2(1)
GB/T 6380—-2008
GB/T 6380—2008
References
[1Ma Fengshi, Xu Qizhou. Outlier test for extreme value distribution[J]. Mathematical Statistics and Applied Probability, 1986, 1(1). 81-91[2] FE Heliang, Testing methods for abnormal data of polar and Weibull distributions LJ]. Journal of Applied Mathematics, 1998, 21 (4. 549-561.
L3] FE Grubbs. Sample criterion lor testing ohservation. Statistics. Annals of MathematicalStatisticstJl, 1950, 21. 27-58.[4] JO Irwin. On a critcrion for the rejection of Outlying ubservalions[J_. IBiomctrics, 1925, 17. 238-250
WJ Dixon. Analysis of extrcmc value, Annals of Mathematical Statistics, 1950, 21.[5]
[6] WJ Dixon. Processing data for outliers, Biometrics, 1953, 5(1). 74-89.656=De.9s(11), so z(:1)=-4.09 is judged as an outlier, that is, 4.09 in the original data is judged as an outlier. For the detected outlier r(1=4.09, the elimination level α=0.01 is determined, and the critical value Dssg(11)=0.748 is found in Appendix A Table A.1, which is used for D:1-0.6750.748=Da.=(11), so it is judged that (11=-4.09 is not a statistical outlier (that is, (11)=4.09 is an outlier, that is, 4.09 in the original data is judged to be an outlier). Then continue to test the remaining 10 data. At this time, the sample size becomes 10, and the value of the statistic D) is calculated according to formula (1): Dig)-(a2
(-17. 31)-(-62. 16)
2(10)-x(1>-(- 17. 31)-(88. 01)0.634
The detection level α is still taken as 0. 05, in Appendix A Table A. 1, find the critical value Ds, 5 (10) = 0. 676, because D. U. 634 <0.676-D.% (10), it is judged that D<15) = —17.31 is an outlier (that is, it is not found that 17.31 in the original data is an outlier). So far, the whole test stops.
GB/T 6380—2008
Appendix A
(Normative Appendix)
Critical value table
The critical value table of Dixon test is shown in Table A.1, and the critical value table of Owen test is shown in Table A.2. Table A.1 Critical value table of Dixon test
Statistical maximum
r(a) -Ie
D, a) - Z(t-2)
atn, —z)
Statistic
Critical value table of Owen’s test
— () — 2(1)
GB/T 6380—-2008
GB/T 6380—2008
References
[1Ma Fengshi, Xu Qizhou. Outlier test for extreme value distribution[J]. Mathematical Statistics and Applied Probability, 1986, 1(1). 81-91[2] FE Heliang, Testing methods for abnormal data of polar and Weibull distributions LJ]. Journal of Applied Mathematics, 1998, 21 (4. 549-561.
L3] FE Grubbs. Sample criterion lor testing ohservation. Statistics. Annals of MathematicalStatisticstJl, 1950, 21. 27-58.[4] JO Irwin. On a critcrion for the rejection of Outlying ubservalions[J_. IBiomctrics, 1925, 17. 238-250
WJ Dixon. Analysis of extrcmc value, Annals of Mathematical Statistics, 1950, 21.[5]
[6] WJ Dixon. Processing data for outliers, Biometrics, 1953, 5(1). 74-89.
Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.