title>Statistical interpretation of data; Detection and handling of outlying observations in the sample of type I extreme value distribution - GB 6380-1986 - Chinese standardNet - bzxz.net
Home > GB > Statistical interpretation of data; Detection and handling of outlying observations in the sample of type I extreme value distribution
Statistical interpretation of data; Detection and handling of outlying observations in the sample of type I extreme value distribution

Basic Information

Standard ID: GB 6380-1986

Standard Name:Statistical interpretation of data; Detection and handling of outlying observations in the sample of type I extreme value distribution

Chinese Name: 数据的统计处理和解释 I型极值分布样本异常值的判断和处理

Standard category:National Standard (GB)

state:Abolished

Date of Release1986-05-13

Date of Implementation:1987-05-01

Date of Expiration:2009-01-01

standard classification number

Standard ICS number:Sociology, Services, Organization and management of companies (enterprises), Administration, Transport>>Quality>>03.120.30 Application of statistical methods

Standard Classification Number:Comprehensive>>Basic Subjects>>A41 Mathematics

associated standards

alternative situation:Replaced by GB/T 6380-2008

Publication information

publishing house:China Standards Press

Publication date:1987-05-01

other information

Release date:1986-05-13

Review date:2004-10-14

drafter:Ma Fengming, Xu Qizhou, Shi Daoji

Drafting unit:Data Processing and Interpretation Subcommittee Working Group

Focal point unit:National Technical Committee for Application of Statistical Methods and Standardization

Proposing unit:National Technical Committee for Application of Statistical Methods and Standardization

Publishing department:National Bureau of Standards

competent authority:National Standardization Administration

Introduction to standards:

This standard specifies the general principles and implementation methods for judging and handling outliers in random samples from a type I extreme value distribution population. GB 6380-1986 Statistical processing and interpretation of data Judgment and handling of outliers in samples from type I extreme value distribution GB6380-1986 Standard download decompression password: www.bzxz.net
This standard specifies the general principles and implementation methods for judging and handling outliers in random samples from a type I extreme value distribution population.


Some standard content:

1 Introduction
National Standard of the People's Republic of China
Statistical interpretation of data -Detection and handiing of outlyingobservations in the sample of type Iextreme value distribution
UDC 519.25
GB 6380-86
1.1: This standard specifies the general principles and implementation methods for detecting and handling outliers in random samples of type I extreme value distribution.
Type I extreme value distribution is also called Gumbel distribution. Its distribution function is F (r) = exp(-eu)
and the probability density function is
f(r)-exp(--e\\)
where: y=(r-α)/b, b>0,
when α=0, b1, the probability density function curve is as follows t (a)bZxz.net
Type I extreme value distribution has a wide range of applications in many fields such as hydrology, meteorology, earthquake and reliability. 1.2 An outlier (or an abnormal observation) refers to an individual value in a sample whose value is significantly deviated from the rest of the observations in the sample to which it (or they) belong.
An outlier may be an extreme manifestation of the inherent random variability of the population. This outlier and the rest of the observations in the sample belong to the same population. Outliers may also be the result of accidental deviations in test conditions and test methods, or errors in observation, calculation, and recording. Such outliers do not belong to the same population as the rest of the observed values ​​in the sample. 1.3 For other statistical terms used in this standard, see GB 3358-82 "Statistical Terms and Symbols". 1.4 Application conditions: When there is sufficient theoretical basis or experience to ensure that the observed values ​​in the sample under test, except for individual outliers, the majority of the remaining values ​​are called sample subjects, which come from the same type I extreme value distribution population. When the sample subjects come from the same type I minimum distribution, the sample subjects obtained after transformation Z = -X can be considered as type I extreme value distribution. The distribution function of type I minimum distribution is F (α) = 1-exp (-e)
Its probability density function is
Where: = (α-a)/b, b0,
f(α)
0.4h under the probability density curve when α=0, b=1
2 Statistical principles for judging outliers
Texp(ye\)
2.1 When implementing this standard, an upper limit on the number of outliers detected in the sample should be specified (accounting for a small proportion of the number of sample observations). When this upper limit is exceeded, the representativeness of the sample should be carefully studied and handled. 2.2 Test rules for semi-judgment of single outliers According to actual conditions, select appropriate outlier test rules (see 4 of this standard), and specify the significance level α of the statistical test for detecting outliers, referred to as the detection level. According to α and the number of observations n determine the critical value of the statistic. Substitute each observation value into the statistic given by the test rule. If the value obtained exceeds the critical value, the maximum observation value to be tested is judged to be an outlier. Otherwise, it is judged that there is no outlier. The detection level should be 1% or 5%.
2.3 Test rules for judging multiple outliers
When the number of detected outliers is allowed to be greater than 1, the method specified in this standard is to repeatedly use the same test rule for judging a single outlier: that is, use the specified detection level and the test rules specified in 2.2 to test all observations. If no outliers are detected, the entire test stops; if an outlier is detected, the same detection level and the same rules are used to remove the detected outliers. 247
GB 6380-86
The remaining observations after the test continue to be tested... until no outliers are detected, or the number of detected outliers exceeds the upper limit. 3 General rules for handling outliers
3.1 For outliers detected by statistical methods, the technical and physical reasons for the outliers should be sought as much as possible as a basis for handling outliers.
3.2 The ways to handle outliers are:
The outliers are retained in the sample and participate in the subsequent data analysis; outliers are allowed to be eliminated, that is, outliers are eliminated from the sample, and appropriate observations are added to the sample; outliers are corrected when the actual reasons are found. 3.3 The user of the standard should weigh the cost of finding the cause of outliers, correctly judge the benefits of outliers and the risk of erroneously eliminating normal observations according to the nature of the actual problem, and determine to implement one of the following three rules: a. For any outlier, if there is no sufficient explanation of the technical and physical reasons for its abnormality, it shall not be eliminated or corrected. b. In addition to those with sufficient technical and physical reasons for their abnormality, they can be eliminated or corrected. In addition, if they are highly abnormal in statistics, they can also be eliminated or corrected. The meaning of being highly abnormal in statistics is: specify a significance level α* for the statistical test to test whether the outlier is highly abnormal, referred to as the elimination level, which is less than the detection level α. When implemented, after the test is carried out according to the provisions of 2.2, the detection level α is replaced by the elimination level α* according to the provisions of 2.2, and the detected outlier is tested. If this test is significant under the elimination level α*, then this outlier is highly abnormal. In the case of repeated use of the same test rule, each time an outlier is detected, it must be tested again to see if it is highly abnormal under the elimination level. If the outlier detected in a certain test is highly abnormal, then this outlier and the outlier detected before it can be eliminated or corrected.
Except for special circumstances, the elimination level should generally be 1%, and should not be greater than 5%. In the case of selecting the elimination level, the detection level can be 5% or slightly larger. c. All detected outliers can be eliminated or corrected. 3.4 The detected outliers and the reasons for elimination and correction should be recorded for reference. 4 Rules for judging and handling outliers
4.1 This standard stipulates: When the sample size is 5n≤30, the Dixon test method is used, and when the sample size is 30n50, the Irwin test method is used. 4.2 Dixon test method - sample size 5≤n≤30 4.2.1 Test steps
Select the minimum observation value X(1), the maximum observation value X(), the second largest observation value X(n-1) and the ath largest observation value X(α-2) from the sample observation value t, calculate the statistic X(n)-X(r-1)
X(n)-X(t)
X(m) - X(n-2)
X(n)-X(1)
5≤n30
Confirm the detection threshold a, and find the critical value D corresponding to n,α from Appendix 1. b.
When DD,(n), judge the maximum observation value X(n) as an outlier, otherwise it cannot be judged as an outlier. c.
Given the rejection water α*, find the critical value D\a*(n) corresponding to n, α* from Appendix 1. When DD*(n), X(n) is judged to be highly abnormal, otherwise it is judged to be an abnormal value without highly abnormal. 248
4.2.2 Example
GB6380—86
Start the shearing machine to cut steel materials, and record the length of the first 100 steel materials cut at each mouth. Example 1 (Example of using Dickson type test method) for - batch data. Six batches were recorded in - week, and the daily maximum value data were obtained as follows: (unit: mm) 321.46, 319.62, 320.44, 319.51, 329.73, 320.41. Now check whether the maximum value is abnormal. According to experience, it can be considered that the above sample abundances come from the same type 1 extreme value distribution. From these observations, the minimum value is X(1)-319.51, the maximum value is X(6)=329.73, and the second largest value is X(5)=321.46. For n=6, calculate the statistic D=rio
(6)- X (5)-=329.73-321.46
X(6)-X(1)
329.73-319.51
. Take the detection level α=5%, and check the Appendix 1 to get the critical value of D Do.95(6)=0.681. Since D=0.809-0.681-Dg(6), it is judged that X(6)-329.73 is an abnormal value. Taking the elimination level α*=1%, we can get Do.g(6)=0.796 according to Appendix 1, and 1D: Do.s(6), so we judge that X(()=329.73 is highly abnormal. After verification, it is found that this data is misrecorded, and the actual value is 319.73. Example 2 (Example of repeatedly using the Dixon test method to judge multiple abnormal values) 11 samples are randomly selected from a certain insulating material and life tests are carried out under certain conditions. The failure times are (unit: h) 4.09, 17.31, 60.78, 62.16, 64.15, 70.67, 71.85, 75.50, 79.35, 80.00, 88.01. Now check whether 4.09 and 17.31 are abnormal. Theoretical work shows that the life T of this insulating material obeys the type I minimum distribution. Therefore, the sample obtained after the transformation X-T can be considered as a type I extreme value distribution, that is, for the sample of type I extreme value distribution X(1) = -88.01, X. 2) -80.00, = 62.16, X() = -60.78, X(10) = -17.31, X() = 4.09, check whether the maximum observation value X(m) and the second largest observation value X(10) are abnormal. Here n=11, and the calculated statistic D= r2n =
X(n) - X()
X(10) - X(1)
(-4.09) - (- 60.78)
(- 4.09) - (~- 88.01)
. Taking the detection level α=5%, and looking up Table 1, the critical value of D is Do.9s(11)=0.656. Since DDu.95(11), X(11)=4.09 is judged to be an outlier. Continue to test the remaining 10 data, still taking the detection level α=5, for n=10, calculate the statistic
X(10)-X(+) = (-17.31)-(-62.16)= 0.634D= r20=Xm)-X(1)
(- 17.31)-(- 88.01)
Look up Appendix 1 and get the critical value Do.95(10)=0.676, out of /):Dm.95(10), so X(w)=-17.31 cannot be considered as an abnormal value.
4.3 Owen-type test method
Sample size 30°m%50 case
4.3.1 Test steps
Select the minimum observation value X(1), the maximum observation value X(n), and the second largest observation value X(1) from the sample observations. a.
Calculate the statistic
X(n)-
E(Xi)-Xn)2
GB6380—86
Here the sum is calculated for all sample observations after excluding the minimum observation value X(1) and the maximum observation value X(\). b. Determine the detected water α, and find the critical value corresponding to n, α from Appendix 2 "-α (n). When I1i-α(n), the maximum observed value X(n) is judged to be an abnormal value, otherwise it cannot be judged as an abnormal value. C
d. Given the exclusion level α*, find the critical value 1-a(n) corresponding to n, α* from Appendix 2. When I11-a(n), X(n) is judged to be highly abnormal, otherwise it is judged to be an abnormal value without highly abnormal. 4.3.2 Series
Example 3 (Example using Owen's test method) The annual observation data of the annual maximum flow of a river in a certain place are as follows: (Unit: km3/s)
1.69, 1.22, 0.75, 1.26, 1.73, 1.74, 3.09, 1.57, 1.97, 2.23, 2.03, 1.58, 0.90, 2.40, 1.65,1.96,2.30,1.79,1.48,2.22,1.91,3.06,2.08,1.06,4.31,1.56,1.88,2.10,2.02,1.74,1.18,2.12,1.38,0.90,1.45,1.78,1.97,2.27,2.34,2.44. Now let’s check whether the maximum value X(40)=4.31 is abnormal. It can be considered that the annual observation data of the annual maximum flow rate approximately obeys the type 1 extreme value distribution. A brief arrangement of the above observation data shows that the maximum The minimum value is X(1)=0.75, the maximum value is X(40)=4.31 and the second largest value is X(39,=3.09. For all data after removing X(1) and Xah), first calculate S=0.502, and then calculate the value of the statistic I=In
Xe40)X(39)
4.31-3.09
. Take the detection level α=5%, and check Appendix 2 to get the critical value of I 0.95(40)=2.84. Since 1a.95(40), X(40)=4.31 cannot be considered an outlier.
X(n)- X(a-1)
X(n)-Xa)
X(n)-X n-2)
X(n)-X u)
GB 6380-86
Appendix A
(Supplement)
Critical value table of Dixon type test method
GB6380-86
Critical value table of Owen type test method
Note: This table is the result of statistical simulation M=10000 times on DPS8-45 computer. Additional remarks:
This standard is proposed by the National Technical Committee for the Application of Statistical Methods. This standard was developed by the Working Group of the Data Processing and Interpretation Subcommittee of the National Technical Committee for the Application of Statistical Methods. The main drafters of this standard are Ma Fengshi, Xu Qizhou and Shi Daoji. 232The maximum observed value X(n) is judged to be an outlier, otherwise it cannot be judged as an outlier. C
d. When the elimination level α* is given, find the critical value 1-a(n) corresponding to n, α* from Appendix 2. When I11-a(n), X(n) is judged to be highly abnormal, otherwise it is judged to be an outlier without highly abnormal. 4.3.2 Series
Example 3 (Example using Owen's test method) The annual observation data of the maximum annual flow of a river in a certain place are as follows: (Unit: km3/s)
1.69, 1.22, 0.75, 1.26, 1.73, 1.74, 3.09, 1.57, 1.97, 2.23, 2.03, 1.58, 0.90, 2.40, 1.65,1.96,2.30,1.79,1.48,2.22,1.91,3.06,2.08,1.06,4.31,1.56,1.88,2.10,2.02,1.74,1.18,2.12,1.38,0.90,1.45,1.78,1.97,2.27,2.34,2.44. Now check whether the maximum value X(40)=4.31 is abnormal. It can be considered that the annual observation data of the annual maximum flow rate approximately obeys the type 1 extreme value distribution. A brief review of the above observed data shows that the minimum value is X(1) = 0.75, the maximum value is X(40) = 4.31, and the second largest value is X(39, = 3.09. For all data after excluding X(1) and Xah), first calculate S = 0.502, and then calculate the value of the statistic I = In
Xe40)X(39)
4.31-3.09
. Taking the detection level α = 5%, the critical value of I is 0.95(40) = 2.84 from Appendix 2. Since 1a.95(40), X(40) = 4.31 cannot be considered an outlier.
X(n)- X(a-1)
X(n)-Xa)
X(n)-X n-2)
X(n)-X u)
GB 6380-86
Appendix A
(Supplement)
Critical value table of Dixon type test method
GB6380-86
Critical value table of Owen type test method
Note: This table is the result of statistical simulation M=10000 times on DPS8-45 computer. Additional remarks:
This standard is proposed by the National Technical Committee for the Application of Statistical Methods. This standard was developed by the Working Group of the Data Processing and Interpretation Subcommittee of the National Technical Committee for the Application of Statistical Methods. The main drafters of this standard are Ma Fengshi, Xu Qizhou and Shi Daoji. 232The maximum observed value X(n) is judged to be an outlier, otherwise it cannot be judged as an outlier. C
d. When the elimination level α* is given, find the critical value 1-a(n) corresponding to n, α* from Appendix 2. When I11-a(n), X(n) is judged to be highly abnormal, otherwise it is judged to be an outlier without highly abnormal. 4.3.2 Series
Example 3 (Example using Owen's test method) The annual observation data of the maximum annual flow of a river in a certain place are as follows: (Unit: km3/s)
1.69, 1.22, 0.75, 1.26, 1.73, 1.74, 3.09, 1.57, 1.97, 2.23, 2.03, 1.58, 0.90, 2.40, 1.65,1.96,2.30,1.79,1.48,2.22,1.91,3.06,2.08,1.06,4.31,1.56,1.88,2.10,2.02,1.74,1.18,2.12,1.38,0.90,1.45,1.78,1.97,2.27,2.34,2.44. Now check whether the maximum value X(40)=4.31 is abnormal. It can be considered that the annual observation data of the annual maximum flow rate approximately obeys the type 1 extreme value distribution. A brief review of the above observed data shows that the minimum value is X(1) = 0.75, the maximum value is X(40) = 4.31, and the second largest value is X(39, = 3.09. For all data after excluding X(1) and Xah), first calculate S = 0.502, and then calculate the value of the statistic I = In
Xe40)X(39)
4.31-3.09
. Taking the detection level α = 5%, the critical value of I is 0.95(40) = 2.84 from Appendix 2. Since 1a.95(40), X(40) = 4.31 cannot be considered an outlier.
X(n)- X(a-1)
X(n)-Xa)
X(n)-X n-2)
X(n)-X u)
GB 6380-86
Appendix A
(Supplement)
Critical value table of Dixon type test method
GB6380-86
Critical value table of Owen type test method
Note: This table is the result of statistical simulation M=10000 times on DPS8-45 computer. Additional remarks:
This standard is proposed by the National Technical Committee for the Application of Statistical Methods. This standard was developed by the Working Group of the Data Processing and Interpretation Subcommittee of the National Technical Committee for the Application of Statistical Methods. The main drafters of this standard are Ma Fengshi, Xu Qizhou and Shi Daoji. 232
Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.