Statistical interpretation of data;Determination of a statistical tolerance interval
Some standard content:
National Standard of the People's Republic of China
Statistical processing and interpretation of data
Determination of a statistical tolerance interval
Statistical interpretation of dataDetermination of a statistical tolerance intervalUDC 519 (083.4)
GB3359-82
This standard specifies the method for determining statistical tolerance intervals based on samples. A statistical tolerance interval is an interval that contains at least a specified proportion of the population at a given confidence level. A statistical tolerance interval can be two-sided or one-sided, and the endpoints of the interval are called statistical tolerance limits, also known as the natural limits of the process.
1 Introduction
1.1 The method given in this standard is only applicable when the sampling units are randomly selected from the population under consideration and are independent of each other; the characteristics of the population follow a normal distribution. The requirement for normality is more important here than when inferring the mean and the difference between the means. 1.2 When the assumption of normality is rejected or there is reason to doubt its validity, the variable can be transformed into a normal one, or the method described in the introduction to Appendix A of this standard can be used.
It is possible to determine the statistical tolerance interval for non-normal distribution forms by other methods, but these methods are not given in this standard. Only a simple case is given in Appendix A.
1.3 When determining the statistical tolerance interval, all information related to the data source and collection method should be given, especially the smallest unit or significant figures with practical significance, which will help statistical analysis. 1.4 Individual suspicious data should not be arbitrarily eliminated or corrected unless there are obvious reasons for elimination or correction in terms of experiment, technology or other aspects.
In each case, the data to be eliminated or corrected should be explained. 1.5 As mentioned in 1.1, the confidence level 1-α is the probability that the statistical tolerance interval contains at least P of the population proportion. The risk that this interval contains less than p of the population proportion is α. The values of 1-α are usually taken as 0.95 and 0.99 (i.e., α = 0.05 and α = 0.01). If many samples determine many statistical tolerance intervals (one interval for each sample) at the same confidence level of 0.95, the proportion of those intervals that contain at least the required proportion of the population is close to 95%. 1.6 When the standard deviation of the population is known (the mean is unknown), use Tables 1 and 2. When both the mean and standard deviation are unknown, use Tables 3 and 4.
When both the mean μ and the standard deviation α are known, the distribution of the characteristic under investigation (assuming it is normal) is completely determined, so that the proportion of the population is exactly p to the right of μup or to the left of μ+up (one-sided interval), or between -1. and μ+p (two-sided interval). Here, up is the p quantile of the standard normal variable, and the data for up are given in the last row of Tables B1 and B2. 1.7 Calculations can often be simplified by changing the origin or units. This standard is formulated with reference to the international standard ISO3207 "Statistical interpretation of data - determination of statistical tolerance intervals" (1975, the first edition).
Published by the National Bureau of Standards on December 30, 1982
Implementation on January 1, 1984
Technical characteristics of the population
Technical characteristics of the sampling unit
Observations that were eliminated
GB3359-82
Table 1 One-sided statistical tolerance interval (variance is known)*Statistical items
Sample size:
Sum of observations1:
Known value of the population variance:
Standard deviation,
Population proportion selected for the statistical tolerance interval: p=
Confidence level:
kr(n,p,l-α)=
a. The probability that the proportion of the population is at least P is 1-α for the one-sided interval "to the left"
below L, and the probability that the proportion of the population is at least P is 1-α. Ls=r+ki(n,p, 1-a)a=
b. The probability that the one-sided interval "to the right"
above L, and the probability that the proportion of the population is at least p is 1-α. Li= -ki(n, p,1-α)=
*See Example 1 of this standard.
(n,p,1α)bzxZ.net
For different n and = 0.90, 0.95, 0.99 and 1-α = 0.95, 0.99, the values of h(n,, 1α) can be obtained from Table B1. 2
Technical characteristics of the population
Technical characteristics of sampling units
Obtained observations
Statistical items
Sample size:
Sum of observations:
Known value of population variance:
Standard deviation
GB 3359—82
Table 2 Two-sided statistical tolerance interval (variance known)*Count
k,(n,p,-α))α=*
Population proportion selected for statistical tolerance interval: p
Confidence level:
ki(n,p,1-α)
The probability that the population is contained in the limits L, and the proportion between L** is at least P is 1-α. L;-i(n,p,lα)α
Li=+k(n,p,l-α)a
*See Example 2 of this standard.
*For different n and =0.90, 0.95, 0.99 and 1-a=0.95, 0.99, the value of, =(n,p,1-α) can be obtained from Table B2.
**These limits are symmetric to, but not "symmetric in probability". Therefore, it cannot be said that "at the confidence level 1-α, no more than (1-p)/2 of the population is below L, and no more than (1-p)/2 is above L". 3
Technical characteristics of the population
Technical characteristics of the sampling units
Observations eliminated
GB 3359--82
Table 3 One-sided statistical tolerance interval (variance unknown)*Statistical items
Sample size,
Sum of observations:
(3 -)2
Sum of squares of observations
Popular proportion selected for statistical tolerance interval, pn
Confidence level:
hz(n, p, la)
One-sided interval to the left
(estimation of standard deviation α)
kz(n, p1-α)S=**
Under the following L., the probability that the proportion of the population is at least is 1-α. L, =+hz(n, p,1 -a) s=
One-sided interval "to the right"
The probability that the proportion of the population is at least 1α is above the following L. Li=-kz(n,p,l-α)S
*See Example 3 of this standard.
**For different sums p=0.90, 0.95, 0.99 and 1~α=0.95, 0.99, the values of k2(n,P,1~α) can be obtained from Table B3.
Technical characteristics of the population
Technical characteristics of sampling units
Obviously eliminated observations
Statistical items
Sample size:
Sum of observations:
Sum of squares of observations
GB3359—82
Table 4 Two-sided statistical tolerance interval (variance unknown)*Calculation
M(r, -x)?
Population proportion selected for statistical tolerance interval: p=
Confidence level:
k, (n, p, l-α) =
(Estimate of standard deviation α)
z(n, p, l-α)S =**
The probability that the population is contained in the limit L, and the proportion between L*** is at least P is 1-α. L,=-k(n,p,1-α)S=
Ls+h(n,p,-a)S=
The observed values of the breaking load of cotton yarn are as follows (in hundredths of Newton): 228.6, 232.7, 238.8, 317.2, 315.8, 275.1, 222.2, 236.7, 224.7, 251.2, 210.4, 270.7. These 12 observations are from 12,000 spools of the same batch. They are packed in 120 boxes, each containing 100 spools. 12 boxes are randomly selected from the batch and one spool is randomly selected from each box. The test yarn is a 50 cm long cotton yarn cut from each spool at a distance of about 5 m from the outer end of the thread. The test is carried out on the center of these test yarns. Experience has shown that the breaking load measured under these conditions actually follows a normal distribution. The calculation results are as follows:
Sample size:
Sum of observations:
*See Example 4 of this standard.
i = 3024.1
**For different n and p = 0.90, 0.95, 0.99 and 1-α = 0.95, 0.99, the values of kz (n,, 1α) can be obtained from Table B4. ***These limits are called to, but they are not "symmetric according to probability". Therefore, it cannot be said that "under the confidence level 1-α, the proportion of the population that does not exceed (1-β)/2 is below L; the proportion that does not exceed (1-)/2 is above" 5
Mean:
Sum of squares of observations:
Sum of squares of differences from the mean:
Estimate of variance:
Estimate of standard deviation:
GB 3359--82
Z?=775996.09
(i)2
It can also be seen from the experiment that in the same batch of cotton yarn, the distribution of breaking load is very close to the normal distribution. Example 1: One-sided statistical tolerance interval (variance is known, see Table 1). Assume that the observations obtained previously show that for different batches of cotton yarn with the same raw materials, although the mean is changing, the variance remains unchanged, and its standard deviation = 33.15.
Calculate L so that it can be concluded with a confidence level of 1~α=0.95 that the breaking load observed under the same conditions is at least 0.95 (i.e. 95%) above L. Table B1 gives:
k (12,0.95,0.95)=2.12
From this we get:
L:=-kg=252.0-2.12×33.15
Of course, if a higher overall proportion (such as 99%) is taken, a smaller limit L; can be obtained. Example 2: Two-sided statistical tolerance interval (variance is known, see Table 2). Under the same conditions as Example 1, calculate L; and Ls so that it can be concluded with a confidence level of 1~α=0.95 that at least p=0.90 (90%) of the breaking load of this batch of cotton yarn falls between Li and Ls. Table B2 gives:
k(12,0.90,0.95)=1.89
From this we get:
L; = ≤ k,g = 252.0 -1.89 × 33.15 =189.3Ls=± +kja =252.0 +1.89×33.15=314.7The following misunderstanding must be eliminated: "At most 5% of the population falls to the left of L, and at most 5% falls to the right of L". For example, in Example 1, no more than 5% of the population falls to the left of Li, L; is 181.7. Example 3: One-sided statistical tolerance interval (variance is unknown, see Table 3). Assume that the standard deviation of the population is unknown and must be estimated from the sample. Use the same conditions as the case where the standard deviation is known (Example 1), namely p=0.95 and 1-α=0.95.
Overall technical characteristics
GB 3359—82
A batch of cotton yarn consists of 12,000 spools. These spools are packed in 120 boxes, 100 in each box. Technical characteristics of sampling units
12 boxes are randomly selected from the batch of cotton yarn, and then spools are randomly selected from each box. On each spool, 50 cm of test yarn is cut from about 5 meters away from the outer end of the thread. The test is carried out on the center part of these test yarns. Observations excluded: None
Statistical items
Sample size:
Sum of observations:
Zxi=3 024.1
Sum of squares of observations:
Zr = 775 996.09
Population proportion selected for statistical tolerance interval p = 0.95 (95%)
Confidence level:
1 → α = 0.95
, k2(12, 0.95, 0.95) = 2.74 (derived from Table B3) Result
(x; -)2
775 996.09
α()2/n
(3 024,1) 2 / 12
=S =V/1 263.4 = 35.5
kz(12, 0.95, 0.95)S=97.3
With a confidence level of 0.95, it is concluded that at least 0.95 (95%) of the breaking loads of this batch of cotton yarn are above LL; =252.02.74 × 35.5 = 154.7
It must be noted that the value of L; is smaller than the value of L: in Example 1 (where the variance is known) because S is used to estimate α, which makes the coefficient higher (2.74 instead of 2.12)
Example 4: Two-sided statistical tolerance interval (variance is unknown, see Table 4) Under the same conditions as Example 3, L; and Ls should be calculated so that it can be concluded with a confidence level of 1-α=0.95 that at least p=0.90 (90%) of the breaking loads of this batch of cotton yarn fall between L: and Ls. Table B4 gives:
From this we obtain:
k, (12, 0.90, 0.95) = 2.66
L; = ≤ -k, S = 252.0-2.66×35.5= 157.6
Ls = ± +h, S = 252.0 +2.66×35.5=346.4It should be noted that the value of L; is smaller than 2, and the value of Ls is larger than 2. The reason is that when S is used to estimate r, the coefficient is higher (2.66 instead of 1.89).
The reason for the widening of the interval is that we have to pay the price of not knowing the standard deviation α of the population. Of course, if we are not completely confident that the value α = 33.15 used in Examples 1 and 2 is correct, then it may be appropriate to use S to estimate in Examples 3 and 4. GB3359--82
Appendix A
Case of arbitrary distribution
(Supplement)
This appendix specifies the method of using sample extreme values (minimum and maximum) m and M. A.1 Introduction
When a simple random sample of sample size n follows a continuous distribution, information about the spread of the distribution can be obtained from the sample minimum value ㎡ and the maximum value "M or from only one of them without other assumptions. In addition to the sample extreme value, other order statistics can also be used, but they are not given in this Appendix A. A.1.1 One-sided case
There is the following relationship between the sample size n, the confidence level 1-α and the proportion p of the population above m (or below m): p\=a
If n and p are fixed, this relationship can be used to calculate 1- α. The probability that the proportion of the population that falls above r (or below αm) is at least p is not less than 1-α.
If n and 1-α are taken, P can be calculated from this relationship. The probability that the proportion of the population that falls above αm (or below αm) is at least p is not less than 1-α.
If p and 1-α are taken, the minimum sample size n can be determined from this relationship. Therefore, with a confidence level of not less than 1-α, it can be concluded that the proportion of the population that falls above the sample minimum (or below the maximum) of sample size n is at least P. A.1.2 Two-sided case
In the sample There is a relationship between the sample size n, the population proportion p that falls between m and M, and the confidence level 1α: npa-1_(n-1)p\=α
If n and p are fixed, 1-α can be calculated from this relationship. The probability that the proportion of the population that falls between ~m and M is at least P is not less than α.
If n and 1-α are fixed, p can be calculated from this relationship. The probability that the proportion of the population that falls above m (or below m) is at least p is not less than 1a.
If p and 1-α are fixed, the minimum sample size n can be determined from this relationship so that It can be concluded with a confidence level of not less than 1-α that the proportion of the population falling between the minimum and maximum values of the sample size n is at least P. A.2 Example
A fatigue test of rotating stress is carried out on the components of an aircraft engine. The sample size is 15. The observed values of endurance are arranged in order from small to large into the following table:
GB335982
The graph test shows that the normality hypothesis of the component population is rejected. Therefore, the method of determining the statistical tolerance interval in Examples 3 and 4 is not suitable for the sample extreme value:
m=0.200, xm=8.800
Confidence level:
1 -α= 0.95
A.2.1 What is the maximum proportion of the population that falls below m = 0.200? For 1-α = 0.95, Table B5 gives a value of p that is slightly above 0.75 (75%) for the minimum proportion of the population that falls above m. Therefore, the value of 1-P for the maximum proportion of the population that falls below m is slightly below 0.25 (25%). From Nomogram 1, P is about 0.82. A.2.2 What sample size must be taken to conclude, with a confidence level of 0.95, that at least a proportion of the population of parts falls below the maximum value of the sample by P = 0.90 (90%)?
For 1-α = 0.95 and p = 0.90, Table B5 gives n = 29. A.2.2.1 What is the minimum proportion of the population of parts that falls between m = 0.200 and M = 8.800 at a confidence level of 0.95? For 1α = 0.95 and n = 15, the p value given in Table B6 is slightly less than 0.75 (75%). Nomogram 2 shows that p is about 0.72.
A.2.2.2 How large a sample size must be taken to conclude with a confidence level of 0.95 that at least a proportion of the population of parts falls between the minimum and maximum values of the sample with p = 0.90 (90%)? For 1-α=0.95 and p=0.90, Table B6 gives n=46. 9
GB3359-82
Appendix B
Statistics:
(Supplement)
One-sided statistical tolerance interval, α is known, μ is unknown s+kra or -g
Value of coefficient (, p, -α)
1 -a= 0.95
1 - α = 0, 99
GB3359-82
Table B2 Two-sided statistical tolerance interval, α is known, μ is unknown! ±kig
Value of coefficient (n, p, 1-α)
1-α=0.95
1 -α=0.99
GB3359--82
Table B3 One-sided statistical tolerance interval, μ, α unknown +k, S or to -kzS,
Value of coefficient 2 (n, p, l-α)
1-α=0.95
Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.