Skip to main content

Runs Test for Detecting Non-randomness

Runs Test for Detecting Non-randomness

Purpose:
Detect Non-Randomness
The runs test (Bradley, 1968) can be used to decide if a data set is from a random process. A run is defined as a series of increasing values or a series of decreasing values. The number of increasing, or decreasing, values is the length of the run. In a random data set, the probability that the (I+1)th value is larger or smaller than the Ith value follows a binomial distribution, which forms the basis of the runs test.
Typical Analysis and Test Statistics The first step in the runs test is to count the number of runs in the data sequence. There are several ways to define runs in the literature, however, in all cases the formulation must produce a dichotomous sequence of values. For example, a series of 20 coin tosses might produce the following sequence of heads (H) and tails (T).
    H H T T H T H H H H T H H T T T T T H H
The number of runs for this series is nine. There are 11 heads and 9 tails in the sequence.
Definition We will code values above the median as positive and values below the median as negative. A run is defined as a series of consecutive positive (or negative) values. The runs test is defined as:
H0: the sequence was produced in a random manner
Ha: the sequence was not produced in a random manner
Test Statistic: The test statistic is
    Z = (R - Rbar)/sR
where R is the observed number of runs, R, is the expected number of runs, and sR is the standard deviation of the number of runs. The values of R and sR are computed as follows:
    Rbar sR
where n1 and n2 are the number of positive and negative values in the series.
Significance Level: α
Critical Region: The runs test rejects the null hypothesis if
    |Z| > Z1-α/2
For a large-sample runs test (where n1 > 10 and n2 > 10), the test statistic is compared to a standard normal table. That is, at the 5 % significance level, a test statistic with an absolute value greater than 1.96 indicates non-randomness. For a small-sample runs test, there are tables to determine critical values that depend on values of n1 and n2 (Mendenhall, 1982).
Runs Test Example
A runs test was performed for 200 measurements of beam deflection contained in the LEW.DAT data set.
 
H0:  the sequence was produced in a random manner
Ha:  the sequence was not produced in a random manner  

Test statistic:  Z = 2.6938
Significance level:  α = 0.05
Critical value (upper tail):  Z1-α/2 = 1.96 
Critical region: Reject H0 if |Z| > 1.96 
Since the test statistic is greater than the critical value, we conclude that the data are not random at the 0.05 significance level.
Question The runs test can be used to answer the following question:
  • Were these sample data generated from a random process?
Importance Randomness is one of the key assumptions in determining if a univariate statistical process is in control. If the assumptions of constant location and scale, randomness, and fixed distribution are reasonable, then the univariate process can be modeled as:
    y(i) = A0 + E(i)
where Ei is an error term. If the randomness assumption is not valid, then a different model needs to be used. This will typically be either a times series model or a non-linear model (with time as the independent variable).
Related Techniques Autocorrelation
Run Sequence Plot
Lag Plot
Case Study Heat flow meter data
Software Most general purpose statistical software programs support a runs test. Both Dataplot code and R code can be used to generate the analyses in this section.

Comments

Popular posts from this blog

Double exponential distribution

Double Exponential Distribution Probability Density Function The general formula for the  probability density function  of the double exponential distribution is where   is the  location parameter  and   is the  scale parameter . The case where   = 0 and   = 1 is called the  standard double exponential distribution . The equation for the standard double exponential distribution is Since the general form of probability functions can be  expressed in terms of the standard distribution , all subsequent formulas in this section are given for the standard form of the function. The following is the plot of the double exponential probability density function. Cumulative Distribution Function The formula for the  cumulative distribution function  of the double exponential distribution is The following is the plot of the double exponential cumulative distribution function. Percent Point Function The formula for the  percent point function  of the double exponential distribution

Basics of Sampling Techniques

Population                A   population   is a group of individuals(or)aggregate of objects under study.It is also known as universe. The population is divided by (i)finite population  (ii)infinite population, (iii) hypothetical population,  subject to a statistical study . A population includes each element from the set of observations that can be made. (i) Finite population : A population is called finite if it is possible to count its individuals. It may also be called a countable population. The number of vehicles crossing a bridge every day, (ii) Infinite population : Sometimes it is not possible to count the units contained in the population. Such a population is called infinite or uncountable. ex, The number of germs in the body of a patient of malaria is perhaps something which is uncountable   (iii) Hypothetical population : Statistical population which has no real existence but is imagined to be generated by repetitions of events of a certain typ