Skip to main content

Frequency distribution



FREQUENCY DISTRIBUTION

Introduction:
Frequency distribution is a series when a number of observations  with  similar  or  closely  related  values  are  put  in separate bunches or groups, each group being in order of magnitude in a series.  It is simply a table in which the data are grouped into classes  and  the  number  of  cases  which  fall  in  each  class  are recorded.  It shows   the frequency of occurrence of different values of a single Phenomenon.
A frequency distribution is constructed for three main reasons:
1.   To facilitate  the analysis of data.
2.   To estimate  frequencies  of the unknown population
distribution  from the distribution of sample data and
3.   To facilitate   the computation of various   statistical
measures

Raw data:
The statistical data collected are generally raw data or ungrouped data.  Let us consider the daily wages (in Rs ) of 30 labourers in a factory.

80
70
55
50
60
65
40
30
80
90
75
45
35
65
70
80
82
55
65
80
60
55
38
65
75
85
90
65
45
75

The above figures are    nothing but raw or ungrouped data and they are recorded as they occur without any pre consideration. This representation of data does not furnish any useful information and is rather confusing to mind.  A better way to express the figures in an ascending or descending order of magnitude and is commonly known as array.  But this does not reduce the bulk of the data.  The above data when formed into an array is in the following form:

30
35
38
40
45
45
50
55
55
55
60
60
65
65
65
65
65
65
70
70
75
75
75
80
80
80
80
85
90
90

The  array  helps  us  to  see  at  once  the  maximum  and minimum values.   It also gives a rough idea of the distribution of the items over the range . When we have a large number of items, the formation of an array is very difficult, tedious and cumbersome. The  Condensation should be directed for better understanding and may be done in two ways, depending on the nature of the data.
a)  Discrete (or) Ungrouped frequency distribution:
In this form of distribution, the frequency refers to discrete value.        Here  the  data  are  presented  in  a  way  that  exact measurement of units are clearly indicated.
There  are  definite  difference  between  the  variables  of different groups of items.  Each class is distinct and separate from
the other class.   Non-continuity from one class to another class exist. Data as such facts like the number of rooms in a house, the
number  of  companies  registered  in  a  country,  the  number  of children in a family, etc.
The process of preparing this type of distribution is very simple.   We have just to count the number of times a particular
value is repeated, which is called the frequency of that class.   In order to facilitate counting prepare a column of tallies.
In  another  column,  place  all  possible  values  of  variable from the  lowest  to  the  highest.    Then put  a  bar  (Vertical line)
opposite the particular value to which it relates.
To facilitate counting, blocks of five bars    are prepared and some space is left in between each block.  We finally count the
number of bars and get frequency.
Example 1:
1
0
3
2
1
5
6
2
2
1
0
3
4
2
1
6
3
2
1
5
3
3
2
4
2
2
3
0
2
1
4
5
3
3
4
4
1
2
4
5

 
In a survey of 40 families in a village, the number of children per family was recorded and the following data obtained.







Represent  the  data  in  the  form  of  a  discrete  frequency distribution.

Solution:
Frequency distribution of the number of children



Number of
Children
Tally
Marks
Frequency
0

3
1

7
2

10
3

8
4

6
5

4
6

2

Total
40

b) Continuous frequency distribution:
In this form of distribution refers to groups of values.  This
becomes necessary in the case of some variables which can take any fractional value and in which case an exact measurement is not possible.  Hence a discrete variable can be presented in the form of a continuous frequency distribution.
Wage distribution of 100 employees

Weekly wages
(Rs)
Number of
employees
50-100
4
100-150
12
150-200
22
200-250
33
250-300
16
300-350
8
350-400
5
Total
100

Nature of class:
The  following  are  some  basic  technical  terms  when  a
continuous frequency distribution is formed or data are classified according to class intervals.

a)  Class limits:
The class limits are the lowest and the highest values that
can be included in the class.   For example, take the class   30-40. The lowest value of the class is 30 and highest class is 40.  The two boundaries of class are known as the lower limits and the upper limit of the class.   The lower limit of a class is the value below which there can be no item in the class.  The upper limit of a class is the value above which there can be no item to that class. Of the class 60-79, 60 is the lower limit and 79 is the upper limit, i.e. in the case there can be no value which is less than 60 or more than
79.   The way in which class limits are stated depends upon the nature of the data.   In statistical calculations, lower class limit is denoted by L and upper class limit by U.

b) Class Interval:
The  class  interval  may  be  defined  as  the  size  of  each grouping of data.  For example, 50-75, 75-100, 100-125are class intervals.    Each grouping begins with the lower limit of a class interval and ends at the lower limit of the next succeeding class interval

c) Width or size of the class interval:
The difference between the lower and upper class limits is called Width or size of class interval and is denoted by C’ .
d) Range:
The  difference  between largest  and  smallest  value  of the
observation  is called  The Range and is denoted by R ie
R = Largest value Smallest value
R  =  L - S
e) Mid-value or mid-point:
The central point of a class interval is called  the mid value or mid-point.  It is found out by adding the upper and lower limits of a class and dividing the sum by 2.

(i.e.) Midvalue =  L   U
2
For  example, if the class interval is 20-30 then the mid-value is

20   30
2
f) Frequency:

= 25

Number  of  observations  falling  within  a  particular  class interval is called frequency of that class.
Let  us  consider  the  frequency  distribution  of  weights  if
persons working in a company.

Weight
(in kgs)
Number of
persons
30-40
25
40-50
53
50-60
77
60-70
95
70-80
80
80-90
60
90-100
30
Total
420

I
n    the    above    example,    the    class    frequency    are
25,53,77,95,80,60,30.   The total   frequency is equal to 420.   The total   frequency   indicate   the   total   number   of   observations considered in a frequency distribution.

g) Number of class intervals:
The number of class interval in a frequency is matter of importance.  The number of class interval should not be too many. For an ideal frequency distribution,    the number of  class intervals can vary from 5 to 15.  To decide the number of class intervals for the frequency distributive in the whole data, we choose the lowest and the highest of the values.   The difference between them will enable us to decide  the class intervals.
Thus the number of class intervals can be fixed arbitrarily keeping in view the nature of problem under study or it can be

decided   with the help of Sturges’   Rule.   According to him, the number of classes can be determined by the formula
K = 1 + 3. 322 log10 N
Where  N   =  Total number of observations
log  = logarithm of the number
K    =  Number of class intervals.
Thus if the number of observation is 10, then the number of
class intervals is
K = 1 + 3. 322 log 10  = 4.322     4
If 100 observations are being studied, the number of class
interval is
K = 1 + 3. 322 log 100  =  7.644     8 and so on.

h) Size of the class interval:
Since   the   size   of   the   class   interval   is   inversely proportional to the number of class interval in a given distribution. The approximate value of the size (or width or magnitude) of the class interval C’ is obtained by using sturges rule as
Size of class interval = C =                   Range                Number of class interval
=            Range      
1+3.322 log10 N
Where Range = Largest Value smallest value in the distribution.

Types of class intervals:

There are three methods of classifying the data according to class intervals namely
a)  Exclusive method b)  Inclusive method
c)  Open-end classes

a) Exclusive method:
When the class intervals are so fixed that the upper limit of one class is the lower limit of the next class; it is known as the exclusive   method   of   classification.   The   following   data   are classified on this basis.


Expenditure
(Rs.)
No. of families
0 -  5000
60
5000-10000
95
10000-15000
122
15000-20000
83
20000-25000
40
Total
400

It is clear that the exclusive method ensures continuity of data as much as the upper limit of one class is the lower limit of the next  class.    In the  above  example,  there  are  so  families  whose expenditure is between Rs.0 and Rs.4999.99.   A family whose expenditure  is  Rs.5000  would  be  included  in  the  class  interval
5000-10000.  This method is widely used in practice.

b) Inclusive method:
In this method,  the overlapping of the class intervals   is avoided.  Both the lower and upper limits are included in the class interval.   This type of classification   may   be used for a grouped frequency  distribution  for  discrete  variable  like  members  in  a family, number of workers in a factory etc., where the variable may take only integral values.   It cannot be used with fractional values like age, height, weight etc.
This method may be illustrated as follows:

Class interval
Frequency
5-  9
7
10-14
12
15-19
15
20-29
21
30-34
10
35-39
5
Total
70
Thus to decide whether to use the inclusive method or the exclusive method, it is important to determine whether the variable

under observation in a continuous or discrete one.   In case of continuous variables, the exclusive method must be used.   The inclusive method should be used in case of discrete variable.
c) Open end classes:
A class limit is missing either at the lower end of the first class interval or at the upper end of the last class interval or both are not  specified.   The necessity of open end classes arises in a number of practical situations, particularly relating to economic and medical data when there are few very high values or few very low values which are far apart from the majority of observations.

The example for the open-end classes as follows :
Salary Range
No of
workers
Below 2000
7
2000 4000
5
4000 6000
6
6000 8000
4
8000 and
above
3

Construction of frequency table:
Constructing a frequency distribution depends on the nature
of the given data.  Hence, the following general consideration  may be borne in mind for ensuring meaningful classification of data.
1.   The number of classes should preferably be between 5 and
20.  However there is no rigidity about it.
2.   As far as possible one should avoid values of class intervals as   3,7,11,26.etc.      preferably  one   should   have  class- intervals of either five or multiples of 5 like 10,20,25,100 etc.
3.   The  starting  point  i.e  the  lower  limit  of  the  first  class, should either be zero or 5 or multiple of 5.
4.   To  ensure continuity and to get correct class interval we should adopt exclusive method.
5.   Wherever possible, it  is desirable to use class interval of equal sizes.


Comments

Popular posts from this blog

Double exponential distribution

Double Exponential Distribution Probability Density Function The general formula for the  probability density function  of the double exponential distribution is where   is the  location parameter  and   is the  scale parameter . The case where   = 0 and   = 1 is called the  standard double exponential distribution . The equation for the standard double exponential distribution is Since the general form of probability functions can be  expressed in terms of the standard distribution , all subsequent formulas in this section are given for the standard form of the function. The following is the plot of the double exponential probability density function. Cumulative Distribution Function The formula for the  cumulative distribution function  of the double exponential distribution is The following is the plot of the double exponential cumulative distribution function. Percent Point Function The formula for the  percent point function  of the double exponential distribution

Runs Test for Detecting Non-randomness

Runs Test for Detecting Non-randomness Purpose: Detect Non-Randomness The runs test ( Bradley, 1968 ) can be used to decide if a data set is from a random process. A run is defined as a series of increasing values or a series of decreasing values. The number of increasing, or decreasing, values is the length of the run. In a random data set, the probability that the ( I +1)th value is larger or smaller than the I th value follows a binomial distribution , which forms the basis of the runs test. Typical Analysis and Test Statistics The first step in the runs test is to count the number of runs in the data sequence. There are several ways to define runs in the literature, however, in all cases the formulation must produce a dichotomous sequence of values. For example, a series of 20 coin tosses might produce the f

Basics of Sampling Techniques

Population                A   population   is a group of individuals(or)aggregate of objects under study.It is also known as universe. The population is divided by (i)finite population  (ii)infinite population, (iii) hypothetical population,  subject to a statistical study . A population includes each element from the set of observations that can be made. (i) Finite population : A population is called finite if it is possible to count its individuals. It may also be called a countable population. The number of vehicles crossing a bridge every day, (ii) Infinite population : Sometimes it is not possible to count the units contained in the population. Such a population is called infinite or uncountable. ex, The number of germs in the body of a patient of malaria is perhaps something which is uncountable   (iii) Hypothetical population : Statistical population which has no real existence but is imagined to be generated by repetitions of events of a certain typ