FREQUENCY DISTRIBUTION
Introduction:
Frequency distribution is a series when a number of observations
with similar
or closely related
values
are put
in separate bunches
or
groups, each group being in order of magnitude in a series. It is simply a table in which the data are grouped into classes
and
the
number of
cases which fall
in each class
are recorded. It shows the frequency of occurrence of different values
of
a single Phenomenon.
A frequency distribution is constructed for three main reasons:
1. To facilitate the analysis of data.
2. To estimate
frequencies of the unknown population
distribution
from the distribution of sample data and
3. To facilitate the computation of various statistical
measures
Raw data:
The statistical data collected are generally raw data or
ungrouped data. Let us consider the daily wages (in Rs ) of 30
labourers in a factory.
80
|
70
|
55
|
50
|
60
|
65
|
40
|
30
|
80
|
90
|
75
|
45
|
35
|
65
|
70
|
80
|
82
|
55
|
65
|
80
|
60
|
55
|
38
|
65
|
75
|
85
|
90
|
65
|
45
|
75
|
The above figures are nothing but raw or ungrouped data
and they are recorded as they occur without any pre consideration.
This representation of data does not furnish any useful information and is rather confusing to mind. A better way to express the figures
in
an ascending or descending order of
magnitude and is commonly known as array.
But this does not reduce the bulk of the data. The above data when formed into an array is in the following form:
30
|
35
|
38
|
40
|
45
|
45
|
50
|
55
|
55
|
55
|
60
|
60
|
65
|
65
|
65
|
65
|
65
|
65
|
70
|
70
|
75
|
75
|
75
|
80
|
80
|
80
|
80
|
85
|
90
|
90
|
The
array helps
us to
see at
once
the
maximum and minimum values.
It also gives a rough idea of the distribution of the items over the range . When we have a large number of items, the formation of an array is very difficult, tedious and cumbersome. The Condensation should be directed for better understanding
and may be done
in two ways, depending
on the nature of the data.
a) Discrete (or) Ungrouped frequency
distribution:
In this form of distribution, the frequency
refers to discrete value.
Here the data
are
presented in
a
way that exact
measurement of units are clearly indicated.
There are definite
difference
between the variables of
different groups of items.
Each class is distinct and separate from
the other class. Non-continuity from one class to another class
exist. Data as such facts like the number of rooms in a house, the
number
of companies registered in
a
country,
the
number of
children in a family, etc.
The process of preparing this type of distribution
is
very simple. We have just to count the number of times a particular
value is repeated, which is called the frequency of that class. In order to facilitate counting prepare a column of tallies.
In another
column,
place all possible values of variable from the lowest
to the highest.
Then put a bar (Vertical line)
opposite the particular value to which it relates.
To facilitate counting, blocks of five bars
are prepared
and some space is left in between each block.
We finally count the
number of bars and get frequency.
Example 1:
|
Represent the data
in the form of
a
discrete
frequency distribution.
Solution:
Frequency distribution of the number of children
Number of
Children
|
Tally
Marks
|
Frequency
|
0
|
|
3
|
1
|
|
7
|
2
|
|
10
|
3
|
|
8
|
4
|
|
6
|
5
|
|
4
|
6
|
|
2
|
|
Total
|
40
|
b) Continuous frequency
distribution:
In this form of distribution refers to groups of values.
This
becomes necessary in the case of some variables which can take any
fractional
value and in which case an exact measurement is not possible.
Hence a discrete variable can be presented in the form of
a continuous frequency distribution.
Wage distribution of 100 employees
Weekly wages
(Rs)
|
Number of
employees
|
50-100
|
4
|
100-150
|
12
|
150-200
|
22
|
200-250
|
33
|
250-300
|
16
|
300-350
|
8
|
350-400
|
5
|
Total
|
100
|
Nature of class:
The
following are
some basic
technical
terms when
a
continuous frequency
distribution is formed or data are classified
according to class intervals.
a) Class limits:
The class limits are the lowest and the highest values that
can be included in the class. For example, take the class 30-40.
The lowest value of the class is 30 and highest class is 40.
The two boundaries of class are known as the lower limits and the upper
limit of the class.
The lower limit of a class is the value below which there can be no item in the class. The upper limit of a class is the value above which there can be no item to that class. Of the class 60-79, 60 is the lower limit and 79 is the upper limit, i.e. in
the case there can be no value which is less than 60 or more than
79. The way
in
which class limits are stated depends upon the nature of the data.
In statistical calculations, lower class limit is denoted by L and upper class limit by U.
b) Class
Interval:
The
class interval
may be
defined
as the
size of
each grouping of data.
For example, 50-75, 75-100, 100-125…are class
intervals.
Each grouping begins with the lower limit of a class interval and ends at the lower limit of the next succeeding class interval
c) Width or size of the class interval:
The difference between the lower and upper class limits is called Width or size of class interval and is denoted by ‘ C’ .
d) Range:
The
difference
between largest and smallest
value
of the
observation is called The Range and is denoted by ‘ R’
ie
R = Largest value – Smallest value
R = L - S
e) Mid-value or mid-point:
The central point of a class interval is called
the mid value or mid-point. It is found out by adding the upper and lower limits of a class and dividing the sum by 2.
(i.e.) Midvalue =
L U
2
For example, if the class interval is 20-30 then the mid-value is
20 30
2
f) Frequency:
= 25
Number
of observations
falling
within
a
particular class
interval is called frequency of that class.
Let
us consider the frequency distribution
of weights
if
persons working in a company.
Weight
(in kgs)
|
Number of
persons
|
30-40
|
25
|
40-50
|
53
|
50-60
|
77
|
60-70
|
95
|
70-80
|
80
|
80-90
|
60
|
90-100
|
30
|
Total
|
420
|
In the above example, the class frequency are
25,53,77,95,80,60,30. The
total frequency is equal to 420. The total frequency indicate the total number of observations considered in a frequency distribution.
g) Number of
class intervals:
The number of class interval
in
a frequency is matter of
importance.
The number of class interval should not be too many.
For an ideal frequency distribution, the number of class intervals
can vary from 5 to 15. To decide the number of
class intervals for the frequency distributive in the whole data, we choose the lowest
and
the highest of the values. The difference between them
will enable us to decide
the class intervals.
Thus the number of class intervals can be fixed arbitrarily
keeping in view the nature of problem under study or it can be
decided
with the help of Sturges’
Rule. According to him, the number of classes can be determined by the formula
K = 1 + 3. 322 log10 N
Where
N = Total number of observations
log
= logarithm of the number
K = Number of class intervals.
Thus if the number of observation is 10, then the number of
class intervals is
K = 1 + 3. 322 log 10 =
4.322 4
If 100 observations are being studied, the number of class
interval is
K = 1 + 3. 322 log 100 = 7.644 8 and so on.
h) Size of the class interval:
Since
the size of the class interval
is inversely proportional to the number of class interval in a given distribution. The approximate value of the size (or width or magnitude) of the class interval ‘ C’ is obtained by using sturges rule as
Size of class interval = C =
Range
Number of class interval
= Range
1+3.322 log10 N
Where Range = Largest
Value – smallest value in the distribution.
Types of class intervals:
There are three methods of classifying the data according to class intervals namely
a)
Exclusive method b)
Inclusive method
c)
Open-end classes
a) Exclusive method:
When the class intervals are so fixed that the upper limit of
one class is the lower limit of
the next class; it is known as the exclusive method of classification. The
following data are
classified on this basis.
Expenditure
(Rs.)
|
No. of
families
|
0 - 5000
|
60
|
5000-10000
|
95
|
10000-15000
|
122
|
15000-20000
|
83
|
20000-25000
|
40
|
Total
|
400
|
It is clear that the exclusive method ensures continuity of data as much as the upper limit of one class is the lower limit of the
next class. In the above example, there are so families whose expenditure is between Rs.0 and Rs.4999.99.
A family whose expenditure
is Rs.5000 would be
included
in the class
interval
5000-10000.
This method is widely used in practice.
b) Inclusive method:
In this method, the overlapping of the class intervals is
avoided. Both the lower and upper limits are included in the class interval. This type of classification may
be used for a grouped frequency distribution for
discrete
variable
like
members in a family, number of workers in a factory
etc.,
where the variable may take only integral values. It cannot be used with fractional values like age, height, weight etc.
This method may be illustrated as follows:
Class interval
|
Frequency
|
5- 9
|
7
|
10-14
|
12
|
15-19
|
15
|
20-29
|
21
|
30-34
|
10
|
35-39
|
5
|
Total
|
70
|
Thus
to decide whether to use the inclusive method or the
exclusive method, it is important to determine whether the variable
under observation in a continuous or discrete one. In case
of continuous variables, the exclusive method must be
used.
The inclusive method should be used in case of discrete variable.
c) Open end classes:
A class limit is missing either at the lower end of the first
class interval or at the upper end of the last class interval or both are not specified.
The necessity of open end classes arises in a
number of practical situations, particularly
relating to economic and medical data when there are few very high values or few very low values which are far apart from the majority of observations.
The example for the open-end classes as follows :
Salary Range
|
No of
workers
|
Below 2000
|
7
|
2000 – 4000
|
5
|
4000 – 6000
|
6
|
6000 – 8000
|
4
|
8000 and
above
|
3
|
Construction of frequency
table:
Constructing a frequency distribution depends on the nature
of the given data.
Hence, the following general consideration may
be
borne in mind for ensuring meaningful classification of data.
1. The number of classes should preferably be between 5 and
20.
However there is no rigidity about it.
2. As far as possible one should avoid values of class intervals
as 3,7,11,26….etc.
preferably one
should have class- intervals of either five or multiples of 5 like 10,20,25,100
etc.
3.
The starting point
i.e the lower limit of
the
first
class,
should either be zero or 5 or multiple of 5.
4.
To ensure continuity and to get correct class interval we should adopt “exclusive” method.
5.
Wherever possible, it
is desirable to use class interval of equal sizes.
Comments
Post a Comment