Skip to main content

Collection data


Everybody collects, interprets and uses information, much of it in a numerical or statistical form is  just  because there is so much information available that people need to be able to  absorb,  select  and  reject  it.  In  everyday life,  in business and industry, certain statistical information is necessary and it is independent to know where to find it how to collect it. As consequences, everybody has to compare prices and quality before making any decision about what goods to buy.   As employees of any firm, people want to compare their salaries and working conditions, promotion opportunities and so on.  In time the firms on their part want to control costs and expand their profits.
One of the main functions of statistics is to provide information  which  will  help  on  making  decisions.     Statistics provides the type of information by providing a description of the present, a profile of the past and an estimate of the future.   The following are some of the objectives of collecting statistical information.
1.   To  describe  the  methods  of  collecting  primary  statistical information.
2.   To consider the status involved in carrying out a survey.
3.   To   analyse   the   process   involved   in   observation   and
4.   To define and describe sampling.
5.   To analyse the basis of sampling.
6.   To describe a variety of sampling methods.
Statistical investigation is a comprehensive and requires systematic  collection  of  data  about  some  group  of  people  or objects, describing and organizing the data, analyzing the data with
the help of different statistical method, summarizing the analysis and using these results for making judgements, decisions and predictions.   The validity and accuracy of final judgement is most crucial and depends heavily on how well the data was collected in the first place.  The quality of data will greatly affect the conditions and hence at most importance must be given to this process and every  possible  precautions  should  be  taken  to  ensure  accuracy while collecting the data.

Nature of data:
It may be noted that different types of data can be collected
for different purposes. The data can be collected in connection with time or geographical location or in connection with time and location.  The following are the three types of data:
1.   Time series data.
2.   Spatial data
3.   Spacio-temporal data.

Time series data:
It is a collection of a set of numerical values, collected over
a  period  of time.  The  data  might  have  been collected  either  at regular intervals of time or irregular intervals of time.
Example 1:
The following is the data for the three types of expenditures
in rupees for a family for the four years 2001,2002,2003,2004.

















Spatial Data:
If the data collected is connected with that of a place, then it is termed as spatial data. For example, the data may be

1.   Number  of  runs  scored  by  a  batsman  in  different  test matches in a test series at different places
2.   District wise rainfall in Tamilnadu
3.   Prices of silver in four metropolitan cities
Example 2:
The population of the southern states of India in 1991.

Andhra Pradesh

Spacio Temporal Data:
If the data collected is connected to the time as well as place
then it is known as spacio temporal data.
Example 3:

Tamil Nadu
Andhra Pradesh

Categories of data:
Any statistical data can be classified under two categories depending upon the sources utilized.
These  categories are,
1.     Primary data               2.  Secondary data

Primary data:
Primary   data   is   the   one,   which   is   collected   by  the
investigator himself for the purpose of a specific inquiry or study. Such data is original in character and is generated by survey conducted  by  individuals  or  research  institution  or  any organisation.

Example 4:
If a researcher is interested to know the impact of noon-
meal scheme for the school children, he has to undertake a survey and collect data on the opinion of parents and children by asking relevant questions. Such a data collected for the purpose is called primary data.
The  primary data  can be  collected  by the  following  five methods.
1.   Direct personal interviews.
2.   Indirect Oral interviews.
3.   Information from correspondents.
4.   Mailed questionnaire method.
5.   Schedules sent through enumerators.

1. Direct personal interviews:
The persons from whom informations are collected are known as informants.  The investigator personally meets them and asks  questions  to  gather  the  necessary  informations.    It  is  the suitable method for intensive rather than extensive field surveys.  It suits best for intensive study of the limited field.

1. People willingly supply informations because they are approached personally.   Hence, more response noticed in this method than in any other method.
2.   The  collected  informations  are  likely  to  be  uniform  and
accurate.  The investigator is there to clear the doubts of the informants.
3.   Supplementary    informations    on    informant s    personal aspects  can  be  noted.     Informations  on  character  and
environment may help later to interpret some of the results.
4.   Answers for questions about which the informant is likely
to be sensitive can be gathered by this method.
5.   The wordings in one or more questions can be altered to suit
any informant.    Explanations may be given in other languages  also.    Inconvenience and misinterpretations are thereby avoided.

1.   It is very costly and time consuming.
2. It is very difficult, when the number of persons to be interviewed is large and the persons are spread over a wide area.
3.   Personal prejudice and bias are greater under this method.

2. Indirect Oral Interviews:
Under  this method  the investigator  contacts witnesses or
neighbours or friends or some other third parties who are capable of supplying the necessary information.   This method is preferred if the required information is on addiction or cause of fire or theft or murder etc., If a fire has broken out a certain place, the persons living   in   neighbourhood   and   witnesses   are   likely   to   give information on the cause of fire.  In some cases, police interrogated third parties who are supposed to have knowledge of a theft or a murder  and  get  some  clues.    Enquiry committees  appointed  by governments generally adopt this method and get people s views and all possible details of facts relating to the enquiry.  This method is suitable whenever direct sources do not exists or cannot be relied upon or would be unwilling to part with the information.
The validity of the results depends upon a few factors, such as the nature of the person whose evidence is being recorded, the ability of the interviewer to draw out information from the third parties by means of appropriate questions and cross examinations, and the number of persons interviewed.     For the success of this method one person or one group alone should not be relied upon.

3. Information from correspondents:
The investigator appoints local agents or correspondents in
different places and compiles the information sent by them. Informations to Newspapers and some departments of Government come by this method.   The advantage of this method is that it is cheap and appropriate for extensive investigations.  But it may not ensure accurate results because the correspondents are likely to be negligent, prejudiced and biased.  This method is adopted in those cases where informations are to  be collected periodically from a wide area for a long time.

4.  Mailed questionnaire method:
Under this method a list of questions is prepared and is sent
to all the informants by post.   The list of questions is technically called questionnaire. A covering letter accompanying the questionnaire explains the purpose of the investigation and the importance of correct informations and request the informants to fill in the blank spaces provided and to return the form within a specified time.  This method is appropriate in those cases where the informants are literates and are spread over a wide area.

1.   It is relatively cheap.
2.   It  is  preferable  when the  informants are spread  over the
wide area.

1.   The  greatest  limitation  is  that  the  informants  should  be literates who are able to understand and reply the questions.
2.   It  is  possible  that  some  of  the  persons  who  receive  the questionnaires do not return them.
3.   It is difficult to verify the correctness of the informations furnished by the respondents.
With   the    view    of   minimizing    non-respondents   and collecting   correct   information,   the   questionnaire   should   be
carefully drafted.  There is no hard and fast rule.  But the following general principles may be helpful in framing the questionnaire.   A
covering letter and a self addressed and stamped envelope should accompany the questionnaire.   The covering letter should politely
point out the purpose of the survey and privilege of the respondent who  is one among the few associated with the investigation.   It
should assure that the informations would be kept confidential and would never be misused.  It may promise a copy of the findings or
free gifts or concessions etc.,

Characteristics of a good questionnaire:
1.   Number of questions should be minimum.
2.   Questions should be in logical orders, moving from easy to more difficult questions.

3.   Questions should be short and simple.  Technical terms and vague            expressions   capable   of   different   interpretations should be avoided.
4.   Questions  fetching  YES  or  NO  answers  are  preferable.
There  may  be  some  multiple  choice  questions  requiring lengthy answers are to be avoided.
5.   Personal  questions  and  questions  which  require  memory power and calculations should also be avoided.
6. Question should  enable cross check. Deliberate or unconscious mistakes can be detected to an extent.
7.   Questions  should  be  carefully framed  so  as to  cover  the entire scope of the survey.
8.   The  wording  of the  questions  should  be  proper  without hurting the feelings or arousing resentment.
9.   As far as possible confidential informations should not be sought.
10. Physical  appearance  should  be  attractive,  sufficient  space should be provided for answering each questions.

5.  Schedules sent through Enumerators:
Under  this  method  enumerators  or  interviewers  take  the
schedules, meet the informants and filling their replies.   Often distinction is made between the schedule and a questionnaire.   A schedule is filled by the interviewers in a face-to-face situation with the informant.   A questionnaire is filled by the informant which he receives and returns by post.  It is suitable for extensive surveys.
1.   It can be adopted even if the informants are illiterates.
2.   Answers for questions of personal and pecuniary nature can
be collected.
3.   Non-response  is  minimum  as  enumerators  go  personally
and contact the informants.
4.   The  informations  collected  are  reliable.  The  enumerators
can be properly trained for the same.
5.   It is most popular methods.

1.   It is the costliest method.
2.   Extensive  training  is  to  be  given to  the  enumerators  for collecting correct and uniform informations.
3.   Interviewing  requires  experience.  Unskilled  investigators are likely to fail in their work.
Before the actual survey, a pilot survey is conducted.  The questionnaire/Schedule  is  pre-tested  in  a  pilot  survey.    A  few
among  the  people  from whom actual information  is  needed  are asked to reply.  If they misunderstand a question or find it difficult
to answer or do not like its wordings etc., it is to be altered.  Further it is to be ensured that every questions fetches the desired answer.

Merits and Demerits of primary data:
1.      The collection of data by the method of personal survey is
possible  only if the  area covered by the investigator  is small.  Collection  of  data  by sending  the  enumerator  is bound to be expensive. Care should be taken twice that the enumerator record correct information provided by the informants.
2.      Collection  of  primary  data  by  framing  a  schedules  or
distributing and collecting questionnaires by post is less expensive and can be completed in shorter time.
3.      Suppose the questions are embarrassing or of complicated nature  or  the  questions  probe  into  personnel affairs  of
individuals, then the schedules may not be filled with accurate and correct information and hence this method is
4.      The   information   collected   for   primary  data  is  mere
reliable than those collected from the secondary data.

Secondary Data:
Secondary data are those data which have been already collected and analysed by some earlier agency for its own use; and later the same data are used by a different agency.   According to W.A.Neiswanger, A primary source is a publication in which the data  are  published  by  the  same  authority  which  gathered  and analysed them.   A secondary source is a publication, reporting the data which have been gathered by other authorities and for which others are responsible .
Sources of Secondary data:
In most of the studies the investigator finds it impracticable
to collect first-hand information on all related issues and as such he makes use of the data collected by others.  There is a vast amount of  published  information  from  which  statistical  studies  may  be made and fresh statistics are constantly in a state of production. The sources of secondary data can broadly be classified under two heads:
1.   Published sources, and
2.   Unpublished sources.
1.  Published Sources:
The various sources of published data are:
1.   Reports and official publications of
(i)  International bodies such as the International Monetary Fund, International Finance Corporation and United Nations Organisation.
(ii) Central and State Governments such as the Report of the
Tandon Committee and Pay Commission.
2.   Semi-official publication of   various  local bodies such as
Municipal Corporations and District Boards.
3.    Private publications-such as the publications of
(i)  Trade and professional bodies such as the Federation of Indian   Chambers   of   Commerce   and   Institute   of Chartered Accountants.
(ii) Financial and economic journals such as Commerce ,
Capital and Indian Finance .
(iii) Annual reports of joint stock companies.
(iv)  Publications brought out by research agencies, research scholars, etc.
It  should be noted that the publications mentioned above vary  with  regard  to  the  periodically  of  publication.    Some  are
published   at   regular   intervals   (yearly,   monthly,   weekly  etc.,)
whereas  others  are  ad  hoc  publications,  i.e.,  with  no  regularity
about periodicity of publications.

2. Unpublished Sources
All statistical material is not always published.   There are
various sources of unpublished data such as records maintained by various Government and private offices, studies made by research institutions, scholars, etc.   Such sources can also be used where necessary
Precautions in the use of Secondary data
The  following  are  some  of  the  points  that  are  to  be
considered in the use of secondary data
1.   How the data has been collected and processed
2.   The accuracy of the data
3.   How far the data has been summarized
4.   How comparable the data is with other tabulations
5.   How to interpret the data, especially when figures collected
for one purpose is used for another
Generally  speaking,  with  secondary data,  people  have  to
compromise between what  they want  and what  they are able to find.
Merits and Demerits of Secondary Data:
1. Secondary data is cheap to obtain. Many government publications   are   relatively   cheap   and   libraries   stock quantities of secondary data produced by the government, by companies and other organisations.
2.   Large  quantities  of  secondary  data  can  be  got  through internet.
3.   Much of the secondary data available has been collected for many years and therefore it can be used to plot trends.
4.   Secondary data is of value to:
-    The  government   help  in  making  decisions  and
planning future policy.
-    Business and industry in areas such as marketing,
and sales in order to appreciate the general economic and social conditions and to provide information on competitors.
-    Research    organisations        by    providing    social,
economical and industrial information.


Popular posts from this blog

Frequency Polygons

Learning Objectives Create and interpret frequency polygons Create and interpret cumulative frequency polygons Create and interpret overlaid frequency polygons Frequency polygons are a graphical device for understanding the shapes of distributions. They serve the same purpose as histograms, but are especially helpful for comparing sets of data. Frequency polygons are also a good choice for displaying cumulative frequency distributions . To create a frequency polygon, start just as for histograms , by choosing a class interval. Then draw an X-axis representing the values of the scores in your data. Mark the middle of each class interval with a tick mark, and label it with the middle value represented by the class. Draw the Y-axis to indicate the frequency of each class. Place a point in the ...

Lognormal distribution

Lognormal Distribution Probability Density Function A variable X is lognormally distributed if Y = LN(X) is normally distributed with "LN" denoting the natural logarithm. The general formula for the  probability density function  of the lognormal distribution is where   is the  shape parameter ,   is the  location parameter  and  m is the  scale parameter . The case where   = 0 and  m  = 1 is called the  standard lognormal distribution . The case where   equals zero is called the 2-parameter lognormal distribution. The equation for the standard lognormal distribution is Since the general form of probability functions can be  expressed in terms of the standard distribution , all subsequent formulas in this section are given for the standard form of the function. The following is the plot of the lognormal probability density function for four values of  . There are several commo...

Double exponential distribution

Double Exponential Distribution Probability Density Function The general formula for the  probability density function  of the double exponential distribution is where   is the  location parameter  and   is the  scale parameter . The case where   = 0 and   = 1 is called the  standard double exponential distribution . The equation for the standard double exponential distribution is Since the general form of probability functions can be  expressed in terms of the standard distribution , all subsequent formulas in this section are given for the standard form of the function. The following is the plot of the double exponential probability density function. Cumulative Distribution Function The formula for the  cumulative distribution function  of the double exponential distribution is The following is the plot of the double exponential cumulative distribution function. Percent Point Function...