Statistics
The word “statistics” is used in both its singular as well as its plural senses.
In singular sense, statistics maybe defined as the science of collection, presentation, analysis and interpretation of numerical data.
In plural sense, statistics means numerical facts or observations collected with definite purpose.
Statistical Data
Statistical data are of two types (i) primary data (ii) secondary data.
Primary Data: Primary data is data that is collected by a researcher from first-hand sources, using methods like surveys, interviews, or experiments. It is collected with the research project in mind, directly from primary sources.
Secondary Data: Secondary data is data gathered from studies, surveys, or experiments that have been run by other people or for other research.
Typically, a researcher will begin a project by working with secondary data. This allows time to formulate questions and gain an understanding of the issues being dealt with before the costly and time consuming operation of collecting primary data.
Presentation of Data
This refers to the organization of data into tables, graphs or charts, so that logical and statistical conclusions can be derived from the collected measurements. Such an arrangement is called presentation of data.
The raw data can be arranged in any one of the following ways:
(i) Serial order or alphabetical order (ii) Ascending order (iii) Descending order
The raw data when put in ascending or descending order of magnitude is called an Array.
Let the marks obtained by 30 students of Class X in a class test, out of 50 marks, according to their roll numbers be:
The data in this form are called raw data or ungrouped data. This way of arranging data does not give enough information to analyze or interpret the data.
Roll Number | Marks | Roll Number | Marks | Roll Number | Marks |
1 | 39 | 11 | 19 | 21 | 40 |
2 | 25 | 12 | 1 | 22 | 12 |
3 | 5 | 13 | 10 | 23 | 41 |
4 | 33 | 14 | 8 | 24 | 33 |
5 | 19 | 15 | 12 | 25 | 19 |
6 | 21 | 16 | 17 | 26 | 21 |
7 | 12 | 17 | 19 | 27 | 33 |
8 | 41 | 18 | 17 | 28 | 5 |
9 | 12 | 19 | 17 | 29 | 1 |
10 | 21 | 20 | 41 | 30 | 21 |
Now suppose we wish to judge the standard of achievement of the students. The data in this form do not give us a clear picture of the group.
If we arrange them in ascending or descending order, it gives us a slightly better picture.
In ascending order, the data looks as follows:
In descending order, the data look as follows:
The raw data when put in ascending or descending order of magnitude is called an array or arrayed data.
If the number of observations is large, then arranging data in ascending or descending or serial order is a tedious job and it does not tell us much except perhaps the minimum(s) and maximum(s) of data.
So, to make it easier to understandable, we can tabulate data in the form of a table given below.
Marks | Tally Mark | No of Students (Frequency) |
1 | || | 2 |
5 | || | 2 |
8 | | | 1 |
10 | | | 1 |
12 | |||| | 4 |
17 | ||| | 3 |
19 | |||| | 4 |
21 | |||| | 4 |
25 | | | 1 |
33 | ||| | 3 |
39 | | | 1 |
40 | | | 1 |
41 | ||| | 3 |
This way of presentation of data is known as frequency distribution. Marks are called variate and the number of students who have secured a particular number of marks is called frequency of the variate. The number of times an observation occurs in the given data, is called the frequency of the observation.
The presentation of data can be further condensed into class groups. In this presentation all observations are divided into groups. These groups are called classes or class intervals.
We can arrange the above data into classes as follows:
Class | Number of students (Frequency) |
1-10 | 6 |
11-20 | 11 |
21-30 | 5 |
31-40 | 4 |
41-50 | 4 |
Frequency Distribution
Frequency distribution in statistics provides the information of the number of occurrences (frequency) of distinct values distributed within a given period of time or interval, in a list, table, or graphical representation.
Frequency distributions are of two types:
(i) Discrete frequency distribution (ii) Continuous or grouped frequency distribution
Discrete Frequency Distribution: Discrete data is generated by counting, and each and every observation is exact. When an observation is repeated, it is counted. The number for which the observation is repeated is called the frequency of that observation.
The process of preparing this type of distribution is very simple. We then find out the number of times that data has occurred in the set of data provided (i.e. the frequency). You can use tally marks to count the number of times the data has occurred.
Take the example, if there was a test (max marks 5) and the students scored the following marks:
For the above data, if we have to draw a discrete frequency distribution, then it would look like the follows:
Marks | Tally Bars | Frequency |
1 | 5 | |
2 | 6 | |
3 | |||| | 4 |
4 | ||| | 3 |
5 | || | 2 |
Continuous or Grouped Frequency Distribution: A grouped frequency distribution is an arrangement of class intervals and corresponding frequencies in a table.
If the number of observations in data is large and the difference between the greatest and the smallest observations is large, then we condense the data into classes or groups. Such a presentation of data is known as the grouped frequency distribution.
Let the marks obtained by 30 students of class X in a class test, out of 50 marks, according to their roll numbers be:
Marks (Class Intervals) | Tally Bars | No. of students (Frequency) |
0-10 | 6 | |
11-20 | 11 | |
21-30 | 5 | |
31-40 | |||| | 4 |
41-50 | |||| | 4 |
(i) Exclusive Method: When the class intervals are so fixed that the upper limit of one class is the lower limit of the next class it is known as the exclusive method of classification. In this method the upper limit of a class is not included in the class.
Thus, in the class of marks obtained by students, a student who has obtained marks is not included in this class. He is counted in the next class .
Let the marks obtained by students of class X in a class test, out of marks, according to their roll numbers be:
For this data set, exclusive method (distribution) would look like the following:
Class | Number of students (Frequency) |
0-10 | 5 |
10-20 | 12 |
20-30 | 5 |
30-40 | 4 |
40-50 | 4 |
(ii) Inclusive Method: In this method the classes are so formed that the upper limit of a class is included in that class.
So for the above data, the distribution would look like the following:
Class | Number of students (Frequency) |
0-9 | 5 |
10-19 | 12 |
20-29 | 5 |
30-39 | 4 |
40-49 | 4 |
Exclusive Method | Inclusive Method | ||
Class | Number of students (Frequency) | Class | Number of students (Frequency) |
0-10 | 5 | 0-9 | 5 |
10-20 | 12 | 10-19 | 12 |
20-30 | 5 | 20-29 | 5 |
30-40 | 4 | 30-39 | 4 |
40-50 | 4 | 40-49 | 4 |
Total | 30 | Total | 30 |
If you look at exclusive method, the class interval is while for inclusive the class interval is
. However,
is not correct class interval. Whenever inclusive method is used it is necessary to make an adjustment to determine the correct class interval and to have a continuity.
lf is a class in inclusive method, then in exclusive method it becomes
where
For our example,
. Hence for inclusive method we get
Class | Number of students (Frequency) |
-0.5 – 9.5 | 5 |
9.5 – 19.5 | 12 |
19.5 – 29.5 | 5 |
29.5 – 39.5 | 4 |
39.5 – 49.5 | 4 |
Cumulative Frequency Distribution: Technically, a cumulative frequency distribution is the sum of the class and all classes below it in a frequency distribution. All that means is you’re adding up a value and all of the values that came before it.
In the above example the frequencies are grouped-frequencies or class-frequencies.
If, however, the frequency of the first class is added to that of the second and this sum is added to that of the third and so on, then the frequencies so obtained are known as cumulative frequencies.
There are two types of cumulative frequencies viz. less than and greater than. For less than cumulative frequencies we add up the frequencies from above and for greater than cumulative frequencies we add up frequencies from below.
For less than cumulative frequency distribution:
Class | Number of students (Frequency) |
Less than 10 | 5 |
Less than 20 | 17 |
Less than 30 | 22 |
Less than 40 | 26 |
Less than 50 | 30 |
Class | Number of students (Frequency) |
Greater than 0 | 30 |
Greater than 10 | 24 |
Greater than 20 | 13 |
Greater than 30 | 8 |
Greater than 40 | 3 |
To prepare a discrete frequency distribution from the given raw data we use the following steps.
- Step 1: Obtain the given raw data and organize that in ascending order.
- Step 2: Prepare a table with three columns – first for ‘variable’ under study such as marks, weight, height etc., second for ‘Tally marks’ and third for the total, representing corresponding ‘frequency’ to each value or size of the variable.
- Step 3: Place all the values of the variable in the first column in ascending order.
- Step 4: Find out how many times that data has occurred in the data occurred in the ascending data structure. This is the frequency of occurrence for the data element
Example: Below are the ages of 25 students of Class X in a school. Prepare a frequency distribution.
Solution: Frequency Distribution of Ages of 25 school students
Age | Tally Bars | Number of students (Frequency) |
14 | |||| | 4 |
15 | 8 | |
16 | 10 | |
17 | ||| | 3 |
Total | 25 |
The following steps is used in the construction of a grouped frequency distribution.
- Step 1: Determine the maximum and minimum value of the variate occurring in the data.
- Step 2: Decide upon the number of classes to be formed. Note that the number of classes should be in range of 5 to 15.
- Step 3: Find the difference between the maximum value and minimum value and divide this difference by the number of classes to be formed to determine the class interval. The difference between the maximum value and minimum value in a data is called range.
- Step 4: Be sure that there must be classes with us to include minimum and maximum occurring in the data.
- Step 5: Take each item from the data, and at a time and put a tally mark (|) against the class to which the item belongs. If tally marks are more than 4, then record them in the bunches of five, the fifth one is marked by crossing diagonally the first four.
- Step 6: By counting determine the total number of tally marks in each class, which gives us the frequency of the class.
- Step 7: Check that the total of all frequencies is same as the total number of observations.
Example: Electricity bill for houses in the locality are given below. Construct a grouped frequency distribution with class size of
.
Solution: The range of variate: Maximum Minimum
Therefore Range
Given class size
No of Classes
Therefore, we should have
classes. We should make the classes in such a way that the minimum value is covered and the maximum value is covered. Therefore the distribution is
Age | Tally Bars | Number of students (Frequency) |
14 – 24 | |||| | 4 |
24 – 34 | || | 2 |
34 – 44 | ||| | 3 |
44 – 54 | ||| | 3 |
54 – 64 | | | 1 |
64 – 74 | || | 2 |
74 – 84 | 5 | |
84 – 94 | ||| | 3 |
94 – 104 | ||| | 3 |
104 – 114 | |||| | 4 |
Total | 30 |