Statistics

The word “statistics” is used in both its singular as well as its plural senses.

In singular sense, statistics maybe defined as the science of collection, presentation, analysis and interpretation of numerical data.

In plural sense, statistics means numerical facts or observations collected with definite purpose.

Statistical Data

Statistical data are of two types (i) primary data (ii) secondary data.

Primary Data: Primary data is data that is collected by a researcher from first-hand sources, using methods like surveys, interviews, or experiments. It is collected with the research project in mind, directly from primary sources.

Secondary Data: Secondary data is data gathered from studies, surveys, or experiments that have been run by other people or for other research.

Typically, a researcher will begin a project by working with secondary data. This allows time to formulate questions and gain an understanding of the issues being dealt with before the costly and time consuming operation of collecting primary data.

Presentation of Data

This refers to the organization of data into tables, graphs or charts, so that logical and statistical conclusions can be derived from the collected measurements. Such an arrangement is called presentation of data.

The raw data can be arranged in any one of the following ways:

(i) Serial order or alphabetical order     (ii) Ascending order     (iii) Descending order

The raw data when put in ascending or descending order of magnitude is called an Array.

Let the marks obtained by 30 students of Class X in a class test, out of 50 marks, according to their roll numbers be:

39, 25, 5, 33, 19, 21, 12, 41, 12, 21, 19, 1, 10, 8, 12, 17, 19, 17, 17, 41, 40, 12, 41, 33, \\ 19, 21, 33, 5, 1, 21

The data in this form are called raw data or ungrouped data. This way of arranging data does not give enough information to analyze or interpret the data.

Roll NumberMarksRoll NumberMarksRoll NumberMarks
13911192140
2251212212
3513102341
4331482433
51915122519
62116172621
71217192733
8411817285
9121917291
102120413021

Now suppose we wish to judge the standard of achievement of the students. The data in this form do not give us a clear picture of the group.

If we arrange them in ascending or descending order, it gives us a slightly better picture.

In ascending order, the data looks as follows:

1, 1, 5, 5, 8, 10, 12, 12, 12, 12, 17, 17, 17, 19, 19, 19, 19, 21, 21, 21, 21, 25, 33, 33, 33, \\ 39, 40, 41, 41, 41

In descending order, the data look as follows:

41, 41, 41, 40, 39, 33, 33, 33, 25, 21, 21, 21, 21,19, 19, 19, 19, 17, 17, 17, 12, 12,12, \\ 12, 10, 8, 5, 5, 1, 1

The raw data when put in ascending or descending order of magnitude is called an array or arrayed data.

If the number of observations is large, then arranging data in ascending or descending or serial order is a tedious job and it does not tell us much except perhaps the minimum(s) and maximum(s) of data.

So, to make it easier to understandable, we can tabulate data in the form of a table given below.

MarksTally MarkNo of Students (Frequency)
1||2
5||2
8|1
10|1
12||||4
17|||3
19||||4
21||||4
25|1
33|||3
39|1
40|1
41|||3

This way of presentation of data is known as frequency distribution. Marks are called variate and the number of students who have secured a particular number of marks is called frequency of the variate. The number of times an observation occurs in the given data, is called the frequency of the observation.

The presentation of data can be further condensed into class groups. In this presentation all observations are divided into groups. These groups are called classes or class intervals.

We can arrange the above data into classes as follows:

ClassNumber of students (Frequency)
1-106
11-2011
21-305
31-404
41-504

Frequency Distribution

Frequency distribution in statistics provides the information of the number of occurrences (frequency) of distinct values distributed within a given period of time or interval, in a list, table, or graphical representation.

Frequency distributions are of two types:

(i) Discrete frequency distribution     (ii) Continuous or grouped frequency distribution

Discrete Frequency Distribution: Discrete data is generated by counting, and each and every observation is exact. When an observation is repeated, it is counted. The number for which the observation is repeated is called the frequency of that observation.

The process of preparing this type of distribution is very simple. We then find out the number of times that data has occurred in the set of data provided (i.e. the frequency). You can use tally marks to count the number of times the data has occurred.

1, 1, 2, 3, 4, 3, 2, 1, 1, 4, 5, 2, 4, 2, 2, 1, 3, 3, 2, 5

Take the example, if there was a test (max marks 5) and the students scored the following marks:

For the above data, if we have to draw a discrete frequency distribution, then it would look like the follows:

MarksTally BarsFrequency
1||||5
2||||  |6
3||||4
4|||3
5||2

Continuous or Grouped Frequency Distribution: A grouped frequency distribution is an arrangement of class intervals and corresponding frequencies in a table.

If the number of observations in data is large and the difference between the greatest and the smallest observations is large, then we condense the data into classes or groups. Such a presentation of data is known as the grouped frequency distribution.

Let the marks obtained by 30 students of class X in a class test, out of 50 marks, according to their roll numbers be:

39, 25, 5, 33, 19, 21, 12, 41, 12, 21, 19, 1, 10, 8, 12, 17, 19, 17, 17, 41, 40, 12, 41, 33, \\ 19, 21, 33, 5, 1, 21
Marks (Class Intervals)Tally BarsNo. of students (Frequency)
0-10|||| |6
11-20||||  ||||  |11
21-30||||5
31-40||||4
41-50||||4

(i) Exclusive Method: When the class intervals are so fixed that the upper limit of one class is the lower limit of the next class it is known as the exclusive method of classification. In this method the upper limit of a class is not included in the class.

Thus, in the class of marks obtained by students, a student who has obtained marks is not included in this class. He is counted in the next class .

Let the marks obtained by students of class X in a class test, out of  marks, according to their roll numbers be:

39, 25, 5, 33, 19, 21, 12, 41, 12, 21, 19, 1, 10, 8, 12, 17, 19, 17, 17, 41, 40, 12, 41, 33, \\ 19, 21, 33, 5, 1, 21

For this data set, exclusive method (distribution) would look like the following:

ClassNumber of students (Frequency)
0-105
10-2012
20-305
30-404
40-504

(ii) Inclusive Method: In this method the classes are so formed that the upper limit of a class is included in that class.

So for the above data, the distribution would look like the following:

ClassNumber of students (Frequency)
0-95
10-1912
20-295
30-394
40-494
 Exclusive Method Inclusive Method
ClassNumber of students (Frequency)ClassNumber of students (Frequency)
0-1050-95
10-201210-1912
20-30520-295
30-40430-394
40-50440-494
Total30Total30

If you look at exclusive method, the class interval is 50 while for inclusive the class interval is 49 . However, 49 is not correct class interval. Whenever inclusive method is used it is necessary to make an adjustment to determine the correct class interval and to have a continuity.

lf a - b is a class in inclusive method, then in exclusive method it becomes (a - \frac{h}{2}) - ( b + \frac{h}{2})   where h = \frac{lower \ limit \ of \ a \ class - upper \ limit \ of \ a \ class}{2}

For our example, h = \frac{10-9}{2} = 0.5 . Hence for inclusive method we get

ClassNumber of students (Frequency)
-0.5 – 9.55
9.5 – 19.512
19.5 – 29.55
29.5 – 39.54
39.5 – 49.54

Cumulative Frequency Distribution: Technically, a cumulative frequency distribution is the sum of the class and all classes below it in a frequency distribution. All that means is you’re adding up a value and all of the values that came before it.

In the above example the frequencies are grouped-frequencies or class-frequencies.

If, however, the frequency of the first class is added to that of the second and this sum is added to that of the third and so on, then the frequencies so obtained are known as cumulative frequencies.

There are two types of cumulative frequencies viz. less than and greater than. For less than cumulative frequencies we add up the frequencies from above and for greater than cumulative frequencies we add up frequencies from below.

For less than cumulative frequency distribution:

ClassNumber of students (Frequency)
Less than 105
Less than 2017
Less than 3022
Less than 4026
Less than 5030
ClassNumber of students (Frequency)
Greater than 030
Greater than 1024
Greater than 2013
Greater than 308
Greater than 403

To prepare a discrete frequency distribution from the given raw data we use the following steps.

  • Step 1: Obtain the given raw data and organize that in ascending order.
  • Step 2: Prepare a table with three columns – first for ‘variable’ under study such as marks, weight, height etc., second for ‘Tally marks’ and third for the total, representing corresponding ‘frequency’ to each value or size of the variable.
  • Step 3: Place all the values of the variable in the first column in ascending order.
  • Step 4: Find out how many times that data has occurred in the data occurred in the ascending data structure. This is the frequency of occurrence for the data element

Example: Below are the ages of 25 students of Class X in a school. Prepare a frequency distribution.

15, 16, 16, 14, 17, 17, 16, 15, 15, 16, 16, 17, 15, 16, 16, 14, 16, 15, 14, 15, 16, 16, 15, 14, 15

Solution:  Frequency Distribution of Ages of 25 school students

AgeTally BarsNumber of students (Frequency)
14||||4
15||||  |||8
16||||  ||||10
17|||3
Total 25

The following steps is used in the construction of a grouped frequency distribution.

  • Step 1: Determine the maximum and minimum value of the variate occurring in the data.
  • Step 2: Decide upon the number of classes to be formed. Note that the number of classes should be in range of 5 to 15.
  • Step 3: Find the difference between the maximum value and minimum value and divide this difference by the number of classes to be formed to determine the class interval. The difference between the maximum value and minimum value in a data is called range.
  • Step 4: Be sure that there must be classes with us to include minimum and maximum occurring in the data.
  • Step 5: Take each item from the data, and at a time and put a tally mark (|) against the class to which the item belongs. If tally marks are more than 4, then record them in the bunches of five, the fifth one is marked by crossing diagonally the first four.
  • Step 6: By counting determine the total number of tally marks in each class, which gives us the frequency of the class.
  • Step 7: Check that the total of all frequencies is same as the total number of observations.

Example: Electricity bill for 30 houses in the locality are given below. Construct a grouped frequency distribution with class size of 10 . 30, 32, 45, 54, 74, 78, 108, 112, 66, 76, 88, 40, 14, 20, 15, 35, 44, 66, 75, 84, 95, 96, \\ 102, 110, 88, 74, 112, 14, 34, 44

Solution: The range of variate: Maximum = 112 Minimum = 14 Therefore Range = 112 - 14 = 98 Given class size = 10 \Rightarrow No of Classes = 98/10 = 9.8 Therefore, we should have 10 classes. We should make the classes in such a way that the minimum value is covered and the maximum value is covered. Therefore the distribution is

AgeTally BarsNumber of students (Frequency)
14 – 24||||4
24 – 34||2
34 – 44|||3
44 – 54|||3
54 – 64|1
64 – 74||2
74 – 84||||5
84 – 94|||3
94 – 104|||3
104 – 114||||4
Total 30