Statistics

The word “statistics” is used in both its singular as well as its plural senses.

In singular sense, statistics maybe defined as the science of collection, presentation, analysis and interpretation of numerical data.

In plural sense, statistics means numerical facts or observations collected with definite purpose.

Statistical Data

Statistical data are of two types (i) primary data (ii) secondary data.

Primary Data: Primary data is data that is collected by a researcher from first-hand sources, using methods like surveys, interviews, or experiments. It is collected with the research project in mind, directly from primary sources.

Secondary Data: Secondary data is data gathered from studies, surveys, or experiments that have been run by other people or for other research.

Typically, a researcher will begin a project by working with secondary data. This allows time to formulate questions and gain an understanding of the issues being dealt with before the costly and time consuming operation of collecting primary data.

Presentation of Data

This refers to the organization of data into tables, graphs or charts, so that logical and statistical conclusions can be derived from the collected measurements. Such an arrangement is called presentation of data.

The raw data can be arranged in any one of the following ways:

(i) Serial order or alphabetical order     (ii) Ascending order     (iii) Descending order

The raw data when put in ascending or descending order of magnitude is called an Array.

Let the marks obtained by 30 students of Class X in a class test, out of 50 marks, according to their roll numbers be: $39, 25, 5, 33, 19, 21, 12, 41, 12, 21, 19, 1, 10, 8, 12, 17, 19, 17, 17, 41, 40, 12, 41, 33, \\ 19, 21, 33, 5, 1, 21$

The data in this form are called raw data or ungrouped data. This way of arranging data does not give enough information to analyze or interpret the data.

 Roll Number Marks Roll Number Marks Roll Number Marks 1 39 11 19 21 40 2 25 12 1 22 12 3 5 13 10 23 41 4 33 14 8 24 33 5 19 15 12 25 19 6 21 16 17 26 21 7 12 17 19 27 33 8 41 18 17 28 5 9 12 19 17 29 1 10 21 20 41 30 21

Now suppose we wish to judge the standard of achievement of the students. The data in this form do not give us a clear picture of the group.

If we arrange them in ascending or descending order, it gives us a slightly better picture.

In ascending order, the data looks as follows:

In descending order, the data look as follows:

The raw data when put in ascending or descending order of magnitude is called an array or arrayed data.

If the number of observations is large, then arranging data in ascending or descending or serial order is a tedious job and it does not tell us much except perhaps the minimum(s) and maximum(s) of data.

So, to make it easier to understandable, we can tabulate data in the form of a table given below.

 Marks Tally Mark No of Students (Frequency) 1 || 2 5 || 2 8 | 1 10 | 1 12 |||| 4 17 ||| 3 19 |||| 4 21 |||| 4 25 | 1 33 ||| 3 39 | 1 40 | 1 41 ||| 3

This way of presentation of data is known as frequency distribution. Marks are called variate and the number of students who have secured a particular number of marks is called frequency of the variate. The number of times an observation occurs in the given data, is called the frequency of the observation.

The presentation of data can be further condensed into class groups. In this presentation all observations are divided into groups. These groups are called classes or class intervals.

We can arrange the above data into classes as follows:

 Class Number of students (Frequency) 1-10 6 11-20 11 21-30 5 31-40 4 41-50 4

The class means the marks obtained between and including both and . The number of observations falling in a particular class is called the frequency of that class or class frequency. This type of presentation of data is called grouped frequency distribution.

Frequency Distribution

Frequency distribution in statistics provides the information of the number of occurrences (frequency) of distinct values distributed within a given period of time or interval, in a list, table, or graphical representation.

Frequency distributions are of two types:

(i) Discrete frequency distribution     (ii) Continuous or grouped frequency distribution

Discrete Frequency Distribution: Discrete data is generated by counting, and each and every observation is exact. When an observation is repeated, it is counted. The number for which the observation is repeated is called the frequency of that observation.

The process of preparing this type of distribution is very simple. We then find out the number of times that data has occurred in the set of data provided (i.e. the frequency). You can use tally marks to count the number of times the data has occurred.

Take the example, if there was a test (max marks 5) and the students scored the following marks:

For the above data, if we have to draw a discrete frequency distribution, then it would look like the follows:

 Marks Tally Bars Frequency 1 |||| 5 2 ||||  | 6 3 |||| 4 4 ||| 3 5 || 2

The above method of condensing the raw data is convenient only where the values in the raw data are largely repeating and the difference between the greatest and the smallest observations is not very large

Continuous or Grouped Frequency Distribution: A grouped frequency distribution is an arrangement of class intervals and corresponding frequencies in a table.

If the number of observations in data is large and the difference between the greatest and the smallest observations is large, then we condense the data into classes or groups. Such a presentation of data is known as the grouped frequency distribution.

Let the marks obtained by 30 students of class X in a class test, out of 50 marks, according to their roll numbers be:

 Marks (Class Intervals) Tally Bars No. of students (Frequency) 0-10 |||| | 6 11-20 ||||  ||||  | 11 21-30 |||| 5 31-40 |||| 4 41-50 |||| 4

There are two methods of classifying the data according to the class intervals, viz. (i) ‘Exclusive’ method, and (ii) ‘Inclusive’ method

(i) Exclusive Method: When the class intervals are so fixed that the upper limit of one class is the lower limit of the next class it is known as the exclusive method of classification. In this method the upper limit of a class is not included in the class.

Thus, in the class of marks obtained by students, a student who has obtained marks is not included in this class. He is counted in the next class .

Let the marks obtained by students of class X in a class test, out of  marks, according to their roll numbers be:

For this data set, exclusive method (distribution) would look like the following:

 Class Number of students (Frequency) 0-10 5 10-20 12 20-30 5 30-40 4 40-50 4

(ii) Inclusive Method: In this method the classes are so formed that the upper limit of a class is included in that class.

So for the above data, the distribution would look like the following:

 Class Number of students (Frequency) 0-9 5 10-19 12 20-29 5 30-39 4 40-49 4

If you compare the two methods:

 Exclusive Method Inclusive Method Class Number of students (Frequency) Class Number of students (Frequency) 0-10 5 0-9 5 10-20 12 10-19 12 20-30 5 20-29 5 30-40 4 30-39 4 40-50 4 40-49 4 Total 30 Total 30

It is evident from the above example that both the inclusive and exclusive methods give us the same class frequency although the class intervals are apparently different in the two cases.

If you look at exclusive method, the class interval is $50$ while for inclusive the class interval is $49$. However, $49$ is not correct class interval. Whenever inclusive method is used it is necessary to make an adjustment to determine the correct class interval and to have a continuity.

lf $a - b$ is a class in inclusive method, then in exclusive method it becomes $(a - \frac{h}{2}) - ( b + \frac{h}{2})$  where $h =$ $\frac{lower \ limit \ of \ a \ class - upper \ limit \ of \ a \ class}{2}$

For our example, $h =$ $\frac{10-9}{2}$ $= 0.5$. Hence for inclusive method we get

 Class Number of students (Frequency) -0.5 – 9.5 5 9.5 – 19.5 12 19.5 – 29.5 5 29.5 – 39.5 4 39.5 – 49.5 4

It should be noted that before the adjustment, the class interval was $49$ but after the adjustment, the class interval is $50$.

The mid-value of a class is called the class mark. For example, the class-mark or mid value of the class $10-20$ is $15$. In fact

Class mark $= \frac{lower \ limit + upper \ limit}{2}$

Or Class mark $= lower limit + \frac{1}{2}$ (difference between the upper and lower limits)

Cumulative Frequency Distribution: Technically, a cumulative frequency distribution is the sum of the class and all classes below it in a frequency distribution. All that means is you’re adding up a value and all of the values that came before it.

In the above example the frequencies are grouped-frequencies or class-frequencies.

If, however, the frequency of the first class is added to that of the second and this sum is added to that of the third and so on, then the frequencies so obtained are known as cumulative frequencies.

There are two types of cumulative frequencies viz. less than and greater than. For less than cumulative frequencies we add up the frequencies from above and for greater than cumulative frequencies we add up frequencies from below.

For less than cumulative frequency distribution:

 Class Number of students (Frequency) Less than 10 5 Less than 20 17 Less than 30 22 Less than 40 26 Less than 50 30

For greater than cumulative frequency distribution:

 Class Number of students (Frequency) Greater than 0 30 Greater than 10 24 Greater than 20 13 Greater than 30 8 Greater than 40 3

Construction of a Discrete Frequency Distribution

To prepare a discrete frequency distribution from the given raw data we use the following steps.

• Step 1: Obtain the given raw data and organize that in ascending order.
• Step 2: Prepare a table with three columns – first for ‘variable’ under study such as marks, weight, height etc., second for ‘Tally marks’ and third for the total, representing corresponding ‘frequency’ to each value or size of the variable.
• Step 3: Place all the values of the variable in the first column in ascending order.
• Step 4: Find out how many times that data has occurred in the data occurred in the ascending data structure. This is the frequency of occurrence for the data element

Example: Below are the ages of 25 students of Class X in a school. Prepare a frequency distribution. $15, 16, 16, 14, 17, 17, 16, 15, 15, 16, 16, 17, 15, 16, 16, 14, 16, 15, 14, 15, 16, 16, 15, 14, 15$

Solution:  Frequency Distribution of Ages of 25 school students

 Age Tally Bars Number of students (Frequency) 14 |||| 4 15 ||||  ||| 8 16 ||||  |||| 10 17 ||| 3 Total 25

Construction of A Grouped Frequency Distribution

The following steps is used in the construction of a grouped frequency distribution.

• Step 1: Determine the maximum and minimum value of the variate occurring in the data.
• Step 2: Decide upon the number of classes to be formed. Note that the number of classes should be in range of 5 to 15.
• Step 3: Find the difference between the maximum value and minimum value and divide this difference by the number of classes to be formed to determine the class interval. The difference between the maximum value and minimum value in a data is called range.
• Step 4: Be sure that there must be classes with us to include minimum and maximum occurring in the data.
• Step 5: Take each item from the data, and at a time and put a tally mark (|) against the class to which the item belongs. If tally marks are more than 4, then record them in the bunches of five, the fifth one is marked by crossing diagonally the first four.
• Step 6: By counting determine the total number of tally marks in each class, which gives us the frequency of the class.
• Step 7: Check that the total of all frequencies is same as the total number of observations.

Example: Electricity bill for $30 houses in the locality are given below. Construct a grouped frequency distribution with class size of$latex 10. $30, 32, 45, 54, 74, 78, 108, 112, 66, 76, 88, 40, 14, 20, 15, 35, 44, 66, 75, 84, 95, 96, \\ 102, 110, 88, 74, 112, 14, 34, 44$

Solution: The range of variate: Maximum $= 112$ Minimum $= 14$

Therefore Range $= 112 - 14 = 98$

Given class size $= 10 \Rightarrow$ No of Classes $= 98/10 = 9.8$

Therefore, we should have $10$ classes.

We should make the classes in such a way that the minimum value is covered and the maximum value is covered.

Therefore the distribution is

 Age Tally Bars Number of students (Frequency) 14 – 24 |||| 4 24 – 34 || 2 34 – 44 ||| 3 44 – 54 ||| 3 54 – 64 | 1 64 – 74 || 2 74 – 84 |||| 5 84 – 94 ||| 3 94 – 104 ||| 3 104 – 114 |||| 4 Total 30