Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data. Statistics is a vast subject but we will keep it simple for Class 8 students.

In Statistics, we have to first learn the collection and Tabulation of data. It is all about data and then using that data to extract inferences.

Key Terms to Understand

Data: The word ‘data’ means information in the form of numerical figures or a set of given facts.  is a set of values of qualitative or quantitative variables.

i) The marks obtained by $10$ pupils of a class in a monthly test are: $51, 56, 50, 57, 61, 52, 66, 55, 67, 60$. This is quantitative data.

ii) However, a collection of colors like Red, Blue, Green would be qualitative data something that generally cannot be measured with numerical results.

Raw or Ungrouped Data: The data obtained in original form is called raw data or ungrouped data. Clearly, the data shown above is a raw data. Normally, when we collect data, we will get raw data.

Group Data: For the sake of convenience (in study and comparison), we may condense the raw data into classes or groups. Such a presentation is known as grouped data.

Array: An arrangement of raw numerical data in ascending order of magnitude is called an Array. Example: $1, 3, 5, 6, 7, 8, 10, 13, 15, 20, 50, 100, 200$

Tabulation or Presentation of Data: Arranging the data in the form of tables in condensed form is called as the Tabulation or Presentation of data.

Variable: A quality which is being measured in an experiment or survey is called a variable. Thus, age of students in a class, monthly income in an office, number of children in the residing in a locality, etc. are the examples of variable

Variate: A particular value of a variable is called variate.

Observation: Each numerical figure in a data is called observation.

Frequency:  The number of times a particular observation occurs is called its frequency.

Frequency distribution: The tabular arrangement of data showing the frequency of each observation is called a frequency distribution.

Frequency Distribution of Ungrouped Data

Let’s use the following data to learn Frequency distribution.

Data: $2, 1, 3, 1, 2, 3, 2, 2, 1, 3, 2, 1, 5, 4, 3, 3, 2, 1, 2, 6$  (this data could be anything… may be number of TVs in homes, number of kids in families in a community etc.)

Step 1: Arranging the above data in ascending order, we get: $1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 5, \ and\ 6$

Step 2: Organize the data in frequency distribution. In simple terms, what it means is the number of times the same data appears.

Thus, we may prepare a frequency table as under:

 Number of children Number of Families (Frequency) $1$ $1$ $4$ $4$ $2$ $2$ $7$ $7$ $3$ $3$ $5$ $5$ $4$ $4$ $2$ $2$ $5$ $5$ $1$ $1$ $6$ $6$ $1$ $1$

Grouped Frequency Distribution: When you have large data set, the above frequency distribution might be more cumbersome. In such scenarios, we can simplify the data by grouping it in groups.

Inclusive Form:

Let’s say for the above data, we put them in different grouping or groups or class interval such as $0-2, 3-4 \ and \ 5-6$.

Here $0-2$ means between $0 \ and \ 2$, including both $0 \ and \ 2$. So, this form is known as inclusive form. $0$ is called the lower limit and $2$ is upper limit of the class $0-2$.

Here also you will follow the same steps 1 and 2. Thus, the above frequency distribution may be presented in inclusive form as under:

 Class Interval Frequency $0-2$ $0-2$ $11$ $11$ $3-4$ $3-4$ $7$ $7$ $5-6$ $5-6$ $2$ $2$

Exclusive Form:

If we do the same thing, but instead of inclusive, we have exclusive classes.

Here $0-2$ means between $0$ and $2$, including $0$ but excluding $2$. So, this form is known as exclusive form. $0$ is called the lower limit and $2$ is upper limit of the class $0-2$. Thus $0-2$ would mean $0$ and more but less than $2$.

Here also you will follow the same steps 1 and 2. Thus, the above frequency distribution may be presented in exclusive form as under:

 Class Interval Frequency $0-2$ $0-2$ $4$ $4$ $3-4$ $3-4$ $12$ $12$ $5-6$ $5-6$ $3$ $3$ $6-8$ $6-8$ $1$ $1$

Class-interval: each group into which the raw data is condensed is called a class-interval. Each class is bounded by two figures, which are called class limits. The figure on left side called lower limit and figure on right side is called as upper limit of the class.

Thus $0-2$ is a class with upper limit $2$ and lower limit $0$.

Class-boundaries: In an exclusive form the lower limits and upper limits are known as class boundaries.

Thus, if the boundaries of the class $10-20$ in exclusive forms are $10$ and $20$. The boundaries in inclusive form are obtained by subtracting $0.5$ from lower limit and adding $0.5$ to upper limit. Thus the boundaries in inclusive form $10-20$ are $9.5$ and $20.5$.

Class-size: Difference between true upper limit and true lower limit is called the class size.

Class Mark or Mid-value: $Class \ Mark=\frac{1}{2}\times (Upper\ limit+lower\ limit)$

Thus the class mark of $10-20$ is $\frac{1}{2}\times (20+10) = 15$

Range: The difference between the maximum value and the minimum value of the observations is called its range.

Graphical Representation of Statistical Data

The tabular representation of data is an ideal way of presenting them in a systematic manner.

However, to make the data more noticeable, or easily understandable, we use pictorial representation of the data. Example is a like a bar graph, or a line diagram etc. Once we represent the data in a graphical manner, it is easier to compare the data, see the trend within the data set.

Bar Graph (Or Column Graph)

A bar graph is a pictorial representation of numerical data in the form of rectangles (or bars) of uniform width and varying heights.

These rectangles are drawn either vertically or horizontally, keeping equal space between them.

How to Draw a Bar Graph?

Step 1.  Take a graph paper.  Draw the $x-axis$ and $y-axis$.

Step 2. Mark points at equal intervals along the $x-axis$. Below these points write the names of the data items whose values are to be plotted. In our case, it is the number if “Children” the family has.

Step 3. Choose a suitable scale on that scale determine the height of the bars for the given numerical values. The height of a column represents the frequency of the corresponding observation. This is the number of “Families” or “Frequency”. In our example, there are 4 families that have 1 Child.

Step 4. Mark off these heights parallel to the $y-axis$ from the points taken in step 2.

Step 5. On the $x-axis$, draw bars (or columns) of equal width for the heights marked in Step 4. The bars should be centered on the points marked on the $x-axis$. These bars represent the given numerical data.

So if we were to plot bar graph for figure 1, we will get the following: 