Measures of Center

  • Introduction
  • Mean
  • Median
  • Comparing the Mean and the Median
  • Allow's Summarize

CO-4: Distinguish among different measurement scales, cull the appropriate descriptive and inferential statistical methods based on these distinctions, and translate the results.

LO four.iv: Using appropriate graphical displays and/or numerical measures, describe the distribution of a quantitative variable in context: a) draw the overall blueprint, b) describe striking deviations from the blueprint

LO 4.7: Ascertain and depict the features of the distribution of i quantitative variable (shape, center, spread, outliers).

Introduction

Intuitively speaking, a numerical measure out of heart describes a "typical value" of the distribution.

The 2 chief numerical measures for the center of a distribution are themean and themedian.

In this unit on Exploratory Information Analysis, we volition be calculating these results based upon a sample and and then nosotros will frequently emphasize that the values calculated are the sample mean and sample median.

Each one of these measures is based on a completely unlike idea of describing the centre of a distribution.

We will first present each one of the measures, and and so compare their properties.

Mean

LO 4.8: Ascertain and summate the sample mean of a quantitative variable.

Thehateful is theaverage of a set of observations (i.e., the sum of the observations divided by the number of observations).

Themean is theaverage of a ready of observations

  • The sum of the observations divided past the number of observations).
  • If the n observations are written equally

list of values represented by x 1, x 2, up to x n

  •  their hateful tin exist written mathematically as:their mean is:

x-bar equals the sum of all of the x's divided by the sample size, n

We read the symbol as "x-bar." The bar notation is usually used to represent thesample mean, i.e. the mean of the sample.

Using any advisable letter to represent the variable (x, y, etc.), nosotros tin can indicate the sample mean of this variable past adding a bar over the variable annotation.

Instance: All-time Actress Oscar Winners

We will continue with the All-time Actress Oscar winners example (Link to the All-time Extra Oscar Winners data).

34 34 26 37 42 41 35 31 41 33 30 74 33 49 38 61 21 41 26 80 43 29 33 35 45 49 39 34 26 25 35 33

The mean age of the 32 actresses is:

the sum of all of the values is 1233 divided by 32 so that x-bar equals 38.5

Nosotros add all of the ages to go 1233 and divide by the number of ages which was 32 to get 38.5.

We denote this result as ten-bar and called the sample mean.

Note that the sample mean gives a measure of eye which is higher than our approximation of the center from looking at the histogram (which was 35). The reason for this will exist clear presently.

EXAMPLE: World Loving cup Soccer

Oft we take large sets of data and apply a frequency tabular array to brandish the data more efficiently.

Data were nerveless from the last three World Cup soccer tournaments. A total of 192 games were played. The tabular array below lists the number of goals scored per game (not including any goals scored in shootouts).

Full # Goals/Game Frequency
0 17
1 45
2 51
3 37
4 25
five 11
half-dozen 3
7 2
8 1

To find the mean number of goals scored per game, we would demand to notice the sum of all 192 numbers, and so divide that sum by 192.

Rather than add together 192 numbers, nosotros employ the fact that the same numbers appear many times. For instance, the number 0 appears 17 times, the number 1 appears 45 times, the number 2 appears 51 times, etc.

If we add up 17 zeros, nosotros get 0. If we add up 45 ones, we get 45. If nosotros add up 51 twos, we go 102. Repeated add-on is multiplication.

Thus, the sum of the 192 numbers

               = 0(17) + 1(45) + 2(51) + three(37) + iv(25) + 5(11) + vi(3) + 7(2) + viii(1) = 453.

The sample mean is so 453 / 192 = two.359.

Note that, in this example, the values of one, two, and iii are the nearly common and our average falls in this range representing the bulk of the data.

Did I Get This?: Mean

Median

LO four.9: Define and summate the sample median of a quantitative variable.

Themedian Chiliad is the midpoint of the distribution. It is the number such that half of the observations fall to a higher place, and one-half fall below.

To find the median:

  • Lodge the data from smallest to largest.
  • Consider whether n, the number of observations, is fifty-fifty or odd.
    • If n isodd, the median M is the heart observation in the ordered list. This observation is the i "sitting" in the(due north + i) / 2 spot in the ordered listing.
    • If n iseven, the median One thousand is themean of the2 eye observations in the ordered list. These two observations are the ones "sitting" in the(n / 2)and(n / two) + ane spots in the ordered listing.

EXAMPLE: Median (i)

For a unproblematic visualization of the location of the median, consider the following ii simple cases of n = 7 and n = eight ordered observations, with each observation represented by a solid circle:

When there are n=7 ordered observations, the median M is the center observation, which is located in the (7+1)/2 = 4th spot in the ordered list. When there are n=8 ordered observations, the mediam M is the mean of the two center observations, which in this care are located at the 8/2=4th and 8/2+1=5th spots in the ordered list.

Comments:

  • In the images above, the dots are equally spaced, this need non point the information values are actually equally spaced every bit we are only interested in list them in order.
  • In fact, in the above pictures, two subsequent dots could have exactly the same value.
  • It is clear that the value of the median will be in the aforementioned position regardless of the altitude between information values.

Case: Median (ii)

To find the median age of the Best Actress Oscar winners, we beginning need to order the data.

It would be useful, then, to utilise the stemplot, a diagram in which the data are already ordered.

  • Here n = 32 (an fifty-fifty number), so the median M, will be the mean of the two heart observations
  • These are located at the (northward / 2) = 32 / 2 =16thand (due north / two) + one = (32 / 2) + i =17th

Counting from the top, nosotros notice that:

  • the 16th ranked observation is 35
  • the 17th ranked observation also happens to be 35

Therefore, the median M = (35 + 35) / 2 = 35

A stem plot in which the 16th and 17th leaves are highlighted. The stem plot is described in a stem|leaves format in row order. The highlighted entries are surrounded by *: 2|1 2|56669 3|013333444 3|*5**5*5789 4|11123 4|599 5| 5| 6|1 6| 7|4 7| 8|0

Comparing the Mean and the Median

LO iv.x: Choose the appropriate measures for a quantitative variable based upon the shape of the distribution.

As nosotros have seen, the mean and the median, the most common measures of center, each depict the middle of a distribution of values in a different way.

  • The mean describes the center equally an average value, in which theactual valuesof the data points play an important role.
  • The median, on the other manus, locates the centre value equally the center, and theorderof the information is the fundamental.

To get a deeper agreement of the differences between these two measures of center, consider the following instance. Here are two datasets:

Data fix A → 64 65 66 68 70 71 73
Information set B → 64 65 66 68 70 71 730

For dataset A, the mean is 68.1, and the median is 68.

Looking at dataset B, notice that all of the observations except the concluding one are close together. The observation 730 is very large, and is certainly an outlier.

In this instance, the median is withal 68, but the mean volition be influenced by the loftier outlier, and shifted upwards to 162.

The message that we should take from this example is:

The mean is very sensitive to outliers (because it factors in their magnitude), while the median is resistant (or robust) to outliers.

Therefore:

Conclusions… When to use which measures?

  • Apply the sample hateful as a measure of center for symmetric distributions with no outliers.
  • Otherwise, the median volition be a more than appropriate measure of the center of our data.

Let's Summarize

  • The two principal numerical measures for the center of a distribution are the mean and the median. The mean is the average value, while the median is the middle value.
  • The hateful is very sensitive to outliers (every bit it factors in their magnitude), while the median is resistant to outliers.
  • The mean is an appropriate mensurate of middle for symmetric distributions with no outliers. In all other cases, the median is oftentimes a better measure of the center of the distribution.