It’s A Math, Math World (Probability)

Today’s blog post is really targeted as a “Statistics 101” for the early statistics student or non-statistician. It is a multi-part post on basic stats. Last week, we summarized descriptive statistics and next week we look at inferential statistics. We have some groundwork to complete in the meantime.

Before we dive into inferential statistics, we need to look at the notion and application of probability.  Statistics are based on probabilities and used in applications such as margins of error, confidence intervals, p-values, etc.  In general terms, probability is how “likely” something is to occur. We often throw probabilities around in a subjective fashion (i.e. “It’s ‘50-50’ that I will go to the store” and I’m 90% certain that I know that answer”), but probability is a clearly defined concept in mathematics and has certain axioms that we accept and that can be proven.

By definition, the probability of an event E, P(E), is the relative frequency of an event occurring given an indefinitely long series of repetitions of a chance experiment.

For example, we start with an “experiment” which is anything that produces “outcomes”. Using these “rules of probability”, we can assign a probability to each outcome in the sample space (the set of all possible outcomes).

For example, when we toss a die once, the sample space is given by S= {1, 2, 3, 4, 5, 6}, because one of these 6 numbers will appear every time we toss a die.

A random variable is a variable that can take on different values; each with a certain probability.  A random variable (r.v.) can be either discrete (having a countable or finite # of values) or continuous (having an infinite number of values). Let A and B be discrete random variables from the die tossing problem. Each r.v. can be assigned a value and a probability as follows:

  • Let A = the event that a ‘1’ occurs on the die throw

P (A) = (# of favorable outcomes)/ (Total # of outcomes)

P (A) = 1/6

  • Let B = the “event” (collection of outcomes) that a die roll is an even number

P (B) = (# of favorable outcomes)/ (total number of outcomes)

Favorable outcomes are {2, 4, 6} so numerator = 3

Total number of outcomes = 6

P (B) = 3/6 = ½

Each event can be assigned a probability as can every point in the sample space.

Let A= an event in sample space S

  • 0<= P (A) <=1
  • P(S) = 1
  • Let S’ (the compliment of S) be the event that S does not occur then P(S’) = 1 – P(S)

Which implies P(S’) = 0

This makes sense, because the probability that an event from the sample space occurs is 1. Since every time a die is tossed, one of the 6 outcomes must come up. The complement, P(S’), the event that no outcomes in S happen is zero.

We have looked at probabilities of random variables in the discrete case were we had a finite number of values that could be enumerated. However, some probabilities are in the continuous case when the values of the random variable can take on an infinite number of values on a continuum.  For example if Y = the lifespan of a refrigerator and it is bounded by 0 and 7 years, then the r.v. can take on any value in that interval. Rather than assigning probabilities in a roster or tabular method, we will graph the respective probabilities using a function, called a density, which generates the probabilities. This graph is called a frequency distribution or density curve.

The most common density curve is the Normal Distribution which resembles the “bell curve” used in many real world applications, including the natural sciences. Remember when your teacher/professor graded “on a curve”? This is an application of this distribution. You need the 2 parameters ;mean and standard deviation; of a normal distribution to define it. The formula for the density is rather complicated so we don’t use it to calculate the associated probabilities, instead we refer to the Standard Normal Distribution which has mean=0 and standard deviation=1. We have tables of values for these Standard Normal probabilities.

That is fine, but what do we do when we have a normal distribution which is NOT standard normal?

We can standardize any normal distribution to become standard normal. Then, we can use the Standard Normal tables to look up our values.

Suppose X has a normal distribution (mean = 5, stdev =10). Then we can standardize it to become Z* as follows:

Z* = (X- mean)/stdev = (X-5)/10 is standard normal distribution.

Ex.  find p (X >25)

P (Z* > (25-5)/10) = P (Z* > 2) = 1 – P (Z* < 2) = 1-0.9772 (look up from table) = 0.0228

The probability that an r.v.  X >25 is 0.0228.

Next week, we will look at sampling distributions and the Central Limit Theorem (CLT), confidence Intervals and hypothesis tests. After next week, we will move onto more narrow topics and be a little more laser-focused.

Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World

23 Responses to “It’s A Math, Math World (Probability)”