It’s A Math, Math World – Intro to DOE

In the field of statistics, data collected without the proper “context” is meaningless. Data has to be collected using proper statistical procedures and using proper experimental design for it to be valid and meaningful.  In other words, we design experiments in order to (1) set up a direct comparison between treatments of interest, (2) minimize bias, (3) minimize the error is comparison and (4) make inferences about causation.

There is an entire field of statistics called Design of Experiments (DOE) or Experimental Design that tries to find the best design for a particular situation. Today we will begin a long series of posts dedicated to DOE and we start with an introduction and some terminology.

Experiment – a test in which changes are made to the input variables in a system in order to observe and study the changes made to the output variables.

Explanatory Variables – the input variables

Response Variables – the output variables

Treatments – different procedures being compared in an experiment.

Factor – Combine to form treatments in an experiment

Level – an individual setting of the level.

Ex. Suppose we have a kiln and we are baking ceramic pieces at different temperatures (500, 600 and 800 degrees F) and different humidity percentages (10%, 20%, 30%).

Treatments are different combinations of temp and humidity.

Factors are temperature and humidity

Levels of temperature are 500, 600 and 800

Levels of humidity are 10%, 20% and 30%

Experimental Units are the things to which we apply the treatments.

Response – an outcome we observe after applying a treatment to an experimental unit.

Measurement Units are the actual objects to which the response is measured (may differ from the experimental units)

Ex. If you are applying a standardized test to a classroom of children, then the classroom is the experimental unit and the children are the measurement units.

Control – There are two uses for this word.

An experiment is controlled if the experimenter assigns treatments to experimental units; otherwise it is an observational study.

A control treatment is a “standard” treatment that is used as a baseline or standard of comparison for other treatments.  In clinical research, this could be either a common “gold standard” therapy or a placebo treatment.

Confounding – occurs when the effects of one treatment or factor cannot be distinguished from the effects of another treatment or factor.  The two items are said to be confounded.

Ex. Consider an experiment in which you plant 2 varieties of corn; variety 1 in one NJ and variety 2 in Nebraska.  We are unable to distinguish between state effects and variety effects, therefore the state and variety factors are confounded.

As we can see, experiments usually involve several factors and our goal is to discover which factors influence the response. There are different strategies to approaching how to plan and conduct these experiments.

1. The best-guess approach involves choosing a certain subset of factors to test simultaneously based on theoretical knowledge of the system being studied. It can work reasonably well but has some disadvantages. Suppose the initial guess is incorrect. Then the experiment has to be modified and run again until it is successful which costs time and money.  Also, if it succeeds, the experimenter stops, and he may assume incorrectly that he has the best solution.

2. The one-factor-at-a-time approach consists of starting with baseline (starting levels) of each factor and then varying each factor, one at a time, over their range, while holding the other factors constant at the baseline levels.  The major disadvantage of this method is that it fails to recognize and possible interaction between the factors. An interaction is the failure of one factor to produce the same effect on the response at different levels of another factor.

3. Factorial analysis is the correct approach to dealing with several factors.  This is an experimental design in which several factors are varied together, instead of one at a time.  We will look at these very soon.

Next time, we will begin our look at various types of experimental designs with examples.


Note: Sources of research for this blog post include:

1) Design and Analysis of Experiments (Montgomery), 7th Edition.

2) A First Course in Design and Analysis of Experiments (Dehlert).

Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World


Email Marketing You Can Trust



It’s A Math, Math World (Sample Size Calc)

In our last blog post, we examined the ideas of how sample size and power are interrelated. Today, we are going to look at some examples of power calculations.  The data calculations are from a presentation given by Laura Lee Johnson, Ph.D. who is a Statistician with the National Center for Complementary and Alternative Medicine.

We are going to be looking at the sample size calculations for a study to test a new sleep aid. We will perform various calculations by changing the values of the parameters and seeing what happens to the sample sizes.

  • Study effect of new sleep aid
  • 1 sample test
  • Baseline to sleep time after taking the medication for one week
  • Two-sided test, α = 0.05, power = 90%
  • Difference = 1 (4 hours of sleep to 5)
  • Standard deviation = 2 hr


  • 1 sample test
  • 2-sided test, α = 0.05, 1-β = 90%
  • σ = 2 hr (standard deviation)
  • δ = 1 hr (difference of interest)


  • Change difference of interest from 1 hr to 2 hr
  • n goes from 43 to 11



  • Change power from 90% to 80%
  • n goes from 11 to 8
  • (Small sample: start thinking about using the t distribution)



  • Change the standard deviation from 2 to 3
  • n goes from 8 to 18

We now look at a 2 sample randomized parallel design and compare the sample sizes needed.


  • Original design (2-sided test, α = 0.05, 1-β = 90%, σ = 2hr, δ = 1 hr)
  • Two sample randomized parallel design
  • Needed 43 in the one-sample design
  • In 2-sample need twice that, in each group!
  • 4 times as many people are needed in this design


  • Change difference of interest from 1hr to 2 hr
  • n goes from 170 to 44



  • Change power from 90% to 80%
  • n goes from 44 to 32


  • Change the standard deviation from 2 to 3
  • n goes from 32 to 72



  • Changes in the detectable difference have HUGE impacts on sample size
    • 20 point difference →   25 patients/group
    • 10 point difference → 100 patients/group
    •  5  point difference → 400 patients/group
  • Changes in α, β, σ, number of samples, if it is a 1- or 2-sided test can all have a large impact on your sample size calculation

Next time, we will begin looking at experimental design and clinical trials.

Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review biostatistics than by reading, It’s A Math, Math World

Email Marketing You Can Trust







It’s A Math, Math World (Power/Smpl Size I)

In this post, we will examine type I and type II errors and their relation to sample size and power calculations.

We start with a few definitions:

In a clinical trial, there are 2 types of error that we want to control for:

  • Type I error (False Positive or Consumer’s Risk): This is a decision that finds that the new treatment works better when in fact it really does not.

This error rare is controlled by FDA or other regulatory agencies. Depending on setting, α = 0.05, 0.01 or 0.001 might be required.

  • Type II error (False Negative or Producer’s Risk): This is a decision that fails to find that the treatment works better when in fact it does.

This error rate is controlled more by the company. They have more say in setting this rate, but an irresponsible type II error rate will adversely influence drug approval. For research, a type II error β = 0.20 is usually adequate.

Power = 1 – Type II Error: The chance to detect a difference when one exists.

If there is no bias, then the quality of the study is directly proportional to the sample size.

  • If you have more subjects, then the smaller the error of the estimates and the better the type I and type II errors.
  • IF sample size is too small, then, given type I error is maintained, effective therapy may not be discovered.
  • If sample size is too large, then the study is too expensive and difficult to be done.


It is important to either:

  • Find the minimum sample size to obtain a specified power.
  • Determine the specific power for a given sample size.

However there are many formulas for power and sample size for different:

Outcome types:

  • Continuous
  • Proportions
  • Survival data

Trial purpose:

  • Superiority vs. Non-equivalency

Design of Trial:

  • Matched vs. unmatched study
  • Cluster vs. independent sampling
  • Adjusted for covariates vs. unadjusted analysis

Next time, we will look at specific examples of power calculations.

Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review biostatistics than by reading, It’s A Math, Math World

Email Marketing You Can Trust

It’s A Math, Math World (Randomized Block Designs)

(Note bolded sections and diagrams are from the Research Methods Knowledge Base website at

We saw in out last post that we always want to reduce variability in our data. Stratification is used as a means of controlling sources variation in data as it potentially relates to the outcome.  When we combine stratification with blocking, we get a Randomized Block Design.

They require that the researcher divide the sample into relatively homogeneous subgroups or blocks (analogous to “strata” in stratified sampling). Then, the experimental design you want to implement is implemented within each block or homogeneous subgroup. The key idea is that the variability within each block is less than the variability of the entire sample. Thus each estimate of the treatment effect within a block is more efficient than estimates across the entire sample. And, when we pool these more efficient estimates across blocks, we should get an overall more efficient estimate than we would without blocking.

Here, we can see a simple example. Let’s assume that we originally intended to conduct a simple posttest-only randomized experimental design. But, we recognize that our sample has several intact or homogeneous subgroups. For instance, in a study of college students, we might expect that students are relatively homogeneous with respect to class or year. So, we decide to block the sample into four groups: freshman, sophomore, junior, and senior. If our hunch is correct, that the variability within class is less than the variability for the entire sample, we will probably get more powerful estimates of the treatment effect within each block. Within each of our four blocks, we would implement the simple post-only randomized experiment.

You will only benefit from a blocking design if you are correct that the blocks are more homogeneous than the entire sample. If you are wrong, you will actually be hurt by blocking (you’ll get a less powerful estimate of the treatment effect). How do you know if blocking is a good idea? You need to consider carefully whether the groups are relatively homogeneous.

How Blocking Reduces Noise

So how does blocking work to reduce noise in the data? To see how it works, you have to begin by thinking about the non-blocked study. The figure shows the pretest-posttest distribution for a hypothetical pre-post randomized experimental design. We use the ‘X’ symbol to indicate a program group case and the ‘O’ symbol for a comparison group member. You can see that for any specific pretest value, the program group tends to outscore the comparison group by about 10 points on the posttest. That is, there is about a 10-point posttest mean difference.

Now, let’s consider an example where we divide the sample into three relatively homogeneous blocks. To see what happens graphically, we’ll use the pretest measure to block. This will assure that the groups are very homogeneous. Let’s look at what is happening within the third block. Notice that the mean difference is still the same as it was for the entire sample — about 10 points within each block. But also notice that the variability of the posttest is much less than it was for the entire sample. Remember that the treatment effect estimate is a signal-to-noise ratio. The signal in this case is the mean difference. The noise is the variability. The two figures show that we haven’t changed the signal in moving to blocking — there is still about a 10-point posttest difference. But, we have changed the noise –the variability on the posttest is much smaller within each block that it is for the entire sample. So, the treatment effect will have less noise for the same signal.

Because the blocks are homogeneous, the blocking design yields a stronger treatment effect. If the blocks weren’t homogeneous — their variability was as large as the entire sample’s — we would actually get worse estimates than in the simple randomized experimental case.

Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review biostatistics than by reading, It’s A Math, Math World

Email Marketing You Can Trust