## NEW POSTS COMING SOON!!!!

New posts will be starting in the next few weeks. I will be continuing the discussion of Design of Experiments.

## It’s A Math, Math World – Intro to DOE

In the field of statistics, data collected without the proper “context” is meaningless. Data has to be collected using proper statistical procedures and using proper experimental design for it to be valid and meaningful. In other words, we design experiments in order to (1) set up a direct comparison between treatments of interest, (2) minimize bias, (3) minimize the error is comparison and (4) make inferences about causation.

There is an entire field of statistics called Design of Experiments (DOE) or Experimental Design that tries to find the best design for a particular situation. Today we will begin a long series of posts dedicated to DOE and we start with an introduction and some terminology.

**Experiment** – a test in which changes are made to the input variables in a system in order to observe and study the changes made to the output variables.

**Explanatory Variables** – the input variables

**Response Variables** – the output variables

**Treatments** – different procedures being compared in an experiment.

**Factor **– Combine to form treatments in an experiment

**Level** – an individual setting of the level.

Ex. Suppose we have a kiln and we are baking ceramic pieces at different temperatures (500, 600 and 800 degrees F) and different humidity percentages (10%, 20%, 30%).

*Treatments* are different combinations of temp and humidity.

*Factors* are temperature and humidity

*Levels* of temperature are 500, 600 and 800

*Levels* of humidity are 10%, 20% and 30%

**Experimental Units** are the things to which we apply the treatments.

**Response** – an outcome we observe after applying a treatment to an experimental unit.

**Measurement Units** are the actual objects to which the response is measured (may differ from the experimental units)

Ex. If you are applying a standardized test to a classroom of children, then the classroom is the experimental unit and the children are the measurement units.

**Control** – There are two uses for this word.

An experiment is *controlled* if the experimenter assigns treatments to experimental units; otherwise it is an observational study.

A *control* treatment is a “standard” treatment that is used as a baseline or standard of comparison for other treatments. In clinical research, this could be either a common “gold standard” therapy or a placebo treatment.

**Confounding** – occurs when the effects of one treatment or factor cannot be distinguished from the effects of another treatment or factor. The two items are said to be confounded.

Ex. Consider an experiment in which you plant 2 varieties of corn; variety 1 in one NJ and variety 2 in Nebraska. We are unable to distinguish between state effects and variety effects, therefore the state and variety factors are *confounded*.

As we can see, experiments usually involve several factors and our goal is to discover which factors influence the response. There are different strategies to approaching how to plan and conduct these experiments.

1. The **best-guess approach** involves choosing a certain subset of factors to test simultaneously based on theoretical knowledge of the system being studied. It can work reasonably well but has some disadvantages. Suppose the initial guess is incorrect. Then the experiment has to be modified and run again until it is successful which costs time and money. Also, if it succeeds, the experimenter stops, and he may assume incorrectly that he has the best solution.

2. The **one-factor-at-a-time approach** consists of starting with baseline (starting levels) of each factor and then varying each factor, one at a time, over their range, while holding the other factors constant at the baseline levels. The major disadvantage of this method is that it fails to recognize and possible interaction between the factors. An *interaction* is the failure of one factor to produce the same effect on the response at different levels of another factor.

3. **Factorial analysis** is the correct approach to dealing with several factors. This is an experimental design in which several factors are varied *together*, instead of one at a time. We will look at these very soon.

Next time, we will begin our look at various types of experimental designs with examples.

*Note: Sources of research for this blog post include:*

*1) Design and Analysis of Experiments (Montgomery), 7 ^{th} Edition.*

*2) A First Course in Design and Analysis of Experiments (Dehlert).*

*Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World*

Email Marketing You Can Trust

## It’s A Math, Math World (Sample Size Calc)

In our last blog post, we examined the ideas of how sample size and power are interrelated. Today, we are going to look at some examples of power calculations. The data calculations are from a presentation given by Laura Lee Johnson, Ph.D. who is a Statistician with the National Center for Complementary and Alternative Medicine.

We are going to be looking at the sample size calculations for a study to test a new sleep aid. We will perform various calculations by changing the values of the parameters and seeing what happens to the sample sizes.

- Study effect of new sleep aid
- 1 sample test
- Baseline to sleep time after taking the medication for one week
- Two-sided test, α = 0.05, power = 90%
- Difference = 1 (4 hours of sleep to 5)
- Standard deviation = 2 hr

**CASE #1**

**1 sample test****2-sided test,****α****= 0.05, 1-****β****= 90%****σ****= 2 hr (standard deviation)****δ****= 1 hr (difference of interest)**

**CASE #2**

**Change difference of interest from 1 hr to 2 hr****n goes from 43 to 11**

** **

**CASE #3**

**Change power from 90% to 80%****n goes from 11 to 8****(Small sample: start thinking about using the t distribution)**

**CASE #4**

**Change the standard deviation from 2 to 3****n goes from 8 to 18**

We now look at a 2 sample randomized parallel design and compare the sample sizes needed.

CASE #1A

**Original design (2-sided test,****α****= 0.05, 1-****β****= 90%,****σ****= 2hr,****δ****= 1 hr)****Two sample randomized parallel design****Needed 43 in the one-sample design****In 2-sample need twice that, in each group!****4 times as many people are needed in this design**

**CASE #2A**

**Change difference of interest from 1hr to 2 hr****n goes from 170 to 44**

**CASE #3A**

**Change power from 90% to 80%****n goes from 44 to 32**

**CASE #4A**

**Change the standard deviation from 2 to 3****n goes from 32 to 72**

**Changes in the detectable difference have HUGE impacts on sample size****20 point difference → 25 patients/group****10 point difference → 100 patients/group****5 point difference → 400 patients/group**

**Changes in****α****,****β****,****σ****, number of samples, if it is a 1- or 2-sided test can all have a large impact on your sample size calculation**

Next time, we will begin looking at experimental design and clinical trials.

*Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review biostatistics than by reading, It’s A Math, Math World** *

Email Marketing You Can Trust

## It’s A Math, Math World (Power/Smpl Size I)

In this post, we will examine type I and type II errors and their relation to sample size and power calculations.

We start with a few definitions:

In a clinical trial, there are 2 types of error that we want to control for:

- Type I error (False Positive or Consumer’s Risk):
**This is a decision that finds that the new treatment works better when in fact it really does not.**

This error rare is controlled by FDA or other regulatory agencies. Depending on setting, α = 0.05, 0.01 or 0.001 might be required.

- Type II error (False Negative or Producer’s Risk):
**This is a decision that fails to find that the treatment works better when in fact it does.**

This error rate is controlled more by the company. They have more say in setting this rate, but an irresponsible type II error rate will adversely influence drug approval. For research, a type II error β = 0.20 is usually adequate.

Power = 1 – Type II Error: **The chance to detect a difference when one exists**.

If there is no bias, then the quality of the study is directly proportional to the sample size.

- If you have more subjects, then the smaller the error of the estimates and the better the type I and type II errors.
- IF sample size is too small, then, given type I error is maintained, effective therapy may not be discovered.
- If sample size is too large, then the study is too expensive and difficult to be done.

**MAIN IDEA:**

It is important to either:

- Find the minimum sample size to obtain a specified power.
- Determine the specific power for a given sample size.

**However there are many formulas for power and sample size for different:**

**Outcome types:**

- Continuous
- Proportions
- Survival data

**Trial purpose:**

- Superiority vs. Non-equivalency

**Design of Trial:**

- Matched vs. unmatched study
- Cluster vs. independent sampling
- Adjusted for covariates vs. unadjusted analysis

Next time, we will look at specific examples of power calculations.

*Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review biostatistics than by reading, It’s A Math, Math World** *

Email Marketing You Can Trust

## It’s A Math, Math World (Randomized Block Designs)

(Note bolded sections and diagrams are from the Research Methods Knowledge Base website at http://www.socialresearchmethods.net/kb/expblock.php)

We saw in out last post that we always want to reduce variability in our data. Stratification is used as a means of controlling sources variation in data as it potentially relates to the outcome. When we combine stratification with blocking, we get a Randomized Block Design.

**They require that the researcher divide the sample into relatively homogeneous subgroups or blocks (analogous to “strata” in stratified sampling). Then, the experimental design you want to implement is implemented within each block or homogeneous subgroup. The key idea is that the variability within each block is less than the variability of the entire sample. Thus each estimate of the treatment effect within a block is more efficient than estimates across the entire sample. And, when we pool these more efficient estimates across blocks, we should get an overall more efficient estimate than we would without blocking.**

**Here, we can see a simple example. Let’s assume that we originally intended to conduct a simple posttest-only randomized experimental design. But, we recognize that our sample has several intact or homogeneous subgroups. For instance, in a study of college students, we might expect that students are relatively homogeneous with respect to class or year. So, we decide to block the sample into four groups: freshman, sophomore, junior, and senior. If our hunch is correct, that the variability within class is less than the variability for the entire sample, we will probably get more powerful estimates of the treatment effect within each block. Within each of our four blocks, we would implement the simple post-only randomized experiment.**

You will only benefit from a blocking design if you are correct that the blocks are more homogeneous than the entire sample. If you are wrong, you will actually be hurt by blocking (you’ll get a less powerful estimate of the treatment effect). How do you know if blocking is a good idea? You need to consider carefully whether the groups are relatively homogeneous.

## How Blocking Reduces Noise

**So how does blocking work to reduce noise in the data? To see how it works, you have to begin by thinking about the non-blocked study. The figure shows the pretest-posttest distribution for a hypothetical pre-post randomized experimental design. We use the ‘X’ symbol to indicate a program group case and the ‘O’ symbol for a comparison group member. You can see that for any specific pretest value, the program group tends to outscore the comparison group by about 10 points on the posttest. That is, there is about a 10-point posttest mean difference.**

**Now, let’s consider an example where we divide the sample into three relatively homogeneous blocks. To see what happens graphically, we’ll use the pretest measure to block. This will assure that the groups are very homogeneous. Let’s look at what is happening within the third block. Notice that the mean difference is still the same as it was for the entire sample — about 10 points within each block. But also notice that the variability of the posttest is much less than it was for the entire sample. Remember that the treatment effect estimate is a signal-to-noise ratio. The signal in this case is the mean difference. The noise is the variability. The two figures show that we haven’t changed the signal in moving to blocking — there is still about a 10-point posttest difference. But, we have changed the noise –the variability on the posttest is much smaller within each block that it is for the entire sample. So, the treatment effect will have less noise for the same signal.**

Because the blocks are homogeneous, the blocking design yields a stronger treatment effect. If the blocks weren’t homogeneous — their variability was as large as the entire sample’s — we would actually get worse estimates than in the simple randomized experimental case.

*Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review biostatistics than by reading, It’s A Math, Math World** *

Email Marketing You Can Trust

## It’s A Math, Math World (Block Design)

**Blocking** is a method to create balance in treatment assignments over time as recruitment progresses. Randomization, by its very nature, can create treatment groups of unequal sizes.

First, we have to define *a treatment assignment ratio (allocation ratio) as the ratio of the number of persons in one treatment group relative to another*. For example:

A ratio of 1:1 indicates a treatment and control group of equal sizes

A ratio 1:1:1:1:5 indicates 4 test treatments and 1 control treatment and there are 5 times as many control subjects as in any of the test treatment groups. (Note: The control group is always listed last).

**The block size must be a multiple of the sum of the digits of the allocation ratio.**

**Example:**

Three treatments A, B and C

Allocation Ratio is 1:1:2 (¼ allocated to A, ¼ allocated to B and ½ allocated to C)

Sum of digits = 1+1+2 = 4

Then block sizes must be multiple of 4. Within each block, we randomly assign the treatments according to the allocation ratio.

There is one down side of using the same size block of 4 in this case. Once the first 3 people had been assigned treatments in that block, we would know the treatment assignment for the 4^{th} person. Thus, the mask can be broken on the study this way.

However, by varying the block sizes (sometimes 4, sometimes 8, etc), we can prevent this from happening.

Methods of Varying Block Sizes:

**1) Alternating**

**Ex**: 1:1 allocation with treatments A and B

4-6-4-6-4-6-4-6-…

1^{st} Block size 4

2^{nd} Block size 6

3^{rd} Block size 4

etc.

This approach is the easiest but the pattern can be figured out. The treatment assignment of the 4^{th}, 10^{th}, 14^{th}, 20^{th}, etc, persons are determined by previous assignments.

**2) Random Varying**

4-4-6-4-6-?-?-?

Where “?” = either 4 or 6

**A two step process:**

**First, select a random number to determine block size**

If u_{i} < 0.5, then block *i* has 4 subjects

If u_{i} > 0.5, then block *i* has 6 subjects

**Second, once block size has been determined, select the random ordering of blocks**

Ex. AABABAABA

Nurse will not know if 10^{th} subject will be a “B” or “A” because he/she does not know what the first and second block sizes are.

Next time, we will look at stratification to reduce variation in our clinical trial data.

* *

Email Marketing You Can Trust

## It’s A Math, Math World (Randomization)

Some of the information in this article (i.e. some definitions and examples) is attributed to a lecture at Rutgers University by Adele Gilpin during the spring 2004 semester.

In a clinical trial, we want to help control for bias and variance. To accomplish this, we want to make our treatment and control groups as *comparable* or similar as possible on certain characteristics which are pre-defined. The way we will do this is to **randomize **which patient is assigned to which treatment group. This helps accomplish 3 things:

1. Provides study groups with known statistical properties at baseline

2. Provides a statistical basis for tests of significance

3. Eliminates Selection bias

**Randomization is the process of assigning patients to treatment groups in which there is a known unbiased probability for each outcome of assignment**. The randomization method must be both *reproducible* and *well documented* for the regulatory authorities. This means you can’t flip a coin to perform the randomization since a sequence of coin flips is not reproducible and the coin could be biased. This randomization is done using a computer generated sequence of random numbers as we shall see later.

Other properties of a good randomization scheme include:

1. Release of patient assignments prevented until necessary conditions are satisfied

2. Assignments masked to all involved parties until no longer needed

3. Future assignments not predictable from past assignments (not true for blocked designs)

4. Clear audit trail for assignments

Randomization helps protect against selection bias in the assignment process, however it *does not* *ensure* comparable study groups. They can still differ by chance. Another popular misconception is that “random” numbers are random. If you generate random numbers by computer, you start with a “seed” number. If you use the same seed in the same process, you will get the same string of numbers every time. This is why it is reproducible.

An example of a simple computer randomization scheme:

**Generate random uniform U (0,1) numbers X _{i}**

**If: X _{i} > 0.5 then subject gets treatment A**

** X _{i }<0.5 then subject gets treatment B**

The treatment assignment ratio (or allocation ratio) is the ratio of the number of persons to be in one treatment group relative to another.

**Ex**. a ratio of 1:1 indicates a treatment group and control group of equal numbers of participants

**Ex**. a ratio of 1:1:1:1:2.5 indicates 4 treatment groups of equal size and a control group that has 2.5 as many people assigned to it than any of the treatment groups.

Randomization can cause an imbalance in the group sizes if left to its own devices. There are ways to compensate for this. One way is to use *blocking*. Blocking is used to maintain balance in treatment assignments over time as recruitment progresses. This will be discussed in the next blog post.

* *

Email Marketing You Can Trust

## It’s A Math, Math World (Variance Control)

Some of the information in this article (i.e. some definitions and examples) is attributed to a lecture at Rutgers University by Adele Gilpin during the spring 2004 semester.

In any clinical trial, as well as controlling bias, we want to control variance. What is the difference between bias and variance? **Bias causes the sample mean of one treatment group to be larger or smaller than the true mean. On the other hand, variance inflates the variability of the observed treatment group means. Large variance in a clinical trial reduces the power of the statistical tests**.

**Ways to minimize variance in a clinical trial:**

- Design considerations
- Crossover designs:

Ex. (pain score on treatment 1) – (pain score on treatment 2)

Design balances out any order effect of the treatment administration.

- Stratification and matching:

Stratification uses randomization to balance groups on a **few **characteristics.

Matching is an extreme case of randomization in which balancing is done on **several ** characteristics. It is very expensive and labor intensive, and it can cause recruitment flow problems because it can be difficult to find patients if you match on too many variables.

- Increased sample size:

It increases precision of the estimates of treatment effects. It may not be feasible to recruit the necessary number of patients. It also adds expense to an already expensive process.

- Conduct considerations
- Patient selection:

Use of inclusion and exclusion criteria is used to make patient sample as similar as possible. This reduces variance of the response means but also limits the generalizing of the study results to the general population. Thus, we don’t know if treatment works on those excluded and the FDA might only approve the treatment for a specific subpopulation. The market for the product may be limited. Also, using tight inclusion/exclusion criteria can make it more difficult to recruit enough patients.

- Study Site/Investigator selection:

The sited may have different patient profiles (based on geography for example).

Site staff may have different abilities resulting in different treatment outcomes.

- Standardization of Procedures:

Written protocol

Calibration of instruments and measurement conditions

Training of personnel

Site visits

Central readings and assays (use of a central laboratory)

- Analysis Considerations: (note: to be considered in upcoming blog post)

Use of baseline covariates for adjustment (as in linear regression)

Subgroup analysis

Outlier detection and trimming procedures

- Patient Assignment Considerations: (note: to be considered in upcoming blog post)

Randomization

Stratification

Blocking

*Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World** *

Email Marketing You Can Trust

## It’s A Math, Math World (Clinical Trial Bias)

Some of the information in this article (i.e. some definitions and examples) is attributed to a lecture at Rutgers University by Adele Gilpin during the spring 2004 semester.

In any clinical trial, we want to ensure that our response is not biased. By definition, *bias is systematic error introduced into sampling or testing by selecting or encouraging one outcome over another*. If the same error occurs in the treated and untreated groups, then the study is still internally valid, but systematic error can cause an entire collection of measurements too lose their meaning.

There are 2 types of bias we will look at: (1) Treatment-related, and (2) Non-treatment related.

Treatment related bias: Systematic error that is treatment related can really harm a trial. This is a bias related to treatment assignment that affects the observed treatment differences in the trial. It is important to ensure, to the best of one’s ability, that no such bias exists *before the trial begins*.

Treatment related bias can occur in 3 ways:

- During treatment assignment: This occurs through an assignment process that allows groups to be different at baseline.
- During treatment process: If you have comparable groups, you may treat them differently other than the assigned treatment administration. For example, one group might receive systematically better care than another.
- During Measurement or Data Collection process: In terms of measurement, one may listen more carefully to heart sounds on mercury column when taking BP because they think that group will produce a person with higher BP. In terms of data collection, an investigator might document adverse events more carefully because he/she thinks they are more serious because one treatment is more dangerous than another.

Non-treatment related bias: This is study error not related to treatment assignment. This can cause a “conservative bias” that can make it more difficult to detect a treatment effect. Conservative bias is not good for developer of treatment or patients with condition of interest because it is harder to determine whether treatment is effective against primary outcome.

Requirements to reduce bias:

- Establish comparable study groups that are free of selection bias.
- Use a data collection schedule in which the probability of observing an event is the same for all patients.
- Use data collection procedures that are reproducible and standardized over all treatment groups.

Methods of Bias control:

**Masking (blinding)**:

This is used to conceal the intervention assignment from either the patient, investigator or both.

**Randomization:**

This is used to create comparable treatment and control groups.

**Standardization:**

Written treatment protocol

Tested forms and other documentation, including manuals

Written definitions of what response is

Standard equipment that is tested and calibrated

Training and certification of study personnel

**Surveillance:**

Each trial must be carefully and independently monitored to ensure protocol and regulatory compliance.

*Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World** *

Email Marketing You Can Trust

## It’s A Math, Math World (Clinical Trials)

Some of the information in this article (i.e. definitions) is from the book: Fundamentals of Clinical Trials (Friedman, Furberg, DeMets, 1998, 3^{rd} edition)

Statistics has a huge part in the biopharma industry; especially in clinical trials. *A clinical trial is a prospective study used to compare the effect(s) of an intervention(s) against a control in human subjects.* It is prospective rather than retrospective, meaning the subjects are followed forward in time and the data collected during the trial. Each subject must be followed forward from a well-defined point which is designated as *time-zero* or *baseline*. This time point may be different for every subject because subjects are recruited and may enter the study at different times.

A clinical trial must test an intervention(s) which is defined as “prophylactic, diagnostic or therapeutic agent, device, regime, procedure, etc.” that is tested in an attempt to alter some aspect of the participants (study subjects). There must also be a control group to which the intervention group is compared. At baseline, the control group must be as similar as possible to the active treatment group(s) such that any effect of the intervention(s) can be measured properly and attributed to the intervention. Sometimes an investigational treatment is compared to the best intervention being used currently. Other times, if there is no “best treatment”, a non-active, inert *placebo* treatment is used instead as a control.

In our future work, we will be considering the application of clinical trials to pharmaceutical research; the testing of investigational medications in humans. The clinical trials process is in 4 phases:

**Phase 1**: This is the first phase of human drug testing. In this phase, the drug is being tested in healthy human “volunteers” (note we do not call them patients because they are not sick) to study the safety, absorption, and tolerability of the drug. Also, the investigators try to determine a safe starting dose for future research studies. This phase involves a relatively small group of people.

**Phase 2**: In this phase, the goals are to determine efficacy of the drug, whether the drug has beneficial biologic activity, and the rate of adverse events, or side effects. These trials usually occur in several hundred people who have the medical condition that the drug is intended to treat.

**Phase 3**: In this phase, the drug is tested against other drugs currently used therapeutically to treat the condition the patients have. These trials are conducted in several thousand people and occur just before the drug goes to the FDA for approval.

**Phase 4**: These trials are conducted after the drug is approved and is on the market. Sometimes, tests are done to check safety concerns or to test the drug to treat conditions it was not approved to treat. These are called *off-label indications*.

In future posts, I will exam other aspects of clinical trials and how statistics ties into them. We are done with the basic stats for now. We are diving into the applications to clinical research.

* *

Email Marketing You Can Trust