## It’s A Math, Math World (Basic Stats 101)

Today’s blog post is really targeted as a “Statistics 101” for the early statistics student or non-statistician. It is a 2-part post on basic stats. Today, we summarize descriptive statistics and next week we look at inferential statistics. Let me know your opinions!

We will start with some definitions.

Statistics is the science of collecting, organizing, describing, analyzing and presenting data. It has applications in almost all fields such as the natural sciences, social sciences, manufacturing and business. The main reasons for the applications of statistics are two-fold.

- Everyone has data to be analyzed. As a matter of fact, we are sometimes overwhelmed with the volume of data that we have. We need some way to interpret it.
- All aspects of academia and business are concerned with understanding and controlling
*variation*in their data. In business, variation in production causes loss of revenue from sub–standard products being rejected in the factory or recalled from the marketplace. In research and academia, variation in data means that current processes are inconsistent and need to be re-tooled for more accurate results. For example, in medical research, uncontrolled variation in a clinical trial could result in unsafe products being approved due to invalid data.

These leads to us looking at the two branches of statistics which are:

- Descriptive statistics
- Inferential Statistics

We will look at descriptive statistics today and inferential statistics, briefly, next week. To understand this area of study, we need to understand the basic concept behind the study of statistics.

We consider a *population* of measurements (our data). For example, consider the weight of every US citizen as our population of interest. The average weight of the population is unknown and we call that a *parameter*. Theoretically, we could weigh all 300 million people in the US and average their weights, but this process would be time consuming and very expensive.

Therefore, we randomly select a representative subset of this population of measurements. This is called a *sample*. The sample size is predetermined by statistical methodology and sometimes by the cost restrictions of the collecting organization. The term “random” will be defined later and is the subject of a lot of research and debate.

For example, we select 5,000 measurements, at random, from the population of USA weights and find their average. If this sample is “random” (i.e. picked such that every element of the population is equally likely to get picked) then the sample average, known as a *statistic*, should estimate the population parameter within a certain error range.

Getting back to descriptive statistics, we have the following:

For numeric data, the two most common descriptive statistics (i.e. statistics that describe the data) are the measures of *central tendency* and *dispersion*. The measures of central tendency describe the “center” of the data and include the *mean* and *median*. The measures of dispersion include the *range*, *variance*, and *standard deviation* and describe the “spread” of the data around the “center”.

Descriptive statistics also include the use of *plots* to “graphically” describe the data. For example, we often use the following types of plots:

- Box Plots
- Histograms
- Line Graphs
- Bar Charts
- Pie Charts
- Scatter Plots (for bivariate data)
- etc.

Please let me know your comments. I eventually want to move into Statistical applications, concepts and SAS programming areas, but thought that a solid foundation in basic stats would be a good place to start. Is this background helpful to you? Should I just move on to more advanced concepts? Please advise me. Thank you.

*Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World*

Michael, this is useful. I am working in the market research department of a large telecommunication company. What you have presented here is a good start. I enjoy reading all levels of statistical discussions and this is a sort of refresher. As far as I am concerned, keep going at the pace you are. I like the idea that you will be bringing in SAS. Maybe include market research as well at some point. Looking forward to next chapter.

Thanks, Jay. Should I delve into some probability also or is that too deep? Your suggestions would be helpful.

I’ll keep following this. Brings back some vague memories from a statistics class taken long, long ago! Thanks.

Hi Maureen, I will get into more advanced stuff pretty quickly. I am jusy “testing the waters” to see what topics seem to get “traction”. Any suggestion you have would be greatly appreciated. Thank you for following my blog. I usually post every Monday evening (Eastern Time Zone).

Hi Maureen, I will get into more advanced stuff pretty quickly. I am jusy “testing the waters” to see what topics seem to get “traction”. Any suggestion you have would be greatly appreciated. Thank you for following my blog. I usually post every Monday evening (Eastern Time Zone).

Hi Michael, I am enjoying the education that you are giving me. It is great that you are starting with a refresher of the basics and I look forward to expanding my understanding of statistical analysis. You are presenting the material in a basic, clear fashion. Thanks so much! (maybe you should go into teaching :-))

Hi Nancy, It is good to hear from you. Thank you for the feedback. I was undecided about the direction in which to take this blog but I am getting great fedback and “traction” from the latest post. I may continue with “statistics 101” and then proced to more advanced material and some statistical programming (i.e. SAS). THanks for following my blog!

Good to start on the ground level. combining SAS with stats discussions will be interesting.

Michael, I would say,I like the current pace. I like the idea and definetely will follow this tutorial. Would be great to have some health care examples….

Hi Irina, I will try to get some health care examples if I can keep it brief and concise. I want to keep it within grasp for everyone.

Please follow the tutorial. Pls do it on daily basis.

It is very useful.

Great site I’m happy I stumbled onto it via my friend’s blog. Going to have to add another blog to the morning routine.

Great blog! I really love how it is easy on my eyes and the facts are well written. I am wondering how I might be notified whenever a new post has been made. I have subscribed to your rss feed which ought to do the trick! Have a nice day!

Hello there… Great stuff

I do Bioinformatics, and I find this interesting as it serves as a retrospective refresher… SAS is interesting, I have been working with it but I still like to call myself a learner…

It is going to be more interesting if you could get into probabilities, probabilistic modelling and areas like that under different contexts (genetics, BI, Marketing, behavior…etc)..

Best of luck, I am an enlisting follower 🙂

I post every MOnday evening. Just go to http://www.mpobrien.com for the blog site. The RSS feed should do it also.

After searching Google I found your internet site. I think both are good and I is going to be coming back again to you and them in the future. Thanks

I do try writing about that exact matter but all I can think of is stuff already written, Maybe I have writing block, how you do it?

The BEST page I have read all day!