These
are small files written in Microsoft Excel 97, PowerPoint, and Word. They
demonstrate various common concepts of statistics or help you to perform
certain kinds of calculations. You can copy any of these by
clicking your mouse on the appropriate item below. You will then get a
chance to save the compressed file(s) and to specify where on your computer
hard drive to save it. Be sure to put them some place that you can find
them later!
Sample size estimation is a common
problem when designing a study. The pages in this file will help you with
studies involving comparison of counts, means, or proportions. It is
important to have an idea about what constitutes a clinically important
difference in outcomes. For example, if the main outcome variable of your
study is systolic blood pressure, is it important to doctors/journal readers to
be able to detect an improvement of 1, 2, 5, 10, or 20 Torr? It is also important to have some idea about
the variability you are likely to observe in the data you will eventually
collect. This most often is obtained from previous studies like
yours. For our blood pressure example,
you should search and obtain several 3-5 papers like this one that has means
and standard deviations of systolic blood pressure in the results section [Pesola GR, Pesola HR, Nelson MJ,
Westfal RE (January 2001). "The
normal difference in bilateral indirect blood pressure recordings in
normotensive individuals". American Journal of Emergency
Medicine 19 (1): 43–5.] Statistical
consultation should always be obtained about this before you begin collecting
data.
Frequency distributions are one of the most
important tools used in the analysis of experimental data. Many statistical
tests are carried out by choosing a frequency distribution that closely matches
the distribution of your observations. The mathematical properties of the
matched frequency distribution are then used to calculate the probability of
observing data like yours just by chance. Frequency distributions are also used
to estimate confidence intervals. Choosing the most appropriate distribution to
use to represent your data often involves getting expert help from an experienced
statistician.
The frequency distributions most commonly used in
biostatistics include Student’s t, normal, the chi-square, binomial, gamma, and
the F distributions. Many others are available to model certain types of
observations. When a frequency distribution is scaled so that the total area
under its curve equals one, it is called a "probability density
function." These curves are mathematically complicated and their values
are usually obtained from a table or calculated with a computer.
Each distribution has one or more parameters that
are use to set its center, shape, degree of asymmetry, and other properties.
For example, the parameters of the standard normal distribution are the mean
and the standard deviation. These two parameters uniquely specify its center
and shape.
This spreadsheet demonstrates several commonly
used distributions that are used in biostatistics. The parameters are
adjustable with ‘scrollbars’, and the graphs of the distributions are drawn so
that you can get an idea about the effect of different combinations of
parameters. You can also use the spreadsheet in place of a table of the
distribution. The values generated are accurate to about 10 decimal
places.
Statistical Frequency Distributions
Confidence intervals and hypothesis tests are
ways of describing your uncertainty about your findings. Understanding
the standard error of the mean and the central limit theorem will help you
better understand these concepts.
Comparing the means or proportions from more than
two groups means that there are more than one possible two-way comparisons. As the number of groups increases, the
number of possible two-way comparisons increases rapidly. Adjustments in
your statistical procedures should take this fact into account to avoid
underestimation of your experiment-wise Type II error rate. Statistical
consultation should always be obtained to avoid making this type of mistake in
the statistical interpretation of your data.