Statistical analysis

Statistical Analysis is used to calculate mean and standard deviation, t-Test, and correlation between data sets. We do not cover this as a specific unit, however, the information will be incorporated into our curriculum as well your Internal Assessments. This information has been modified from the old IB Biology curriculum.
Guidance for Statistical Analysis in Practical Work
1. Importance of Statistical Analysis
1. Importance of Statistical Analysis
- Statistical analysis helps to determine the significance of experimental results, providing insights into the reliability and validity of findings.
- It allows students to make data-driven decisions and strengthens conclusions by quantifying differences and patterns.
- Mean and Standard Deviation: Used to calculate the central tendency and dispersion of data. Essential for understanding the spread and reliability of the data.
- T-Test: Compares means between two groups to assess if they are statistically different from each other.
- Chi-Squared Test: Assesses the association between categorical variables. Useful in genetics or categorical datasets.
- ANOVA (Analysis of Variance): For comparing the means of more than two groups to see if at least one mean is different. This is especially useful in ecological studies with multiple factors.
- Correlation and Regression: Used to determine relationships between variables and predict outcomes. Helpful in environmental science and ecological studies.
- Clearly define hypotheses before testing.
- Select an appropriate statistical test based on the research question and data type.
- Ensure correct use of software or manual calculations and follow guidelines on significance thresholds (e.g., p < 0.05 for significance).
Developing Statistical Hypothesis
Overview Guide: Developing Statistical Hypotheses
- Purpose of Hypotheses in Statistical Analysis
- Hypotheses provide a clear focus for your investigation and define the relationships or differences you expect to find.
- They guide the choice of statistical tests and the interpretation of data.
- When to Develop a Hypothesis
- Before Data Collection: Hypotheses should be created during the planning phase of practical work, before data collection, to ensure unbiased data analysis.
- After Observing Patterns: If observations suggest trends, you can form hypotheses to test these patterns formally.
- Types of Hypotheses
- Null Hypothesis (H₀): Assumes no effect or no difference; it is what you test against.
- Example: "There is no difference in growth rate between plants in sunlight and shade."
- Alternative Hypothesis (H₁): Suggests an effect or difference exists.
- Example: "Plants in sunlight grow at a faster rate than those in shade."
- Null Hypothesis (H₀): Assumes no effect or no difference; it is what you test against.
- Key Considerations in Hypothesis Formation
- Measurability: Hypotheses must be based on measurable data and clear variables.
- Specificity: Be precise about the population or variables (e.g., age group, species, environmental condition).
- Relevance: Ensure hypotheses relate directly to your research question or aim.
- Examples of Hypothesis Statements
- Comparison: "There is a significant difference in the heart rates of students before and after exercise."
- Relationship: "There is a positive correlation between sunlight exposure and plant height."
- Final Tips
- Always define your hypothesis prior to analysis.
- Align your hypothesis with the aims of your study for a focused investigation.
Understanding the P-Value, Confidence Level, and Significance
What is a P-Value?
Interpreting the P-Value
Significance Threshold
Confidence Level:
Significance Level (α):
Using the P-Value in Your Analysis
- The p-value is a probability metric that helps determine the statistical significance of your test results.
- It indicates the likelihood of obtaining results as extreme as the observed ones, assuming the null hypothesis (H₀) is true.
Interpreting the P-Value
- Low P-Value (p < 0.05): Strong evidence against the null hypothesis. You can reject H₀, suggesting a significant effect or difference.
- High P-Value (p ≥ 0.05): Weak evidence against the null hypothesis. You do not reject H₀, suggesting that observed differences might be due to chance.
- What It Tells You: Indicates whether observed results are likely due to chance but does not show the effect size or practical relevance.
Significance Threshold
- A common significance level is 0.05, meaning there’s a 5% chance that results are due to random variation. However, levels like 0.01 or 0.1 might also be used depending on the study.
Confidence Level:
- Reflects certainty in your results (e.g., a 95% confidence level means the true effect would appear within the same range in 95 out of 100 repeats).
- Higher confidence levels (e.g., 99%) suggest more certainty but require larger sample sizes.
Significance Level (α):
- The threshold for determining statistical significance, often set at 0.05.
- Decides whether to accept or reject H₀ based on the p-value’s comparison to α.
Using the P-Value in Your Analysis
- Calculate the p-value after performing your statistical test (e.g., t-test, chi-square).
- Compare the p-value to your chosen significance level to determine if your results are statistically significant.
- Example 1: You test whether two groups have different means. If p = 0.03, it’s less than 0.05, so you reject H₀ and conclude a significant difference.
- Example 2: Testing for correlation, if p = 0.08, it’s greater than 0.05, so you fail to reject H₀, indicating no significant correlation.
Making Conclusions and Understanding Significance
- Statistically Significant ≠ Practically Important:
- A significant result (low p-value) does not mean the effect is large or meaningful outside statistical context.
- Consider the real-world implications and context when evaluating significance.
- Non-Significant Results Aren’t Always Unimportant:
- Non-significance might suggest small sample size or high variability, not necessarily the absence of any effect.
- Even without statistical significance, the observed trend might still have practical relevance.
- Discuss the potential impact of the findings, regardless of p-value, and consider reporting effect size and confidence intervals for a fuller analysis.
- When Results Are Not Statistically Significant:
- Do not assume no effect exists. Instead, consider sample size, potential experimental errors, and if further investigation is warranted.
- Reporting non-significant findings transparently adds value to the research by documenting potential influences and insights gained.
Applying Statistical Tests and Presenting Data
- Common Tests:
- Mean and Standard Deviation: For understanding data spread and variability.
- T-Test: To compare means between two groups.
- Chi-Squared Test: For assessing associations in categorical data.
- ANOVA: For comparing means across more than two groups.
- Correlation and Regression: For assessing relationships and predicting outcomes.
- Data Presentation:
- Use tables for organized data display with units, averages, and uncertainties.
- Visualize data with labeled graphs and error bars to clearly represent findings.
Correlation and Causation
One of the most common errors we find is the confusion between correlation and causation in science. In theory, these are easy to distinguish — an action or occurrence can cause another (such as smoking causes lung cancer), or it can correlate with another (such as smoking is correlated with alcoholism). If one action causes another, then they are most certainly correlated. But just because two things occur together does not mean that one caused the other, even if it seems to make sense.
Correlation describes the strength and direction of a linear relationship between two variables
Is positive (x = y) or negative (x = - y)
Causation describes the relationship between two variables, where one variable has a direct effect on another
Correlation does not automatically indicate causation – just because two variables change in relation to one another, does not mean they are linked
E.g. CO2 levels and crime have both risen, but CO2 levels don't cause crime
Correlation describes the strength and direction of a linear relationship between two variables
Is positive (x = y) or negative (x = - y)
Causation describes the relationship between two variables, where one variable has a direct effect on another
Correlation does not automatically indicate causation – just because two variables change in relation to one another, does not mean they are linked
E.g. CO2 levels and crime have both risen, but CO2 levels don't cause crime
Mean
The sum of all the data points divided by the number of data points.
Measure of central tendency for normally distributed data.
DO NOT calculate a mean from values that are already averages.
DO NOT calculate a mean when the measurement scale is not linear (i.e. pH units are not measured on a linear scale
Measure of central tendency for normally distributed data.
DO NOT calculate a mean from values that are already averages.
DO NOT calculate a mean when the measurement scale is not linear (i.e. pH units are not measured on a linear scale
Standard Deviation
Averages do not tell us everything about a sample. Samples can be very uniform with the data all bunched around the mean or they can be spread out a long way from the mean. The statistic that measures this spread is called the standard deviation. The wider the spread of scores, the larger the standard deviation. For data that has a normal distribution, 68% of the data lies within one standard deviation of the mean

How to Calculate the Standard Deviation:
- Calculate the mean (x̅) of a set of data
- Subtract the mean from each point of data to determine (x-x̅). You'll do this for each data point, so you'll have multiple (x-x̅).
- Square each of the resulting numbers to determine (x-x̅)^2. As in step 2, you'll do this for each data point, so you'll have multiple (x-x̅)^2.
- Add the values from the previous step together to get ∑(x-x̅)^2. Now you should be working with a single value.
- Calculate (n-1) by subtracting 1 from your sample size. Your sample size is the total number of data points you collected.
- Divide the answer from step 4 by the answer from step 5
- Calculate the square root of your previous answer to determine the standard deviation.
- Be sure your standard deviation has the same number of units as your raw data, so you may need to round your answer.
- The standard deviation should have the same unit as the raw data you collected. For example, SD = +/- 0.5 cm.
|
|
|
|
Where To Start
It can be very hard to know which statistical tests are the most relevant for your research question and your data. Here is a flow chart that help guide you in the best direction
Class Materials:
Error Analysis
Significant Figures
Precision Measurements and Uncertainties
Precision Lab
Topic 1 Statistics (ppt)
Biostatistics Practical Problems
Graphing In Edexcel
Graphing in Edexcel Practice problems
Standard Deviation (ppt)
Standard Deviation (notes)
Standard Deviation Practice problems
Hydroponics Standard Deviation Practice problems
t-Test (ppt)
t-Test (notes)
Correlation and Causation (ppt)
Correlation and Causation (notes)
Correlation reading
Correlations of cancer (pdf)
Data set #1 (pdf)
Data set #2 (pdf)
Data set #3 (pdf)
T-test reading
T-Testing in Biology University of
Statistics Review
Useful Links
Review of means
Click here for calculating SD with tools
Click here for Flash Card questions on Statistical Analysis
Click here for tips on Excel graphing.
“Using error bars in experimental Biology” by Geoff Cumming, Fiona Fidler, and David L. Vaux. (Journal of Cell Biology)
Are two sets of data really different?Click here to perform Student’s t-test
Click here to perform Student’s t-test via copy and paste
Example graph (from The Biology Teacher, September 2013)
Graphic Calculator Tour
Easy Calculation
Statistics calculator
MERLIN software for Excel
Chi-square calculator
Chi-square table
T-test calculator
Standard deviation reading
T-Test Table, Excel and calculations can be found here.
There are many statistical tools to establish a statistically significant correlation. read more here or read an article about Cause and Correlation by Wisegeek here.
Difference Between Correlation and Causation article
Excellent Handbook of Biological Statistics from John MacDonald
Basic Statistical Tools, from the Natural Resources Management Department
And The Little Handbook of Statistical Practice is very useful.
Sumanas statistics animations
Field Studies Council stats page, including the t-test
Open Door Website stats page and help with graphs and tables.
Making Population Pyramids on Excel
Spreadsheet Data Analysis Tutortial
Video over Table
Making Table g
Making Tables
This is an ecocolumn design you can use in the long-term IA’s 1 - from learner.org
Here’s another ecocolumn design you can use for the long-term IA project - from fastplants.org
Video Clips
Error Analysis
Significant Figures
Precision Measurements and Uncertainties
Precision Lab
Topic 1 Statistics (ppt)
Biostatistics Practical Problems
Graphing In Edexcel
Graphing in Edexcel Practice problems
Standard Deviation (ppt)
Standard Deviation (notes)
Standard Deviation Practice problems
Hydroponics Standard Deviation Practice problems
t-Test (ppt)
t-Test (notes)
Correlation and Causation (ppt)
Correlation and Causation (notes)
Correlation reading
Correlations of cancer (pdf)
Data set #1 (pdf)
Data set #2 (pdf)
Data set #3 (pdf)
T-test reading
T-Testing in Biology University of
Statistics Review
Useful Links
Review of means
Click here for calculating SD with tools
Click here for Flash Card questions on Statistical Analysis
Click here for tips on Excel graphing.
“Using error bars in experimental Biology” by Geoff Cumming, Fiona Fidler, and David L. Vaux. (Journal of Cell Biology)
Are two sets of data really different?Click here to perform Student’s t-test
Click here to perform Student’s t-test via copy and paste
Example graph (from The Biology Teacher, September 2013)
Graphic Calculator Tour
Easy Calculation
Statistics calculator
MERLIN software for Excel
Chi-square calculator
Chi-square table
T-test calculator
Standard deviation reading
T-Test Table, Excel and calculations can be found here.
There are many statistical tools to establish a statistically significant correlation. read more here or read an article about Cause and Correlation by Wisegeek here.
Difference Between Correlation and Causation article
Excellent Handbook of Biological Statistics from John MacDonald
Basic Statistical Tools, from the Natural Resources Management Department
And The Little Handbook of Statistical Practice is very useful.
Sumanas statistics animations
Field Studies Council stats page, including the t-test
Open Door Website stats page and help with graphs and tables.
Making Population Pyramids on Excel
Spreadsheet Data Analysis Tutortial
Video over Table
Making Table g
Making Tables
This is an ecocolumn design you can use in the long-term IA’s 1 - from learner.org
Here’s another ecocolumn design you can use for the long-term IA project - from fastplants.org
Video Clips
Watch Hans Rosling’s brilliant Joy of Statistics here. For a short clip: