Criterion D: Treatment of data (6)
This criterion assesses the extent to which the student has effectively communicated and processed the
data in ways that are relevant to the research question. The student should utilize techniques associated
with the appropriate experimental or social science method of inquiry.
If there is insufficient data then any treatment will be superficial. You need to recognize the potential for such a lack and revisit the method before you arrive at the data collection or analysis. Alternatively, a lack of primary data could be supplemented by the use of secondary data from data banks or simulations to provide sufficient material for analysis.
data in ways that are relevant to the research question. The student should utilize techniques associated
with the appropriate experimental or social science method of inquiry.
If there is insufficient data then any treatment will be superficial. You need to recognize the potential for such a lack and revisit the method before you arrive at the data collection or analysis. Alternatively, a lack of primary data could be supplemented by the use of secondary data from data banks or simulations to provide sufficient material for analysis.
Clarification for Treatment Data
- Minor errors (those that do not affect the conclusion) should not prevent a report from achieving full marks for the criteria/performance level.
- Data can be primary or secondary, and qualitative or quantitative.
- Clear means that the presentation or method of processing can be understood easily, including appropriate details such as the labelling of graphs and tables or the use of units, decimal places and significant figures, where appropriate.
- The raw data presented might be a sample if there is a large amount, that is, survey results or data logging; the remaining data can be included within an appendix.
Any treatment of the data must be appropriate to the focus of the investigation in an attempt to answer your research question. The conclusions drawn must be based on the evidence from the data rather than on assumptions. Given the scope of the internal assessment and the time allocated, it is more than likely that variability in the data will lead to a tentative conclusion and may identify patterns or trends rather than establishing causal links. This should be recognized and the extent of the variability be considered in the conclusion.
Guidelines for Communicating Raw and Processed Data
- Labeling: Clearly label qualitative observations, such as photos or drawings.
- Concise Presentation: Keep text, tables, calculations, and graphics concise and focused.
- Scientific Units: Use correct units and symbols throughout.
- Formatting: Ensure consistent use of units, decimal places, significant figures, and include uncertainties where necessary.
- Data Clarity: Present raw and processed data clearly and directly in relation to the research question.
- Sample Calculations: Include sample calculations or relevant screenshots when appropriate.
- Graphs: Only include relevant graphs that add value, with best-fit lines or curves as needed. Avoid duplicating data across multiple graph types.
Guidelines for Units and Decimal Places
- Use of Units: Use SI units or other metric units (e.g., mL, cm³, L, dm³). Avoid non-metric units (e.g., °F, inches), and convert them if necessary.
- Decimal Places: Maintain a consistent number of decimal places based on the precision of your measurements. Minor inconsistencies are acceptable if there is an overall effort to keep raw and processed data consistent.
- Significant Figures: While not required, if using significant figures, ensure calculations are rounded and reported with consistent precision.
Guidelines for Presenting Data
- Tables and Graphs: Ensure all data tables and graphs are clearly labeled, titled, and numbered. Include units with uncertainties in headers, and concise column/row headings.
- Raw and Processed Data: Separate tables for raw and processed data are not required. Present a representative sample for large data sets, using appendices for additional data if needed.
- Electronic and Secondary Data: Data from electronic devices (e.g., rates, automatically generated graphs) is considered raw and should be processed further. Secondary data (e.g., screenshots from databases) must be cited and can be processed to support the research question.
- Analysis and Graphing: Use processed data to identify patterns and trends. Label all graph elements (axes, legends, titles) to ensure clarity and support data analysis.
- Qualitative Observations: Include qualitative data where relevant, to provide context for raw data.
- Processing Calculations:
- Show processing results like percentages, means, or standard deviations in the relevant rows or columns.
- Use screenshots to display formulas in complex spreadsheet calculations.
- Provide worked examples for less conventional processing.
- Statistical Analysis:
- State null and alternative hypotheses when using statistical tests, even if these are calculated by software.
- When using software like MS Excel, report only the p-value and interpret it in relation to the hypotheses; other statistical details (like degrees of freedom) are handled by the program.
Guidelines for Using Statistics in Data Analysis
- Purpose of Statistics: Go beyond visual trends in graphs—use statistical tests to validate whether a trend is significant and meaningful.
- Selecting Tests: Choose appropriate statistical tests, explain your choice, and relate it to your hypothesis. Briefly outline the hypothesis and interpret test results in the analysis.
- Statistical Protocol:
- Present null and alternative hypotheses.
- Include degrees of freedom, critical values, and probability levels to provide context for the test results.
- Key Statistical Indicators:
- Variation: Show data variability with standard deviation, standard error, trend lines, R² values, range, and error bars.
- Significance Testing: Conduct significance tests to determine the likelihood that observed trends are due to chance.
- Outliers: Identify and appropriately respond to any outlier data.
Guidelines for Addressing Uncertainty in Measurement
- Determining Uncertainty: Base measurement uncertainty on the instrument’s precision and realistic use. For example, handheld callipers don’t need precision to 0.01 mm for plant height due to practical limitations.
- Justifying Uncertainty: Explain the chosen uncertainty level, especially when repeating measurements reveals greater uncertainty than the instrument’s precision.
- Counting Data: Counts (±1) generally don’t need additional uncertainty notation, though derived calculations (e.g., percentage germination) may have an uncertainty margin.
- Uncertainty in Data Presentation:
- Include uncertainties in table headers along with units, unless there’s a reason for variability within the column.
- For visual data, use error bars for the dependent variable in scatter plots, showing uncertainty or R² values.
- Use box-and-whisker plots when appropriate. If uncertainty is too small to see, note it in your report.
- Statistical Measures of Uncertainty:
- Statistical tests (e.g., t-test, chi-squared, ANOVA) inherently measure uncertainty through significance levels (p-values).
- For ANOVA, consider a post-hoc test (e.g., Tukey test) to assess treatment differences.
- Handling Negligible Uncertainty: If a particular uncertainty is too small to impact results, briefly justify excluding it in the analysis.
Outliers
Outliers that have been identified should not be systematically removed from calculations. The impact of an outlier on the results needs to be considered. Removing outliers so that the results fit the general model “better” is not good practice.
Outliers may be identified statistically from the data. A common calculation used is that they are greater than 1.5 times the interquartile range below the first quartile, or more than 1.5 times the interquartile range above the third quartile. If you consider excluding these, a justification is required. This is especially true of data in ESS scientific investigations as the sample size is usually small (n ≤ 30) or very small (n < 15). However, if observations are made that can explain why an outlier occurred, or if a weakness in the method is identified and corrected, then the student may choose to include the analysis with and without the outliers in order to reveal their impact.
Outliers are most likely to occur as the result of human error, methodological flaws or irregularity in the equipment or environment. The quantity in question can be re-measured. The scientific method requires rigour and integrity in gathering data, while the IB requires academic integrity from students. Both of these are more important than attempts to make data appear consistent. Although there is no single agreed-upon method for rejecting outliers, common sense and careful analysis are always helpful.
Data that produces zero results can sometimes be considered outliers. This depends on the experiments being conducted. Any seed germination/growth experiment should consider a viability test on the seeds to check what percentage will not germinate. This needs to be considered in the data processing. For example, if 10% of the seeds never germinate, then when one-tenth of seeds in a treatment do not germinate, this is expected and the data point can be removed. If half of the seeds did not germinate in a treatment, then one data point can be removed, but the other four should be included. The approach must be consistent across all treatments.
Outliers that have been identified should not be systematically removed from calculations. The impact of an outlier on the results needs to be considered. Removing outliers so that the results fit the general model “better” is not good practice.
Outliers may be identified statistically from the data. A common calculation used is that they are greater than 1.5 times the interquartile range below the first quartile, or more than 1.5 times the interquartile range above the third quartile. If you consider excluding these, a justification is required. This is especially true of data in ESS scientific investigations as the sample size is usually small (n ≤ 30) or very small (n < 15). However, if observations are made that can explain why an outlier occurred, or if a weakness in the method is identified and corrected, then the student may choose to include the analysis with and without the outliers in order to reveal their impact.
Outliers are most likely to occur as the result of human error, methodological flaws or irregularity in the equipment or environment. The quantity in question can be re-measured. The scientific method requires rigour and integrity in gathering data, while the IB requires academic integrity from students. Both of these are more important than attempts to make data appear consistent. Although there is no single agreed-upon method for rejecting outliers, common sense and careful analysis are always helpful.
Data that produces zero results can sometimes be considered outliers. This depends on the experiments being conducted. Any seed germination/growth experiment should consider a viability test on the seeds to check what percentage will not germinate. This needs to be considered in the data processing. For example, if 10% of the seeds never germinate, then when one-tenth of seeds in a treatment do not germinate, this is expected and the data point can be removed. If half of the seeds did not germinate in a treatment, then one data point can be removed, but the other four should be included. The approach must be consistent across all treatments.
Guidelines for Data Processing
- Efficient Processing: Present data processing clearly and appropriately for the topic, ensuring it supports the research question.
- Data Processing Techniques: Use relevant tools or statistical techniques based on the data and research question. Include clear graphing, with titles, labels, scales, and realistic trend lines as needed.
- Adequate Data for Trends: Ensure there is enough data to identify trends, enabling conclusions. Insufficient data may require supplementation from secondary sources.
- Graphing Raw vs. Processed Data: Graphing raw data is part of processing and helps derive values (e.g., gradients). However, avoid duplicating data unnecessarily across different graph types.
- Choosing Graph Types: Choose graphs suited to data type. For example, order bar graphs by a meaningful criterion. For continuous variables, a trend line with error bars can help illustrate relationships or patterns.
- Trend Lines and Correlation:
- Use trend lines only if justified by data. Add R² values to show fit quality.
- Avoid trend lines for nominal data (e.g., unranked city districts); instead, order data if appropriate (e.g., by income levels).
- Correlation coefficient (r) and coefficient of determination (R²) help assess linear relationships, but interpret these values carefully.
- Standard Deviation and Standard Error:
- Use standard deviation for normally distributed data to show variation around the mean.
- Calculate standard error for larger samples (n > 30) to reflect reliability.
- Smaller sample sizes (n < 10) are generally inadequate for t-tests or similar tests.
- Error Bars:
- Error bars indicating highest and lowest values around the mean help show data variation.
- Overlapping error bars can indicate insufficient distinction between data points.
- Add R² values to graphs with trend lines to indicate how well the line represents the data.
- Sample Size for Statistical Tests:
- Large samples (n > 30) provide reliable results for most tests.
- Smaller samples (5–14) are limited, though some tests (e.g., Mann-Whitney U) can handle small samples.