Statistical Thinking: A Simulation Approach to Modeling Uncertainty (UM STAT 216 edition)
3.6 causation and random assignment.
Medical researchers may be interested in showing that a drug helps improve people’s health (the cause of improvement is the drug), while educational researchers may be interested in showing a curricular innovation improves students’ learning (the curricular innovation causes improved learning).
To attribute a causal relationship, there are three criteria a researcher needs to establish:
- Association of the Cause and Effect: There needs to be a association between the cause and effect.
- Timing: The cause needs to happen BEFORE the effect.
- No Plausible Alternative Explanations: ALL other possible explanations for the effect need to be ruled out.
Please read more about each of these criteria at the Web Center for Social Research Methods .
The third criterion can be quite difficult to meet. To rule out ALL other possible explanations for the effect, we want to compare the world with the cause applied to the world without the cause. In practice, we do this by comparing two different groups: a “treatment” group that gets the cause applied to them, and a “control” group that does not. To rule out alternative explanations, the groups need to be “identical” with respect to every possible characteristic (aside from the treatment) that could explain differences. This way the only characteristic that will be different is that the treatment group gets the treatment and the control group doesn’t. If there are differences in the outcome, then it must be attributable to the treatment, because the other possible explanations are ruled out.
So, the key is to make the control and treatment groups “identical” when you are forming them. One thing that makes this task (slightly) easier is that they don’t have to be exactly identical, only probabilistically equivalent . This means, for example, that if you were matching groups on age that you don’t need the two groups to have identical age distributions; they would only need to have roughly the same AVERAGE age. Here roughly means “the average ages should be the same within what we expect because of sampling error.”
Now we just need to create the groups so that they have, on average, the same characteristics … for EVERY POSSIBLE CHARCTERISTIC that could explain differences in the outcome.
It turns out that creating probabilistically equivalent groups is a really difficult problem. One method that works pretty well for doing this is to randomly assign participants to the groups. This works best when you have large sample sizes, but even with small sample sizes random assignment has the advantage of at least removing the systematic bias between the two groups (any differences are due to chance and will probably even out between the groups). As Wikipedia’s page on random assignment points out,
Random assignment of participants helps to ensure that any differences between and within the groups are not systematic at the outset of the experiment. Thus, any differences between groups recorded at the end of the experiment can be more confidently attributed to the experimental procedures or treatment. … Random assignment does not guarantee that the groups are matched or equivalent. The groups may still differ on some preexisting attribute due to chance. The use of random assignment cannot eliminate this possibility, but it greatly reduces it.
We use the term internal validity to describe the degree to which cause-and-effect inferences are accurate and meaningful. Causal attribution is the goal for many researchers. Thus, by using random assignment we have a pretty high degree of evidence for internal validity; we have a much higher belief in causal inferences. Much like evidence used in a court of law, it is useful to think about validity evidence on a continuum. For example, a visualization of the internal validity evidence for a study that employed random assignment in the design might be:
The degree of internal validity evidence is high (in the upper-third). How high depends on other factors such as sample size.
To learn more about random assignment, you can read the following:
- The research report, Random Assignment Evaluation Studies: A Guide for Out-of-School Time Program Practitioners
3.6.1 Example: Does sleep deprivation cause an decrease in performance?
Let’s consider the criteria with respect to the sleep deprivation study we explored in class.
18.104.22.168 Association of cause and effect
First, we ask, Is there an association between the cause and the effect? In the sleep deprivation study, we would ask, “Is sleep deprivation associated with an decrease in performance?”
This is what a hypothesis test helps us answer! If the result is statistically significant , then we have an association between the cause and the effect. If the result is not statistically significant, then there is not sufficient evidence for an association between cause and effect.
In the case of the sleep deprivation experiment, the result was statistically significant, so we can say that sleep deprivation is associated with a decrease in performance.
Second, we ask, Did the cause come before the effect? In the sleep deprivation study, the answer is yes. The participants were sleep deprived before their performance was tested. It may seem like this is a silly question to ask, but as the link above describes, it is not always so clear to establish the timing. Thus, it is important to consider this question any time we are interested in establishing causality.
22.214.171.124 No plausible alternative explanations
Finally, we ask Are there any plausible alternative explanations for the observed effect? In the sleep deprivation study, we would ask, “Are there plausible alternative explanations for the observed difference between the groups, other than sleep deprivation?” Because this is a question about plausibility, human judgment comes into play. Researchers must make an argument about why there are no plausible alternatives. As described above, a strong study design can help to strengthen the argument.
At first, it may seem like there are a lot of plausible alternative explanations for the difference in performance. There are a lot of things that might affect someone’s performance on a visual task! Sleep deprivation is just one of them! For example, artists may be more adept at visual discrimination than other people. This is an example of a potential confounding variable. A confounding variable is a variable that might affect the results, other than the causal variable that we are interested in.
Here’s the thing though. We are not interested in figuring out why any particular person got the score that they did. Instead, we are interested in determining why one group was different from another group. In the sleep deprivation study, the participants were randomly assigned. This means that the there is no systematic difference between the groups, with respect to any confounding variables. Yes—artistic experience is a possible confounding variable, and it may be the reason why two people score differently. BUT: There is no systematic difference between the groups with respect to artistic experience, and so artistic experience is not a plausible explanation as to why the groups would be different. The same can be said for any possible confounding variable. Because the groups were randomly assigned, it is not plausible to say that the groups are different with respect to any confounding variable. Random assignment helps us rule out plausible alternatives.
126.96.36.199 Making a causal claim
Now, let’s see about make a causal claim for the sleep deprivation study:
- Association: There is a statistically significant result, so the cause is associated with the effect
- Timing: The participants were sleep deprived before their performance was measured, so the cause came before the effect
- Plausible alternative explanations: The participants were randomly assigned, so the groups are not systematically different on any confounding variable. The only systematic difference between the groups was sleep deprivation. Thus, there are no plausible alternative explanations for the difference between the groups, other than sleep deprivation
Thus, the internal validity evidence for this study is high, and we can make a causal claim. For the participants in this study, we can say that sleep deprivation caused a decrease in performance.
Key points: Causation and internal validity
To make a cause-and-effect inference, you need to consider three criteria:
- Association of the Cause and Effect: There needs to be a association between the cause and effect. This can be established by a hypothesis test.
Random assignment removes any systematic differences between the groups (other than the treatment), and thus helps to rule out plausible alternative explanations.
Internal validity describes the degree to which cause-and-effect inferences are accurate and meaningful.
Confounding variables are variables that might affect the results, other than the causal variable that we are interested in.
Probabilistic equivalence means that there is not a systematic difference between groups. The groups are the same on average.
How can we make "equivalent" experimental groups?
- Social Anxiety Disorder
- Bipolar Disorder
- Kids Mental Health
- Therapy Center
- When To See a Therapist
- Types of Therapy
- Best Online Therapy
- Best Couples Therapy
- Best Family Therapy
- Managing Stress
- Sleep and Dreaming
- Understanding Emotions
- Healthy Relationships
- Relationships in 2023
- Student Resources
- Personality Types
- Verywell Mind Insights
- 2023 Verywell Mind 25
- Mental Health in the Classroom
- Editorial Process
- Meet Our Review Board
- Crisis Support
The Definition of Random Assignment According to Psychology
Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."
Emily is a board-certified science editor who has worked with top digital publishing brands like Voices for Biodiversity, Study.com, GoodTherapy, Vox, and Verywell.
Materio / Getty Images
Random assignment refers to the use of chance procedures in psychology experiments to ensure that each participant has the same opportunity to be assigned to any given group in a study to eliminate any potential bias in the experiment at the outset. Participants are randomly assigned to different groups, such as the treatment group versus the control group. In clinical research, randomized clinical trials are known as the gold standard for meaningful results.
Simple random assignment techniques might involve tactics such as flipping a coin, drawing names out of a hat, rolling dice, or assigning random numbers to a list of participants. It is important to note that random assignment differs from random selection .
While random selection refers to how participants are randomly chosen from a target population as representatives of that population, random assignment refers to how those chosen participants are then assigned to experimental groups.
Random Assignment In Research
To determine if changes in one variable will cause changes in another variable, psychologists must perform an experiment. Random assignment is a critical part of the experimental design that helps ensure the reliability of the study outcomes.
Researchers often begin by forming a testable hypothesis predicting that one variable of interest will have some predictable impact on another variable.
The variable that the experimenters will manipulate in the experiment is known as the independent variable , while the variable that they will then measure for different outcomes is known as the dependent variable. While there are different ways to look at relationships between variables, an experiment is the best way to get a clear idea if there is a cause-and-effect relationship between two or more variables.
Once researchers have formulated a hypothesis, conducted background research, and chosen an experimental design, it is time to find participants for their experiment. How exactly do researchers decide who will be part of an experiment? As mentioned previously, this is often accomplished through something known as random selection.
In order to generalize the results of an experiment to a larger group, it is important to choose a sample that is representative of the qualities found in that population. For example, if the total population is 60% female and 40% male, then the sample should reflect those same percentages.
Choosing a representative sample is often accomplished by randomly picking people from the population to be participants in a study. Random selection means that everyone in the group stands an equal chance of being chosen to minimize any bias. Once a pool of participants has been selected, it is time to assign them to groups.
By randomly assigning the participants into groups, the experimenters can be fairly sure that each group will have the same characteristics before the independent variable is applied.
Participants might be randomly assigned to the control group , which does not receive the treatment in question. The control group may receive a placebo or receive the standard treatment. Participants may also be randomly assigned to the experimental group , which receives the treatment of interest. In larger studies, there can be multiple treatment groups for comparison.
There are simple methods of random assignment, like rolling the die. However, there are more complex techniques that involve random number generators to remove any human error.
There can also be random assignment to groups with pre-established rules or parameters. For example, if you want to have an equal number of men and women in each of your study groups, you might separate your sample into two groups (by sex) before randomly assigning each of those groups into the treatment group and control group.
Random assignment is essential because it increases the likelihood that the groups are the same at the outset. With all characteristics being equal between groups, other than the application of the independent variable, any differences found between group outcomes can be more confidently attributed to the effect of the intervention.
Example of Random Assignment
Imagine that a researcher is interested in learning whether or not drinking caffeinated beverages prior to an exam will improve test performance. After randomly selecting a pool of participants, each person is randomly assigned to either the control group or the experimental group.
The participants in the control group consume a placebo drink prior to the exam that does not contain any caffeine. Those in the experimental group, on the other hand, consume a caffeinated beverage before taking the test.
Participants in both groups then take the test, and the researcher compares the results to determine if the caffeinated beverage had any impact on test performance.
A Word From Verywell
Random assignment plays an important role in the psychology research process. Not only does this process help eliminate possible sources of bias, but it also makes it easier to generalize the results of a tested sample of participants to a larger population.
Random assignment helps ensure that members of each group in the experiment are the same, which means that the groups are also likely more representative of what is present in the larger population of interest. Through the use of this technique, psychology researchers are able to study complex phenomena and contribute to our understanding of the human mind and behavior.
Lin Y, Zhu M, Su Z. The pursuit of balance: An overview of covariate-adaptive randomization techniques in clinical trials . Contemp Clin Trials. 2015;45(Pt A):21-25. doi:10.1016/j.cct.2015.07.011
Sullivan L. Random assignment versus random selection . In: The SAGE Glossary of the Social and Behavioral Sciences. SAGE Publications, Inc.; 2009. doi:10.4135/9781412972024.n2108
Alferes VR. Methods of Randomization in Experimental Design . SAGE Publications, Inc.; 2012. doi:10.4135/9781452270012
Nestor PG, Schutt RK. Research Methods in Psychology: Investigating Human Behavior. (2nd Ed.). SAGE Publications, Inc.; 2015.
By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Account settings
- Advanced Search
- Journal List
- Dtsch Arztebl Int
- v.117(7); 2020 Feb
Methods for Evaluating Causality in Observational Studies
1 Institute for Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University of Mainz
2 Institute of Clinical Physiology of the Italian National Research Council, Lecce, Italy
3 Technical University Dresden, University Hospital Carl Gustav Carus, Medical Clinic 1, Dresden
4 Department of Pediatric Surgery, Faculty of Medicine, Johannes Gutenberg University of Mainz
5 Institute of Genetic Epidemiology, Helmholtz Zentrum München—German Research Center for Environmental Health, Neuherberg; Chair of Genetic Epidemiology, Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig-Maximilians-Universität, München
In clinical medical research, causality is demonstrated by randomized controlled trials (RCTs). Often, however, an RCT cannot be conducted for ethical reasons, and sometimes for practical reasons as well. In such cases, knowledge can be derived from an observational study instead. In this article, we present two methods that have not been widely used in medical research to date.
The methods of assessing causal inferences in observational studies are described on the basis of publications retrieved by a selective literature search.
Two relatively new approaches—regression-discontinuity methods and interrupted time series—can be used to demonstrate a causal relationship under certain circumstances. The regression-discontinuity design is a quasi-experimental approach that can be applied if a continuous assignment variable is used with a threshold value. Patients are assigned to different treatment schemes on the basis of the threshold value. For assignment variables that are subject to random measurement error, it is assumed that, in a small interval around a threshold value, e.g., cholesterol values of 160 mg/dL, subjects are assigned essentially at random to one of two treatment groups. If patients with a value above the threshold are given a certain treatment, those with values below the threshold can serve as control group. Interrupted time series are a special type of regression-discontinuity design in which time is the assignment variable, and the threshold is a cutoff point. This is often an external event, such as the imposition of a smoking ban. A before-and-after comparison can be used to determine the effect of the intervention (e.g., the smoking ban) on health parameters such as the frequency of cardiovascular disease.
The approaches described here can be used to derive causal inferences from observational studies. They should only be applied after the prerequisites for their use have been carefully checked.
The fact that correlation does not imply causality was frequently mentioned in 2019 in the public debate on the effects of diesel emission exposure ( 1 , 2 ). This truism is well known and generally acknowledged. A more difficult question is how causality can be unambiguously defined and demonstrated ( box 1 ) . According to the eighteenth-century philosopher David Hume, causality is present when two conditions are satisfied: 1) B always follows A—in which case, A is called a “sufficient cause” of B; 2) if A does not occur, then B does not occur—in which case, A is called a “necessary cause” of B ( 3 ). These strict logical criteria are only rarely met in the medical field. In the context of exposure to diesel emissions, they would be met only if fine-particle exposure always led to lung cancer, and lung cancer never occurred without prior fine-particle exposure. Of course, neither of these is true. So what is biological, medical, or epidemiological causality? In medicine, causality is generally expressed in probabilistic terms, i.e. exposure to a risk factor such as cigarette smoking or diesel emissions increases the probability of a disease, e.g., lung cancer. The same understanding of causality applies to the effects of treatment: for instance, a certain type of chemotherapy increases the likelihood of survival in patients with a diagnosis of cancer, but does not guarantee it.
Causality in epidemiological observational studies (modified from Parascondola and Weed )
- ausality as production: A produces B. Causality is to be distinguished from mere temporal sequence. It does not suffice to note that A is always followed by B; rather, A must in some way produce, lead to, or create B. However, it remains unclear what ’producing’, ‘leading to’, or ‘creating’ exactly means. On a practical level, the notion of production is what is illustrated in the diagrams of cause-and-effect relationships that are commonly seen in medical publications.
- Sufficient and necessary causes: A is a sufficient cause of B if B always happens when A has happened. A is a necessary cause of B if B only happens when A has happened. Although these relationships are logically clear and seemingly simple, this type of deterministic causality is hardly ever found in real-life scientific research. Thus, smoking is neither a sufficient nor a necessary cause of lung cancer. Smoking is not always followed by lung cancer (not a sufficient cause), and lung cancer can occur in the absence of tobacco exposure (not a necessary cause, either).
- Sufficient component cause: This notion was developed in response to the definitions of sufficient and necessary causes. In this approach, it is assumed that multiple causes act together to produce an effect where no single one of them could do so alone. There can also be different combinations of causes that produce the same effect.
- Probabilistic causality: In this scenario, the cause (A) increases the probability (P) that the effect (B) will occur: in symbols, P (B | A) > (B | not A). Sufficient and necessary causes, as defined above in ( 2 ), are only those extreme cases in which P (B | A) = 1 and P (B | not A) = 0, respectively. When these probabilities take on values that are neither 0 nor 1, causality is no longer deterministic, but rather probabilistic (stochastic). There is no assumption that a cause must be followed by an effect. This viewpoint corresponds to the method of proceeding in statistically oriented scientific disciplines.
- Causal inference: This is the determination that a causal relationship exists between two types of event. Causal inferences are made by analyzing the changes in the effect that arise when there are changes in the cause. Causal inference goes beyond the mere assertion of an association and is connected to a number of specific concepts: some that have been widely discussed recently are counterfactuals, potential outcomes, causal diagrams, and structural equation models ( 36 , 37 ).
- Triangulation: Not all questions can be answered with an experiment or a randomized controlled trial. Alternatively, methodological pluralism is needed, or, as it is now sometimes called, triangulation: confidence in a finding increases when the same finding is arrived at from multiple data sets, multiple scientific disciplines, multiple theories, and/or multiple methods ( 35 ).
- The criterion of consequentiality: The claim that a causal relationship exists has consequences on a societal level (taking action or not taking action). Olsen has called for the formulation of a criterion to determine when action should be taken and when not ( 7 ).
In many scientific disciplines, causality must be demonstrated by an experiment. In clinical medical research, this purpose is achieved with a randomized controlled trial (RCT) ( 4 ). An RCT, however, often cannot be conducted for either ethical or practical reasons. If a risk factor such as exposure to diesel emissions is to be studied, persons cannot be randomly allocated to exposure or non-exposure. Nor is any randomization possible if the research question is whether or not an accident associated with an exposure, such as the Chernobyl nuclear reactor disaster, increased the frequency of illness or death. The same applies when a new law or regulation, e.g., a smoking ban, is introduced.
When no experiment can be conducted, observational studies need to be performed. The object under study—i.e., the possible cause—cannot be varied in a targeted and controlled way; instead, the effect this factor has on a target variable, such as a particular illness, is observed and documented.
Several publications in epidemiology have dealt with the ways in which causality can be inferred in the absence of an experiment, starting with the classic work of Bradford Hill and the nine aspects of causality (viewpoints) that he proposed ( box 2 ) ( 5 ) and continuing up to the present ( 6 , 7 ).
The Bradford Hill criteria for causality (modified from )
- Strength: the stronger the observed association between two variables, the less likely it is due to chance.
- Consistency: the association has been observed in multiple studies, populations at risk, places, and times, and by different researchers.
- Specificity: it is a strong argument for causality when a specific population suffers from a specific disease.
- Temporality: the effect must be temporally subsequent to the cause.
- Biological gradient: the association displays a dose–response effect, e.g., the incidence of lung cancer is greater when more cigarettes are smoked per day.
- Plausibility: a plausible mechanism linking the cause to the effect is helpful, but not absolutely required. What is biologically plausible depends upon the state-of-the-art knowledge of the time.
- Coherence: the causal interpretation of the data should not conflict with biological knowledge about the disease.
- Experiment: experimental evidence should be adduced in support, if possible.
- Analogy: an association speaks for causality if similar causes are already known to have similar effects.
Aside from the statistical uncertainty that always arises when only a sample of an affected population is studied, rather than its entirety ( 8 ), the main obstacle to the study of putative causal relationships comes from confounding variables (“confounders”). These are so named because they can, depending on the circumstances, either obscure a true effect or simulate an effect that is, in fact, not present ( 9 ). Age, for example, is a confounder in the study of the association between occupational radiation exposure and cataract ( 10 ), because both cumulative radiation exposure and the risk of cataract rise with increasing age.
The various statistical methods of dealing with known confounders in the analysis of epidemiological data have already been presented in other articles in this series ( 9 , 11 , 12 ). In the current article, we discuss two new approaches that have not been widely applied in medical and epidemiological research to date.
Methods of evaluating causal inferences in observational studies
The main advantage of an RCT is randomization, i.e., the random allocation of the units of observation (patients) to treatment groups. Potential confounders, whether known or unknown, are thereby distributed to the treatment groups at random as well, although differences between groups may arise through sample variance. Whenever randomization is not possible, the effect of confounders must be taken into account in the planning of the study and in data analysis, as well as in the interpretation of study findings.
Classic methods of dealing with confounders in study planning are stratification and matching ( 13 , 14 ), as well as so-called propensity score matching (PSM) ( 11 ).
The best-known and most commonly used method of data analysis is regression analysis, e.g., linear, logistic, or Cox regression ( 15 ). This method is based on a mathematical model created in order to explain the probability that any particular outcome will arise as the combined result of the known confounders and the effect under study.
Regression analyses are used in the analysis of clinical or epidemiological data and are found in all commonly used statistical software packages. However, they are often used inappropriately because the prerequisites for their correct application have not been checked. They should not be used, for example, if the sample is too small, if the number of variables is too large, or if a correlation between the model variables makes the results uninterpretable ( 16 ).
Regression-discontinuity methods have been little used in medical research to date, but they can be helpful in the study of cause-and-effect relationships from observational data ( 17 ). Regression-discontinuity design is a quasi-experimental approach ( box 3 ) that was developed in educational psychology in the 1960s ( 18 ). It can be used when a threshold value of a continuous variable (the “assignment variable”) determines the treatment regimen to which each patient in the study is assigned ( box 4 ) .
Terms used to characterize experiments ( 18 )
- Experiment/trial A study in which an intervention is deliberately introduced in order to observe an effect.
- Randomized experiment/trial An experiment in which persons, patients, or other units of observation are randomly assigned to one of two or more treatment groups (or intervention groups).
- Quasi-experiment An experiment in which the units of observation are not randomly assigned to the treatment/intervention groups.
- Natural experiment A study in which a natural event (e.g., an earthquake) is compared with a comparison scenario.
- Non-experimental observational study A study in which the size and direction of the association between two variables is determined.
In the simplest case, that of a linear regression, the parameters in the following model are to be estimated:
y i = ß 0 + ß 1 z i + ß 2 (x i - x c ) + e i,
i from 1 to N represents the statistical units
y is the outcome
ß 0 is the y-intercept
z is a dichotomous variable (0, ) indicating whether the patient was treated ( 1 ) or not treated (0)
x is the assignment variable
x c is the threshold
ß 1 is the effect of treatment
ß 2 is the regression coefficient of the assignment variable
e is the random error
A possible assignment variable could be, for example, the serum cholesterol level: consider a study in which patients with a cholesterol level of 160 mg/dL or above are assigned to receive a therapy. Since the cholesterol level (the assignment variable) is subject to random measurement error, it can be assumed that patients whose level of cholesterol is close to the threshold (160 mg/dL) are randomly assigned to the different treatment regimens. Thus, in a small interval around the threshold value, the assignment of patients to treatment groups can effectively be considered random ( 18 ). This sample of patients with near-threshold measurements can thus be used for the analysis of treatment efficacy. For this line of argument to be valid, it must truly be the case that the value being measured is subject to measuring error, and that there is practically no difference between persons with measured values slightly below or slightly above threshold. Treatment allocation in this narrow range can be considered quasi-random.
This method can be applied if the following prerequisites are met:
- The assignment variable is a continuous variable that is measured before the treatment is provided. If the assignment variable is totally independent of the outcome and has no biological, medical, or epidemiological significance, the method is theoretically equivalent to an RCT ( 19 ).
- The treatment must not affect the assignment variable ( 18 ).
- The patients in the two treatment groups with near-threshold values of the assignment variable must be shown to be similar in their baseline properties, i.e., covariables, including possible confounders. This can be demonstrated either with statistical techniques or graphically ( 20 ).
- The range of the assignment variable in the vicinity of the threshold must be optimally set: it must be large enough to yield samples of adequate size in the treatment groups, yet small enough that the effect of the assignment variable itself does not alter the outcome being studied. Methods of choosing this range appropriately are available in the literature ( 21 , 22 ).
- The treatment can be decided upon solely on the basis of the assignment variable (deterministic regression-discontinuity methods), or on the basis of other clinical factors (fuzzy regression-discontinuity methods).
Example 1: The one-year mortality of neonates as a function of the intensity of medical and nursing care was to be studied, where the intensity of care was determined by a birth-weight threshold: infants with very low birth weight (<1500 g) (group A) were cared for more intensively than heavier infants (group B) ( 23 ). The question to be answered was whether the greater intensity of care in group A led to a difference in mortality between the two groups. It was assumed that children with birth weight near the threshold are identical in all other respects, and that their assignment to group A or group B is quasi-random, because the measured value (birth weight) is subject to a relatively small error. Thus, for example, one might compare children weighing 1450–1500 g to those weighing 1501–1550 g at birth to study whether, and how, a greater intensity of care affects mortality.
In this example, it is assumed that the variable “birth weight” has a random measuring error, and thus that neonates whose (true) weight is near the threshold will be randomly allocated to one or the other category. But birth weight itself is an important factor affecting infant mortality, with lower birth weight associated with higher mortality ( 23 ); thus, the interval taken around the threshold for the purpose of this study had to be kept narrow. The study, in fact, showed that the children treated more intensively because their birth weight was just below threshold had a lower mortality than those treated less intensively because their birth weight was just above threshold.
Example 2: A regression-discontinuity design was used to evaluate the effect of a measure taken by the Canadian government: the introduction of a minimum age of 19 years for alcohol consumption. The researchers compared the number of alcohol-related disorders and of violent attacks, accidents, and suicides under the influence of alcohol in the months leading up to (group A) and subsequent to (group B) the 19 th birthday of the persons involved. It was found that persons in group B had a greater number of alcohol-related inpatient treatments and emergency hospitalizations than persons in group A. With the aid of this quasi-experimental approach, the researchers were able to demonstrate the success of the measure ( 24 ). It may be assumed that the two groups differed only with respect to age, and not with respect to any other property affecting alcohol consumption.
Interrupted time series
Interrupted time series are a special type of regression-discontinuity design in which time is the assignment variable. The cutoff point is often an external event that is unambiguously identifiable as having occurred at a certain point in time, e.g., an industrial accident or a change in the law. A before-and-after comparison is made in which the analysis must still take adequate account of any relevant secular trends and seasonal fluctuations ( box 5 ) .
In the simplest case of a study involving an interrupted time series, the temporal sequence is analyzed with a piecewise regression. The following model is used to study both a shift in slope and a shift in the level of an outcome before and after an intervention, e.g., the introduction of a law banning smoking ( figure 2 ):
y = ß 0 + ß 1 × time + ß 2 × intervention + ß 3 × time × intervention + e,
y is the outcome, e.g., cardiovascular diseases
intervention is a dummy variable for the time before (0) and after (1) the intervention (e.g., smoking ban)
time is the time since the beginning of the study
ß 0 is the baseline incidence of cardiovascular diseases
ß 1 is the slope in the incidence of cardiovascular diseases over time before the introduction of the smoking ban
ß 2 is the change in the incidence level of cardiovascular diseases after the introduction of the smoking ban (level effect)
ß 3 is the change in the slope over time (cf. ß 1 ) after the introduction of the smoking ban (slope effect)
The prerequisites for the use of this method must be met ( 18 , 25 ):
- Interrupted time series are valid only if a single intervention took place in the period of the study.
- The time before the intervention must be clearly distinguishable from the time after the intervention.
- There is no required minimum number of data points, but studies with only a small number of data points or small effect sizes must be interpreted with caution. The power of a study is greatest when the number of data points before the intervention equals the number after the intervention ( 26 ).
- Although the equation in Box 5 has a linear specification, polynomial and other nonlinear regression models can be used as well. Meticulous study of the temporal sequence is very important when a nonlinear model is used.
- If an observation at time t —e.g., the monthly incidence of cardiovascular diseases—is correlated with previous observations (autoregression), then the appropriate statistical techniques must be used (autoregressive integrated moving average [ARIMA] models).
Example 1: In one study, the rates of acute hospitalization for cardiovascular diseases before and after the temporary closure of Heathrow Airport because of volcanic ash were determined to investigate the putative effect of aircraft noise ( 27 ). The intervention (airport closure) took place from 15 to 20 April 2010. The hospitalization rate was found to have decreased among persons living in the urban area with the most aircraft noise. The number of observation points was too low, however, to show a causal link conclusively.
Example 2: In another study, the rates of hospitalization before and after the implementation of a smoking ban (the intervention) in public areas in Italy were determined ( 28 ). The intervention occurred in January 2004 (the cutoff time). The number of hospitalizations for acute coronary events was measured from January 2002 to November 2006 ( figure 1 ) . The analysis took account of seasonal dependence, and an effect modification for two age groups—persons under age 70 and persons aged 70 and up—was determined as well. The hospitalization rate declined in the former group, but not the latter.
Age-standardized hospitalization rates for acute coronary events (ACE) in persons under age 70 before and after the implementation of a smoking ban in public places in Italy, studied with the corresponding methods ( 30 ). The observed and predicted rates are shown (circles and solid lines, respectively). The dashed lines show the seasonally adjusted trend in ACE before and after the introduction of the nationwide smoking ban.
The necessary distinction between causality and correlation is often emphasized in scientific discussions, yet it is often not applied strictly enough. Furthermore, causality in medicine and epidemiology is mostly probabilistic in nature, i.e., an intervention alters the probability that the event under study will take place. A good illustration of this principle is offered by research on the effects of radiation, in which a strict distinction is maintained between deterministic radiation damage on the one hand, and probabilistic (stochastic) radiation damage on the other ( 29 ). Deterministic radiation damage—radiation-induced burns or death—arises with certainty whenever a subject receives a certain radiation dose (usually a high one). On the other hand, the risk of cancer-related mortality after radiation exposure is a stochastic matter. Epidemiological observations and biological experiments should be evaluated in tandem to strengthen conclusions about probabilistic causality ( box 1 ) .
While RCTs still retain their importance as the gold standard of clinical research, they cannot always be carried out. Some indispensable knowledge can only be obtained from observational studies. Confounding factors must be eliminated, or at least accounted for, early on when such studies are planned. Moreover, the data that are obtained must be carefully analyzed. And, finally, a single observational study hardly ever suffices to establish a causal relationship.
In this article, we have presented two newer methods that are relatively simple and which, therefore, could easily be used more widely in medical and epidemiological research ( 30 ). Either one should be used only after the prerequisites for its applicability have been meticulously checked. In regression-discontinuity methods, the assumption of continuity must be verified: in other words, it must be checked whether other properties of the treatment and control groups are the same, or at least equally balanced. The rules of group assignment and the role played by the continuous assignment variable must be known as well. Regression-discontinuity methods can generate causal conclusions, but any such conclusion will not be generalizable if the treatment effects are heterogeneous over the range of the assignment variable. The estimate of effect size is applicable only in a small, predefined interval around the threshold value. It must also be checked whether the outcome and the assignment variable are in a linear relationship, and whether there is any interaction between the treatment and assignment variables that needs to be considered.
In the analysis of interrupted time series, the assumption of continuity must be tested as well. Furthermore, the method is valid only if the occurrence of any other intervention at the same time point as the one under study can be ruled out ( 20 ). Finally, the type of temporal sequence must be considered, and more complex statistical methods must be applied, as needed, to take such phenomena as autoregression into account.
Observational studies often suggest causal relationships that will then be either supported or rejected after further studies and experiments. Knowledge of the effects of radiation exposure was derived, at first, mainly from observations on victims of the Hiroshima and Nagasaki atomic bomb explosions ( 31 ). These findings were reinforced by further epidemiological studies on other populations exposed to radiation (e.g., through medical procedures or as an occupational hazard), by physical considerations, and by biological experiments ( 32 ). A classic example from the mid-19 th century is the observational study by Snow ( 33 ): until then, the biological cause of cholera was unknown. Snow found that there had to be a causal relationship between the contamination of a well and a subsequent outbreak of cholera. This new understanding led to improved hygienic measures, which did, indeed, prevent infection with the cholera pathogen. Cases such as these prove that it is sometimes reasonable to take action on the basis of an observational study alone ( 6 ). They also demonstrate, however, that further studies are necessary for the definitive establishment of a causal relationship.
The effect of a smoking ban on the incidence of cardiovascular diseases
- Causal inferences can be drawn from observational studies, as long as certain conditions are met.
- Confounding variables are a major impediment to the demonstration of causal links, as they can either obscure or mimic such a link.
- Random assignment leads to the even distribution of known and unknown confounders among the intervention groups that are being compared in the study.
- In the regression-discontinuity method, it is assumed that the assignment of patients to treatment groups is random with, in a small range of the assignment variable around the threshold, with the result that the confounders are randomly distributed as well.
- The interrupted time series is a variant of the regression-discontinuity method in which a given point in time splits the subjects into a before group and an after group, with random distribution of confounders to the two groups.
Translated from the original German by Ethan Taub, M.D.
Conflict of interest statement The authors state that they have no conflict of interest.
Counteracting Methodological Errors in Behavioral Research pp 39–54 Cite as
- Gideon J. Mellenbergh 2
- First Online: 17 May 2019
A substantial part of behavioral research is aimed at the testing of substantive hypotheses. In general, a hypothesis testing study investigates the causal influence of an independent variable (IV) on a dependent variable (DV) . The discussion is restricted to IVs that can be manipulated by the researcher, such as, experimental (E- ) and control (C- ) conditions. Association between IV and DV does not imply that the IV has a causal influence on the DV . The association can be spurious because it is caused by an other variable (OV). OVs that cause spurious associations come from the (1) participant, (2) research situation, and (3) reactions of the participants to the research situation. If participants select their own (E- or C- ) condition or others select a condition for them, the assignment to conditions is usually biased (e.g., males prefer the E-condition and females the C-condition), and participant variables (e.g., participants’ sex) may cause a spurious association between the IV and DV . This selection bias is a systematic error of a design. It is counteracted by random assignment of participants to conditions. Random assignment guarantees that all participant variables are related to the IV by chance, and turns systematic error into random error. Random errors decrease the precision of parameter estimates. Random error variance is reduced by including auxiliary variables into the randomized design. A randomized block design includes an auxiliary variable to divide the participants into relatively homogeneous blocks, and randomly assigns participants to the conditions per block. A covariate is an auxiliary variable that is used in the statistical analysis of the data to reduce the error variance. Cluster randomization randomly assigns clusters (e.g., classes of students) to conditions, which yields specific problems. Random assignment should not be confused with random selection. Random assignment controls for selection bias , whereas random selection makes possible to generalize study results of a sample to the population.
- Cluster randomization
- Cross-over design
- Independent and dependent variables
- Random assignment and random selection
- Randomized block design
This is a preview of subscription content, access via your institution .
- Available as PDF
- Read on any device
- Instant download
- Own it forever
- Available as EPUB and PDF
- Durable hardcover edition
- Dispatched in 3 to 5 business days
- Free shipping worldwide - see info
Tax calculation will be finalised at checkout
Purchases are for personal use only
Cox, D. R. (2006). Principles of statistical inference . Cambridge, UK: Cambridge University Press.
Hox, J. (2002). Multilevel analysis: Techniques and applications . Mahwah, NJ: Erlbaum.
Lai, K., & Kelley, K. (2012). Accuracy in parameter estimation for ANCOVA and ANOVA contrasts: Sample size planning via narrow confidence intervals. British Journal of Mathematical and Statistical Psychology, 65, 350–370.
PubMed Google Scholar
McNeish, D., Stapleton, L. M., & Silverman, R. D. (2017). On the unnecessary ubiquity of hierarchical linear modelling. Psychological Methods, 22, 114–140.
Murray, D. M., Varnell, S. P., & Blitstein, J. L. (2004). Design and analysis of group-randomized trials: A review of recent methodological developments. American Journal of Public Health, 94, 423–432.
PubMed PubMed Central Google Scholar
Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis . London, UK: Sage.
van Belle, G. (2002). Statistical rules of thumb . New York, NY: Wiley.
Authors and affiliations.
Emeritus Professor Psychological Methods, Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
Gideon J. Mellenbergh
You can also search for this author in PubMed Google Scholar
Correspondence to Gideon J. Mellenbergh .
Rights and permissions
Reprints and Permissions
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter.
Mellenbergh, G.J. (2019). Random Assignment. In: Counteracting Methodological Errors in Behavioral Research. Springer, Cham. https://doi.org/10.1007/978-3-030-12272-0_4
DOI : https://doi.org/10.1007/978-3-030-12272-0_4
Published : 17 May 2019
Publisher Name : Springer, Cham
Print ISBN : 978-3-319-74352-3
Online ISBN : 978-3-030-12272-0
eBook Packages : Behavioral Science and Psychology Behavioral Science and Psychology (R0)
Share this chapter
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Find a journal
- Publish with us
- 2.3 Analyzing Findings
- 1.1 What Is Psychology?
- 1.2 History of Psychology
- 1.3 Contemporary Psychology
- 1.4 Careers in Psychology
- Review Questions
- Critical Thinking Questions
- Personal Application Questions
- 2.1 Why Is Research Important?
- 2.2 Approaches to Research
- 3.1 Human Genetics
- 3.2 Cells of the Nervous System
- 3.3 Parts of the Nervous System
- 3.4 The Brain and Spinal Cord
- 3.5 The Endocrine System
- 4.1 What Is Consciousness?
- 4.2 Sleep and Why We Sleep
- 4.3 Stages of Sleep
- 4.4 Sleep Problems and Disorders
- 4.5 Substance Use and Abuse
- 4.6 Other States of Consciousness
- 5.1 Sensation versus Perception
- 5.2 Waves and Wavelengths
- 5.4 Hearing
- 5.5 The Other Senses
- 5.6 Gestalt Principles of Perception
- 6.1 What Is Learning?
- 6.2 Classical Conditioning
- 6.3 Operant Conditioning
- 6.4 Observational Learning (Modeling)
- 7.1 What Is Cognition?
- 7.2 Language
- 7.3 Problem Solving
- 7.4 What Are Intelligence and Creativity?
- 7.5 Measures of Intelligence
- 7.6 The Source of Intelligence
- 8.1 How Memory Functions
- 8.2 Parts of the Brain Involved with Memory
- 8.3 Problems with Memory
- 8.4 Ways to Enhance Memory
- 9.1 What Is Lifespan Development?
- 9.2 Lifespan Theories
- 9.3 Stages of Development
- 9.4 Death and Dying
- 10.1 Motivation
- 10.2 Hunger and Eating
- 10.3 Sexual Behavior, Sexuality, and Gender Identity
- 10.4 Emotion
- 11.1 What Is Personality?
- 11.2 Freud and the Psychodynamic Perspective
- 11.3 Neo-Freudians: Adler, Erikson, Jung, and Horney
- 11.4 Learning Approaches
- 11.5 Humanistic Approaches
- 11.6 Biological Approaches
- 11.7 Trait Theorists
- 11.8 Cultural Understandings of Personality
- 11.9 Personality Assessment
- 12.1 What Is Social Psychology?
- 12.2 Self-presentation
- 12.3 Attitudes and Persuasion
- 12.4 Conformity, Compliance, and Obedience
- 12.5 Prejudice and Discrimination
- 12.6 Aggression
- 12.7 Prosocial Behavior
- 13.1 What Is Industrial and Organizational Psychology?
- 13.2 Industrial Psychology: Selecting and Evaluating Employees
- 13.3 Organizational Psychology: The Social Dimension of Work
- 13.4 Human Factors Psychology and Workplace Design
- 14.1 What Is Stress?
- 14.2 Stressors
- 14.3 Stress and Illness
- 14.4 Regulation of Stress
- 14.5 The Pursuit of Happiness
- 15.1 What Are Psychological Disorders?
- 15.2 Diagnosing and Classifying Psychological Disorders
- 15.3 Perspectives on Psychological Disorders
- 15.4 Anxiety Disorders
- 15.5 Obsessive-Compulsive and Related Disorders
- 15.6 Posttraumatic Stress Disorder
- 15.7 Mood and Related Disorders
- 15.8 Schizophrenia
- 15.9 Dissociative Disorders
- 15.10 Disorders in Childhood
- 15.11 Personality Disorders
- 16.1 Mental Health Treatment: Past and Present
- 16.2 Types of Treatment
- 16.3 Treatment Modalities
- 16.4 Substance-Related and Addictive Disorders: A Special Case
- 16.5 The Sociocultural Model and Therapy Utilization
By the end of this section, you will be able to:
- Explain what a correlation coefficient tells us about the relationship between variables
- Recognize that correlation does not indicate a cause-and-effect relationship between variables
- Discuss our tendency to look for relationships between variables that do not really exist
- Explain random sampling and assignment of participants into experimental and control groups
- Discuss how experimenter or participant bias could affect the results of an experiment
- Identify independent and dependent variables
Did you know that as sales in ice cream increase, so does the overall rate of crime? Is it possible that indulging in your favorite flavor of ice cream could send you on a crime spree? Or, after committing crime do you think you might decide to treat yourself to a cone? There is no question that a relationship exists between ice cream and crime (e.g., Harper, 2013), but it would be pretty foolish to decide that one thing actually caused the other to occur.
It is much more likely that both ice cream sales and crime rates are related to the temperature outside. When the temperature is warm, there are lots of people out of their houses, interacting with each other, getting annoyed with one another, and sometimes committing crimes. Also, when it is warm outside, we are more likely to seek a cool treat like ice cream. How do we determine if there is indeed a relationship between two things? And when there is a relationship, how can we discern whether it is attributable to coincidence or causation?
Correlation means that there is a relationship between two or more variables (such as ice cream consumption and crime), but this relationship does not necessarily imply cause and effect. When two variables are correlated, it simply means that as one variable changes, so does the other. We can measure correlation by calculating a statistic known as a correlation coefficient. A correlation coefficient is a number from -1 to +1 that indicates the strength and direction of the relationship between variables. The correlation coefficient is usually represented by the letter r .
The number portion of the correlation coefficient indicates the strength of the relationship. The closer the number is to 1 (be it negative or positive), the more strongly related the variables are, and the more predictable changes in one variable will be as the other variable changes. The closer the number is to zero, the weaker the relationship, and the less predictable the relationships between the variables becomes. For instance, a correlation coefficient of 0.9 indicates a far stronger relationship than a correlation coefficient of 0.3. If the variables are not related to one another at all, the correlation coefficient is 0. The example above about ice cream and crime is an example of two variables that we might expect to have no relationship to each other.
The sign—positive or negative—of the correlation coefficient indicates the direction of the relationship ( Figure 2.12 ). A positive correlation means that the variables move in the same direction. Put another way, it means that as one variable increases so does the other, and conversely, when one variable decreases so does the other. A negative correlation means that the variables move in opposite directions. If two variables are negatively correlated, a decrease in one variable is associated with an increase in the other and vice versa.
The example of ice cream and crime rates is a positive correlation because both variables increase when temperatures are warmer. Other examples of positive correlations are the relationship between an individual’s height and weight or the relationship between a person’s age and number of wrinkles. One might expect a negative correlation to exist between someone’s tiredness during the day and the number of hours they slept the previous night: the amount of sleep decreases as the feelings of tiredness increase. In a real-world example of negative correlation, student researchers at the University of Minnesota found a weak negative correlation ( r = -0.29) between the average number of days per week that students got fewer than 5 hours of sleep and their GPA (Lowry, Dean, & Manders, 2010). Keep in mind that a negative correlation is not the same as no correlation. For example, we would probably find no correlation between hours of sleep and shoe size.
As mentioned earlier, correlations have predictive value. Imagine that you are on the admissions committee of a major university. You are faced with a huge number of applications, but you are able to accommodate only a small percentage of the applicant pool. How might you decide who should be admitted? You might try to correlate your current students’ college GPA with their scores on standardized tests like the SAT or ACT. By observing which correlations were strongest for your current students, you could use this information to predict relative success of those students who have applied for admission into the university.
Link to Learning
Manipulate this interactive scatterplot to practice your understanding of positive and negative correlation.
Correlation Does Not Indicate Causation
Correlational research is useful because it allows us to discover the strength and direction of relationships that exist between two variables. However, correlation is limited because establishing the existence of a relationship tells us little about cause and effect . While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable , is actually causing the systematic movement in our variables of interest. In the ice cream/crime rate example mentioned earlier, temperature is a confounding variable that could account for the relationship between the two variables.
Even when we cannot point to clear confounding variables, we should not assume that a correlation between two variables implies that one variable causes changes in another. This can be frustrating when a cause-and-effect relationship seems clear and intuitive. Think back to our discussion of the research done by the American Cancer Society and how their research projects were some of the first demonstrations of the link between smoking and cancer. It seems reasonable to assume that smoking causes cancer, but if we were limited to correlational research , we would be overstepping our bounds by making this assumption.
Unfortunately, people mistakenly make claims of causation as a function of correlations all the time. Such claims are especially common in advertisements and news stories. For example, recent research found that people who eat cereal on a regular basis achieve healthier weights than those who rarely eat cereal (Frantzen, Treviño, Echon, Garcia-Dominic, & DiMarco, 2013; Barton et al., 2005). Guess how the cereal companies report this finding. Does eating cereal really cause an individual to maintain a healthy weight, or are there other possible explanations, such as, someone at a healthy weight is more likely to regularly eat a healthy breakfast than someone who is obese or someone who avoids meals in an attempt to diet ( Figure 2.13 )? While correlational research is invaluable in identifying relationships among variables, a major limitation is the inability to establish causality. Psychologists want to make statements about cause and effect, but the only way to do that is to conduct an experiment to answer a research question. The next section describes how scientific experiments incorporate methods that eliminate, or control for, alternative explanations, which allow researchers to explore how changes in one variable cause changes in another variable.
The temptation to make erroneous cause-and-effect statements based on correlational research is not the only way we tend to misinterpret data. We also tend to make the mistake of illusory correlations, especially with unsystematic observations. Illusory correlations , or false correlations, occur when people believe that relationships exist between two things when no such relationship exists. One well-known illusory correlation is the supposed effect that the moon’s phases have on human behavior. Many people passionately assert that human behavior is affected by the phase of the moon, and specifically, that people act strangely when the moon is full ( Figure 2.14 ).
There is no denying that the moon exerts a powerful influence on our planet. The ebb and flow of the ocean’s tides are tightly tied to the gravitational forces of the moon. Many people believe, therefore, that it is logical that we are affected by the moon as well. After all, our bodies are largely made up of water. A meta-analysis of nearly 40 studies consistently demonstrated, however, that the relationship between the moon and our behavior does not exist (Rotton & Kelly, 1985). While we may pay more attention to odd behavior during the full phase of the moon, the rates of odd behavior remain constant throughout the lunar cycle.
Why are we so apt to believe in illusory correlations like this? Often we read or hear about them and simply accept the information as valid. Or, we have a hunch about how something works and then look for evidence to support that hunch, ignoring evidence that would tell us our hunch is false; this is known as confirmation bias . Other times, we find illusory correlations based on the information that comes most easily to mind, even if that information is severely limited. And while we may feel confident that we can use these relationships to better understand and predict the world around us, illusory correlations can have significant drawbacks. For example, research suggests that illusory correlations—in which certain behaviors are inaccurately attributed to certain groups—are involved in the formation of prejudicial attitudes that can ultimately lead to discriminatory behavior (Fiedler, 2004).
Causality: Conducting Experiments and Using the Data
As you’ve learned, the only way to establish that there is a cause-and-effect relationship between two variables is to conduct a scientific experiment . Experiment has a different meaning in the scientific context than in everyday life. In everyday conversation, we often use it to describe trying something for the first time, such as experimenting with a new hair style or a new food. However, in the scientific context, an experiment has precise requirements for design and implementation.
The Experimental Hypothesis
In order to conduct an experiment, a researcher must have a specific hypothesis to be tested. As you’ve learned, hypotheses can be formulated either through direct observation of the real world or after careful review of previous research. For example, if you think that the use of technology in the classroom has negative impacts on learning, then you have basically formulated a hypothesis—namely, that the use of technology in the classroom should be limited because it decreases learning. How might you have arrived at this particular hypothesis? You may have noticed that your classmates who take notes on their laptops perform at lower levels on class exams than those who take notes by hand, or those who receive a lesson via a computer program versus via an in-person teacher have different levels of performance when tested ( Figure 2.15 ).
These sorts of personal observations are what often lead us to formulate a specific hypothesis, but we cannot use limited personal observations and anecdotal evidence to rigorously test our hypothesis. Instead, to find out if real-world data supports our hypothesis, we have to conduct an experiment.
Designing an Experiment
The most basic experimental design involves two groups: the experimental group and the control group. The two groups are designed to be the same except for one difference— experimental manipulation. The experimental group gets the experimental manipulation—that is, the treatment or variable being tested (in this case, the use of technology)—and the control group does not. Since experimental manipulation is the only difference between the experimental and control groups, we can be sure that any differences between the two are due to experimental manipulation rather than chance.
In our example of how the use of technology should be limited in the classroom, we have the experimental group learn algebra using a computer program and then test their learning. We measure the learning in our control group after they are taught algebra by a teacher in a traditional classroom. It is important for the control group to be treated similarly to the experimental group, with the exception that the control group does not receive the experimental manipulation.
We also need to precisely define, or operationalize, how we measure learning of algebra. An operational definition is a precise description of our variables, and it is important in allowing others to understand exactly how and what a researcher measures in a particular experiment. In operationalizing learning, we might choose to look at performance on a test covering the material on which the individuals were taught by the teacher or the computer program. We might also ask our participants to summarize the information that was just presented in some way. Whatever we determine, it is important that we operationalize learning in such a way that anyone who hears about our study for the first time knows exactly what we mean by learning. This aids peoples’ ability to interpret our data as well as their capacity to repeat our experiment should they choose to do so.
Once we have operationalized what is considered use of technology and what is considered learning in our experiment participants, we need to establish how we will run our experiment. In this case, we might have participants spend 45 minutes learning algebra (either through a computer program or with an in-person math teacher) and then give them a test on the material covered during the 45 minutes.
Ideally, the people who score the tests are unaware of who was assigned to the experimental or control group, in order to control for experimenter bias. Experimenter bias refers to the possibility that a researcher’s expectations might skew the results of the study. Remember, conducting an experiment requires a lot of planning, and the people involved in the research project have a vested interest in supporting their hypotheses. If the observers knew which child was in which group, it might influence how they interpret ambiguous responses, such as sloppy handwriting or minor computational mistakes. By being blind to which child is in which group, we protect against those biases. This situation is a single-blind study , meaning that one of the groups (participants) are unaware as to which group they are in (experiment or control group) while the researcher who developed the experiment knows which participants are in each group.
In a double-blind study , both the researchers and the participants are blind to group assignments. Why would a researcher want to run a study where no one knows who is in which group? Because by doing so, we can control for both experimenter and participant expectations. If you are familiar with the phrase placebo effect , you already have some idea as to why this is an important consideration. The placebo effect occurs when people's expectations or beliefs influence or determine their experience in a given situation. In other words, simply expecting something to happen can actually make it happen.
The placebo effect is commonly described in terms of testing the effectiveness of a new medication. Imagine that you work in a pharmaceutical company, and you think you have a new drug that is effective in treating depression. To demonstrate that your medication is effective, you run an experiment with two groups: The experimental group receives the medication, and the control group does not. But you don’t want participants to know whether they received the drug or not.
Why is that? Imagine that you are a participant in this study, and you have just taken a pill that you think will improve your mood. Because you expect the pill to have an effect, you might feel better simply because you took the pill and not because of any drug actually contained in the pill—this is the placebo effect.
To make sure that any effects on mood are due to the drug and not due to expectations, the control group receives a placebo (in this case a sugar pill). Now everyone gets a pill, and once again neither the researcher nor the experimental participants know who got the drug and who got the sugar pill. Any differences in mood between the experimental and control groups can now be attributed to the drug itself rather than to experimenter bias or participant expectations ( Figure 2.16 ).
Independent and Dependent Variables
In a research experiment, we strive to study whether changes in one thing cause changes in another. To achieve this, we must pay attention to two important variables, or things that can be changed, in any experimental study: the independent variable and the dependent variable. An independent variable is manipulated or controlled by the experimenter. In a well-designed experimental study, the independent variable is the only important difference between the experimental and control groups. In our example of how technology use in the classroom affects learning, the independent variable is the type of learning by participants in the study ( Figure 2.17 ). A dependent variable is what the researcher measures to see how much effect the independent variable had. In our example, the dependent variable is the learning exhibited by our participants.
We expect that the dependent variable will change as a function of the independent variable. In other words, the dependent variable depends on the independent variable. A good way to think about the relationship between the independent and dependent variables is with this question: What effect does the independent variable have on the dependent variable? Returning to our example, what is the effect of being taught a lesson through a computer program versus through an in-person instructor?
Selecting and Assigning Experimental Participants
Now that our study is designed, we need to obtain a sample of individuals to include in our experiment. Our study involves human participants so we need to determine who to include. Participants are the subjects of psychological research, and as the name implies, individuals who are involved in psychological research actively participate in the process. Often, psychological research projects rely on college students to serve as participants. In fact, the vast majority of research in psychology subfields has historically involved students as research participants (Sears, 1986; Arnett, 2008). But are college students truly representative of the general population? College students tend to be younger, more educated, more liberal, and less diverse than the general population. Although using students as test subjects is an accepted practice, relying on such a limited pool of research participants can be problematic because it is difficult to generalize findings to the larger population.
Our hypothetical experiment involves high school students, and we must first generate a sample of students. Samples are used because populations are usually too large to reasonably involve every member in our particular experiment ( Figure 2.18 ). If possible, we should use a random sample (there are other types of samples, but for the purposes of this chapter, we will focus on random samples). A random sample is a subset of a larger population in which every member of the population has an equal chance of being selected. Random samples are preferred because if the sample is large enough we can be reasonably sure that the participating individuals are representative of the larger population. This means that the percentages of characteristics in the sample—sex, ethnicity, socioeconomic level, and any other characteristics that might affect the results—are close to those percentages in the larger population.
In our example, let’s say we decide our population of interest is algebra students. But all algebra students is a very large population, so we need to be more specific; instead we might say our population of interest is all algebra students in a particular city. We should include students from various income brackets, family situations, races, ethnicities, religions, and geographic areas of town. With this more manageable population, we can work with the local schools in selecting a random sample of around 200 algebra students who we want to participate in our experiment.
In summary, because we cannot test all of the algebra students in a city, we want to find a group of about 200 that reflects the composition of that city. With a representative group, we can generalize our findings to the larger population without fear of our sample being biased in some way.
Now that we have a sample, the next step of the experimental process is to split the participants into experimental and control groups through random assignment. With random assignment , all participants have an equal chance of being assigned to either group. There is statistical software that will randomly assign each of the algebra students in the sample to either the experimental or the control group.
Random assignment is critical for sound experimental design . With sufficiently large samples, random assignment makes it unlikely that there are systematic differences between the groups. So, for instance, it would be very unlikely that we would get one group composed entirely of males, a given ethnic identity, or a given religious ideology. This is important because if the groups were systematically different before the experiment began, we would not know the origin of any differences we find between the groups: Were the differences preexisting, or were they caused by manipulation of the independent variable? Random assignment allows us to assume that any differences observed between experimental and control groups result from the manipulation of the independent variable.
Use this online random number generator to learn more about random sampling and assignments.
Issues to Consider
While experiments allow scientists to make cause-and-effect claims, they are not without problems. True experiments require the experimenter to manipulate an independent variable, and that can complicate many questions that psychologists might want to address. For instance, imagine that you want to know what effect sex (the independent variable) has on spatial memory (the dependent variable). Although you can certainly look for differences between males and females on a task that taps into spatial memory, you cannot directly control a person’s sex. We categorize this type of research approach as quasi-experimental and recognize that we cannot make cause-and-effect claims in these circumstances.
Experimenters are also limited by ethical constraints. For instance, you would not be able to conduct an experiment designed to determine if experiencing abuse as a child leads to lower levels of self-esteem among adults. To conduct such an experiment, you would need to randomly assign some experimental participants to a group that receives abuse, and that experiment would be unethical.
Interpreting Experimental Findings
Once data is collected from both the experimental and the control groups, a statistical analysis is conducted to find out if there are meaningful differences between the two groups. A statistical analysis determines how likely any difference found is due to chance (and thus not meaningful). For example, if an experiment is done on the effectiveness of a nutritional supplement, and those taking a placebo pill (and not the supplement) have the same result as those taking the supplement, then the experiment has shown that the nutritional supplement is not effective. Generally, psychologists consider differences to be statistically significant if there is less than a five percent chance of observing them if the groups did not actually differ from one another. Stated another way, psychologists want to limit the chances of making “false positive” claims to five percent or less.
The greatest strength of experiments is the ability to assert that any significant differences in the findings are caused by the independent variable. This occurs because random selection, random assignment, and a design that limits the effects of both experimenter bias and participant expectancy should create groups that are similar in composition and treatment. Therefore, any difference between the groups is attributable to the independent variable, and now we can finally make a causal statement. If we find that watching a violent television program results in more violent behavior than watching a nonviolent program, we can safely say that watching violent television programs causes an increase in the display of violent behavior.
When psychologists complete a research project, they generally want to share their findings with other scientists. The American Psychological Association (APA) publishes a manual detailing how to write a paper for submission to scientific journals. Unlike an article that might be published in a magazine like Psychology Today, which targets a general audience with an interest in psychology, scientific journals generally publish peer-reviewed journal articles aimed at an audience of professionals and scholars who are actively involved in research themselves.
The Online Writing Lab (OWL) at Purdue University can walk you through the APA writing guidelines.
A peer-reviewed journal article is read by several other scientists (generally anonymously) with expertise in the subject matter. These peer reviewers provide feedback—to both the author and the journal editor—regarding the quality of the draft. Peer reviewers look for a strong rationale for the research being described, a clear description of how the research was conducted, and evidence that the research was conducted in an ethical manner. They also look for flaws in the study's design, methods, and statistical analyses. They check that the conclusions drawn by the authors seem reasonable given the observations made during the research. Peer reviewers also comment on how valuable the research is in advancing the discipline’s knowledge. This helps prevent unnecessary duplication of research findings in the scientific literature and, to some extent, ensures that each research article provides new information. Ultimately, the journal editor will compile all of the peer reviewer feedback and determine whether the article will be published in its current state (a rare occurrence), published with revisions, or not accepted for publication.
Peer review provides some degree of quality control for psychological research. Poorly conceived or executed studies can be weeded out, and even well-designed research can be improved by the revisions suggested. Peer review also ensures that the research is described clearly enough to allow other scientists to replicate it, meaning they can repeat the experiment using different samples to determine reliability. Sometimes replications involve additional measures that expand on the original finding. In any case, each replication serves to provide more evidence to support the original research findings. Successful replications of published research make scientists more apt to adopt those findings, while repeated failures tend to cast doubt on the legitimacy of the original article and lead scientists to look elsewhere. For example, it would be a major advancement in the medical field if a published study indicated that taking a new drug helped individuals achieve a healthy weight without changing their diet. But if other scientists could not replicate the results, the original study’s claims would be questioned.
In recent years, there has been increasing concern about a “replication crisis” that has affected a number of scientific fields, including psychology. Some of the most well-known studies and scientists have produced research that has failed to be replicated by others (as discussed in Shrout & Rodgers, 2018). In fact, even a famous Nobel Prize-winning scientist has recently retracted a published paper because she had difficulty replicating her results (Nobel Prize-winning scientist Frances Arnold retracts paper, 2020 January 3). These kinds of outcomes have prompted some scientists to begin to work together and more openly, and some would argue that the current “crisis” is actually improving the ways in which science is conducted and in how its results are shared with others (Aschwanden, 2018).
The Vaccine-Autism Myth and Retraction of Published Studies
Some scientists have claimed that routine childhood vaccines cause some children to develop autism, and, in fact, several peer-reviewed publications published research making these claims. Since the initial reports, large-scale epidemiological research has indicated that vaccinations are not responsible for causing autism and that it is much safer to have your child vaccinated than not. Furthermore, several of the original studies making this claim have since been retracted.
A published piece of work can be rescinded when data is called into question because of falsification, fabrication, or serious research design problems. Once rescinded, the scientific community is informed that there are serious problems with the original publication. Retractions can be initiated by the researcher who led the study, by research collaborators, by the institution that employed the researcher, or by the editorial board of the journal in which the article was originally published. In the vaccine-autism case, the retraction was made because of a significant conflict of interest in which the leading researcher had a financial interest in establishing a link between childhood vaccines and autism (Offit, 2008). Unfortunately, the initial studies received so much media attention that many parents around the world became hesitant to have their children vaccinated ( Figure 2.19 ). Continued reliance on such debunked studies has significant consequences. For instance, between January and October of 2019, there were 22 measles outbreaks across the United States and more than a thousand cases of individuals contracting measles (Patel et al., 2019). This is likely due to the anti-vaccination movements that have risen from the debunked research. For more information about how the vaccine/autism story unfolded, as well as the repercussions of this story, take a look at Paul Offit’s book, Autism’s False Prophets: Bad Science, Risky Medicine, and the Search for a Cure.
Reliability and Validity
Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways. There are a number of different types of reliability. Some of these include inter-rater reliability (the degree to which two or more different observers agree on what has been observed), internal consistency (the degree to which different items on a survey that measure the same thing correlate with one another), and test-retest reliability (the degree to which the outcomes of a particular measure remain consistent over multiple administrations).
Unfortunately, being consistent in measurement does not necessarily mean that you have measured something correctly. To illustrate this concept, consider a kitchen scale that would be used to measure the weight of cereal that you eat in the morning. If the scale is not properly calibrated, it may consistently under- or overestimate the amount of cereal that’s being measured. While the scale is highly reliable in producing consistent results (e.g., the same amount of cereal poured onto the scale produces the same reading each time), those results are incorrect. This is where validity comes into play. Validity refers to the extent to which a given instrument or tool accurately measures what it’s supposed to measure, and once again, there are a number of ways in which validity can be expressed. Ecological validity (the degree to which research results generalize to real-world applications), construct validity (the degree to which a given variable actually captures or measures what it is intended to measure), and face validity (the degree to which a given variable seems valid on the surface) are just a few types that researchers consider. While any valid measure is by necessity reliable, the reverse is not necessarily true. Researchers strive to use instruments that are both highly reliable and valid.
How valid are the sat and act.
Standardized tests like the SAT and ACT are supposed to measure an individual’s aptitude for a college education, but how reliable and valid are such tests? Research conducted by the College Board suggests that scores on the SAT have high predictive validity for first-year college students’ GPA (Kobrin, Patterson, Shaw, Mattern, & Barbuti, 2008). In this context, predictive validity refers to the test’s ability to effectively predict the GPA of college freshmen. Given that many institutions of higher education require the SAT or ACT for admission, this high degree of predictive validity might be comforting.
However, the emphasis placed on SAT or ACT scores in college admissions is changing based on a number of factors. For one, some researchers assert that these tests are biased, and students from historically marginalized populations are at a disadvantage that unfairly reduces the likelihood of being admitted into a college (Santelices & Wilson, 2010). Additionally, some research has suggested that the predictive validity of these tests is grossly exaggerated in how well they are able to predict the GPA of first-year college students. In fact, it has been suggested that the SAT’s predictive validity may be overestimated by as much as 150% (Rothstein, 2004). Many institutions of higher education are beginning to consider de-emphasizing the significance of SAT scores in making admission decisions (Rimer, 2008).
Recent examples of high profile cheating scandals both domestically and abroad have only increased the scrutiny being placed on these types of tests, and as of March 2019, more than 1000 institutions of higher education have either relaxed or eliminated the requirements for SAT or ACT testing for admissions (Strauss, 2019, March 19).
As an Amazon Associate we earn from qualifying purchases.
Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.
Access for free at https://openstax.org/books/psychology-2e/pages/1-introduction
- Authors: Rose M. Spielman, William J. Jenkins, Marilyn D. Lovett
- Publisher/website: OpenStax
- Book title: Psychology 2e
- Publication date: Apr 22, 2020
- Location: Houston, Texas
- Book URL: https://openstax.org/books/psychology-2e/pages/1-introduction
- Section URL: https://openstax.org/books/psychology-2e/pages/2-3-analyzing-findings
© Jun 28, 2023 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.
Arcu felis bibendum ut tristique et egestas quis:
- Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
- Duis aute irure dolor in reprehenderit in voluptate
- Excepteur sint occaecat cupidatat non proident
3.4 - experimental and observational studies.
Now that Jaylen can weigh the different sampling strategies, he might want to consider the type of study he is conduction. As a note, for students interested in research designs, please consult STAT 503 for a much more in-depth discussion. However, for this example, we will simply distinguish between experimental and observational studies.
Now that we know how to collect data, the next step is to determine the type of study. The type of study will determine what type of relationship we can conclude.
There are predominantly two different types of studies:
Let's say that there is an option to take quizzes throughout this class. In an observational study , we may find that better students tend to take the quizzes and do better on exams. Consequently, we might conclude that there may be a relationship between quizzes and exam scores.
In an experimental study , we would randomly assign quizzes to specific students to look for improvements. In other words, we would look to see whether taking quizzes causes higher exam scores.
It is very important to distinguish between observational and experimental studies since one has to be very skeptical about drawing cause and effect conclusions using observational studies. The use of random assignment of treatments (i.e. what distinguishes an experimental study from an observational study) allows one to employ cause and effect conclusions.
Ethics is an important aspect of experimental design to keep in mind. For example, the original relationship between smoking and lung cancer was based on an observational study and not an assignment of smoking behavior.
Frequently asked questions
What is random assignment.
In experimental research, random assignment is a way of placing participants from your sample into different groups using randomization. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.
Frequently asked questions: Methodology
Attrition refers to participants leaving a study. It always happens to some extent—for example, in randomized controlled trials for medical research.
Differential attrition occurs when attrition or dropout rates differ systematically between the intervention and the control group . As a result, the characteristics of the participants who drop out differ from the characteristics of those who stay in the study. Because of this, study results may be biased .
Action research is conducted in order to solve a particular issue immediately, while case studies are often conducted over a longer period of time and focus more on observing and analyzing a particular ongoing phenomenon.
Action research is focused on solving a problem or informing individual and community-based knowledge in a way that impacts teaching, learning, and other related processes. It is less focused on contributing theoretical input, instead producing actionable input.
Action research is particularly popular with educators as a form of systematic inquiry because it prioritizes reflection and bridges the gap between theory and practice. Educators are able to simultaneously investigate an issue as they solve it, and the method is very iterative and flexible.
A cycle of inquiry is another name for action research . It is usually visualized in a spiral shape following a series of steps, such as “planning → acting → observing → reflecting.”
To make quantitative observations , you need to use instruments that are capable of measuring the quantity you want to observe. For example, you might use a ruler to measure the length of an object or a thermometer to measure its temperature.
Criterion validity and construct validity are both types of measurement validity . In other words, they both show you how accurately a method measures something.
While construct validity is the degree to which a test or other measurement method measures what it claims to measure, criterion validity is the degree to which a test can predictively (in the future) or concurrently (in the present) measure something.
Construct validity is often considered the overarching type of measurement validity . You need to have face validity , content validity , and criterion validity in order to achieve construct validity.
Convergent validity and discriminant validity are both subtypes of construct validity . Together, they help you evaluate whether a test measures the concept it was designed to measure.
- Convergent validity indicates whether a test that is designed to measure a particular construct correlates with other tests that assess the same or similar construct.
- Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related. This type of validity is also called divergent validity .
You need to assess both in order to demonstrate construct validity. Neither one alone is sufficient for establishing construct validity.
- Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related
Content validity shows you how accurately a test or other measurement method taps into the various aspects of the specific construct you are researching.
In other words, it helps you answer the question: “does the test measure all aspects of the construct I want to measure?” If it does, then the test has high content validity.
The higher the content validity, the more accurate the measurement of the construct.
If the test fails to include parts of the construct, or irrelevant parts are included, the validity of the instrument is threatened, which brings your results into question.
Face validity and content validity are similar in that they both evaluate how suitable the content of a test is. The difference is that face validity is subjective, and assesses content at surface level.
When a test has strong face validity, anyone would agree that the test’s questions appear to measure what they are intended to measure.
For example, looking at a 4th grade math test consisting of problems in which students have to add and multiply, most people would agree that it has strong face validity (i.e., it looks like a math test).
On the other hand, content validity evaluates how well a test represents all the aspects of a topic. Assessing content validity is more systematic and relies on expert evaluation. of each question, analyzing whether each one covers the aspects that the test was designed to cover.
A 4th grade math test would have high content validity if it covered all the skills taught in that grade. Experts(in this case, math teachers), would have to evaluate the content validity by comparing the test to the learning objectives.
Snowball sampling is a non-probability sampling method . Unlike probability sampling (which involves some form of random selection ), the initial individuals selected to be studied are the ones who recruit new participants.
Because not every member of the target population has an equal chance of being recruited into the sample, selection in snowball sampling is non-random.
Snowball sampling is a non-probability sampling method , where there is not an equal chance for every member of the population to be included in the sample .
This means that you cannot use inferential statistics and make generalizations —often the goal of quantitative research . As such, a snowball sample is not representative of the target population and is usually a better fit for qualitative research .
Snowball sampling relies on the use of referrals. Here, the researcher recruits one or more initial participants, who then recruit the next ones.
Participants share similar characteristics and/or know each other. Because of this, not every member of the population has an equal chance of being included in the sample, giving rise to sampling bias .
Snowball sampling is best used in the following cases:
- If there is no sampling frame available (e.g., people with a rare disease)
- If the population of interest is hard to access or locate (e.g., people experiencing homelessness)
- If the research focuses on a sensitive topic (e.g., extramarital affairs)
The reproducibility and replicability of a study can be ensured by writing a transparent, detailed method section and using clear, unambiguous language.
Reproducibility and replicability are related terms.
- Reproducing research entails reanalyzing the existing data in the same manner.
- Replicating (or repeating ) the research entails reconducting the entire analysis, including the collection of new data .
- A successful reproduction shows that the data analyses were conducted in a fair and honest manner.
- A successful replication shows that the reliability of the results is high.
Stratified sampling and quota sampling both involve dividing the population into subgroups and selecting units from each subgroup. The purpose in both cases is to select a representative sample and/or to allow comparisons between subgroups.
The main difference is that in stratified sampling, you draw a random sample from each subgroup ( probability sampling ). In quota sampling you select a predetermined number or proportion of units, in a non-random manner ( non-probability sampling ).
Purposive and convenience sampling are both sampling methods that are typically used in qualitative data collection.
A convenience sample is drawn from a source that is conveniently accessible to the researcher. Convenience sampling does not distinguish characteristics among the participants. On the other hand, purposive sampling focuses on selecting participants possessing characteristics associated with the research study.
The findings of studies based on either convenience or purposive sampling can only be generalized to the (sub)population from which the sample is drawn, and not to the entire population.
Random sampling or probability sampling is based on random selection. This means that each unit has an equal chance (i.e., equal probability) of being included in the sample.
On the other hand, convenience sampling involves stopping people at random, which means that not everyone has an equal chance of being selected depending on the place, time, or day you are collecting your data.
Convenience sampling and quota sampling are both non-probability sampling methods. They both use non-random criteria like availability, geographical proximity, or expert knowledge to recruit study participants.
However, in convenience sampling, you continue to sample units or cases until you reach the required sample size.
In quota sampling, you first need to divide your population of interest into subgroups (strata) and estimate their proportions (quota) in the population. Then you can start your data collection, using convenience sampling to recruit participants, until the proportions in each subgroup coincide with the estimated proportions in the population.
A sampling frame is a list of every member in the entire population . It is important that the sampling frame is as complete as possible, so that your sample accurately reflects your population.
Stratified and cluster sampling may look similar, but bear in mind that groups created in cluster sampling are heterogeneous , so the individual characteristics in the cluster vary. In contrast, groups created in stratified sampling are homogeneous , as units share characteristics.
Relatedly, in cluster sampling you randomly select entire groups and include all units of each group in your sample. However, in stratified sampling, you select some units of all groups and include them in your sample. In this way, both methods can ensure that your sample is representative of the target population .
A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.
The key difference between observational studies and experimental designs is that a well-done observational study does not influence the responses of participants, while experiments do have some sort of treatment condition applied to at least some participants by random assignment .
An observational study is a great choice for you if your research question is based purely on observations. If there are ethical, logistical, or practical concerns that prevent you from conducting a traditional experiment , an observational study may be a good choice. In an observational study, there is no interference or manipulation of the research subjects, as well as no control or treatment groups .
It’s often best to ask a variety of people to review your measurements. You can ask experts, such as other researchers, or laypeople, such as potential participants, to judge the face validity of tests.
While experts have a deep understanding of research methods , the people you’re studying can provide you with valuable insights you may have missed otherwise.
Face validity is important because it’s a simple first step to measuring the overall validity of a test or technique. It’s a relatively intuitive, quick, and easy way to start checking whether a new measure seems useful at first glance.
Good face validity means that anyone who reviews your measure says that it seems to be measuring what it’s supposed to. With poor face validity, someone reviewing your measure may be left confused about what you’re measuring and why you’re using this method.
Face validity is about whether a test appears to measure what it’s supposed to measure. This type of validity is concerned with whether a measure seems relevant and appropriate for what it’s assessing only on the surface.
Statistical analyses are often applied to test validity with data from your measures. You test convergent validity and discriminant validity with correlations to see if results from your test are positively or negatively related to those of other established tests.
You can also use regression analyses to assess whether your measure is actually predictive of outcomes that you expect it to predict theoretically. A regression analysis that supports your expectations strengthens your claim of construct validity .
When designing or evaluating a measure, construct validity helps you ensure you’re actually measuring the construct you’re interested in. If you don’t have construct validity, you may inadvertently measure unrelated or distinct constructs and lose precision in your research.
Construct validity is often considered the overarching type of measurement validity , because it covers all of the other types. You need to have face validity , content validity , and criterion validity to achieve construct validity.
Construct validity is about how well a test measures the concept it was designed to evaluate. It’s one of four types of measurement validity , which includes construct validity, face validity , and criterion validity.
There are two subtypes of construct validity.
- Convergent validity : The extent to which your measure corresponds to measures of related constructs
- Discriminant validity : The extent to which your measure is unrelated or negatively related to measures of distinct constructs
Naturalistic observation is a valuable tool because of its flexibility, external validity , and suitability for topics that can’t be studied in a lab setting.
The downsides of naturalistic observation include its lack of scientific control , ethical considerations , and potential for bias from observers and subjects.
Naturalistic observation is a qualitative research method where you record the behaviors of your research subjects in real world settings. You avoid interfering or influencing anything in a naturalistic observation.
You can think of naturalistic observation as “people watching” with a purpose.
A dependent variable is what changes as a result of the independent variable manipulation in experiments . It’s what you’re interested in measuring, and it “depends” on your independent variable.
In statistics, dependent variables are also called:
- Response variables (they respond to a change in another variable)
- Outcome variables (they represent the outcome you want to measure)
- Left-hand-side variables (they appear on the left-hand side of a regression equation)
An independent variable is the variable you manipulate, control, or vary in an experimental study to explore its effects. It’s called “independent” because it’s not influenced by any other variables in the study.
Independent variables are also called:
- Explanatory variables (they explain an event or outcome)
- Predictor variables (they can be used to predict the value of a dependent variable)
- Right-hand-side variables (they appear on the right-hand side of a regression equation).
As a rule of thumb, questions related to thoughts, beliefs, and feelings work well in focus groups. Take your time formulating strong questions, paying special attention to phrasing. Be careful to avoid leading questions , which can bias your responses.
Overall, your focus group questions should be:
- Open-ended and flexible
- Impossible to answer with “yes” or “no” (questions that start with “why” or “how” are often best)
- Unambiguous, getting straight to the point while still stimulating discussion
- Unbiased and neutral
A structured interview is a data collection method that relies on asking questions in a set order to collect data on a topic. They are often quantitative in nature. Structured interviews are best used when:
- You already have a very clear understanding of your topic. Perhaps significant research has already been conducted, or you have done some prior research yourself, but you already possess a baseline for designing strong structured questions.
- You are constrained in terms of time or resources and need to analyze your data quickly and efficiently.
- Your research question depends on strong parity between participants, with environmental conditions held constant.
More flexible interview options include semi-structured interviews , unstructured interviews , and focus groups .
Social desirability bias is the tendency for interview participants to give responses that will be viewed favorably by the interviewer or other participants. It occurs in all types of interviews and surveys , but is most common in semi-structured interviews , unstructured interviews , and focus groups .
Social desirability bias can be mitigated by ensuring participants feel at ease and comfortable sharing their views. Make sure to pay attention to your own body language and any physical or verbal cues, such as nodding or widening your eyes.
This type of bias can also occur in observations if the participants know they’re being observed. They might alter their behavior accordingly.
The interviewer effect is a type of bias that emerges when a characteristic of an interviewer (race, age, gender identity, etc.) influences the responses given by the interviewee.
There is a risk of an interviewer effect in all types of interviews , but it can be mitigated by writing really high-quality interview questions.
A semi-structured interview is a blend of structured and unstructured types of interviews. Semi-structured interviews are best used when:
- You have prior interview experience. Spontaneous questions are deceptively challenging, and it’s easy to accidentally ask a leading question or make a participant uncomfortable.
- Your research question is exploratory in nature. Participant answers can guide future research questions and help you develop a more robust knowledge base for future research.
An unstructured interview is the most flexible type of interview, but it is not always the best fit for your research topic.
Unstructured interviews are best used when:
- You are an experienced interviewer and have a very strong background in your research topic, since it is challenging to ask spontaneous, colloquial questions.
- Your research question is exploratory in nature. While you may have developed hypotheses, you are open to discovering new or shifting viewpoints through the interview process.
- You are seeking descriptive data, and are ready to ask questions that will deepen and contextualize your initial thoughts and hypotheses.
- Your research depends on forming connections with your participants and making them feel comfortable revealing deeper emotions, lived experiences, or thoughts.
The four most common types of interviews are:
- Structured interviews : The questions are predetermined in both topic and order.
- Semi-structured interviews : A few questions are predetermined, but other questions aren’t planned.
- Unstructured interviews : None of the questions are predetermined.
- Focus group interviews : The questions are presented to a group instead of one individual.
Deductive reasoning is commonly used in scientific research, and it’s especially associated with quantitative research .
In research, you might have come across something called the hypothetico-deductive method . It’s the scientific method of testing hypotheses to check whether your predictions are substantiated by real-world data.
Deductive reasoning is a logical approach where you progress from general ideas to specific conclusions. It’s often contrasted with inductive reasoning , where you start with specific observations and form general conclusions.
Deductive reasoning is also called deductive logic.
There are many different types of inductive reasoning that people use formally or informally.
Here are a few common types:
- Inductive generalization : You use observations about a sample to come to a conclusion about the population it came from.
- Statistical generalization: You use specific numbers about samples to make statements about populations.
- Causal reasoning: You make cause-and-effect links between different things.
- Sign reasoning: You make a conclusion about a correlational relationship between different things.
- Analogical reasoning: You make a conclusion about something based on its similarities to something else.
Inductive reasoning is a bottom-up approach, while deductive reasoning is top-down.
Inductive reasoning takes you from the specific to the general, while in deductive reasoning, you make inferences by going from general premises to specific conclusions.
In inductive research , you start by making observations or gathering data. Then, you take a broad scan of your data and search for patterns. Finally, you make general conclusions that you might incorporate into theories.
Inductive reasoning is a method of drawing conclusions by going from the specific to the general. It’s usually contrasted with deductive reasoning, where you proceed from general information to specific conclusions.
Inductive reasoning is also called inductive logic or bottom-up reasoning.
A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.
A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).
Triangulation can help:
- Reduce research bias that comes from using a single method, theory, or investigator
- Enhance validity by approaching the same topic with different tools
- Establish credibility by giving you a complete picture of the research problem
But triangulation can also pose problems:
- It’s time-consuming and labor-intensive, often involving an interdisciplinary team.
- Your results may be inconsistent or even contradictory.
There are four main types of triangulation :
- Data triangulation : Using data from different times, spaces, and people
- Investigator triangulation : Involving multiple researchers in collecting or analyzing data
- Theory triangulation : Using varying theoretical perspectives in your research
- Methodological triangulation : Using different methodologies to approach the same topic
Many academic fields use peer review , largely to determine whether a manuscript is suitable for publication. Peer review enhances the credibility of the published manuscript.
However, peer review is also common in non-academic settings. The United Nations, the European Union, and many individual nations use peer review to evaluate grant applications. It is also widely used in medical and health-related fields as a teaching or quality-of-care measure.
Peer assessment is often used in the classroom as a pedagogical tool. Both receiving feedback and providing it are thought to enhance the learning process, helping students think critically and collaboratively.
Peer review can stop obviously problematic, falsified, or otherwise untrustworthy research from being published. It also represents an excellent opportunity to get feedback from renowned experts in your field. It acts as a first defense, helping you ensure your argument is clear and that there are no gaps, vague terms, or unanswered questions for readers who weren’t involved in the research process.
Peer-reviewed articles are considered a highly credible source due to this stringent process they go through before publication.
In general, the peer review process follows the following steps:
- First, the author submits the manuscript to the editor.
- Reject the manuscript and send it back to author, or
- Send it onward to the selected peer reviewer(s)
- Next, the peer review process occurs. The reviewer provides feedback, addressing any major or minor issues with the manuscript, and gives their advice regarding what edits should be made.
- Lastly, the edited manuscript is sent back to the author. They input the edits, and resubmit it to the editor for publication.
Exploratory research is often used when the issue you’re studying is new or when the data collection process is challenging for some reason.
You can use exploratory research if you have a general idea or a specific question that you want to study but there is no preexisting knowledge or paradigm with which to study it.
Exploratory research is a methodology approach that explores research questions that have not previously been studied in depth. It is often used when the issue you’re studying is new, or the data collection process is challenging in some way.
Explanatory research is used to investigate how or why a phenomenon occurs. Therefore, this type of research is often one of the first stages in the research process , serving as a jumping-off point for future research.
Exploratory research aims to explore the main aspects of an under-researched problem, while explanatory research aims to explain the causes and consequences of a well-defined problem.
Explanatory research is a research method used to investigate how or why something occurs when only a small amount of information is available pertaining to that topic. It can help you increase your understanding of a given topic.
Clean data are valid, accurate, complete, consistent, unique, and uniform. Dirty data include inconsistencies and errors.
Dirty data can come from any part of the research process, including poor research design , inappropriate measurement materials, or flawed data entry.
Data cleaning takes place between data collection and data analyses. But you can use some methods even before collecting data.
For clean data, you should start by designing measures that collect valid data. Data validation at the time of data entry or collection helps you minimize the amount of data cleaning you’ll need to do.
After data collection, you can use data standardization and data transformation to clean your data. You’ll also deal with any missing values, outliers, and duplicate values.
Every dataset requires different techniques to clean dirty data , but you need to address these issues in a systematic way. You focus on finding and resolving data points that don’t agree or fit with the rest of your dataset.
These data might be missing values, outliers, duplicate values, incorrectly formatted, or irrelevant. You’ll start with screening and diagnosing your data. Then, you’ll often standardize and accept or remove data to make your dataset consistent and valid.
Data cleaning is necessary for valid and appropriate analyses. Dirty data contain inconsistencies or errors , but cleaning your data helps you minimize or resolve these.
Without data cleaning, you could end up with a Type I or II error in your conclusion. These types of erroneous conclusions can be practically significant with important consequences, because they lead to misplaced investments or missed opportunities.
Data cleaning involves spotting and resolving potential data inconsistencies or errors to improve your data quality. An error is any value (e.g., recorded weight) that doesn’t reflect the true value (e.g., actual weight) of something that’s being measured.
In this process, you review, analyze, detect, modify, or remove “dirty” data to make your dataset “clean.” Data cleaning is also called data cleansing or data scrubbing.
Research misconduct means making up or falsifying data, manipulating data analyses, or misrepresenting results in research reports. It’s a form of academic fraud.
These actions are committed intentionally and can have serious consequences; research misconduct is not a simple mistake or a point of disagreement but a serious ethical failure.
Anonymity means you don’t know who the participants are, while confidentiality means you know who they are but remove identifying information from your research report. Both are important ethical considerations .
You can only guarantee anonymity by not collecting any personally identifying information—for example, names, phone numbers, email addresses, IP addresses, physical characteristics, photos, or videos.
You can keep data confidential by using aggregate information in your research report, so that you only refer to groups of participants rather than individuals.
Research ethics matter for scientific integrity, human rights and dignity, and collaboration between science and society. These principles make sure that participation in studies is voluntary, informed, and safe.
Ethical considerations in research are a set of principles that guide your research designs and practices. These principles include voluntary participation, informed consent, anonymity, confidentiality, potential for harm, and results communication.
Scientists and researchers must always adhere to a certain code of conduct when collecting data from others .
These considerations protect the rights of research participants, enhance research validity , and maintain scientific integrity.
In multistage sampling , you can use probability or non-probability sampling methods .
For a probability sample, you have to conduct probability sampling at every stage.
You can mix it up by using simple random sampling , systematic sampling , or stratified sampling to select units at different stages, depending on what is applicable and relevant to your study.
Multistage sampling can simplify data collection when you have large, geographically spread samples, and you can obtain a probability sample without a complete sampling frame.
But multistage sampling may not lead to a representative sample, and larger samples are needed for multistage samples to achieve the statistical properties of simple random samples .
These are four of the most common mixed methods designs :
- Convergent parallel: Quantitative and qualitative data are collected at the same time and analyzed separately. After both analyses are complete, compare your results to draw overall conclusions.
- Embedded: Quantitative and qualitative data are collected at the same time, but within a larger quantitative or qualitative design. One type of data is secondary to the other.
- Explanatory sequential: Quantitative data is collected and analyzed first, followed by qualitative data. You can use this design if you think your qualitative data will explain and contextualize your quantitative findings.
- Exploratory sequential: Qualitative data is collected and analyzed first, followed by quantitative data. You can use this design if you think the quantitative data will confirm or validate your qualitative findings.
Triangulation in research means using multiple datasets, methods, theories and/or investigators to address a research question. It’s a research strategy that can help you enhance the validity and credibility of your findings.
Triangulation is mainly used in qualitative research , but it’s also commonly applied in quantitative research . Mixed methods research always uses triangulation.
In multistage sampling , or multistage cluster sampling, you draw a sample from a population using smaller and smaller groups at each stage.
This method is often used to collect data from a large, geographically spread group of people in national surveys, for example. You take advantage of hierarchical groupings (e.g., from state to city to neighborhood) to create a sample that’s less expensive and time-consuming to collect data from.
No, the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation coefficient only tells you how closely your data fit on a line, so two datasets with the same correlation coefficient can have very different slopes.
To find the slope of the line, you’ll need to perform a regression analysis .
Correlation coefficients always range between -1 and 1.
The sign of the coefficient tells you the direction of the relationship: a positive value means the variables change together in the same direction, while a negative value means they change together in opposite directions.
The absolute value of a number is equal to the number without its sign. The absolute value of a correlation coefficient tells you the magnitude of the correlation: the greater the absolute value, the stronger the correlation.
These are the assumptions your data must meet if you want to use Pearson’s r :
- Both variables are on an interval or ratio level of measurement
- Data from both variables follow normal distributions
- Your data have no outliers
- Your data is from a random or representative sample
- You expect a linear relationship between the two variables
Quantitative research designs can be divided into two main categories:
- Correlational and descriptive designs are used to investigate characteristics, averages, trends, and associations between variables.
- Experimental and quasi-experimental designs are used to test causal relationships .
Qualitative research designs tend to be more flexible. Common types of qualitative design include case study , ethnography , and grounded theory designs.
A well-planned research design helps ensure that your methods match your research aims, that you collect high-quality data, and that you use the right kind of analysis to answer your questions, utilizing credible sources . This allows you to draw valid , trustworthy conclusions.
The priorities of a research design can vary depending on the field, but you usually have to specify:
- Your research questions and/or hypotheses
- Your overall approach (e.g., qualitative or quantitative )
- The type of design you’re using (e.g., a survey , experiment , or case study )
- Your sampling methods or criteria for selecting subjects
- Your data collection methods (e.g., questionnaires , observations)
- Your data collection procedures (e.g., operationalization , timing and data management)
- Your data analysis methods (e.g., statistical tests or thematic analysis )
A research design is a strategy for answering your research question . It defines your overall approach and determines how you will collect and analyze data.
Questionnaires can be self-administered or researcher-administered.
Self-administered questionnaires can be delivered online or in paper-and-pen formats, in person or through mail. All questions are standardized so that all respondents receive the same questions with identical wording.
Researcher-administered questionnaires are interviews that take place by phone, in-person, or online between researchers and respondents. You can gain deeper insights by clarifying questions for respondents or asking follow-up questions.
You can organize the questions logically, with a clear progression from simple to complex, or randomly between respondents. A logical flow helps respondents process the questionnaire easier and quicker, but it may lead to bias. Randomization can minimize the bias from order effects.
Closed-ended, or restricted-choice, questions offer respondents a fixed set of choices to select from. These questions are easier to answer quickly.
Open-ended or long-form questions allow respondents to answer in their own words. Because there are no restrictions on their choices, respondents can answer in ways that researchers may not have otherwise considered.
A questionnaire is a data collection tool or instrument, while a survey is an overarching research method that involves collecting and analyzing data from people using questionnaires.
The third variable and directionality problems are two main reasons why correlation isn’t causation .
The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not.
The directionality problem is when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other.
Correlation describes an association between variables : when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables.
Causation means that changes in one variable brings about changes in the other (i.e., there is a cause-and-effect relationship between variables). The two variables are correlated with each other, and there’s also a causal link between them.
While causation and correlation can exist simultaneously, correlation does not imply causation. In other words, correlation is simply a relationship where A relates to B—but A doesn’t necessarily cause B to happen (or vice versa). Mistaking correlation for causation is a common error and can lead to false cause fallacy .
Controlled experiments establish causality, whereas correlational studies only show associations between variables.
- In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
- In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.
In general, correlational research is high in external validity while experimental research is high in internal validity .
A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.
A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.
Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.
A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .
A correlation reflects the strength and/or direction of the association between two or more variables.
- A positive correlation means that both variables change in the same direction.
- A negative correlation means that the variables change in opposite directions.
- A zero correlation means there’s no relationship between the variables.
Random error is almost always present in scientific studies, even in highly controlled settings. While you can’t eradicate it completely, you can reduce random error by taking repeated measurements, using a large sample, and controlling extraneous variables .
You can avoid systematic error through careful design of your sampling , data collection , and analysis procedures. For example, use triangulation to measure your variables using multiple methods; regularly calibrate instruments or procedures; use random sampling and random assignment ; and apply masking (blinding) where possible.
Systematic error is generally a bigger problem in research.
With random error, multiple measurements will tend to cluster around the true value. When you’re collecting data from a large sample , the errors in different directions will cancel each other out.
Systematic errors are much more problematic because they can skew your data away from the true value. This can lead you to false conclusions ( Type I and II errors ) about the relationship between the variables you’re studying.
Random and systematic error are two types of measurement error.
Random error is a chance difference between the observed and true values of something (e.g., a researcher misreading a weighing scale records an incorrect measurement).
Systematic error is a consistent or proportional difference between the observed and true values of something (e.g., a miscalibrated scale consistently records weights as higher than they actually are).
On graphs, the explanatory variable is conventionally placed on the x-axis, while the response variable is placed on the y-axis.
- If you have quantitative variables , use a scatterplot or a line graph.
- If your response variable is categorical, use a scatterplot or a line graph.
- If your explanatory variable is categorical, use a bar graph.
The term “ explanatory variable ” is sometimes preferred over “ independent variable ” because, in real world contexts, independent variables are often influenced by other variables. This means they aren’t totally independent.
Multiple independent variables may also be correlated with each other, so “explanatory variables” is a more appropriate term.
The difference between explanatory and response variables is simple:
- An explanatory variable is the expected cause, and it explains the results.
- A response variable is the expected effect, and it responds to other variables.
In a controlled experiment , all extraneous variables are held constant so that they can’t influence the results. Controlled experiments require:
- A control group that receives a standard treatment, a fake treatment, or no treatment.
- Random assignment of participants to ensure the groups are equivalent.
Depending on your study topic, there are various other methods of controlling variables .
There are 4 main types of extraneous variables :
- Demand characteristics : environmental cues that encourage participants to conform to researchers’ expectations.
- Experimenter effects : unintentional actions by researchers that influence study outcomes.
- Situational variables : environmental variables that alter participants’ behaviors.
- Participant variables : any characteristic or aspect of a participant’s background that could affect study results.
An extraneous variable is any variable that you’re not investigating that can potentially affect the dependent variable of your research study.
A confounding variable is a type of extraneous variable that not only affects the dependent variable, but is also related to the independent variable.
In a factorial design, multiple independent variables are tested.
If you test two variables, each level of one independent variable is combined with each level of the other independent variable to create different conditions.
Within-subjects designs have many potential threats to internal validity , but they are also very statistically powerful .
- Only requires small samples
- Statistically powerful
- Removes the effects of individual differences on the outcomes
- Internal validity threats reduce the likelihood of establishing a direct relationship between variables
- Time-related effects, such as growth, can influence the outcomes
- Carryover effects mean that the specific order of different treatments affect the outcomes
While a between-subjects design has fewer threats to internal validity , it also requires more participants for high statistical power than a within-subjects design .
- Prevents carryover effects of learning and fatigue.
- Shorter study duration.
- Needs larger samples for high power.
- Uses more resources to recruit participants, administer sessions, cover costs, etc.
- Individual differences may be an alternative explanation for results.
Yes. Between-subjects and within-subjects designs can be combined in a single study when you have two or more independent variables (a factorial design). In a mixed factorial design, one variable is altered between subjects and another is altered within subjects.
In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.
In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.
The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.
Random assignment is used in experiments with a between-groups or independent measures design. In this research design, there’s usually a control group and one or more experimental groups. Random assignment helps ensure that the groups are comparable.
In general, you should always use random assignment in this type of experimental design when it is ethically possible and makes sense for your study topic.
To implement random assignment , assign a unique number to every member of your study’s sample .
Then, you can use a random number generator or a lottery method to randomly assign each number to a control or experimental group. You can also do so manually, by flipping a coin or rolling a dice to randomly assign participants to groups.
Random selection, or random sampling , is a way of selecting members of a population for your study’s sample.
In contrast, random assignment is a way of sorting the sample into control and experimental groups.
Random sampling enhances the external validity or generalizability of your results, while random assignment improves the internal validity of your study.
“Controlling for a variable” means measuring extraneous variables and accounting for them statistically to remove their effects on other variables.
Researchers often model control variable data along with independent and dependent variable data in regression analyses and ANCOVAs . That way, you can isolate the control variable’s effects from the relationship between the variables of interest.
Control variables help you establish a correlational or causal relationship between variables by enhancing internal validity .
If you don’t control relevant extraneous variables , they may influence the outcomes of your study, and you may not be able to demonstrate that your results are really an effect of your independent variable .
A control variable is any variable that’s held constant in a research study. It’s not a variable of interest in the study, but it’s controlled because it could influence the outcomes.
Including mediators and moderators in your research helps you go beyond studying a simple relationship between two variables for a fuller picture of the real world. They are important to consider when studying complex correlational or causal relationships.
Mediators are part of the causal pathway of an effect, and they tell you how or why an effect takes place. Moderators usually help you judge the external validity of your study by identifying the limitations of when the relationship between variables holds.
If something is a mediating variable :
- It’s caused by the independent variable .
- It influences the dependent variable
- When it’s taken into account, the statistical correlation between the independent and dependent variables is higher than when it isn’t considered.
A confounder is a third variable that affects variables of interest and makes them seem related when they are not. In contrast, a mediator is the mechanism of a relationship between two variables: it explains the process by which they are related.
A mediator variable explains the process through which two variables are related, while a moderator variable affects the strength and direction of that relationship.
There are three key steps in systematic sampling :
- Define and list your population , ensuring that it is not ordered in a cyclical or periodic order.
- Decide on your sample size and calculate your interval, k , by dividing your population by your target sample size.
- Choose every k th member of the population as your sample.
Systematic sampling is a probability sampling method where researchers select members of the population at a regular interval – for example, by selecting every 15th person on a list of the population. If the population is in a random order, this can imitate the benefits of simple random sampling .
Yes, you can create a stratified sample using multiple characteristics, but you must ensure that every participant in your study belongs to one and only one subgroup. In this case, you multiply the numbers of subgroups for each characteristic to get the total number of groups.
For example, if you were stratifying by location with three subgroups (urban, rural, or suburban) and marital status with five subgroups (single, divorced, widowed, married, or partnered), you would have 3 x 5 = 15 subgroups.
You should use stratified sampling when your sample can be divided into mutually exclusive and exhaustive subgroups that you believe will take on different mean values for the variable that you’re studying.
Using stratified sampling will allow you to obtain more precise (with lower variance ) statistical estimates of whatever you are trying to measure.
For example, say you want to investigate how income differs based on educational attainment, but you know that this relationship can vary based on race. Using stratified sampling, you can ensure you obtain a large enough sample from each racial group, allowing you to draw more precise conclusions.
In stratified sampling , researchers divide subjects into subgroups called strata based on characteristics that they share (e.g., race, gender, educational attainment).
Once divided, each subgroup is randomly sampled using another probability sampling method.
Cluster sampling is more time- and cost-efficient than other probability sampling methods , particularly when it comes to large samples spread across a wide geographical area.
However, it provides less statistical certainty than other methods, such as simple random sampling , because it is difficult to ensure that your clusters properly represent the population as a whole.
There are three types of cluster sampling : single-stage, double-stage and multi-stage clustering. In all three types, you first divide the population into clusters, then randomly select clusters for use in your sample.
- In single-stage sampling , you collect data from every unit within the selected clusters.
- In double-stage sampling , you select a random sample of units from within the clusters.
- In multi-stage sampling , you repeat the procedure of randomly sampling elements from within the clusters until you have reached a manageable sample.
Cluster sampling is a probability sampling method in which you divide a population into clusters, such as districts or schools, and then randomly select some of these clusters as your sample.
The clusters should ideally each be mini-representations of the population as a whole.
If properly implemented, simple random sampling is usually the best sampling method for ensuring both internal and external validity . However, it can sometimes be impractical and expensive to implement, depending on the size of the population to be studied,
If you have a list of every member of the population and the ability to reach whichever members are selected, you can use simple random sampling.
The American Community Survey is an example of simple random sampling . In order to collect detailed data on the population of the US, the Census Bureau officials randomly select 3.5 million households per year and use a variety of methods to convince them to fill out the survey.
Simple random sampling is a type of probability sampling in which the researcher randomly selects a subset of participants from a population . Each member of the population has an equal chance of being selected. Data is then collected from as large a percentage as possible of this random subset.
Quasi-experimental design is most useful in situations where it would be unethical or impractical to run a true experiment .
Quasi-experiments have lower internal validity than true experiments, but they often have higher external validity as they can use real-world interventions instead of artificial laboratory settings.
A quasi-experiment is a type of research design that attempts to establish a cause-and-effect relationship. The main difference with a true experiment is that the groups are not randomly assigned.
Blinding is important to reduce research bias (e.g., observer bias , demand characteristics ) and ensure a study’s internal validity .
If participants know whether they are in a control or treatment group , they may adjust their behavior in ways that affect the outcome that researchers are trying to measure. If the people administering the treatment are aware of group assignment, they may treat participants differently and thus directly or indirectly influence the final results.
- In a single-blind study , only the participants are blinded.
- In a double-blind study , both participants and experimenters are blinded.
- In a triple-blind study , the assignment is hidden not only from participants and experimenters, but also from the researchers analyzing the data.
Blinding means hiding who is assigned to the treatment group and who is assigned to the control group in an experiment .
A true experiment (a.k.a. a controlled experiment) always includes at least one control group that doesn’t receive the experimental treatment.
However, some experiments use a within-subjects design to test treatments without a control group. In these designs, you usually compare one group’s outcomes before and after a treatment (instead of comparing outcomes between different groups).
For strong internal validity , it’s usually best to include a control group if possible. Without a control group, it’s harder to be certain that the outcome was caused by the experimental treatment and not by other variables.
An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.
Individual Likert-type questions are generally considered ordinal data , because the items have clear rank order, but don’t have an even distribution.
Overall Likert scale scores are sometimes treated as interval data. These scores are considered to have directionality and even spacing between them.
The type of data determines what statistical tests you should use to analyze your data.
A Likert scale is a rating scale that quantitatively assesses opinions, attitudes, or behaviors. It is made up of 4 or more questions that measure a single attitude or trait when response scores are combined.
To use a Likert scale in a survey , you present participants with Likert-type questions or statements, and a continuum of items, usually with 5 or 7 possible responses, to capture their degree of agreement.
In scientific research, concepts are the abstract ideas or phenomena that are being studied (e.g., educational achievement). Variables are properties or characteristics of the concept (e.g., performance at school), while indicators are ways of measuring or quantifying variables (e.g., yearly grade reports).
The process of turning abstract concepts into measurable variables and indicators is called operationalization .
There are various approaches to qualitative data analysis , but they all share five steps in common:
- Prepare and organize your data.
- Review and explore your data.
- Develop a data coding system.
- Assign codes to the data.
- Identify recurring themes.
The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .
There are five common approaches to qualitative research :
- Grounded theory involves collecting data in order to develop new theories.
- Ethnography involves immersing yourself in a group or organization to understand its culture.
- Narrative research involves interpreting stories to understand how people make sense of their experiences and perceptions.
- Phenomenological research involves investigating phenomena through people’s lived experiences.
- Action research links theory and practice in several cycles to drive innovative changes.
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.
Operationalization means turning abstract conceptual ideas into measurable observations.
For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.
Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.
When conducting research, collecting original data has significant advantages:
- You can tailor data collection to your specific research aims (e.g. understanding the needs of your consumers or user testing your website)
- You can control and standardize the process for high reliability and validity (e.g. choosing appropriate measurements and sampling methods )
However, there are also some drawbacks: data collection can be time-consuming, labor-intensive and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.
Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.
There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control and randomization.
In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.
In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .
In statistical control , you include potential confounders as variables in your regression .
In randomization , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.
A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause , while the dependent variable is the supposed effect . A confounding variable is a third variable that influences both the independent and dependent variables.
Failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.
To ensure the internal validity of your research, you must consider the impact of confounding variables. If you fail to account for them, you might over- or underestimate the causal relationship between your independent and dependent variables , or even find a causal relationship where none exists.
Yes, but including more than one of either type requires multiple research questions .
For example, if you are interested in the effect of a diet on health, you can use multiple measures of health: blood sugar, blood pressure, weight, pulse, and many more. Each of these is its own dependent variable with its own research question.
You could also choose to look at the effect of exercise levels as well as diet, or even the additional effect of the two combined. Each of these is a separate independent variable .
To ensure the internal validity of an experiment , you should only change one independent variable at a time.
No. The value of a dependent variable depends on an independent variable, so a variable cannot be both independent and dependent at the same time. It must be either the cause or the effect, not both!
You want to find out how blood sugar levels are affected by drinking diet soda and regular soda, so you conduct an experiment .
- The type of soda – diet or regular – is the independent variable .
- The level of blood sugar that you measure is the dependent variable – it changes depending on the type of soda.
Determining cause and effect is one of the most important parts of scientific research. It’s essential to know which is the cause – the independent variable – and which is the effect – the dependent variable.
In non-probability sampling , the sample is selected based on non-random criteria, and not every member of the population has a chance of being included.
Common non-probability sampling methods include convenience sampling , voluntary response sampling, purposive sampling , snowball sampling, and quota sampling .
Probability sampling means that every member of the target population has a known chance of being included in the sample.
Probability sampling methods include simple random sampling , systematic sampling , stratified sampling , and cluster sampling .
Using careful research design and sampling procedures can help you avoid sampling bias . Oversampling can be used to correct undercoverage bias .
Some common types of sampling bias include self-selection bias , nonresponse bias , undercoverage bias , survivorship bias , pre-screening or advertising bias, and healthy user bias.
Sampling bias is a threat to external validity – it limits the generalizability of your findings to a broader group of people.
A sampling error is the difference between a population parameter and a sample statistic .
A statistic refers to measures about the sample , while a parameter refers to measures about the population .
Populations are used when a research question requires data from every member of the population. This is usually only feasible when the population is small and easily accessible.
Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.
There are seven threats to external validity : selection bias , history, experimenter effect, Hawthorne effect , testing effect, aptitude-treatment and situation effect.
The two types of external validity are population validity (whether you can generalize to other groups of people) and ecological validity (whether you can generalize to other situations and settings).
The external validity of a study is the extent to which you can generalize your findings to different groups of people, situations, and measures.
Cross-sectional studies cannot establish a cause-and-effect relationship or analyze behavior over a period of time. To investigate cause and effect, you need to do a longitudinal study or an experimental study .
Cross-sectional studies are less expensive and time-consuming than many other types of study. They can provide useful insights into a population’s characteristics and identify correlations for further research.
Sometimes only cross-sectional data is available for analysis; other times your research question may only require a cross-sectional study to answer it.
Longitudinal studies can last anywhere from weeks to decades, although they tend to be at least a year long.
The 1970 British Cohort Study , which has collected data on the lives of 17,000 Brits since their births in 1970, is one well-known example of a longitudinal study .
Longitudinal studies are better to establish the correct sequence of events, identify changes over time, and provide insight into cause-and-effect relationships, but they also tend to be more expensive and time-consuming than other types of studies.
Longitudinal studies and cross-sectional studies are two different types of research design . In a cross-sectional study you collect data from a population at a specific point in time; in a longitudinal study you repeatedly collect data from the same sample over an extended period of time.
There are eight threats to internal validity : history, maturation, instrumentation, testing, selection bias , regression to the mean, social interaction and attrition .
Internal validity is the extent to which you can be confident that a cause-and-effect relationship established in a study cannot be explained by other factors.
In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .
The research methods you use depend on the type of data you need to answer your research question .
- If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
- If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
- If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.
A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.
A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.
In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.
Discrete and continuous variables are two types of quantitative variables :
- Discrete variables represent counts (e.g. the number of objects in a collection).
- Continuous variables represent measurable amounts (e.g. water volume or weight).
Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).
Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).
You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .
You can think of independent and dependent variables in terms of cause and effect: an independent variable is the variable you think is the cause , while a dependent variable is the effect .
In an experiment, you manipulate the independent variable and measure the outcome in the dependent variable. For example, in an experiment about the effect of nutrients on crop growth:
- The independent variable is the amount of nutrients added to the crop field.
- The dependent variable is the biomass of the crops at harvest time.
Defining your variables, and deciding how you will manipulate and measure them, is an important part of experimental design .
Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:
- A testable hypothesis
- At least one independent variable that can be precisely manipulated
- At least one dependent variable that can be precisely measured
When designing the experiment, you decide:
- How you will manipulate the variable(s)
- How you will control for any potential confounding variables
- How many subjects or samples will be included in the study
- How subjects will be assigned to treatment levels
Experimental design is essential to the internal and external validity of your experiment.
I nternal validity is the degree of confidence that the causal relationship you are testing is not influenced by other factors or variables .
External validity is the extent to which your results can be generalized to other contexts.
The validity of your experiment depends on your experimental design .
Reliability and validity are both about how well a method measures something:
- Reliability refers to the consistency of a measure (whether the results can be reproduced under the same conditions).
- Validity refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure).
If you are doing experimental research, you also have to consider the internal and external validity of your experiment.
A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.
In statistics, sampling allows you to test a hypothesis about the characteristics of a population.
Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.
Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.
Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.
Methods are the specific tools and procedures you use to collect and analyze data (for example, experiments, surveys , and statistical tests ).
In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .
In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.
Ask our team
Want to contact us directly? No problem. We are always here for you.
- Email [email protected]
- Start live chat
- Call +1 (510) 822-8066
- WhatsApp +31 20 261 6040
Our team helps students graduate by offering:
- A world-class citation generator
- Plagiarism Checker software powered by Turnitin
- Innovative Citation Checker software
- Professional proofreading services
- Over 300 helpful articles about academic writing, citing sources, plagiarism, and more
Scribbr specializes in editing study-related documents . We proofread:
- PhD dissertations
- Research proposals
- Personal statements
- Admission essays
- Motivation letters
- Reflection papers
- Journal articles
- Capstone projects
The Scribbr Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .
The Scribbr Citation Generator is developed using the open-source Citation Style Language (CSL) project and Frank Bennett’s citeproc-js . It’s the same technology used by dozens of other popular citation tools, including Mendeley and Zotero.
You can find all the citation styles and locales used in the Scribbr Citation Generator in our publicly accessible repository on Github .
Visit CI Central | Visit Our Continuous Improvement Store
- [email protected]
Last updated by Jeff Hajek on December 22, 2020
An assignable cause is a type of variation in which a specific activity or event can be linked to inconsistency in a system. In effect, it is a special cause that has been identified.
As a refresher, common cause variation is the natural fluctuation within a system. It comes from the inherent randomness in the world. The impact of this form of variation can be predicted by statistical means. Special cause variation, on the other hand, falls outside of statistical expectations. They show up as outliers in the data .
Variation is the bane of continuous improvement . It decreases productivity and increases lead time . It makes it harder to manage processes.
While we can do something about common cause variation, typically there is far more bang for the buck by attacking special causes. Reducing common cause variation, for example, might require replacing a machine to eliminate a few seconds of variation in cutting time. A special cause variation on the same machine might be the result of weld spatter from a previous process. The irregularities in a surface might make a part fit into a fixture incorrectly and require some time-consuming rework. Common causes tend to be systemic and require large overhauls. Special causes tend to be more isolated to a single process step .
The first step in removing special causes is identifying them. In effect, you turn them into assignable causes. Once a source of variation is identified, it simply becomes a matter of devoting resources to resolve the problem.
One of the problems with continuous improvement is that the language can be murky at times. You may find that some people use special causes and assignable causes interchangeably. Special cause is a far more common term, though.
I prefer assignable cause, as it creates an important mental distinction. It implies that you…
Extended Content for this Section is available at academy.Velaction.com
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
The Federal Register
The daily journal of the united states government, request access.
Due to aggressive automated scraping of FederalRegister.gov and eCFR.gov, programmatic access to these sites is limited to access to our extensive developer APIs.
If you are human user receiving this message, we can add your IP address to a set of IPs that can access FederalRegister.gov & eCFR.gov; complete the CAPTCHA (bot test) below and click "Request Access". This process will be necessary for each IP address you wish to access the site from, requests are valid for approximately one quarter (three months) after which the process may need to be repeated.
An official website of the United States government.
If you want to request a wider IP range, first request access for your current IP, and then use the "Site Feedback" button found in the lower left-hand side to make the request.
- Share full article
A Cancer Patient’s Last Wish: To Pay Off the Medical Debt of Others
Casey McIntyre organized a campaign that has raised nearly $220,000 since she died a week ago. The funds are expected to pay off more than $20 million in debt, an official said.
By Lola Fadulu
A 38-year-old woman with ovarian cancer who started a campaign to help people pay off their medical debt has raised more than $200,000 in the week since she died.
Two days after the woman, Casey McIntyre, died on Nov. 12, a post appeared on her social media accounts , saying: “A note to my friends: if you’re reading this I have passed away.”
“The cause was stage four ovarian cancer,” the note read. “I loved each and every one of you with my whole heart and I promise you, I knew how deeply I was loved.”
Ms. McIntyre asked for donations to a campaign to pay off the medical debt of others. By Sunday morning, the campaign had raised nearly $220,000.
“Me and Casey’s family are stunned,” her husband of eight years , Andrew Gregory, said of the money that had been raised so far. “We’re overwhelmed, and it’s been really powerful to see the response to people wanting to eliminate strangers’ medical debt.”
Ms. McIntyre’s campaign is on a website called RIP Medical Debt, which uses data analytics to find households with medical debt that have income below four times the federal poverty level or have debts that make up 5 percent or more of their annual income.
The organization buys debt in bundles “at a steep discount,” which means each donation relieves “about 100 times its value in medical debt,” according to its website.
“In general, $1 donated does abolish $100 of medical debt,” said Daniel Lempert, the vice president of communications for RIP Medical Debt. On Saturday when the funds raised at that point were just under $200,000, he said, “As it stands, that’ll probably abolish somewhere in the neighborhood of $19 million.”
Mr. Lempert said Ms. McIntyre’s campaign was the first the organization had seen that was planned by someone to take place posthumously.
“As far as a fund-raiser, I don’t know if we’ve ever seen something kind of raise as much money as Casey’s campaign has as quickly as it has,” he said.
The organization has paid $10.4 billion of medical debt for more than seven million people, according to its website .
More than 20 million people in the United States have “significant” medical debt, owing at least $195 billion in total, according to a 2022 survey from the Peterson Center on Healthcare and KFF.
Around 16 million people owe at least $1,000 in medical debt, while around three million people owe more than $10,000, according to the survey.
Ms. McIntyre was inspired to start the campaign because she had felt lucky to have access to excellent medical care at Memorial Sloan Kettering Cancer Center, but she was “keenly aware that so many in our country don’t have access to good care,” she wrote on social media.
Ms. McIntyre, who grew up in Tenafly, N.J., learned she had ovarian cancer in 2019.
She had a long hospitalization at Memorial Sloan Kettering Cancer Center in May and almost died. That’s when her oncologist suggested she move to home hospice. She was not expected to live longer than a few weeks, but she lived for six more months.
“We ended up having all this extra time,” Mr. Gregory, 41, said in an interview on Saturday. “We had six months where we were able to travel, we went to the beach, we went to the river, we had karaoke parties in our house, and Casey had time to plan, and she was a planner.”
In addition to her social media posts, Ms. McIntyre, who started her career as a publicist and worked as a publisher at Razorbill, an imprint of Penguin Random House, wrote letters to her 18-month-old daughter, Grace, and planned her own memorial service.
Mr. Gregory, who has produced a few videos for The New York Times as a freelancer in his career, said that his wife’s health worsened greatly in the week before she died, and a medical professional told them how “sad” her situation was.
Ms. McIntyre was enraged, Mr. Gregory said, and later explained why.
“‘I don’t have a sad life, I have a happy life,’” Mr. Gregory recalled his wife saying. “‘I have you, I have Grace, I have my friends and my family, I live in the perfect apartment that we dreamed up together.’”
“To see that Casey’s good and happy life is continuing in this way is very beautiful to me,” he said.
Jack Begg contributed research.
Lola Fadulu is a general assignment reporter on the Metro desk of The Times. She was part of a team that was a finalist for a Pulitzer Prize in 2023 for coverage of New York City’s deadliest fire in decades. More about Lola Fadulu