Exploring Student Sensemaking When Engaging with Anomalous Data
Abstract
In undergraduate research settings, students are likely to encounter anomalous data, that is, data that do not meet their expectations. Most of the research that directly or indirectly captures the role of anomalous data in research settings uses post-hoc reflective interviews or surveys. These data collection approaches focus on recall of past reasoning, rather than analyzing reasoning about anomalous data as it happens. We use the frameworks of sensemaking and epistemological resources to explore in-the-moment how students identify, generate ideas about the cause of, and determine what to do with anomalies. Students participated in think-aloud interviews where they interacted with anomalous data within larger datasets. Interviews were qualitatively analyzed to identify epistemological resources students used when interacting with anomalous data, and how students’ reasoning influenced later choices with the data. Results found that students use a variety of resources as they sensemake about anomalous data and determine what to do with the anomalies. Furthermore, the explanation that students generate about the cause of an anomaly impacts whether the student chooses to keep, remove, recollect, or mitigate the anomalous data. Findings highlight the need to understand students’ complex reasoning around anomalous data to support students in lab settings.
INTRODUCTION
Inquiry-based undergraduate lab courses can provide students with opportunities to grapple with experimental data. In this setting, students have expectations about the data they collect based on the course structure, their research questions, and hypotheses, among other factors. In this setting, students often encounter anomalous data, which we define as data that do not meet the students’ expectations. Sometimes when this occurs, students begin to wonder about the cause. Perhaps the anomaly was rooted in the experimental procedures or maybe reflects something related to the underlying conceptual phenomenon. This wondering can generate an explanation of the anomaly's cause. Built into this explanation might be other considerations, such as the experimental goals, students’ beliefs about doing science, and course requirements, all of which influence how students handle anomalous data. Furthermore, throughout this process of working with anomalous data, the students may also use a variety of knowledge resources related to their conceptual understanding of the biological phenomenon, their understanding of the experiment, and other resources related to personal or course learning goals.
The goal of this article is to explore the nuances of students’ interactions with anomalous data, specifically with anomalies that have underlying technical causes. We are interested in the process of identifying anomalous data, reasoning about what caused the anomaly, and how students determine what to do with the anomalous data. We used the frameworks of sensemaking (Odden and Russ, 2019) and epistemological resources (Hammer and Elby, 2002) to explore how students reasoned with anomalous data in think-aloud interviews that simulated the inquiry-lab environment. These complementary frameworks provide insight into what experiences and prior knowledge students bring to their sensemaking, which is then beneficial to understanding what choices students make when encountering anomalies with underlying technical causes.
STUDENT REASONING WITH ANOMALOUS DATA
The field of Biology Education Research (BER) has been prioritizing engaging undergraduate students in research experiences that emphasize scientific practices (AAAS, 2011). Importantly, in these research experiences, such as Course-based Undergraduate Research Experiences (CUREs) and inquiry-based labs, students have significant opportunities to interact with authentic data that they generate themselves (Auchincloss et al., 2014; Kjelvik & Schultheis, 2019). Students are then more likely to encounter anomalous data in these research experiences compared with following meticulously designed lab protocols. Anomalous data can be encountered for a variety of reasons, such as inconsistencies between the student's knowledge and the phenomenon they are investigating, along with various technical causes such as equipment malfunction or measurement errors. When this occurs, students may reason about the anomalous data and then figure out the next steps for their analysis or experiment. In so doing, students may engage in a variety of reasoning processes about their data collection process, among other dimensions of the labs. Here we presume that these reasoning processes might be commonly occurring in undergraduate labs, but as will be discussed, the reasoning processes surrounding anomalous data are not often emphasized in either instruction or the BER literature.
To investigate how students interact with anomalous data, we draw from the vast literature on CUREs, which has emphasized students working with data broadly in the context of other scientific practices but has paid less attention to anomalous data. A body of work has focused on evaluating whether students learn various scientific skills in these settings (e.g., Brownell et al., 2015; Hester et al., 2018; Killpack et al., 2020; Callahan et al., 2022). One relevant line of work has evaluated the impact of a particular CURE on students’ scientific thinking and data analysis. Within this work, the tendency has been to focus on assessing students’ abilities to analyze data at the end of courses (Brownell et al., 2015). Another line of work has implemented interviews with students who have engaged with data in CURE settings (Gin et al., 2018), but anomalous data were not the focus of the analysis, even though it may have been embedded in the content of students’ interviews. A different line of work highlights students’ binary judgment of data but overlooks more nuanced interpretations of data (Bolger et al., 2021). Across these studies, data are valued in CUREs, and sometimes students are interacting with anomalous data, but the details of how students engage with anomalies are not the emphasis of the research. Furthermore, across this literature, the research tends to focus on students’ post-hoc recollections of interacting with data, not their in-the-moment reasoning process. Students’ recollections could be limited by not remembering everything that occurred, as they might skip or filter potentially crucial ideas for any number of reasons.
In the broader Science Education literature, there are other instances of empirical studies capturing student reasoning about anomalous data that also often use students’ recollection rather than more authentic in-the-moment reasoning processes. An older article explored the different kinds of responses students had to anomalous data (Chinn and Brewer, 1998), but this research study presented the anomalous data in vignettes involving theories about dinosaurs, rather than numerical data. Extending Chinn and Brewer's work, Lin (2007) examined students handling anomalous data in written chemistry lab reports, but they overlooked the authentic in-the-moment reasoning with anomalous data behind the reports. Using a different approach, Johansen and Christensen (2020) explicitly asked students to recall what they do with anomalous data in labs via a survey, and they documented that students recall discarding anomalous data or redoing experiments to achieve “correct” data. Closer to our work, Dunbar (1993) look at students’ reasoning in the moment and found that students’ goals to confirm their hypothesis or explain conflicting data impacted how they incorporated conflicting data. Crujeiras-Pérez and Jiménez-Aleixandre (2019) explored high school students’ in-the-moment reasoning while enacting experiments and documented a modest improvement over time in students noticing anomalies. Across this body of research, there is strong evidence that students encounter anomalous data across science courses. Still, there is less information about the details of students’ reasoning about anomalous data in-the-moment.
Here we aim to explore how students reason about anomalous data in-the-moment, as it can shed insights into the details embedded within these reasoning processes and potentially might be more realistic than a student's recollections. Furthermore, by unpacking students’ reasoning in-the-moment, we might gain insights that can support instructional scaffolds down the road that are tailored to fit how they reason about anomalous data. We draw from two theoretical frameworks in Physics Education Research (PER) that have successfully been used to capture how students engage in reasoning and problem-solving in a variety of educational settings, including lab courses. The frameworks of sensemaking (Odden and Russ, 2019) and epistemological resources (Hammer and Elby, 2002) can be used to highlight multiple steps within the reasoning process, and what knowledge resources students employ during that process. These frameworks allow us to capture how students’ knowledge resources interact in the process of identifying the anomalous data, generating ideas about what caused the anomaly, and how students determine what to do about the anomaly. Thus, our research questions are: what resources do students use throughout this process, and how do they use those resources?
THEORETICAL FRAMEWORKS OF SENSEMAKING AND EPISTEMOLOGICAL RESOURCES
The framework of sensemaking explores how students “figure something out” by reconciling conflicting knowledge or gaps in their prior knowledge to generate a new understanding of a phenomenon (Odden and Russ, 2019). This framework, as synthesized by Odden and Russ, was initially used to explore physics students’ reasoning while problem-solving (Odden and Russ, 2018), but has also been applied to undergraduate physics labs for life science majors (May et al., 2022b) and chemistry education (Hunter et al., 2021). There are several different perspectives for how one can conceptualize sensemaking. Sensemaking can be treated as a frame a student takes, that is, how they situate themselves and expected behavior in the learning context (e.g., Hutchison and Hammer, 2010). Some scholarship on sensemaking focuses on the cognitive processes that occur when students are sensemaking (e.g., Sherin et al., 2012). Other research explores how discourse practices, the ways people interact and communicate with each other, shape the sensemaking process (e.g., Berland and Reiser, 2011). While all these perspectives are valuable, we are mostly focusing on the cognition that is occurring when students are sensemaking about a technical anomaly, inspired by students encountering technical anomalies in our prior work (May et al., 2020). In the context of anomalous data, sensemaking begins with the identification of data as anomalous, often because there is a mismatch between what the student expects to see in their results and the actual data. If the student continues in sensemaking, they then try to figure something out about the cause of the anomalous data. The student may draw on relevant prior knowledge and experiences throughout the entire process to both identify data as anomalous and generate an explanation of what caused the anomalous data. To conceptualize the role of knowledge in this reasoning process, we switch to our complementary framework of epistemological resources.
Throughout the entire reasoning process, students’ prior knowledge, experiences, and ways of gathering information from the world impact their identification of the anomalous data, generating ideas about what caused it, and what they determine to do about it. To capture their knowledge resources, we draw from the complementary framework of resource theory, specifically epistemological resources (Hammer and Elby, 2002). Within the resource theory framework, there are different kinds of resources. Conceptual resources are pieces of knowledge about concepts. In Biology education, a body of scholarship has identified conceptual resources about energy transfer (Bhatia et al., 2022), fluid flow (Slominski et al., 2023), reflex circuits (Lira and Gardner, 2020), and other biological phenomena (Southerland et al., 2001; Gouvea and Simon, 2018). A smaller body of research focuses on procedural resources, often in the context of mathematical activities, as these resources often direct moves a student makes during problem solving (Caballero et al., 2015). Epistemological resources include both beliefs about knowledge, such as the belief of knowledge as propagated stuff passed onto to individuals, as well as a person's understanding of how to gather and evaluate knowledge, such as the idea that one can accumulate more information or check information to verify (Hammer and Elby, 2002). Here we focus on epistemological resources, henceforth referred to as resources. In the context of this study, resources should be impacting how students interact with anomalous data, particularly how they go about evaluating it, and how they justify their subsequent choices with it.
Previously, scholarship in PER has used resources to explore how students gather and evaluate knowledge when working with physics problem sets, such as employing algebraic or analogical reasoning and appealing to authority for assistance (Richards et al., 2020). In chemistry education, some work has explored students’ resources related to characteristics of models (Lazenby et al., 2019). Comparably, minimal work has applied resources to lab settings where students are reasoning with data. Likely, students are using many kinds of knowledge resources in the lab setting. Here, we focus specifically on epistemological resources to explore what drives students’ actions and choices while sensemaking about technical anomalies. For example, we might expect to see students use resources to direct their sensemaking, such as choosing to visually represent their data or consider the context of the experiment while they interact with anomalous data. Within this framework, we assume students can access various resources as they identify the anomalous data, generate ideas about what caused the anomaly, and eventually make choices about what to do about the anomaly, such as keep, discard, recollect, or mitigate it in some way.
These two frameworks shed complementary insights into how students interact with anomalous data in the following three-step process. The sensemaking process begins with identifying data that is anomalous. Sometimes, identifying anomalous data is explicit as the individual immediately moves to the next step, and sometimes it is implicit, quick, or vague, where the individual may notice something odd or unexpected in the data, but not go on to the next step immediately. The next step involves idea generation to develop explanations about what caused the anomalous data. Many actions may happen in this step, including organizing or reorganizing data to test ideas, drawing on previous lab experiences, trying different analyses, and evaluating procedures or data sources. The final step involves using the previously generated explanation about the cause to then make choices about keeping, removing, or otherwise mitigating the anomalous data. This process can be iterative, as an individual may engage in additional sensemaking after evaluating the effects of keeping, removing, or mitigating anomalous data, or collecting new data. We posit that throughout this process individuals use a variety of resources to gather and evaluate relevant information. Through this process, one might identify any number of different types of anomalies, such as ones rooted in gaps in conceptual understanding, human error, or technical issues related to equipment or software. But, as will be described further in the Materials and Methods section, we focus primarily on anomalies from technical issues.
In an undergraduate biology lab, this could play out in the following way. After collecting some data, the student uses the resource visually represent data by plotting data to identify initial patterns. While looking at the data, they notice an anomaly. It could be data that deviate from the trendline, or it could be a sudden jump in the graph, or something else unexpected. The next step involves determining what caused the anomaly. This may involve considering experimental context, such as reasoning through the physical phenomenon, potential equipment malfunction, and possible human error. Then the student uses the explanation generated about the cause to determine what to do with the anomaly. This may involve additional resources to justify what choice to make, such as the ideas that quantity of data matters and relative impact of anomaly matters. Using these resources, along with the newly generated explanation, the student may then choose to keep, remove, or mitigate the anomalous data, or collect new data. Based on the results of that choice, this process might begin again. Finally, aspects of this process may be explicit, based on vocalized discussions, or might be more subtle and only revealed through the student's actions with data. In this article, we aim to capture this reasoning process at a detailed level through interviews to answer our questions about what resources students use throughout this process, and how they use those resources.
MATERIALS AND METHODS
Participants and Context
Students were recruited from an Introductory Physics for Life Sciences (IPLS) first-semester lab course at a large research institution in the Western U.S. as part of a larger project exploring how students engage in data analysis in this class setting. The IPLS lab is a reformed, algebra-based physics lab that engages students with physics concepts in biological contexts, such as the kinematics of vesicular transport and fluid flow in capillaries across three to four lab modules in each course of a two-semester sequence. These courses were adapted from the NEXUS project (Moore et al., 2014; Redish et al., 2014) to emphasize the inquiry-based nature of the lab (May et al., 2022a). For each lab module, students work in groups of three to four where they generate a research question and data collection plans surrounding the topic of the module, collect and analyze data according to their research questions, and present their findings to their peers. For example, in the module about Brownian motion of particles in fluid, students sometimes generate research questions about the impacts of particle size or fluid viscosity, and then might collect and analyze data on particle movement in different conditions. Often, the students focus on Brownian motion as a model of drug diffusion in cells. Thus, based on their lab experiences, the students in this study were familiar with the open-ended nature of working with data in an inquiry setting without a traditional lab manual or protocols. About 85% of the students who take this course are life science majors or other majors in prehealth tracks, and often in their third or fourth year of study (University of Utah, 2022). All students enrolled in the course were invited to participate in the study.
Recruitment occurred first in class as the first author, who was also the interviewer, visited their classroom. He gave a short overview of overarching research goal of understanding how students work with data and provided fliers with contact information. He explained that participating in this research would not impact their grades or standing in the course. He also mentioned that as part of the interviews, students would be encouraged to bring lab data if they so choose. For students who were absent or in lab sections the author could not visit, the first author made a short video announcement that was posted in the course on the school's learning management system. Then, when implementing the interviews, the interviewer began by identifying himself as a Ph.D. student in the Department of Educational Psychology and having an interest in improving lab experiences for students. He explained again that the purpose of these interviews was to learn about how undergrads in these labs think about analyzing data with a goal of improving the labs.
Six students consented to participate in this study. Most students who participated identified as female and white, were majoring in biology and related fields, and had completed at least one general chemistry lab course prior to taking the IPLS lab course (Table 1). Most students did not have previous research experience, but two students had some research experience via coursework. Students were encouraged to participate in multiple interview sessions.
Race | First-generation college student | ||
---|---|---|---|
Asian and White Non-Hispanic | 1 | Yes | 2 |
White Hispanic | 1 | No | 4 |
White Non-Hispanic | 4 | Major | |
Self-Identified Gender | Biology | 4 | |
Female | 5 | Communications | 1 |
Male | 1 | Kinesiology | 1 |
Previous Research Experience | Minor | ||
Yes (via coursework only) | 2 | Chemistry | 3 |
No | 4 | Nutrition | 1 |
Previous Lab Courses | |||
General Chemistry 1 and/or 2 | 6 | Introductory Biology 1 and/or 2 | 2 |
Organic Chemistry | 2 | Other Biology | 3 |
Interview Method and Protocol
Our study aimed to elicit student reasoning as they interacted with anomalous data by having students verbalize their thinking aloud as they worked. Our design supported this in two main ways. First, we employed multiple 1-hour interviews with each student to allow students time to become comfortable explaining their thinking aloud to the interviewer. Given the limited number of students that participated in the interviews, having multiple interviews per student also helped ensure we had enough data to reach saturation in our analysis (e.g., codes are seen multiple times across students/across interviews with a student; Miles et al., 2019). Second, we encouraged, but did not require, students to bring their own data from the lab to the interviews if they wanted help working with it to have more meaningful data to the student, and potentially bring anomalies they encountered and thus wanted help with understanding or fixing. However, most students did not bring their lab data to interviews and instead interacted with designed interview prompts that presented students with a data source to explore that was similar to what they encountered in their lab. The first semester IPLS lab used video recordings of objects in motion that students collected in lab and could then generate position data from them to analyze, and so we designed our prompts similarly but provided the video. This ensured that students had multiple opportunities to interact with data that would be similar and relevant to what they had seen in the lab. Across the six participants, we collected 16 hour of interviews. Four students attended at least three interviews each, interacting with at least two interview prompts, each of which included a different data source. The first author served as the interviewer for all data collection.
We employed a teaching interview method, also sometimes called a tutoring interview method, where the interviewer may provide some instruction to support the intended sensemaking in the short interview time. This interview method was chosen because it limits the time a student might spend struggling with technical tasks, like searching for equations or manipulating equations in spreadsheets. The teaching interview method has been used to study student reasoning in mathematics/statistics (e.g., Hershkowitz et al., 2001; Wagner, 2006; Kapon et al., 2015), physics (e.g., Kapon & diSessa, 2012), and chemistry (e.g., Karch & Sevian, 2022). In the interviews, assistance was primarily technical as students worked with ImageJ (Rasband, 2021), the software that they also use in the IPLS lab to gather positional data of objects from videos. The interviewer also provided help as students worked with the resulting spreadsheets of data to plot values and perform calculations to assess their data. Thus, assistance generally involved support for modifying settings in ImageJ and using graphing functions in Excel to generate specific outputs that students desired. This assistance then prioritized time in the interview for discussion around the anomalies.
Broadly, all interviews regardless of the data source were introduced to the students as think-aloud style and students were directed to “think out loud” as they interacted with software and data. If the student brought their data from the IPLS lab, the interviewer asked the student for context about how the data were collected and would ask about any confusion or questions they had about their data to help prompt conversation. The session would then be spent exploring the student's lab data, where the interviewer would prompt them to explain their reasoning or justifications for the actions they took with the data as they worked. If the student did not bring data, then the interviewer presented an interview prompt that would provide students with a data source of objects moving in a video, which students could then collect data and analyze using ImageJ. Students were directed to interact with the prompt data like they would interact with data in the lab. Similarly, the interviewer would prompt the student to explain their reasoning as they worked.
For the interview prompts, we designed several data sources for students to interact with, all of which used videos of objects moving similar to the lab modules in the first-semester IPLS lab. The key anomalies in all the prompts revolved around technical issues related to the video or the ImageJ positional data output. We chose to focus on technical-based anomalies for this project, as it was inspired by our previous observations of students encountering technical issues while working in the IPLS lab (May et al., 2020; May et al., 2022b). Further, one prompt known as “Glitch in the Frame” was inspired by an old lab module that had been removed during course modifications due to COVID-19 where we noticed students encountered a technical issue in the video and occasionally had discussions about it in their lab groups. The interviewer introduced the prompt with the context that it used to be the first module in the lab. The students were provided with a video of five zebrafish swimming in a tank (Figure 1). The video contained a glitching of frames that made the fish appear to jump backward and then dart forward as the frames played out of order (e.g., instead of playing sequentially as frame 1, 2, 3, 4…, frames occasionally played as 1, 3, 2, 4… at various points throughout the video). For the sake of time, the interviewer had previously manually tracked all five fish through the video clip if the students wanted to use that data, but students could do their own manual tracking if they wished. After introducing the prompt and its context, students were free to explore the available video clip and data in a spreadsheet, both of which were pulled up on the computer and ready to access. Students generally gravitated toward the video first before looking at position data. The interviewer let the discussion unfold as students described what they were noticing in the video to wait and see whether the student noticed and began discussing the glitch. If the student did not notice the glitch, then the interviewer would prompt the student to view the video frame-by-frame to cue noticing the glitch. The interviewer aimed to have students explain their thought processes throughout the entire interview by utilizing follow-up questions, with particular attention to when students identified anomalous data, generated ideas about the causes of anomalies, and considered options for what to do with the anomalous data.
The other two interview prompts also involved the use of ImageJ, included technical-based anomalies, and aligned with other biology lab topics. The second interview prompt was a video of sperm cells swimming on a microscope slide (Barlow, 2018) and utilized an autotracking plugin for ImageJ to generate position data for the approximately 200 visible cells. Some data generated from this video had errors due to limitations of the autotracking when cells swam over or under each other, or when cells entered or left the frame during the video. The third interview prompt mimicked student data from the Brownian motion lab, where students tracked 5 µm silica microspheres in water using a microscope equipped with a camera and explored Brownian motion using mean squared displacement. The silica microspheres served as a model for particle diffusion. Embedded within the third scenario were constraints on data quality from the video requiring cropping to produce useable data, which then imposed similar issues as the second prompt with spheres entering and leaving the video frame, and a calculation error.
We video recorded the interviews in two ways: recording student's computer screen throughout the process as they interacted with the data using Snagit (TechSmith, 2022), and recording the student and the interviewer via a video camera to capture their gestures, expressions, and body positions. The interviews were then autotranscribed with otter.ai (Otter.ai, 2021) to provide a rough draft transcript. The first author then refined the transcripts, correcting any initial transcription errors, and inserting indicators of students’ actions, their tone, and clarifying the meaning of ambiguous words (here, this, it, etc.) where possible.
Data Analysis
We used qualitative analysis and took a fine-grained analysis approach to explore students’ reasoning at the level of phrases to identify resources students were using (Parnafes and diSessa, 2013). This is a common approach for a family of methods used in identifying knowledge resources, called knowledge analysis (diSessa et al., 2015; Barth-Cohen et al., 2023). To ensure we obtained enough interview data for analysis, we implemented multiple interviews with students, as many as the student wanted to participate, up to four interviews. This served two purposes. First, resources are context dependent, so having multiple interviews with different data contexts provided an opportunity to see shifts in an individual's resource use (Barth-Cohen et al., 2023). Second, it can take time for students to feel comfortable voicing their thinking, so multiple interviews provided extended time to develop rapport with the participant. Thus, with multiple interviews, we can better reach saturation in our analysis by providing enough time for students to use multiple resources and use the same resource multiple times, either within a single student's collective interview sessions, or across all six students we interviewed (Miles et al., 2019).
Analysis centered around identifying knowledge resources (Barth-Cohen et al., 2023). The first step of analysis involved reviewing the interview recordings to flag instances when students were engaging in sensemaking. Guided by summary field notes written after each interview, the first author looked for when students giving explanations or justifications in dialog, paying particular attention to when they were engaged with anomalous data. The second step involved writing descriptive memos of each interview session. These memos provided a detailed summary of what occurred in each session, particularly about the students’ sensemaking and working with anomalous data (Emerson et al., 1995). The third step involved using Atlas.TI (Atlas.TI, 2022) to develop the codebook, focusing on identifying and describing students’ statements when they were interacting with or interpreting anomalous data. In the tradition of knowledge analysis, the first author examined when students justified an action they took when working with the data, when they expressed confusion about anomalous data, and when they generated explanations about the data. In particular, the first author focused on when students were using language comfortable to them rather than technical terms, as resources often surface in these moments (Barth-Cohen et al., 2023). The coding process was iterative, with several rounds of developing codes, refining the codes or definitions, and recoding the flagged sensemaking episodes (Saldaña, 2009). Most of the codes emerged from the data, though existing literature on scientists reasoning with anomalous data inspired a few code labels. Specifically, the concepts of using theory and domain knowledge to explain an anomaly, manipulating data to look at the data differently (e.g., working with different subsets of data, removing outliers then evaluating effects, etc.) (Trickett and Trafton, 2007) provided actions to look for when students were sensemaking. The final refinement of the codebook produced distinct resources that captured the students’ beliefs about data and knowledge gathering/evaluation that appeared in their explanations and justifications. See Figure 2 for an illustration of the coding process. To ensure the validity of the developed resource codes, we conducted interrater reliability (Saldaña, 2009). A colleague coded a 20% subset of the transcripts which contained all the resource codes. After discussion and further refinement of resources and definitions, a Cohen's Kappa of 0.77 was achieved, indicating substantial agreement between the first author and the colleague (Cohen, 1960).
Case Selection.
In addition to detailing the resources that students used, we selected an instrumental case that is representative of students’ sensemaking and use of resources to highlight the interaction between students’ sensemaking about an anomaly and their choice of what to do with it (Creswell and Poth, 2018). We present the case in a narrative format with key selection of dialog as the sensemaking occurred over several minutes. This instrumental case highlights how one student, Michelle, used resources as she interacted with anomalous data in the first interview prompt, “Glitch in the Frame.”
Transcription Norms.
In the transcripts and quotations presented, we use punctuation to indicate the length of pause in speech, where commas indicate short pauses, periods indicate longer pauses often at the end of a statement, and ellipses (…) indicate long pauses over a second, often as the student was thinking. Indicators of tone (e.g., laughing) and actions the student does on the computer (e.g., scrolls down spreadsheet) are included in brackets as they happened with respect to dialog. For the case, we explicitly highlight words and phrases that are key to identifying what resource the student is employing. Relevant phrases in the transcript will be underlined and the identified resource will follow the line of dialog and be italicized in {curly brackets}.
RESULTS
To begin, we provide a summary of the resources students used when identifying anomalous data, when generating ideas about the cause, and when determining what to do about the anomaly. Throughout each of these subsections, we explain how these resources were used with examples across interviews and participants. Then, we present an instrumental case that illustrates how a student used resources as she interacted with the “Glitch in the Frame” prompt during the interview to capture how this process unfolds. We conclude the results by situating the case within the larger body of data and highlighting patterns in resource use that spanned students and interview prompts.
Resources Used to Identify Anomalous Data
Most students began interpretation by plotting the data in the spreadsheet. The resource visually represent data was used when students plotted numerical data or made other visual representations, such as graphs. Students often mentioned goals for plotting data though the specificity in the goal varied. For example, several students explained their action to visualize data because of their preference for working with data, such as Julia who wanted to “see” the data (e.g., Julia: “I like plotting it cuz I can visually see it.”). Other students used this resource with the specific intent of evaluating data for patterns and anomalies, such as Joshua, who used conditional formatting in the spreadsheet to color-code a series of averages to see their variance (e.g., Joshua: “my thought is obviously to basically, find a general range of these averages, and almost conditionally format them, to just kind of see whether there's outliers, and see whether those outliers are caused by ImageJ having a hard time tracking, or whether it's like other things going on that are natural.”). While students had different reasons for using the resource to visually represent data, the resource appeared to be a tool students used to understand their data, whether they were specifically seeking outliers or not. Thus, the plots, graphs, and other visualizations generated by the students supported their identification of anomalous data by visual comparison of the anomalous data to the larger dataset.
Resources Used When Generating Ideas about the Cause
Students also used visual methods of evaluating the data to engage in sensemaking by coordinating the video data with the numerical position data. The resource cross-reference with multiple data sources was used when students noticed an anomaly in the numerical data and wanted to see what was happening in the video to generate that anomaly, or whether they noticed something in the video and wanted to see whether it also appeared in the numerical data. As such, this resource was often identified when students alternated between data sources while generating ideas. With the resource cross-reference with multiple data sources, there is an assumption that an anomaly identified by one means of representing the data (e.g., graphically) should be present across other representational forms, such as the video. For example, Michelle noticed a somewhat consistent pattern of 0 velocity for two objects around every 5 frames. She then used the video to observe the objects’ movement at these points in time, searching visually for this movement pattern (Michelle: “it's gotta be on here [the video]”). Using the video, Michelle then formed an explanation of the pattern in the data by observing that the two objects briefly touched, which caused ImageJ to move one object's label to the other object, (Michelle: “Oh, I think it just switched dots that it was tracking there”). This then generated duplicate position data based on a sperm cell that occasionally had minimal movement. Cross-reference with multiple data sources is related to visually represent data as they are both visual ways of understanding the data. However, cross-reference with multiple data sources was used in the process of generating an explanation after an anomaly had been identified, whereas visually represent data were mostly used to initially interpret data.
Students used the resource consider experimental context to explore plausible causes of anomalous data given details related to the biological phenomenon of study, organisms present, technical issues with equipment, or human error. Sometimes this involved considering details related to the phenomenon or organism in the video (Kristen: “I don't really know if sperm cells differ in size that much”). Students also used this resource when considering possible technical or human errors during the experiment (Joshua: “something with the like, image sequence had gone wrong”). All students interviewed used this resource, often several times, indicating a strong idea that the context of an experiment, whether conceptual or technical, could provide explanations for an anomaly's cause. Importantly, the explanations generated when students employed the consider experimental context resource directly influenced students’ next steps when considering mitigating, removing, or including the anomalous data.
These previously discussed resources were used frequently across interview prompts (Table 2). The resources visually represent data and cross-reference with multiple data sources enabled students to analyze different forms and representations of the data, which supported identification of anomalies and generating causes about them. The resource consider experimental context directly facilitated students’ explanation generation about the cause of an anomaly. These resources directly impacted students’ sensemaking in different ways that ultimately supported students’ explanations about the cause of anomalous data.
Resource | Definition | Frequency of code |
---|---|---|
Visually represent data | Plotting or otherwise visually organizing data can help with interpretation of the data. | 6 |
Cross-reference with multiple data sources | When multiple data sources are available, one can use a data source to confirm/diagnose issues in another, with the assumption that multiple data sources should align logically. | 11 |
Consider experimental context | One should consider features of the experimental environment (i.e., equipment) or relevant phenomenon (i.e., conceptual information) to determine the erroneous status of a data point. | 19 |
Resources Used When Determining What to do with an Anomaly
Finally, students used particular resources when determining what to do with an anomaly. These resources facilitated students’ consideration of the details of their experiment or their goals as they were reasoning about what to do with the anomaly. During this reasoning, students used prior experiences of working with data in other settings and other judgments about data to determine whether they wanted to keep or remove the anomalous data. There are few examples of students using these resources in the existing literature, but a small amount of work has highlighted that expert scientists engage in these considerations about their data (Roth, 2013). Here we present five examples of these resources: quantity of data matters, relative impact of anomaly matters, efficiency of action matters, select data to avoid anomalies, and select data to be representative (Table 3). When using these resources, the student's explanation of an anomaly interacts with what they determine to do with the anomaly.
Resource | Definition | Frequency of code |
---|---|---|
Quantity of data matters | If there is “enough” data, one may justify deleting anomalous data. Conversely, one may try to salvage data or decide to include anomalies to have enough data to work with. | 6 |
Efficiency of action matters | Time and energy (and sometimes mathematics skills/equipment) limit what tasks can be reasonably completed. One may select less data, or choose to not do activities that are time/energy intensive or not directly related assignment requirements. | 10 |
Relative impact of anomaly matters | If the anomaly's impact is small (e.g., little difference in calculated values) then it may not require removal. This consideration may prompt further investigation into the anomaly to determine if it is acceptable variation or an error. | 12 |
Select data to avoid anomalies | One can select “good data” that matches one's hypothesis/assumptions to avoid issues later in analysis or remove data determined to be erroneous. | 17 |
Select data to be representative | One may include “acceptable” variation to reflect the dataset, sometimes acknowledging that removing extremes may bias the data. | 6 |
Several students considered how much data were available before determining what to do with the anomalous data that they identified. When students used the resource quantity of data matters, they sometimes referred to prior experiences in the lab class to justify or explain their reasoning (Alice: “usually when we are in lab and we have an error with our data, we don't throw out our data, but we explain like the errors in the data, that's usually just because we don't have a huge amount of data to work with”). In the interview prompts, students had access to large datasets and often used this resource to justify removing anomalous data deemed erroneous because there was “enough” data to do so.
Students also made judgments about the anomalous data's effect on their calculations. Students used the resource relative impact of anomaly matters when they considered how much or little variance the anomalous data added. Students did not establish a threshold value but rather made intuitive judgments about impact, which occasionally caused some uncertainty about what action to take regarding the anomalous data (Joshua: “considering it's not even a centimeter. It's kind of like, well, it's probably fine, to leave it in and work with it. But on the other hand, it is like, clearly like, especially with the difference of pixels so. yeah…I don't know on that one.”). Students’ reasoning using this resource supported both including and removing anomalous data based on its impact.
Time and effort were also factors that students considered when determining what to do with anomalous data. Some of the students’ use of the resource efficiency of action matters was impacted by the hour-long interview sessions (Julia: “If I had a ton of time, I would probably try to include all of them [sperm cells] that didn't have the overlap problem.”). Students also referred to experiences in the lab class or used the lab setting as a context to justify if an action was worth doing (Kristen: “I think averaging would do better like, hm, make for a more precise number, more accurate. But I think like, personally for me, if I was going through lab, and wanted to save time [laugh], I'd just pick one and hope it's constant for the rest of the trial.”). Generally, this resource was used to justify quicker or easier actions regarding anomalous data (e.g., deleting or including vs. correcting or recollecting data).
The resources select data to avoid anomalies and select data to be representative usually appeared in conjunction with the resources discussed above. Students’ explanations about the cause of anomalies had an important role in students choosing to exclude or include anomalous data. Select data to avoid anomalies was most commonly used by students across data sources when students had generated a technical error explanation, as students either deleted anomalous data or selected subsets of data to use in calculations. However, a few students employed select data to be representative and chose to include anomalous data when they formed explanations that the anomaly may be an accurate feature of the data. For example, Julia pointed out the possible risk of removing data because “you don't know if, the extreme behaviors are more typical, it could cause bias [to remove them].”
Similar to our selection of exemplar resources used when identifying and generating ideas about what caused anomalous data, we chose these example resources as they captured the epistemological resources students most frequently used when determining what to do with anomalous data. The resources quantity of data matters, relative impact of anomaly matters, and efficiency of action matters supported students’ justification of keeping or excluding the anomalous data through the consideration of various aspects of experimentation after data collection. These justifications were also directly impacted by the students’ sensemaking about the anomaly. Explanations that attributed the cause of anomalous data to errors in data collection generally justified removal, whereas explanations that considered anomalous data as valuable data generally justified inclusion. However, there was some nuance as students also justified keeping error-based anomalous data if it was not too impactful to the results. Identifying these resources highlights the nuance of students’ actions regarding anomalous data.
Case Study
The following case highlights how one student, Michelle, identified an anomaly, generated an explanation about the cause of the anomalous data, and her choice to include the anomalous data. Michelle was working with the interview prompt “Glitch in the Frame.” When Michelle identified that the fish appeared to “skip” backward, she spent some time sensemaking about the cause of skips. Even when the interviewer presented alternate scenarios that would suggest different causes for the anomaly, Michelle found her explanation to be more satisfactory and chose to include the skips rather than exclude or correct them.
Michelle began watching the video of the zebrafish swimming in the tank. She first let the video play several times and then used the scroll bar at the bottom of the video to manually move through the frames of the video. Initially, Michelle made comments about the general behavior of the fish and did not mention the skips. The interviewer suggested scrolling through the frames more slowly, which made the visual “skipping” back of the fish more apparent. She then identified the anomalous behavior, saying “They're going backwards then forwards.” Immediately following this statement, Michelle verbalized her curiosity about the fishes’ movement.
Michelle: My brain is like, how on earth are they doing that? I feel like it would
probably effect the, your velocity, just because all of a sudden, like it's going
forward then it's like, no, we're going backwards until like, it changes like
direction through time…So maybe it like, slows velocity. Oraffects the
acceleration, distances. [continues scrolling through the video] Yeah. Did it again,
that's so weird. {Relative impact of anomaly matters}
Based on Michelle's puzzlements in line 1 and describing the skips as “weird” in line 6, the skips were anomalous to her, not behavior she expected from the zebrafish. She described the behavior of the fish changing direction in lines 2-3 but had not yet begun to verbalize ideas about the cause of the anomaly. Michelle proposed that the skips could impact possible variables she could calculate with the position data, thereby, potentially employing the resource relative impact of anomaly matters. It is also possible Michelle was brainstorming possible variables she could calculate, as she had not yet made any calculations with the data.
As it appeared Michelle was thinking about potential analysis directions, the interviewer asked whether she wanted to do something with those skips. Michelle replied, “Besides ignore them, not the answer, um.” Michelle appeared to assume that the interviewer wanted a specific response, which was not “ignore them.” By critiquing her thinking aloud that ignoring the data was not acceptable, Michelle may have had an assumption that there was a “right” way to do the interview. However, the interviewer encouraged the idea of ignoring the skips, as Michelle had begun sensemaking about the skips to consider ignoring them. When asked what might be a reason she would ignore the skips, Michelle then gave the following explanation.
7. | Interviewer: Well, that could be a thing to explore. So if you did ignore them, | ||||
8. | what would be your reasons for ignoring them? | ||||
9. | Michelle: My assumption is like the fish isn't movingwhen it like, skips | ||||
10. | backwards [moves hand horizontally]. Because I just think like moving water, | ||||
11. | like you're moving in the water you stop moving forward,[holds hand still] like | ||||
12. | you're gonna move backward [moves hand horizontally]. And so that was just | ||||
13. | my assumption. Those were like not moving themselves. That would be why I | ||||
14. | would ignore them, but that's how this was captured. Which I don't know how | ||||
15. | this was captured. {consider experimental context} |
Michelle used the resource consider experimental context as she described her explanation of the fishes’ movement. Within this portion of her sensemaking, she considered two facets of the context. First, in lines 9-13, Michelle thought about the experience of the fish actively swimming against a current in the water, using her hand as a representation of the fish as she explained how a current could cause a backward movement of the fish. Michelle's use of “you” in her explanation suggests she may have been recalling physical experiences she had with water currents. Michelle then addressed the equipment context in lines 14-15, or more specifically, that she did not know much about the video recording. Michelle's focus on a physical explanation of a water current was unique for this interview, as the other students generally first explained it as a human or technological error, often without considering other possibilities. Michelle discussed multiple components of the experimental context while sensemaking: conceptual understandings of water flow and mechanisms of swimming, and the video collection itself.
Noticing Michelle's statement about not knowing the recording context, the interviewer said, “I do know they were in a tank, so that I know of there was no flow,” to see how Michelle would incorporate new information. Michelle acknowledged the information, rephrasing the interviewer's statement by saying, “there was no current.” The interviewer assumed that information might prompt Michelle to a technological error explanation. However, the physical explanation appeared more satisfying to her as she did not directly discuss technology as a cause, even if she was still curious about the physical mechanism responsible for the anomaly (lines 16-17). Michelle then began considering ways she could mitigate the possible impact of the skips.
16. | Michelle: I think I want to know what makes them dothat. First of all, I'm just | ||||
17. | curious in that way. {consider experimental context} Um but, does it change like | ||||
18. | to just the body? Or is it like, could you measure from likethe specific points and | ||||
19. | that like have less of an impact? Is what I'm wondering. {consider experimental context} |
Michelle considered changing what point on the fish was used for manual tracking. In doing so, Michelle again used consider experimental context, focusing on how the tracking data was collected, and how that could have potentially contributed to the skips. It was unclear how Michelle thought changing the point on the fish used for manual tracking would mitigate the anomaly. However, suggesting recollecting data may indicate that Michelle started considering a “correctable” technology or human-caused error that she could mitigate. To further explore this reasoning, the interviewer asked if she had specific points on the fish she would use for tracking.
20. | Interviewer: Yeah, that's an interesting idea. Do you have a particular maybe | ||||
21. | place on the body in mind that you're thinking of? | ||||
22. | Michelle: [scrolling through video] Nah I was just trying to see where it doesn't, | ||||
23. | yeah, I don't know. Maybe just like, doing like the center [of the fish]? I think I'd | ||||
24. | [mumbles to self] yeah, I don't know. Cuz, like, I guess you have to include | ||||
25. | because like, while it's like, error or like messes with yourdata, like, it's | ||||
26. | externally accurate {select data to be representative} |
Here we see how Michelle's physical explanation impacts her choice to include the data, even if it is “error.” This physical explanation directed Michelle's use of the resource select data to be representative in lines 24-26, based on Michelle's statement that the skips are “externally accurate” in line 26. Michelle acknowledged that the skips would introduce error, but they should not be removed because she still considered it valid data. As there was still time left in the interview, the interviewer encouraged Michelle to try out her idea of recollecting data to see whether Michelle might engage in more sensemaking. However, even when she manually tracked frame-by-frame, she still referred to her physical context explanation as she worked, further indicating that was the most satisfying explanation to her (Michelle: “the only thought I had was like if they were in like a stream or something, or some current going on.”).
Understanding how Michelle was sensemaking about the skips is key to understanding why Michelle does not want to remove them. Michelle used intuitions about movement in water, likely from her own experiences, to reason about what the fish in the video might have experienced in the tank. Even when the interviewer introduced additional context about the tank, Michelle did not change her explanation. As Michelle determined the skips were physically happening to the fish, removing them would impact the accuracy of her data, even if it was “weird.” Michelle's sensemaking highlights the complex interaction of students’ reasoning about anomalous data and their choices about keeping or excluding anomalous data.
Comparing with the Larger Dataset
We chose this case as representative of the larger dataset across participants because of the clear interaction between Michelle's explanation of the anomaly and her choice to include them rather than remove or mitigate them. Michelle's case was novel for this interview prompt in that she was the only student who formulated and used a nontechnology–related explanation of the skips. However, that novelty prominently highlights the connection of explanation of an anomaly to subsequent actions with it, a pattern that was present across students. The resource consider experimental context was foundational to the explanation Michelle generated about the skips. We can contrast Michelle's case with students who determined that the skips were technological errors, in which removing them or correcting them was the more appropriate choice to improve the accuracy of their data. Across interviews, students leveraged the same resource to consider conceptual and equipment/technological aspects of the experiment when sensemaking about an anomaly.
Interestingly, Michelle did not use efficiency of action matters at all during this interview. Likely, this was due to Michelle having ample time in the hour-long session to complete what she wanted to. In other interviews, including a future session with Michelle, students often used the resource to justify explaining an idea rather than enacting it. Another possible explanation for why Michelle did not use this resource in this session was that she did not consider her proposed data recollection as a difficult task. In a few instances of students using efficiency of action matters, there was ample time left in the interview, but the task the student proposed could be considered mentally taxing or tedious, and students were reluctant to actually do it in the interview. The variability in when this resource was used highlights the impact of both time and the students’ evaluation of how hard or time-consuming the task is.
Finally, some students also used hypothetical or actual additional data collection to sensemake about and try to mitigate anomalous data. As exemplified in the case, Michelle considered how she could change data collection to try to mitigate the effects of the skips on the data, and then tested that modified data collection. Working with a different interview prompt, another student, Julia, considered how changing experimental parameters would potentially affect her data, including what kind of anomalous data the modifications could generate. This hypothetical experiment then allowed Julia to consider how data selection could unintentionally bias the data (Julia: “I think it would definitely help, if you were doing that experiment, to plot more values, because obviously some of them aren't moving as much, so if you're only selecting those ones in your data, that would skew it.”). While less frequently used by students, imagining modifications to the experiment appeared to be a useful mental exercise for students to consider how they could mitigate or prevent anomalous data.
DISCUSSION
This study uses the frameworks of epistemological resources and sensemaking to explore how students interact with anomalous data. We found that students use different resources as they work through the process of identifying anomalous data, generating ideas about the cause, and determining what to do with the anomalies. Further, the explanation that students generate about the cause of an anomaly impacts whether the student chooses to keep, remove, recollect, or mitigate the anomalous data. That is, we noticed that when students attribute the cause to something happening to the organism or object, they tended to suggest keeping the data as it was capturing what was happening. When they attributed the cause to a technological error, they tended to suggest removing or mitigating the issue to have more accurate data. The results of this study provide insight into students’ reasoning when they interact with anomalous data, which then has implications for how we support the design and implementation of labs.
The frameworks of sensemaking and epistemological resources were vital to this study's findings, and they might be useful for others in BER. These frameworks are rooted in Knowledge-in-Pieces and Resource Theory (Hammer, 2000; diSessa, 2014), which are increasingly used in BER. The concept of dynamic knowledge networks has been used to explore students’ teleological, anthropocentric, and essentialist thinking (Gouvea and Simon, 2018). Conceptual resources have been used to provide insight into students’ mechanistic reasoning about reflexes (Lira and Gardner, 2020), energy and matter transformation in metabolism (Bhatia et al., 2022), and bulk flow (Doherty et al., 2023). The related notion of epistemological frames has shed light on how the context of a fluid flow problem impacted how students solved it (Slominski et al., 2023). Further supporting the use of these frameworks are associated methods for identifying knowledge resources (Barth-Cohen et al., 2023) and fine-grained analysis of student reasoning (Parnafes and diSessa, 2013). In this study, fine-grained analysis was useful for exploring student reasoning on a moment-to-moment level. We suggest that future work in biology education may find these frameworks and methods helpful when exploring students’ conceptual understanding and epistemological approach to a variety of topics across the field in a variety of settings, including interviews, labs, and classrooms.
Here, we have argued that CUREs and inquiry-based labs provide students with opportunities to interact with authentic data that they generate themselves, and in this context, they also interact with anomalous data. However, reasoning processes involved in working with anomalous data in-the-moment have been often overlooked by the field. That is, research has tended to focus on reasoning with anomalous data through post-hoc reflections (e.g., Johansen and Christiansen, 2020; Bolger et al., 2021). Missing has been a detailed analysis of how students interact with anomalous data in lab settings. Our results capture what knowledge resources impact the process of identifying anomalous data, generating ideas about the cause, and determining what to do about the anomaly, along with how those resources are used throughout the process. Parts of this process have been captured elsewhere in the BER literature. For example, students’ interview responses hinted at rich reasoning when students encountered surprising or confusing data in the lab, where sometimes they engaged in model revision, and sometimes deemed data as erroneous (Bolger et al., 2021). Our results highlight that we can access students’ reasoning with data in-the-moment by utilizing think-aloud style interviews as they work, rather than reflective interviews. Further, students’ reasoning about anomalous data in-the-moment is complex, considering multiple factors about the anomaly, the experimental context, and the constraints of the lab that all impact what the student determines to do with the anomaly. However, we do want to acknowledge that the interview setting has its own affordances and constraints on the data. While interviews can be useful tools to capture in-depth student reasoning as the interviewer can ask follow-up questions in a way that may not be possible in a course setting, it is also a more artificial environment than the context of a lab course. Thus, researchers must balance the goals of their study with the limitations of data collection methods. Further, our results are limited by the kind of anomaly students encountered. In our data students primarily encountered anomalies due to technical issues, such as glitched video or limitations of ImageJ. Further work is needed to explore how students’ resource use or sensemaking process may differ when an anomaly is due to a conceptual inconsistency or novel data that does not fit accepted models.
Our results also document that the key to collecting this rich data was the interview dynamics, as the interviewer fostered student agency to iteratively problem-solve through validating students’ ideas around anomalous data, highlighting what students have pointed out as supportive in CURE experiences (Gin et al., 2018). Encouraging Michelle to explore her idea of ignoring the skips led to an insightful interview exploring her intuitive explanation, and the impact of that explanation on her actions with the skips. Validating and exploring student reasoning can be time-consuming, but understanding students’ current reasoning and supporting them over time in shifting how they think about an anomaly may better support students’ learning (Wagner, 2006). Moving forward, these results suggest the field of BER can benefit from similar interview methods where students’ ideas are validated to capture the reasoning processes associated with anomalous data. Given the types of resources used around the process of reasoning with anomalous data and students’ tendencies to filter their explanations and choices for the “right” ones, it becomes especially prudent for researchers and instructors to consider how encouragement and perceived agency impacts students’ reasoning.
A related body of scholarship has explored how students problematize in lab settings, in physics and elsewhere. In this focus, problematizing is the “work of identifying, articulating, and motivating the problem that needs solving” (Phillips et al., 2017). Similarly to sensemaking, a sense of unease or confusion is what begins the process of problematizing, though problematizing adds an additional focus on how a problem is articulated. Research that has studied students engaging in problematizing in labs has identified activities that students engaged in to problematize that are like the resources identified in this paper. For example, students an IPLS lab setting similar to the students in the current study engaged in group discussion about physical concepts, proposing new or modified experiments, and evaluating their calculations to identify the problem (Sundstrom et al., 2020. Our work adds an additional perspective by using the resource theory perspective to consider how students choose to engage in these kinds of activities. Further work highlights that the epistemic frame that students have about the lab setting, such as taking an inquiry frame versus a confirmatory frame, impacts if they engage in problematizing, with less problematizing when considering lab as a confirmatory or procedural activity (Phillips et al., 2021). Phillips and colleagues highlighted that perceived epistemic agency influences if students engage in activities like problematizing. While the interviews in our study were framed to students as high-agency spaces, in that students could do whatever they liked with the data, some of the resources, such as efficiency of action matters indicate that students were exercising their agency in different ways. Across these overlapping bodies of research on sensemaking, problematizing, and epistemic framing, there is a complex relationship with the role of students’ knowledge resources. Future work may better pinpoint the relationship among these perspectives.
Our work highlights that students engage in similar reasoning that expert scientists do in their work. Biologists engage in similar actions as the students did in our study; in particular, they spend time determining the causes of anomalous data, and may modify their models if the anomaly is not due to a technical or procedural error (Dunbar, 1999). Dunbar also noted that having a series of unexpected results was more likely to lead to model revision. Roth (2013) adds an in-depth perspective to this in practice by documenting how biologists evaluated data in-the-moment as they collected it, shaping their final dataset by what they chose to include or discard. The scientists used their background knowledge and considered the quantity of data available to determine cutoffs for discarding data. Notably in Roth's study, the different backgrounds and years of experience that the scientists’ had impacted the nuance of each scientist's reasoning about the same anomaly. Similar to those results, our results find that students used their prior experiences when interacting with anomalous data and had specific justifications when determining what to do with anomalous data. Even without years of experience, the undergraduate students in this study used a variety of knowledge when evaluating their anomalous data. Thus, this work adds to a body of scholarship documenting similarities between students’ and scientists’ reasoning.
For Biology faculty, understanding how previous science experiences have impacted how students engage in lab courses can help biology lab instructors better support student learning. Students bring what they learn in other discipline contexts, such as chemistry and physics labs, to their biology coursework. The instructional context of this study, the IPLS lab, is becoming more common in physics departments as they reform their algebra-based introductory physics labs to better suit the majority life sciences majors who take these courses. This study can be viewed as part of a trend working to connect BER and PER scholarship in ways that can support more communication and connections about student learning across their STEM coursework (Redish and Cooke, 2013; Thompson et al., 2013).
Our work raises questions about the role of anomalous data in labs. Given that is a common occurrence in undergraduate labs and with professional scientists, how do instructors and teaching assistants consider anomalous data? The larger social context of labs likely impacts how students sensemake about anomalous data in different ways than what happens in a one-on-one interview. Future work can explore how the social and often collaborative context of labs impacts student reasoning, both in how peers can impact sensemaking as well as how instructors and teaching assistants can support the process. Further, our work raises questions about the role of anomalous data in labs, particularly as we consider what skill outcomes we want from labs. Would it benefit lab instruction to emphasize the sensemaking process around anomalous data? Should practice working with anomalous data be forefronted in lab curriculum? Exploring these questions could provide valuable insights into how lab instruction can better support students’ sensemaking with data, particularly when working with anomalous data.
CONCLUSION
Interacting with anomalous data can be a complex reasoning process where one uses a variety of knowledge resources to identify, generate ideas about the cause, and determine what to do with anomalous data. Anomalous data are also a foundational part of science in all disciplines, as students and professional scientists encounter anomalous data across many contexts. In this article, we are beginning to add knowledge about the importance of anomalous data in students’ reasoning, with the goal of building an understanding of students’ reasoning processes within the BER and larger science education fields. This study is beginning to set the groundwork for future work that might develop more targeted instructional scaffolds and educational materials to support more depth in reasoning about anomalous data.
ACKNOWLEDGMENTS
We thank Tamara Young for assisting with interrater reliability checks. We thank Jordan Gerton for helpful feedback on this manuscript. We also thank the students who participated in this research. This material is based upon work supported by the
REFERENCES
American Association for the Advancement of Science (AAAS) (2011). Vision and change in undergraduate biology education: A view for the 21st century. Retrieved on February 9, 2023 from www.visionandchange.org Google ScholarATLAS.ti Scientific Software Development GmbH . (2022). ATLAS.ti (Version 22.0.6) [Computer software]. Retrieved from https://www.atlasti.com Google Scholar- 2014). Assessment of course-based undergraduate research experiences: A meeting report. CBE – Life Sciences Education, 13(1), 29–40. Link, Google Scholar , … & (
- 2018.). Spermatazoa dark field photography [Video]. David Barlow Archive. Retrieved on September 1, 2021, from http://www.davidbarlowarchive.com/categories/microscopy/fertilisation/sperm-dark-field-photograph.html Google Scholar (
- 2023). Methods of research design and analysis for identifying knowledge resources. Physical Review Physics Education Research, 19(2), 020119. Google Scholar (
- 2011). Classroom communities' adaptations of the practice of scientific argumentation. Science Education, 95(2), 191–216. Google Scholar (
- 2022). Putting the pieces together: Student thinking about transformations of energy and matter. CBE—Life Sciences Education, 21(4), ar60. Medline, Google Scholar (
Bolger, M. S., Osness, J. B., Gouvea, J. S., & Cooper, A. C. (2021). Supporting scientific practice through model-based inquiry: A students’-eye view of grappling with data, uncertainty, and community in a laboratory experience, CBE—Life Sciences Education, 20(1076), 1–22. https://doi.org/10.1187/cbe.21-05-0128. Google Scholar- 2015). A high-enrollment course-based undergraduate research experience improves student conceptions of scientific thinking and ability to interpret data. CBE—Life Sciences Education, 14(2), ar21. Link, Google Scholar (
- 2015). Unpacking students’ use of mathematics in upper-division physics: Where do we go from here? European Journal of Physics, 36(6), 065004. Google Scholar (
- 2022). External collaboration results in student learning gains and positive STEM attitudes in CUREs. CBE—Life Sciences Education, 21(4), ar74. Medline, Google Scholar , ... & (
- 1998). An empirical test of a taxonomy of responses to anomalous data in science. Journal for Research in Science Teaching, 35(6), 623–654. Google Scholar (
- 1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. Google Scholar (
- 2018). Qualitative Inquiry and Research Design: Choosing Among Five Approaches (4th ed). Thousand Oaks, California: Sage Publications. Google Scholar (
- 2019). Students’ progression in monitoring anomalous results obtained in inquiry-based laboratory tasks. Research in Science Education, 49(1), 243–264. Google Scholar (
- 2014). A history of conceptual change research: Threads and fault lines. In R. K. Sawyer (Ed.), The Cambridge Handbook of the Learning Sciences (pp. 88–108). Cambridge, UK: Cambridge University Press. Google Scholar (
- 2015). Knowledge analysis: An introduction. In A. A. diSessaM. LevinN. J. S. Brown (Eds.), Knowledge and Interaction: A Synthetic Agenda for the Learning Sciences, ed. New York, NY: Routledge. Google Scholar (
- 2023). Oaks to arteries: The Physiology Core Concept of flow down gradients supports transfer of student reasoning. Advances in Physiology Education, 47(2), 282–295. Medline, Google Scholar (
- 1993). Concept discovery in a scientific domain. Cognitive Science, 17(3), 397–434. Google Scholar (
- 1999). How scientists build models in vivo science as a window on the scientific mind. Model-based Reasoning in Scientific Discovery, 85–99. Boston, MA: Springer US. Google Scholar (
- 1995). Writing Ethnographic Fieldnotes. Chicago, IL: University of Chicago Press. Google Scholar (
Gin, L. E., Rowland, A. A., Steinwand, B., Bruno, J., & Corwin, L. A. (2018). Students who fail to achieve predefined research goals may still experience many positive outcomes as a result of cure participation, CBE—Life Sciences Education, 17(1076), ar57. https://doi.org/10.1187/cbe.18-03-0036 Medline, Google Scholar- 2018). Challenging cognitive construals: A dynamic alternative to stable misconceptions. CBE—Life Sciences Education, 17(2), ar34. Link, Google Scholar (
Hammer, D. (2000). Student resources for learning introductory physics, American Journal of Physics, 68(3), S52–S59 https://doi.org/10.1119/1.19520. Google Scholar- 2002). On the form of a personal epistemology. In B. K. HoferP. R. Pintrich (Eds.), Personal Epistemology: The Psychology of Beliefs About Knowledge and Knowing (pp. 169--190). 169–190. Mahwah, NJ: Erlbaum. Google Scholar (
- 2001). Abstraction in context: Epistemic actions. Journal for Research in Mathematics Education, 32(2), 195–222. Google Scholar (
- 2018). Authentic Inquiry through Modeling in Biology (AIM-Bio): An introductory laboratory curriculum that increases undergraduates’ scientific agency and skills. CBE—Life Sciences Education, 17(4), ar63. Link, Google Scholar (
- 2021). Making sense of sensemaking: Using the sensemaking epistemic game to investigate student discourse during a collaborative gas law activity. Chemistry Education Research and Practice, 22(2), 328–346. Google Scholar (
- 2010). Attending to student epistemological framing in a science classroom. Science Education, 94(3), 506–524. Google Scholar (
- 2020). Handling anomalous data in the lab: Students’ perspectives on deleting and discarding. Science and Engineering Ethics, 26, 1107–1128. Medline, Google Scholar (
Kapon, S., & diSessa, A. A. (2012). Reasoning through instructional analogies, Cognition and Instruction, 30(3), 261–310. https://doi.org/10.1080/07370008.2012.689385. Google ScholarKapon, S., Ron, G., Hershkowitz, R., & Dreyfus, T. (2015). Perceiving permutations as distinct outcomes: the accommodation of a complex knowledge system, Educational Studies in Mathematics, 88(1), 43–64. https://doi.org/10.1007/s10649-014-9570-2. Google ScholarKarch, J. M., & Sevian, H. (2022). Development of a framework to capture abstraction in physical chemistry problem solving, Chemistry Education Research and Practice, 23(292), 55–77. https://doi.org/10.1039/d1rp00119a Google Scholar- 2020). Increased scaffolding and inquiry in an introductory biology lab enhance experimental design skills and sense of scientific ability. Journal of Microbiology & Biology Education, 21(2), 21–22. Google Scholar (
Kjelvik, M. K., & Schultheis, E. H. (2019). Getting messy with authentic data: Exploring the potential of using data from scientific research to support student data literacy, CBE—Life Sciences Education 18(1076), es2. https://doi.org/10.1187/cbe.18-02-0023. Medline, Google Scholar- 2019). Undergraduate chemistry students’ epistemic criteria for scientific models. Journal of Chemical Education, 97(1), 16–26. Google Scholar (
- 2007). Responses to anomalous data obtained from repeatable experiments in the laboratory. Journal of Research in Science Teaching, 44(3), 506–528. Google Scholar (
- 2020). Leveraging multiple analytic frameworks to assess the stability of students’ knowledge in physiology. CBE—Life Sciences Education, 19(1), ar3. Link, Google Scholar (
- 2020). Students’ dynamic engagement with experimental data in a physics laboratory setting. In Proceedings of the 2020 Physics Education Research Conference, Orlando, FL. Google Scholar (
- 2022a). Bringing three-dimensional learning to undergraduate physics: Insight from an introductory physics laboratory course. American Journal of Physics, 90(6), 452–461. Google Scholar (
- 2022b). Student sensemaking about inconsistencies in a reform-based introductory physics lab. Physical Review Physics Education Research, 18(2), 020134. Google Scholar (
- 2019). Fundamentals of qualtiative data analysis. Qualitative Data Analysis: A Methods Sourcebook (4th ed., pp. 61–100). Thousand Oaks, CA: SAGE. Google Scholar (
- 2014). Toward better physics labs for future biologists. American Journal of Physics, 82(5), 387–393. Google Scholar (
- 2018). Sensemaking epistemic game: A model of student sensemaking processes in introductory physics. Physical Review Physics Education Research, 14(2), 020122. Google Scholar (
- 2019). Defining sensemaking: Bringing clarity to a fragmented theoretical construct. Science Education, 103(1), 187–205. Google Scholar (
Otter.ai . (2021). Otter.ai. Retrieved from https://otter.ai/ Google Scholar- 2013). Microgenetic learning analysis: A methodology for studying knowledge in transition. Human Development, 56(1), 5–37. Google Scholar (
- 2021). Not engaging with problems in the lab: Students’ navigation of conflicting data and models. Physical Review Physics Education Research, 17(2), 020112. Google Scholar (
- 2017). Problematizing as a scientific endeavor. Physical Review Physics Education Research, 13(2), 020107. Google Scholar (
- 2021). ImageJ (v. 153n). National Institute of Health. Retrieved from https://imagej.nih.gov/ij/index.html Google Scholar (
- 2014). NEXUS/Physics: An interdisciplinary repurposing of physics for biologists. American Journal of Physics, 82(5), 368–377. Google Scholar , ... & (
- 2013). Learning each other's ropes: Negotiating interdisciplinary authenticity. CBE—Life Sciences Education, 12(2), 175–186. Link, Google Scholar (
- 2020). How students combine resources to make conceptual breakthroughs. Research in Science Education, 50(3), 1119–1141. Google Scholar (
- 2013). Data generation in the discovery sciences—learning from the practices in an advanced research laboratory. Research in Science Education, 43(4), 1617–1644. Google Scholar (
Saldaña, J. (2009). The Coding Manual for Qualitative Researchers (1st ed.) Thousand Oaks, California: Sage. Google Scholar- 2012). Some assembly required: How scientific explanations are constructed during clinical interviews. Journal of Research in Science Teaching, 49(2), 166–198 Google Scholar (
- 2023). The impact of context on students’ framing and reasoning about fluid dynamics. CBE—Life Sciences Education, 22(2), ar15. Medline, Google Scholar (
- 2001). Understanding students' explanations of biological phenomena: Conceptual frameworks or p-prims?. Science Education, 85(4), 328–348. Google Scholar (
- 2020, August). Problematizing in inquiry-based labs: How students respond to unexpected results. In Proceedings of the 2020 Physics Education Research Conference, virtual conference. Google Scholar (
- TechSmith Corporation. (2022). Snagit (Version 2022) [Computer software]. TechSmith Google Scholar
- 2013). Competency-based reforms of the undergraduate biology curriculum: Integrating the physical and biological sciences. CBE—Life Sciences Education, 12(2), 162–169. Link, Google Scholar (
- 2007). “What if…”: The use of conceptual simulations in scientific reasoning. Cognitive Science, 31(5), 843–875. Medline, Google Scholar (
University of Utah (2022). University analytics and institutional reporting [Interactive Data Portal]. Retrieved on January 20, 2022 from https://data.utah.edu/data Google Scholar- 2006). Transfer in pieces. Cognition and Instruction, 24(1), 1–71. Google Scholar (