Keywords

1 Introduction

Problems encountered in the classroom are often well-structured and admit predefined answers, for instance as multiple choice questions. However, the problems that students will later face in their professional careers are likely to be ill-structured problems (ISP) which admit a broad set of context-dependent solutions  [1]. Such solutions are shaped by many variables and may trigger multiple unintended consequences. Solving an ISP thus requires a mental representation of the causal links between the variables that contribute to, or are impacted by, a potential solution.

Many software packages can support students in creating such representations in the form of “maps” or networks, such as Coggle or cMap. There have also been significant efforts in developing instructional systems to perform summative assessment on these maps. In the early 2010’s, Ifenthaler and colleagues developed a vast array of systems including HIMATT  [2] and SMD Technology  [3], together with their successor AKOVIA  [4]. The scope of application of such software packages has also been broadened over time: while early solutions required students to provide their mental representations as networks, newer packages such as GIKS can “immediately convert students’ writings”  [5] into graphs, which are then examined for summative assessment by listing strengths and weaknesses compared to the expert’s model. The latest packages also tend to provide online environments with high usability, such that teachers can intuitively use the tools and form communities of practitioners. For instance, our work at the 2018 Human Computer Interaction conference presented the design and implementation of an online tool that allows teachers to create assignments, upload the students’ maps, and comprehensively compare them to the solution maps  [6].

Although new tools and case studies can provide detailed summative feedback  [7, 8], there is a paucity of tools using formative feedback to guide students in improving their work. The lack of tools has long been a barrier to the use of ISPs in the classroom  [9], thus limiting the development of problem-solving skills for students  [10]. One exception was the work of Wu and colleagues, who provided hints regarding changes that should be made by students in the maps to bring them closer to the experts’ maps  [11]. While useful, such feedback must typically be created by instructors, which may be a barrier to use. Unlike Wu’s approach of requiring an additional annotation of the map in order to provide meaningful feedback, we take a systems science approach to automatically identify the most critical aspects that must be changed in the maps. Our contributions are twofold:

  1. (1)

    We design an adaptive instructional system to tell student (i) what they need to modify in their map and (ii) why such changes are needed.

  2. (2)

    We implement this system to support communities of practice by (i) using a client/server architecture to allow collaboration and sharing; and (ii) emphasizing usability through an intuitive Graphical User Interface (GUI)

The remainder of this paper is structured as follows. In Sect. 2, we explain why maps are an important tool to externalize the mental models held by students and we briefly survey methods and software to automatically analyze such maps. We then cover how software development efforts have approached the problem of assessing maps. Section 3 builds on this background to present the design of our assessment solution and its software implementation. Considerations for future studies are discussed in Sect. 4.

2 Background

2.1 The Importance of Creating Maps

Students often face questions admitting a small list of valid answers, such as True/False questions or Multiple Choice Questions. While these are useful to test core concepts and can be easily graded, many of the decisions that students will make as professionals require a detailed understanding of causes and consequences. Students may eventually become policymakers and decide whether to recommend quarantine or keep businesses open in the face of a pandemic. Some students may embrace administrative careers and will determine which services are offered in-person while others are provided online. Students working in human resources may need to choose between an internal and an external candidate. Each of these many situations admits several solutions, which are identified based on the preferences of the various actors involved as well as the salient factors and interrelationships in the problem space. Equipping students with tools to handle such complex scenarios is thus necessary to avoid creating a divide between the well-structured problems that they encounter in their formal education and the more open-ended questions that they will face as professionals.

A key tool to help decision-making processes is the creation of a model that lists the relevant factors and interrelationships. This model is initially implicit, as a mental model held internally by students. However, the assumptions found in implicit models may be inconsistent or do not conform to the evidence. To promote evidence-based decision-making, models are externalized and thus become explicit. Seeing a model allows to check for consistency or test hypotheses  [12]. The field of research concerned with the externalization of models from human participants is known as participatory modeling (PM). There is a wide variety of PM tools  [13], depending on whether the goal is to produce a computational model (e.g., to provide numerical estimates for the implications of a decision) or a conceptual model (i.e., to organize knowledge by listing factors and interrelationships). We focus on conceptual models, which are the foundation upon which any type of computational model may be built (e.g., transforming a causal loop diagram into a System Dynamics model). There are different types of conceptual models, depending on the structure (e.g., a ‘mind map’ puts the problem at the center and all other ideas branch off radially) and participants (e.g., whether the map reflects the knowledge of an individual or a group).

In this paper, we assess concept maps as studies have demonstrated that they help students recall and apply the knowledge gained  [14, 15]. We follow the definition of Voinov et al.: “A concept map results in a network, where concepts (nodes) are connected through directed links (edges). These links are labeled to indicate semantic or otherwise meaningful relationships.”  [13]

2.2 Analysis of Digital Concept Maps

As a concept map is a network, it can be analyzed using methods from network theory. The specific method depends on the purpose:

(Grouping):

As maps get larger, we need a higher unit of analysis than individual factors or interrelationships. Grouping factors into communities helps to understand a map, particularly when a group corresponds to a meaningful theme (e.g., all factors related to psychology in one group and factors related to the environment in another). For instance, the Foresight Obesity Map is among the largest maps for obesity  [16] and reducing it to its communities revealed which ones were more strongly connected or which connections were potentially missing in the model  [17]. In the large obesity maps of the Provincial Health Services Authority  [18], factors were categorized by community to create a hierarchy that helps policymakers interact with the network  [19, 20] by ‘closing’ a community (i.e. see it as a single high-level node) or ‘opening’ it (i.e. see all individual factors and interrelationships within the community). As grouping is one of the most common analyses, several additional examples can be found in the work of Allender, McGlashan and others  [21,22,23]. Note that ‘grouping’ as discussed here relies on the use of community detection algorithm, which is different from a thematic analysis in which researchers (rather than algorithms) assign factors to groups  [24].

(Intervening):

A map often supports decision-making activities. It is a snapshot of a perceived system such that we can examine the motivations and implications of various actions, also known as ‘interventions’ or ‘what-if’ scenarios. Some parts of the system are more important in driving its overall behavior and they are thus a prime target for interventions. Such parts are known as ‘leverage points’  [25]. They exist at several levels, from individual nodes (at a low level) to sets of interrelationships or even the whole ‘mindset’  [25]. The individual nodes most likely to impact the system can be identified by measuring their centrality  [23, 26]. Feedback loops (i.e. a cycle starting at a concept and following causal consequences that get back to this concept) in the system have also been identified in several works  [19, 27, 28] (Fig. 4a).

(Validating):

Studies  [29] have revealed that participants commonly omit the loops found in the real-world because of cognitive limitations (e.g., it is much simpler to think ‘linearly’ with chains of causes-and-effects). They also frequently ignore the alternative paths that connect two factors (Fig. 4b). In short, the maps show that people “tend to (un)consciously reduce complexity in order to prevent information overload and to reduce mental effort”  [30]. Consequently, the process of eliciting a map is often done with a trained facilitator  [31,32,33] who can focus on the structure (e.g., ensuring that loops are present, avoiding redundant concepts, keeping the relevance to the problem space) while the participant focuses on the content. In this context, the validation of a map is a means to ensure that it was developed through a rigorous facilitation progress (e.g., does it have loops?) and that it provides a plausible depiction of the real-world scenario under study  [34]. For instance, a map in which all loops are reinforcing  [35] reflects a strong focus on problems rather than solutions  [36]; this lack of balance and hence incomplete map can later be misleading for decision-making activities. Although there are additional ways to validate a map (e.g., by contrast with the evidence), this requires mixed-methods studies, which are beyond the scope of this brief overview of network methods  [37,38,39].

(Comparing):

In participatory modeling, it is common to produce several maps for a given problem by externalizing the perspectives of each participant (e.g., through semi-structured interviews) into one map and then building an aggregate  [34, 40]. Comparing the maps of individual participants can reveal differences between their mental models, thus prompting focused conversations to explore or bridge these gaps. Comparing the maps of groups allows to explore social differences (e.g., the maps of policymakers or subject-matter experts vs. the maps of community members)  [41, 42]. In education, encouraging students to create maps eventually leads instructors to assess maps, which can be done by comparison between the students’ maps and an expert’s map  [43]. A comparison can consist of identifying a ‘structural core’ of factors shared across participants (e.g., when do students tend to agree with the expert?)  [44], contrasting important elements (e.g., does the student agree with the expert on the most central factors?)  [45], or quantifying how similarities between maps (e.g., producing a numerical score to facilitate the assessment of students)  [7]. Note that throughout these participatory modeling studies, the focus is systematically to examine differences between maps rather than reduce these gaps, which is the focus of the present paper.

2.3 Methods and Software to Evaluate Digital Concept Maps

The field of research concerned with examining and representing the way learners organize their knowledge of a subject is called structural assessment of knowledge (SAK). When a student acquires information, it is incorporated into the existing body of knowledge (e.g., new factors) and connected (e.g., new edges) with existing knowledge. These connections may be correct, incorrect, or partially correct at any given time during the learning process. Research in SAK is concerned both with methods of knowledge representation and methods for evaluating the correctness of those representations  [15]. The two topics are strongly interdependent since the evaluation of knowledge depends on its representation and, in turn, the representation may be chosen to support the evaluation. For instance, several studies have shown how restricting the available concepts in a map simplifies its evaluation  [11, 14, 46].

Concept maps are popular in SAK to support assessment efforts. Trumpower and Vanapalli refer to this as “SAK of Learning” [emphasis added] in contrast to using concept maps to aid the student in contextualizing new information within their existing body of knowledge, which constitutes “SAK for Learning”  [15]. In the assessment context, concept maps would be created or completed in place of a more traditional exam or essay at the end of an instructional topic to evaluate how well students had mastered the material  [15]. Because concept maps are being used in place of a more traditional assessment instrument, it is critically important that the method of evaluating maps is reliable and accurately indicates the degree of mastery attained  [47]. Several barriers have been mentioned in using maps for assessment: the potential for subjectivity, since each map represents an individual view of a particular subject area  [46]; the reluctance of instructors to use concept maps as assessment tools because they perceive them to be time-consuming to evaluate  [14]; and the concerns of students who may find the maps time-consuming to produce  [14]. Experimental studies show that these concerns may not be systematically applicable. For instance, McClure, Sonak, and Suen performed a study on evaluation of concept maps by hand, using six different methods, both with and without a referent map for comparison, and found that none of the methods required more than 5 min per map on average, which they judged to be similar to the time required to evaluate an essay, based on personal experience  [46]. However, experimental studies do not always concur, in part due to the different preferences of the participants. While one study showed that the students’ concern about time could be effectively addressed with computer-based concept mapping instead of using pencil-and-paper  [47], another study ruled out the use of any technology as participating teachers considered that technology was difficult to acquire  [48]. In short, two key concerns remain in grading a map, similarly to grading essays: there can be significant variance in grading and it is much longer than the easily automatized activity of grading multiple choice questionnaires.

The need to have reliable methods to evaluate concept maps has prompted the development of several frameworks and algorithms  [11]. A common approach is to use algorithms from network science to measure an aspect of the student map and compare it with the measure obtained on the expert’s map (i.e. each map is reduced to a number and the two numbers are compared). Ifenthaler and colleagues proposed and named several such structural measures  [4]: number of concepts in each map (named ‘surface matching’), the diameter of their spanning trees as a proxy to the range of knowledge (‘graphical matching’), or the density (‘gamma matching’). As discussed by Krabbe  [49], such algorithms partly ignore the topic since they neglect the labels in the maps and only compare their structures. The inclusion of labels in an algorithm leads to semantic measures, which start by identifying the set of concepts shared by the student and the expert. Then, we can examine whether concepts connected in the expert’s map are also connected in the student’s map (i.e. ‘propositional matching’). More advanced semantic measures such as the convergence score, salience score, or balanced matching are detailed by Krabbe  [49]. As exemplified in the 2017 review on assessment technologies by Bhagat and Spector  [50], two software are noteworthy for supporting the assessment of concept maps: HIMATT, introduced in 2010 and equipped with structural algorithms to compare maps; and AKOVIA, released in 2014 and expanding on HIMATT with semantic algorithms. Pathfinder is occasionally mentioned in the literature since it can show how closely the relationships defined by the student match the relationships present in a referent map (by computing both configural and common similarity scores). However, this is limited to Pathfinder networks, which are a very constrained sub-category of concept maps  [51].

Fig. 1.
figure 1

Our proposed software is the fourth version of ITACM, which started in 2016 as a small package to align terms or switch layouts  [43] (a; top) and gradually proposed new functions such as recommender systems in 2018  [6] (b; bottom) or advanced graph algorithms recently  [7].

While all of the algorithms aforementioned perform an assessment by comparing a student map to a referent expert map, several referent-free measures have also been proposed  [52, 53] to favor certain structures in a student’s map (e.g., the ‘coherence’ Pathfinder measure, number of branches, total links). An alternative to these purely structural metrics is to use the Structure-Behavior-Function framework pioneered by Hmelo-Silver, who showed that experts are characterized by the ability to integrate all three aspects of the framework whereas novices focus on static properties of the system  [54]. We recently proposed a similar framework based on structure, function, leverage points (can students manage the system? can they generate multiple scenarios?) and trade-offs  [55].

2.4 The Incremental Thesaurus for Assessing Causal Maps (ITACM) Software

In 2016, our team started the development of the Incremental Thesaurus for Assessing Causal Maps (ITACM) software to address two shortcomings in software support for semantic measures  [43]. First, it is well documented that maps can get large and difficult to navigate, as the “abundance of variables hinders their spatial organization” and facilitators spend time manually moving variables until a better layout is obtained  [48]. Our software included several graph layout algorithms to automatically re-position the concepts and improve usability (Fig. 1a – top buttons). Most importantly, our software allowed to compute propositional matching or compare the Structure-Behavior-Function of maps in which the terms could be entirely different. Although it is recognized that maps created without restricting terms are more useful to students  [14], the prevailing attitude is that comparing such maps is more difficult  [11, 14, 15, 56] and hence there was a lack of software support to solve the variation in language (e.g., ‘heart attack’ in the student map but ‘cardiac arrest’ in the expert map). ITACM uses a subject-area specific thesaurus database constructed by users to allow terms in one concept map to be aligned with terms used in an expert map. This feature allows concept maps to be effectively analyzed without restricting students to only use words from a pre-determined list.

After the 2017 release of ITACM, we rewrote the software to bring two additional improvements  [6] (Fig. 2b). First, a client/server architecture supports a community of practice as instructors can create accounts, share assignments (and the students’ maps within), control access permissions, and re-use the thesaurus database developed by others. Second, we introduce a recommender system to suggest how to align terms from the student with the expert, thus reducing the amount of time spent by instructors on the alignment process. The codebase of ITACM v2 forms the basis on which future versions have since been built.

The third version of ITACM  [7] brought three new approaches to measure the similarity of maps (i.e., graph kernel, graph editing distance, graph embedding), based on advanced algorithms that have been extensively studied in graph theory yet never utilized for the assessment of digital maps. Approaches such as Graph Edit Distance allow the user to choose between structural or semantic measures, as the distance between two maps is computed based on the minimum number of operations to transform one into the other and the cost associated with such operations. An instructor can thus choose which cost or ‘penalty’ should be applied when the semantic of a student’s concept do not match with the expert’s, even after alignment (e.g., ‘poor body image \(\rightarrow \) depressed’ for the student but ‘low body image \(\rightarrow \) low self-esteem’ for the expert).

The software presented in the next section, ITACM v4, continues in the tradition of the previous releases of ITACM by using a client/server architecture and allowing users to align terms. Its core innovation lies in helping students to address differences instead of only counting them.

3 Principles, Design, and Implementation of the Proposed Technology

3.1 Overarching Goal and Core Principles

While several of the software aforementioned can perform a summative assessment, instructors often seek to correct misconceptions rather than merely count them. As discussed by several researchers, feedback is an essential part of the learning process  [11, 14, 47]. It is possible to ‘tweak’ some of the summative assessment software to provide feedback by highlighting differences between the student map and the expert map, but this requires a concentrated effort on the part of the student to reflect on and examine the reasoning behind the displayed differences. A more effective method is to provide specific targeted feedback for each student map  [15]. Wu et al. concluded that the use of the “evaluation-feedback-modification cycle [...] significantly improved the learning achievement of the students”  [11]. Thus far, the main limitation to provide feedback is that it required extensive additional work for the instructors  [11, 15, 47]. The overarching goal of ITACM v4 is thus to automatically generate feedback.

Fig. 2.
figure 2

In the top proposed wireframe (a), the user only sees the student map and the changes to make, summarized in tabular form. The map components involved in a proposed change can be highlighted by clicking the ratio button in the table and the justification appears via a modal window by clicking on the reason (e.g., J1, J2). An alternative wireframe (b) shows both the student and the expert’s map, with missing nodes in red (i.e. present only in the expert’s map) and extraneous nodes in blue (i.e. present only in the student’s map). By hovering over blue or red components, a popover window appears to give contextual information. (Color figure online)

Fig. 3.
figure 3

In the top proposed wireframe (a), missing nodes from the student’s map are shown in red and become green if added. By clicking on the explanation links, a modal window provides explanations. An alternative wireframe (b) emphasizes the order of suggestions, which are only revealed one at a time and summarized in the bottom window. For each suggestion, changes can be made to the map via a series of buttons, text fields, and drop-down menus. (Color figure online)

The translation of this goal to a specific design and ensuing implementation follows the following six principles:

  1. (1)

    The terms used by students should be preserved to the extent possible. Wu et al. point out that new information should be taught in relation to the learner’s original knowledge structures  [11]. Consequently the feedback should not seek to systematically replace the students’ terminology by the expert’s, but only do so when necessary.

  2. (2)

    Students should be responsible for making the changes and enabled to do so through hints. Wu et al. generated feedback for students in the form of hints such as “There is a missing notion related to Concept A” and “There is a missing connection related to Concept A and some other Concept”  [11].

  3. (3)

    Students should not be overwhelmed with a large list of changes to make. Rather, a small set of hints should focus on the most important changes. In addition to the individual relationships represented in concept maps, causal maps are typically analyzed for the presence of large system-level structures that can cause significant system-wide changes. As explained in Sect. 2.2, two such structures are loops and alternative paths between concepts. These structures can exert a large amount of influence on the behavior of a system, so they are important features for students to identify.

  4. (4)

    The feedback should bridge the gap between a student’s map and the expert’s using a minimal number of steps. For instance, if the student has a factor that is not found in the expert’s map, then it would be unnecessary to ask the students to first remove causal links involving the extraneous factor before removing the factor itself. Instead, the student should be asked to directly remove the factor (single step), which de facto cuts all links involving this factor. This principle is inspired by the notion of ‘solution path length’  [57], positing that the intricacy of the problem-solution process is driven both the number of steps (which we emphasize) and by the complexity of each step.

  5. (5)

    When bridging the gap between a student’s map and the expert’s requires several steps, the successive steps should be organized such that the student advances towards a clear goal instead of fixing seemingly disconnected aspects of a model. For example, a student can take a series of steps to complete a loop that is missing with respect to the expert’s map, then add a missing alternative path, and finally prune unnecessary links. Although these activities take several steps, they can be bundled into three higher-level activities which motivates low-level changes (e.g., adding or removing a link) by higher-level needs (e.g., missing a loop).

  6. (6)

    The graphical user interface must follow current standards and be intuitive to support software usability. Although Weinerth et al. found that very few papers on the assessment of concept maps mention software usability, either directly or indirectly, the International Test Commission has requested to include usability of assessment instruments in all assessment research  [47].

3.2 Design Process

Usability issues can arise when high-level functionalities of a software are immediately translated into an implementation. To provide a satisfying experience to users and avoid creating barriers through the software interface, it is necessary to first think about the target audience, how they will interact with the software to accomplish various tasks, and hence how to support the sequence of interactions that they may perform. We thus began the design process by identifying the intended users of the software and considering the key implications in interface design for those users. Although students submit maps to be evaluated, the users of the software are instructors. We cannot assume that the instructors have created the expert maps used for comparison, but we assume that they are familiar with concept maps in general. Our design also assumes that instructors decide how to convey the feedback to students, such that we are providing a tool to support instructors. Finally, we emphasize that the goal of the software is to provide automatic feedback, thus instructors would not need to go through the additional work of annotating the expert maps with justifications explaining the rationale for including specific structures.

With these considerations in mind, we started the process of sketching possible interfaces and the series of interactions that instructors would perform. In our research group, the standard practice is that several research assistants independently generate sketches  [58].

The many ideas generated through this process are then discussed by the group to combine the strengths emerging from the different sketches. Four different visions are shown in Fig. 2(a–b) and 3(a–b). These different visions revealed the need to make design decisions regarding (i) the level of detail included in the feedback (step-by-step modification plan? display of discrepancies using color coding?), and (ii) the style of feedback provided (textual? graphical display?). We decided against providing an incremental feedback that forces the user to engage in a prescribed series of actions (Fig. 3b), of which only the most important would be revealed at a time. Although such rigid guidance can satisfy principle #5, our experience with previous software is that users do not like being locked in algorithmically-decided choices  [59]. We thus chose to organize the information to favor continuity in changes rather than jumping across the map, but instructors ultimately judge which feedback items most effectively reinforce the learning outcomes they are working towards. We also considered that showing differences between the student’s and expert’s maps using colors (Fig. 2b) could provide useful guidance and that users should be able to access justifications in multiple ways, such as by clicking on the links in the table (Fig. 2a) or the links on the map (Fig. 3a).

To refine the design, we created a series of prototypes and gathered feedback on their usability. We began by only suggesting changes to a student map based on differences in the set of nodes or edges between the student map and the expert map. We then detected edges that were part of loops or alternative paths between concepts present in the expert’s map, but missing from the student’s. Each suggested change was justified with a set of reasons listed in the table, such as “The node ‘concept 1’ was present in the expert map” or “The edge between ‘concept 1’ and ‘concept 2’ would complete a loop present in the expert map.” When multiple reasons were applicable, we listed all possible reasons. The feedback on our prototypes resulted in removing radio buttons from the table and using a single color to highlight concepts/relationships involved in the suggested change selected by the user. We also added the ability to export the full list of feedback items to a CSV file and the ability to customize the feedback by controlling the algorithms detecting loops or alternative paths. These algorithms are detailed in the next sub-section.

3.3 Implementation

As the codebase reuses ITACM v2, we refer the reader to the details of this implementation for the client/server architecture, network and visualization libraries  [6]. The two algorithms introduced in the software discussed here serve to find alternative paths and cycles. In graph theory, the former is known as disjoint path detection and the latter is about enumerating all cycles. Both problems have been studied for decades, resulting in several algorithms. Since concept maps are relatively small networks (unlike e.g. Twitter or Facebook datasets), we do not need to use approximation algorithms or solutions that leverage parallel and distributed computing. We thus rely on foundational algorithms for both problems. We implemented disjoint path detection, using both an edge-disjoint path detection algorithm and a node-disjoint path detection algorithm. The node-disjoint path detection algorithm was based on a transformation scheme outlined in Suurballe’s 1974 paper  [60]. To enumerate cycles, we use the method introduced in 1975 by Johnson  [61].

Fig. 4.
figure 4

Two potential feedback loops in obesity (a) include: a lowered physical fitness, which further fuels weight gain; a decrease in body image, with a potential for depression and the intake of antidepressants whose potential side effect includes weight gain. Two alternative paths connecting the legal recognition of ‘obesity as a disability’ to ‘weight-based discrimination’ (b) include: being able to sue for discrimination, hence make it dissuasive; increasing the popular belief that obesity is undesirable which, together with the belief that it is controllable, results in discrimination.

Fig. 5.
figure 5

Our implementation uses a table to summarize the changes that would transform the student’s map into the expert’s map. The content of the table depends on two parameters (maximum length of loops and alternative paths). Upon selecting a specific change to make (b), the corresponding part of the map is highlighted in red and the reason(s) for the change are also available next to the highlight. Clicking on a reason either next to the map or in the table will open a window with an automatically generated explanation. This high resolution figure can be zoomed in using the digital version of the article. (Color figure online)

Instructors may decide that students ought to represent short loops while being more lenient on the inclusion of longer loops. For example, it is relatively straightforward to conceptualize that being obese reduces the ability to engage in physical activity, which in turns participates to obesity (Fig. 4a-left). This loop of length 2 is conceptually more evident, and hence expected of students, than the other loop of length 5 in which obesity reduces body image, thus promoting depression, with the possibility of taking antidepressants which may lower metabolism (Fig. 4a-right). Rather than comparing a student’s map to the expert’s map on the basis of every single loop, we introduce a parameter which lets instructors control that maximum length of the loops that should be detected. Similarly, we introduced a parameter to control the length of disjoint paths, which allows instructors to specify that students should consider alternatives but only up to a point. In Fig. 4b, the two paths have a length of 2 (from the legal status of obesity as a disability to the prevalence of weight-based discrimination). There may be a third path connecting these concepts with a length of 15, but this alternative may be far fetched and a lower priority in improving a student’s work. Instructors can conveniently change the values of both parameters on the same screen, rather than having to locate a panel for settings. The effect of a change in parameters’ values is immediately visible, as the table of suggested changes is refreshed accordingly.

Two screenshots from our implementation are provided in Fig. 5. The screenshots are annotated to identify some of the specific design elements discussed above.

4 Discussion

Formative assessment of concept maps is necessary to support educators in teaching problem-solving skills to students. While previous software primarily supports summative assessment, we proposed, design, and implemented a software that leverage systems thinking to automatically generate a step-by-step list of changes that students need to make together with supporting reasons. The software was designed to provide a satisfying user experience to instructors, who are fully in control of how the guidance is passed onto students. Instructors are further able to customize the automatically generated guidance through two parameters, which impact our search algorithms for loops and alternative paths.

In this section, we focus on three potential avenues for future work. First, our software is the first to generate feedback entirely automatically while giving instructors the possibility to customize the type of sub-structures (e.g., loops, alternative paths) that they seek to promote in their students’ maps. This opens a new line of research in the parametrization of algorithms for formative assessment of concept maps. Several questions within this line of inquiry are as follows. Which sub-structures of a causal map would indicate higher levels of systems thinking or better mastery of domain-related knowledge? For instance, it is possible that certain network motifs  [62] or network structures (e.g., a star in which one factor is influencing/influenced by many non-interacting factors) need to be promoted or flagged as potential issues. Which parameters would allow instructors to specify that they seek such structures along a continuum? In this paper, we used a parameter to control the length of alternative paths, but it is possible that a better control parameter would be how much longer a path should be compared to the shortest one (e.g., if the shortest explanation from a factor to another is of length 2 then should a student be required to think of an alternative of length 6?). Can each parameter be set independently or should a change in one parameter automatically impact the range of possible values in another? As instructors are assumed to be familiar with concept maps without being experts in graph algorithms, how can we continue to support usability while providing the customization of increasingly complex algorithms?

Second, as research in recommender systems has shown, automatically generating an advice does not mean that it will be followed, even if it would have been highly likely to benefit the individual. Explanations need to be transparent and convincing. This is less of a concern for low-level explanations such as ‘you should include this factor because the expert has it’, but it applies for high-level rationales that involve several successive changes such as ‘you need to add these four links to finish a loop’. Further experimental studies are thus needed to examine the impact of several approaches to explain the changes and organize them.

Finally, we note that the historical divide between elicitation tools on the one hand (e.g., MentalModeler  [63]) and analysis tools on the other hand (e.g., ActionableSystems  [20], Gephi) is starting to disappear as newer software provide both extended analytical and map-building capabilities (e.g., our artificial facilitator  [31], STICKE  [44]). It would thus be of interest to design, implement, and evaluate a system in which students can create their maps while receiving automated feedback to the extent desired by the instructor. We also envision that instructors should be able to set milestones for evaluation in such a system, such that it can be adopted as part of classroom activities or assignments.

Contributions This study was jointly initiated by PJG and AAT. The manuscript was written by PJG and the software development was directed by PJG. Feedback on the design and prototypes was provided by AAT.