. Author manuscript; available in PMC: 2024 Mar 1.

Published in final edited form as: Int J Med Inform. 2023 Jan 6;171:104985. doi: 10.1016/j.ijmedinf.2023.104985

Clinical Research Staff Perceptions on a Natural Language Processing-driven Tool for Eligibility Prescreening: An Iterative Usability Assessment

Betina Idnay ^1,^2,³, Yilu Fang ³, Caitlin Dreisbach ⁴, Karen Marder ², Chunhua Weng ^3,^*, Rebecca Schnall ^1,^5,^*

PMCID: PMC9912278 NIHMSID: NIHMS1864348 PMID: 36638583

Abstract

Background:

Participant recruitment is a barrier to successful clinical research. One strategy to improve recruitment is to conduct eligibility prescreening, a resource-intensive process where clinical research staff manually reviews electronic health records data to identify potentially eligible patients. Criteria2Query (C2Q) was developed to address this problem by capitalizing on natural language processing to generate queries to identify eligible participants from clinical databases semi-autonomously.

Objective:

We examined the clinical research staff’s perceived usability of C2Q for clinical research eligibility prescreening.

Methods:

Twenty clinical research staff evaluated the usability of C2Q using a cognitive walkthrough with a think-aloud protocol and a Post-Study System Usability Questionnaire. On-screen activity and audio were recorded and transcribed. After every five evaluators completed an evaluation, usability problems were rated by informatics experts and prioritized for system refinement. There were four iterations of system refinement based on the evaluation feedback. Guided by the Organizational Framework for Intuitive Human-computer Interaction, we performed a directed deductive content analysis of the verbatim transcriptions.

Results:

Evaluators aged from 24 to 46 years old (33.8; SD: 7.32) demonstrated high computer literacy (6.36; SD:0.17); female (75%), White (35%), and clinical research coordinators (45%). C2Q demonstrated high usability during the final cycle (2.26 out of 7 [lower scores are better], SD: 0.74). The number of unique usability issues decreased after each refinement. Fourteen subthemes emerged from three themes: seeking user goals, performing well-learned tasks, and determining what to do next.

Conclusions:

The cognitive walkthrough with a think-aloud protocol informed iterative system refinement and demonstrated the usability of C2Q by clinical research staff. Key recommendations for system development and implementation include improving system intuitiveness and overall user experience through comprehensive consideration of user needs and requirements for task completion.

Keywords: Eligibility prescreening, cohort identification, clinical research, natural language processing, usability, cognitive walkthrough

1. Introduction

Timely accrual of participants into clinical research studies is a persistent challenge that leads to delays or increased costs in developing new therapies due to timeline extensions or study termination.¹ Several strategies (e.g., community outreach, clinician partnership, social media engagement) have shown limited improvement in recruitment and participant diversity.² A promising strategy is electronic eligibility prescreening.³ It involves manual or automated electronic health records (EHR) scanning to identify study cohorts potentially satisfying clinical trial eligibility criteria.⁴ Despite its advantages,⁵ this approach still faces challenges in translating criteria text into accurate and efficient clinical data queries.⁶ It is a knowledge-intensive task, often carried out by a hard-to-find technician with clinical domain knowledge and database skills. For many organizations, this task remains costly, inefficient, and full of variability.⁷

Efforts have been made to develop natural language processing (NLP) systems to automate standard-based study cohort query generation to minimize the manual work involved in transforming criteria text into cohort queries.⁸ For example, Criteria2Query (C2Q) allows clinical researchers to use NLP to efficiently construct semi-automatic queries to identify study participants in large clinical databases^9,10 In a recent systematic review of NLP systems used for eligibility screening,¹¹ out of the eleven evaluation studies, only one included a usability evaluation and demonstrated satisfactory usability.¹² The Organizational Framework for Intuitive Human-computer Interaction posits that intuitive technology systems allow users in lenient learning environments to use a combination of prior experience and feedforward methods to achieve their goals in using the system to complete a task.¹³ These systems are embedded in a social-organizational environment – hence adoption can differ in different settings and with different users.¹⁴ A successful system adoption is not guaranteed just because the organization enforces it; rather, the integration of usability methods during system development process facilitates technology acceptance and adoption.¹⁵ Wang and colleagues found unsatisfactory evaluation of the adoption of five NLP systems,¹⁶ highlighting the need to understand the factors influencing adoption behavior at individual and organizational levels to accept and use these systems before full-scale adoption.¹⁷ Computational challenges and human factors are important considerations for evaluating an NLP system.¹⁸

Preliminary evaluation of C2Q showed promising usability from the perspective of informatics experts⁹ and moderate usability with clinical research coordinators.¹⁰ The goal of this study was to iteratively evaluate the usability of C2Q for eligibility prescreening from the clinical research staff’s perspective to inform system refinement.

2. Materials and methods

We iteratively evaluated the usability of C2Q by employing a cognitive walkthrough with a think-aloud protocol, administering the Post-Study System Usability Questionnaire (PSSUQ)¹⁹, and independent expert usability testing. This approach involved simultaneous qualitative and quantitative data collection and analysis that continues cyclically through multiple rounds to increase the likelihood of technology adoption.²⁰ This study was approved by the Columbia University Irving Medical Center (CUIMC) Institutional Review Board.

2.1. Setting and sample

The evaluations were conducted remotely through Zoom. Twenty clinical research staff with experience in Alzheimer’s disease and other related dementias (ADRD) clinical research were recruited through CUIMC Irving Institute for Clinical and Translational Research Clinical Research Resource, research team’s professional network, and word of mouth. Evaluators had to be 18 years old, able to read and communicate in English, worked as clinical research staff (e.g., clinical research coordinator, research assistant, research nurse) in the last 12 months, and worked in ADRD clinical research for at least three months. Demographic variation sampling was used to reduce situational uniqueness and attain a comprehensive understanding of the user experiences.²¹

2.2. Features of Criteria2Query

C2Q is an NLP-driven tool that facilitates human-computer collaboration by translating free-text eligibility criteria to standards-based cohort definition queries and allowing user-defined modifications through an editable user interface (Fig. 1).¹⁰ It enables clinical research staff without database query experience or informatics training to transform eligibility criteria into clinical database queries for cohort identification and tailors the query to the eligibility criteria of their study.⁹ C2Q has the following properties: an editable user interface with functions to prioritize or simplify the eligibility criteria text to query for potentially eligible participants; accessible and portable cohort Structured Query Language (SQL) query formulation based on the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) version 5; and real-time cohort query execution with result visualization. For evaluation purposes, we connected C2Q to the publicly available Medicare Claims Synthetic Public Use Files SynPUF_1K containing 1,000 synthetic patient cases.²²

Fig. 1. — C2Q interface and use case example. The Eligibility Criteria Input box shows the free-text eligibility criteria provided by the user. The Initial Eligibility Criteria Parsing Result box shows C2Q’s automated concept mapping of the clinical terms. As shown in the Updated Eligibility Criteria Parsing Result box, the user then modifies and prioritizes the criteria that would best address their need, such as updating the map of AD in Inclusion criterion #1 from adrenal adenoma to Alzheimer’s disease, unchecking inclusion criterion #2, and removing mapped clinical terms in exclusion criterion #1.

2.3. Iterative Usability Evaluation

2.3.1. Cognitive walkthrough with a think-aloud protocol

We conducted a cognitive walkthrough to evaluate how users perform tasks with little to no formal instruction using C2Q.^23.24 The evaluators completed four steps:²³ 1) identifying an end goal; 2) inspecting available actions; 3) selecting one of the correct actions as the next step leading to the end goal, and; 4) evaluating system feedback on the progress toward accomplishing the end goal. A think-aloud protocol is a commonly employed method that asks evaluators to verbalize their thoughts as they navigate the system interface.²⁵ Analysis of the verbalization supports exploring how users perceive the system and why particular interface components are easy to use.²⁶

2.3.2. Post-Study System Usability Questionnaire (PSSUQ)

The PSSUQ assesses users’ perceived usability with a computer system in a scenario-based context.¹⁹ With a Cronbach’s alpha of .94, the 16-item survey has a three-factor structure: system usefulness, information quality, and interface quality.¹⁹ The items are rated on a seven-point Likert scale from “strongly agree” (1) to “strongly disagree”(7), with lower overall scores indicating better user satisfaction.²⁷

2.3.3. Independent usability severity rating

Two health informatics experts in the team (RS, CW) independently rated the severity of each usability violation to determine the priority of C2Q refinement for the next evaluation cycle. Problem severity was assigned based on three factors: frequency of the problem, the potential impact of the problem on the user, and persistence of the problem.²⁸ Using these factors, the problem severity was rated according to the Nielsen 5-scale rating²⁸ as follows: (0) not a problem at all, (1) cosmetic problem only, (2) minor usability problem, (3) major usability problem, or (4) usability catastrophe.

2.4. Data collection methods and procedures

The study involved one 1-1.5-hour online zoom meeting. Evaluators were granted remote access to navigate the study team’s computer screen with a beta version of C2Q. Using a purposely selected Alzheimer’s disease (AD) research protocol from ClinicalTrials.gov (NCT04619420; Fig. 2) to assess burden (estimating an average of one minute to modify a criterion⁴), the evaluators completed ten tasks step-by-step (Supplementary Table 1) without any specific guidance on how to complete the tasks. Evaluators were prompted to describe their thought processes as they attempted to complete a task. Occasionally, we provided prompts when the evaluators were unable to complete a task.

Fig 2. — Automated criteria parsing result of NCT04619420 from C2Q.

We monitored and logged the evaluators’ interface interaction and recorded the audio and on-screen activity. All audio recordings of the usability evaluation sessions were transcribed verbatim. In addition to PSSUQ, evaluators completed a demographic survey and a computer literacy evaluation (Table 9), a validated 22-item (Cronbach’s α = .92) with a Likert-type scale survey scored 1-7; a higher score indicates higher computer literacy.²⁹ Evaluators received a $50 digital Amazon gift code for participation. After five clinical research staff completed evaluation, a list of the unique usability problems was distributed for the independent usability severity rating. There were four iterations of C2Q system refinement (five evaluations for each iteration).

2.5. Data analysis and rigor

Sessions were analyzed using audio and video recordings, verbatim transcriptions, and field notes. Comments, silence, and repetitive actions were reviewed and evaluated to determine potential usability issues.³⁰ Comments were categorized as positive or negative characteristics and recommendations. The severity of the identified unique usability issue was determined by averaging the severity scores assigned to the problem. Usability issues that scored 3.5 (0-5, with 5 being more severe) and higher were prioritized and addressed during system refinement. The PSSUQ, demographic survey, and computer literacy evaluation results were analyzed to calculate the descriptive statistics. We explored the differences in usability based on sex, age, job experience, and position.

Further, a directed deductive content analysis³¹ of the transcriptions of the cognitive walkthrough was performed. Transcripts were reviewed for accuracy and uploaded to Dedoose software.³² A thematic codebook was created and drawn from the adapted Organizational Framework for Intuitive Human-computer Interaction constructs: seeking user goals, performing well-learned tasks, and determining what to do next.¹³ The framework conceptualizes the constructs as pie slices representing the user’s required cognitive activities in intuitive human-computer interactions (Fig. 3).

Fig. 3. — Adapted Organizational Framework for Intuitive Human-computer Interaction used in the directed deductive analysis ¹³.

The framework constructs became the main themes into which meaning units would be coded (Table 1). The subthemes were further categorized as metacognition and user knowledge.

Table 1.

Coding framework based on the Organizational Framework for Intuitive Human-computer Interaction¹³

Code	Description
Seeking user goals	Meaning units were coded into this theme when statements described how the clinical research staff would use C2Q based on the tasks and their experience and expertise. Comments about the system and its interface concerning the effective use of the system to achieve their goal were also coded in this theme.
Performing well-learned tasks	Meaning units were coded into this theme when statements described how the clinical research staff perceived the goal of the system and its impact on how they interact with the interface to complete the tasks and achieve their goal. In addition, statements on factors that affect how the system is implemented were also coded in this theme.
Determining what to do next	Meaning units were coded into this theme when statements indicated the system’s attributes that will impact how the users complete the tasks.

Open in a new tab

Two coders (BI, CD) independently coded the data. Memos related to codes and coded segments were documented to keep track of the developments in the data analysis. Discrepancies in the themes or coding were discussed among the team until a consensus was reached. To ensure that the analysis reveals the clinical research staff’s experiences in eligibility prescreening, member checking was done throughout data collection by asking participants if the findings from the participants of the previous evaluation cycle matched their experience.³³ Maximum variation through our sampling strategies allowed a vast range of views and perspectives to be considered, confirming the data’s conformability and credibility.³⁴ Data saturation was reached once no new themes emerged.

3. Results

3.1. Evaluator Characteristics

Twenty evaluators completed the usability evaluation. The mean age was 33.45 years (range 24-46), 75% female, 35% White, 50% Hispanic, and 60% above undergraduate level. Most (45%) of the evaluators worked as clinical research coordinators, 85% worked as clinical research staff for two years or longer, and 75% worked in ADRD research for two years or longer. Majority (80%) of the evaluators manually review the EHR to identify potentially eligible participants. Half of the evaluators spend two hours or more a week, and most (65%) spend ten minutes or more prescreening one potentially eligible participant. Evaluators scored 6.36 on average on the computer literacy questionnaire (range 5.58-7), indicating high computer literacy. Table 2 summarizes the evaluators’ characteristics by cycle.

Table 2.

Evaluator characteristics by cycle (n per cycle = 5; N = 20)

Characteristics	Cycle 1	Cycle 2	Cycle 3	Cycle 4	Overall
Age, years (mean (SD) [range])	34 (6.04) [27-42]	30.4 (4.83) [24-37]	32.6 (7.13) [24-40]	36.8 (10.38) [25-46]	33.45 (7.32) [24-46]
Gender (n (%))
Female	4 (80)	5(100)	2 (40)	4 (80)	15 (75)
Male	1 (20)	0	3 (60)	1 (20)	5 (25)
Race (n (%))
American Indian/Alaskan Native	0	0	1 (20)	0	1 (5)
Asian or Asian American	0	1 (20)	1 (20)	1 (20)	3 (15)
Black or African American	2 (40)	1 (20)	0	2 (40)	5 (25)
Multiracial	3 (60)	0	0	0	3 (15)
Something else	0	1 (20)	0	0	1 (5)
White or Caucasian	0	2 (40)	3 (60)	2 (40)	7 (35)
Ethnicity (n (%))
Hispanic	4 (80)	3 (60)	1 (20)	2 (40)	10 (50)
Non-Hispanic	1 (20)	2 (40)	4 (80)	3 (60)	10 (50)
Educational level (n (%))
Bachelor's degree	1 (20)	1 (20)	3 (60)	3 (60)	8 (40)
Master's degree	2 (40)	2 (40)	2 (40)	2 (40)	8 (40)
Doctoral degree	2 (40)	2 (40)	0	0	4 (20)
Job position (n (%))
Clinical research coordinator	3 (60)	3 (60)	3 (60)	0	9 (45)
Nurse practitioner	0	0	0	1 (20)	1 (5)
Research assistant	1 (20)	1 (20)	2 (40)	2 (40)	6 (30)
Research associate	1 (20)	0	0	1 (20)	2 (10)
Research program/project manager	0	1 (20)	0	1 (20)	2 (10)
Months/years as clinical research staff (n (%))
less than 6 months	0	0	1 (20)	0	1 (5)
6 months to < one year	0	0	0	0	0
1 to < two years	0	0	1 (20)	1 (20)	2 (10)
2 to < five years	3 (60)	3 (60)	0	3 (60)	9 (45)
5 to < ten years	1 (20)	2 (40)	2 (40)	0	5 (25)
ten years or over	1 (20)	0	1 (20)	1 (20)	3 (15)
Months/years in ADRD research (n (%))
less than 6 months	0	0	1 (20)	0	1 (5)
6 months to < one year	0	0	0	1 (20)	1 (5)
1 to < two years	0	1 (20)	1 (20)	1 (20)	3 (15)
2 to < five years	4 (80)	4 (80)	0	2 (40)	10 (50)
5 to < ten years	1 (20)	0	1 (20)	1 (20)	3 (15)
ten years or over	0	0	2 (40)	0	2 (10)
Eligibility prescreening methods used ^* (n)
Clinician referral	1	0	2	0	3
Manual review: EHR	3	5	4	4	16
Manual review: printed medical records	1	2	2	4	9
Telephone survey	4	5	4	3	16
Other study team referral	1	0	1	0	2
Type of research^* (n)
Pharmaceutical trials	1	1	3	4	9
Observational studies	6	7	7	4	24
Registries	0	2	0	2	4
Post-mortem (brain donation)	0	0	1	0	1
Weekly time spent in eligibility prescreening (n (%))
30 minutes to < 1 hour	1 (20)	2 (40)	2 (40)	0	5 (25)
1 hour to < 2 hours	0	2 (40)	1 (20)	2 (40)	5 (25)
2 hours to < 3 hours	2 (40)	0	2 (40)	2 (40)	6 (30)
more than 3 hours	2 (40)	1 (20)	0	1 (20)	4 (20)
Time spent prescreening one participant (n (%))
Less than 10 minutes	1 (20)	1 (20)	3 (60)	2 (40)	7 (35)
10 minutes to < 20 minutes	2 (40)	3 (60)	1 (20)	0	6 (30)
20 minutes to < 30 minutes	1 (20)	1 (20)	1 (20)	3 (60)	6 (30)
more than 30 minutes	1 (20)	0	0	0	1 (5)
Database query experience (n (%))
Yes	3 (60)	2 (40)	1 (20)	4 (80)	10 (50)
No	2 (40)	3 (60)	4 (80)	1 (20)	10 (50)
Computer literacy score (mean (SD))
Software	6.93 (0.15)	7.00 (0.00)	6.87 (0.30)	6.67 (0.47)	6.87 (0.29)
Hardware	6.40 (0.63)	5.60 (1.63)	6.40 (0.63)	5.40 (1.35)	5.95 (1.15)
Multimedia	5.85 (0.89)	5.10 (1.01)	5.15 (1.44)	5.85 (0.98)	5.49 (1.08)
Network	7.00 (0.00)	7.00 (0.00)	7.00 (0.00)	6.70 (0.45)	6.93 (0.24)
Information ethics	6.60 (0.65)	6.65 (0.38)	5.40 (1.52)	6.85 (0.22)	6.38 (0.98)
Information security	6.67 (0.33)	6.53 (0.65)	6.27 (0.83)	6.73 0.28)	6.55 (0.55)
Total score	6.58 (0.26)	6.31 (0.25)	6.18 (0.60)	6.37 (0.39)	6.36 (0.40)

Open in a new tab

One evaluator may have multiple answers.

3.2. Usability evaluation

The system’s usability was rated poorly in the first cycle (3.18; SD: 1.24) but surpassed the benchmark (overall score: 2.82; system usefulness: 2.80, information quality: 3.02; interface quality: 2.49)³⁵ across all the subscales and overall usability in the second cycle (Table 3). Evaluators without database query experience rated the usability slightly better (n=10; mean: 2.45; SD: 0.81) compared to those who have experience (n = 10; mean: 2.50; SD: 1.16).

Table 3.

The iterative usability scores and issues identified by cycle.

System Usability (PSSUQ)	Cycle 1 Mean (SD)	Cycle 2 Mean (SD)	Cycle 3 Mean (SD)	Cycle 4 Mean (SD)
System Use^*	2.87 (1.19)	2.27 (0.71)	1.77 (0.89)	2.07 (0.58)
Information quality^*	3.40 (1.21)	2.70 (0.68)	2.27 (1.19)	2.23 (0.91)
Interface quality^*	3.27 (1.52)	2.47 (0.77)	1.80 (1.26)	2.60 (0.89)
Overall ^*	3.18 (1.24)	2.49 (0.62)	1.98 (1.03)	2.26 (0.74)
Unique usability issues (n)	36	21	17	16
Usability issues prioritized for refinement (n)	22	12	4	6

Open in a new tab

A score of 1 indicates the best usability, and 5 indicates the worst usability.

Though the system achieved the highest usability during the third cycle, it continued to demonstrate high usability during the final cycle (2.26, SD: 0.74). Supplementary Table 2 details the rating of the unique usability issues identified in each cycle. Forty-four usability issues were prioritized and addressed during system refinement (Supplementary Table 3). The number of usability issues decreased by 56% from Cycle 1 to Cycle 4.

3.3. Qualitative Analysis

With 1,802 codes, 14 subthemes were identified and classified into three themes: seeking user goals, performing well-learned tasks, and determining what to do next. The number of codes increased from Cycle 1 to 3 and decreased in Cycle 4 (Fig. 4). Performing well-learned tasks has the highest number of codes. Certain terms were removed from the query, such as “memory box score” (because this is not commonly documented in the EHR), “positive tau PET results” (because the result may not be available in the EHR and only a limited number of patients had the procedure), “progressive subjective decline” (because it is unclear how to characterize progressive), and “medications that affect the central nervous system” (because it is too broad). A detailed list of the proportion by cycle of the subthemes within the themes is detailed in Fig. 5.

Fig. 5. — Code proportion for subthemes by cycle.

Exemplars are provided in Table 4, and a comprehensive list of exemplars by theme is available in Supplementary Tables 4-6.

Table 4.

Theme exemplars including cycle of the evaluation and evaluators’ characteristics (job position, age, and the number of years in ADRD research).

Themes	Exemplars
Theme 1: Seeking user goals
Aligning user and system goals^*	“I know there has to be a difference between the darker gray versus the white. I can see that some things are bolded, and other things are in different colors” – Cycle 1, RA, 29, 2 to <5 years
Aligning user and system goals^*	“…what’s a concept, and why do I need to pick a concept? Why is this important right now?” - Cycle 3, CRC, 36, 5 to <10 years
Augmenting system intelligence^*	“… the way the information was put in the system doesn’t recognize the criteria, it looks like it’s giving me opportunity to fix that so it’s just clear criteria that the system can actually process and filter.” - Cycle 4, RA, 25, 2 to <5 years
Optimizing workflow+	“…to easily filter through a bunch of patients saves a lot of time. I also really like about being able to share that filter with other institutions so they can easily check if patients are eligible for your study. I certainly would have loved to use something like this.” - Cycle 4, RA, 25, 2 to <5 years
	“…you notice that is a mistake, and you don’t want that. So, you can do it yourself and you don’t have to contact anybody else to do it.” - Cycle 4, Nurse Practitioner, 45, 2 to <5 years
	“…if you’re writing a paper, you can copy the code and include it in a publication.” - Cycle 3, CRC, 37, ≥10 years
Casting a wider net of eligibility+	“… I just to try to cast a wider net so I can see for myself. So, when I do my chart review, these more specific things. I’d try to take them out because I would rather not leave someone out that I could potentially bring into the study.” – Cycle 1, CRC, 36, 5 to <10 years
Casting a wider net of eligibility+	“… when I was looking through potential participants, it’s like anyone who shows up – and this is an exaggeration – but, it felt like anyone who shows up to an emergency room who is over 60, they could be high, could be hallucinating, could be belligerent, could be drunk, and it will say likely dementia or AD by some intake person. And obviously, to us, it’s like we wouldn’t ever try to enroll them if they didn’t have an actual Alzheimer’s diagnosis by a geriatrician or neurologist. Can you tell who diagnosed it? Even with like a family practice doctor versus a neurologist or something.” - Cycle 2, Program manager, 28, 2 to <5 years
Theme 2: Performing well-learned tasks
Carrying out complex actions^*	“… you should be able to start to write in Alzheim, and the first result should be AD.” - Cycle 2, Program manager, 28, 2 to <5 years
Carrying out complex actions^*	“I wish there was an undo button for my mistakes.” - Cycle 2, RA, 24, 2 to <5 years
Understanding how the system works^*	“… we want to know is what exactly is the word that this whole system is going to be searching for in the database. And, I think you want to make that clear. What I’d want to know is what is going to go into my search.” - Cycle 2, CRC, 31, 2 to <5 years
Understanding how the system works^*	“I assumed that Standard concepts was like its recommended list of things that they think I am trying to get at, and All concepts would allow me to filter through more broadly.” - Cycle 4, RA, 25, 2 to <5 years
Learning how to use the system^*	“I think using it a little bit more does become more intuitive.” - Cycle 4, RA, 25, 2 to <5 years
Learning how to use the system^*	“I do think after using it once or twice, you get the idea of how to do things. And I think it’s smooth enough.” - Cycle 1, RA, 27, 2 to <5 years
Level of expertise in the clinical domain+	“This level is probably beyond the scope of most coordinators. I am thinking, earlier in my career, I probably wouldn’t have made any kind of alterations, unless it was something really obvious to change.”- Cycle 3, CRC, 37, ≥10 years
Level of expertise in the clinical domain+	“…but maybe if I was more knowledgeable on it [frontotemporal dementia], it would have been more obvious to me which concept I had to choose; I guess I went with the most general one and hoped that it applies.” - Cycle 1, RA, 27, 2 to <5 years
Designing the system after the user workflow+	“I would probably do a couple of test cases. When I am going through that process, I am going to be playing around with the inclusion/exclusion criteria, running it based on different criteria. I would like to see the results. If it comes up with 25 people, I know that’s not enough. So, I’ll go back, refine it, and rerun it. Once I get it to a place where I think it’s good, then I’ll probably want to export the file.” - Cycle 3, CRC, 37, ≥10 years
Theme 3: Determining what to do next
Looking for clear information^*	“…what if you can hover over the NCT ID and give you a little comment bubble? And it says an NCT ID is the number for the clinical trials website.” - Cycle 3, CRC, 36, 5 to <10 years
Looking for clear information^*	“When it has the value and the measurement as different colors, does the system automatically know to associate those two?” - C2, RA, 24, 2 to <5 years
Needing explicit instructions^*	“I liked that before I even typed it in, there was an example in the textbox, like in the beta script, where things are there beforehand. I could already see that there was an example in there, so I knew what format to use” - Cycle 3, CRC, 24, 1 to <2 years
Visualization as cognitive support^*	“… I would generate click cohort again because it’s green. And green is always go.” - Cycle 1, CRC, 36, 5 to <10 years
	“The multiple colors do throw you off because you think you’re looking at different things. But I understand after I followed the numerical division that it tells you each color is a condition or diagnosis.” - Cycle 1, CRC, 36, 5 to <10 years
	“It makes me visually think I should start there because it’s at the top. Because this is a lighter text, I feel like it’s not editable, but now I know that it is, that I can type in it. Now I can see that, but my eyes go to the box first.” - Cycle 3, CRC, 24, 1 to <2 years
Doing the logical next step^*	“But there is no little button that says “Save.” You know how in Word, and you want to save your things?” -C3, RA, 40, ≥10 years
Depending on system output+	“I like to start with a huge query just to see how many we’re really working with.” - C2, Program manager, 28, 2 to <5 years
Depending on system output+	“I think it depends on whether you’re recruiting for a study. Are you trying to recruit 1,000 people or 50 people? Because you’ll approach it differently.” - Cycle 2, Program manager, 28, 2 to <5 years

Open in a new tab

Metacognition +User knowledge; RA: research assistant; CRC: clinical research coordinator

3.3.1. Theme 1: Seeking user goals

The evaluators made sense of what they initially observed when interacting with the system interface, then related it to their overall goal. Four subthemes emerged as evaluators oriented themselves with the interface toward achieving their goals:

Aligning user and system goals (metacognition): Evaluators sought to identify congruence between what they saw and how they could accomplish their goals using C2Q. Evaluators needed to find alignment between the system’s function and their goal.
Augmenting system intelligence (metacognition): Evaluators acknowledged the capability of the system to automate certain tasks to help them accomplish their goals efficiently; however, they also recognized that their domain expertise and intervention are essential to complete the tasks.
Optimizing workflow (user knowledge): All evaluators stated how C2Q could optimize their eligibility prescreening workflow by helping them narrow down the pool of patients to those potentially eligible for their research study, reducing the time spent sifting through the medical records of ineligible patients manually. Even though C2Q can reduce their workload, evaluators said they would still need to manually review the patient charts to confirm the patient’s eligibility. Regarding the query’s output (i.e., the list of patients that meet the user-modified eligibility criteria query), the evaluators identified other information that C2Q can provide that would be helpful, such as the medical record number linking to the patient’s EHR and the patient’s permission to be contacted for research. The only identified necessary EHR information specific to AD research was the Mini-mental Status Examination (MMSE) score. The convenience of how both the output and the query can be downloaded and shared with collaborators or as a part of their dissemination is also valuable.
Casting a wider net of eligibility (user knowledge): All evaluators preferred their queries to be broad enough to capture all potentially eligible patients, even though they included ineligible patients in their output that they had to exclude afterward. The evaluators expressed concerns about how pertinent information was documented in the EHR and how the results of tests were represented in the database. The temporal aspect of eligibility was recognized as a challenge to ensure the query would capture the appropriate participants. The complex and ambiguous criteria were excluded from the query because it requires further inquiry, such as medication use, sperm donation, and alcohol abuse. Specific to ADRD-focused research, evaluators expressed how the diagnosis of dementia and the level of cognitive impairment can impact how they modify the query.

3.3.2. Theme 2: Performing well-learned tasks

Five subthemes emerged when evaluators attempted to complete the tasks successfully:

Carrying out complex tasks (metacognition): In executing complex actions, evaluators sought to find ways to be efficient. For example, evaluators preferred the system to autofill during the concept search bar without having to type the whole term. Evaluators also recommended ways to be efficient in modifying the criteria parsing results. In Cycles 1 and 2, the criteria parsing edit and delete functionalities can only be done by pressing a key and clicking the mouse. During these cycles, evaluators encountered difficulties in completing the task. After adding the right-click and pop-up box option for Cycle 3, no concerns were observed.
Understanding how the system works (metacognition): Evaluators had difficulty identifying how the system queries the database based on the mapped concept and the completeness of the database. The evaluators also had difficulty understanding the difference between unchecking the box to remove the criterion from the query and clicking the “delete all tags” icons. Automating unchecking the box if there are no tagged terms was recommended. Jargon was one of the biggest hurdles in understanding how C2Q works. Evaluators did not realize they could download the query file for future use because they were unfamiliar with “SQL query” and “JSON query.” When mapping a term to a concept, the evaluators were confused with Standard Concepts versus All Concepts options. Starting Cycle 3, the added bracketed text after Standard Concepts that says “Recommended” was helpful for the evaluators to decide which option to choose, but it did not clarify the difference.
Learning how to use the system (metacognition): Evaluators demonstrated that the repetition of the task helped them learn how to do it efficiently.
Level of expertise in the clinical domain (user knowledge): Another factor that the evaluators emphasized is the clinical research staff’s domain expertise and content knowledge. They emphasized the need for training on using the system and mapping the terms comprehensively and correctly. Even in the field of ADRD, the evaluators who did not have an extensive experience with a particular disease domain did not feel confident in mapping specific terms to a concept.
Designing the system after the user workflow (user knowledge): Most evaluators stated that they do not use the eligibility criteria listed on ClinicalTrials.gov but instead refer to the study protocol. They also acknowledged the potential benefits of extracting the criteria from ClincialTrials.gov, but there were concerns about the timeliness of the update. Evaluators stressed the iterative nature of eligibility prescreening, emphasizing the importance of systems like C2Q to modify the cohort list efficiently for a more detailed eligibility determination.

3.3.3. Theme 3: Determining what to do next

Five subcategories emerged when evaluators determined the subsequent action to execute to complete the tasks:

Looking for clear information (metacognition): The evaluators recommended providing clear information that there are two options to enter the eligibility criteria (i.e., typing the inclusion or exclusion criteria in the corresponding box or entering the National Clinical Trial Identification Number [NCTID] of the study) through context-aware hyperlinks where information will pop up when the mouse is hovered over a term or icon. Evaluators also asked for more information concerning the criteria parsing modification to know if they successfully completed the task.
Needing explicit instructions (metacognition): Evaluators tried completing a task without reading the instructions, and when they were unsuccessful, they looked for and read the instructions. Most evaluators in Cycles 1 and 2 expressed that the intuitive action for them to take was to click or right-click on the tagged term. When the right-click option to update and delete a tag to a term was added, the evaluators intuitively right-clicked to complete the actions, even without reading the instructions. Some evaluators also emphasized the benefit of a video tutorial to walk users through the interface, highlighting what the users can and cannot do in the criteria parsing output, a case scenario to give context on choosing a concept to map to a term and the exact steps to modify the criteria.
Visualization as cognitive support (metacognition): Visualization is important to help users determine how to complete the task. For example, colors prompted the evaluators to proceed confidently to the following action. Though seeing different colors overwhelmed some evaluators, they perceived it necessary to organize how users see the mapped terms. The evaluators in Cycles 1 and 2 expressed confusion when the instructions included icons, yet they could not find the icons when they tried to figure out how to complete the task. Evaluators recommended changes to visually help users know what to do and what has been done.
Doing the logical next step(metacognition): Evaluators expected the system to have the same standard functionalities found in other interfaces, such as text recommendations in the search bar and right-click action to see options for the next step. This expectation guided their decision to determine the logical next step, even without explicit instructions on what to do next.
Depending on system output (user knowledge): The following action also depends on what output the evaluators got from the system. They expressed the importance of running the query at a certain point to determine if they need to add or remove more criteria and the relevance of considering the target number of participants when executing the query.

4. Discussion

This study focuses on an iterative usability evaluation of an NLP-driven tool to enhance electronic eligibility prescreening for clinical research. This evaluation design provides feedback from end-users and experts, facilitating prompt system design, content evaluation, and refinement.³⁶ Findings demonstrate that C2Q is usable from the clinical research staff perspective. The cognitive walkthrough showed qualitatively that the C2Q was perceived to be useful for prescreening eligibility and minimized the amount of manual work.¹⁰ Findings also demonstrated quantitatively that with a prioritized system refinement approach, the unique usability issues decreased by more than half (56%). We identified actionable recommendations in each evaluation cycle and deployed an improved system version incorporating domain experts’ feedback (Supplementary Table 3). Although the usability evaluation, as reported by the PSSUQ scores, decreased from Cycle 1 to 4, Cycle 3 demonstrated the highest usability. This pattern has been seen in other studies ³⁷ and may be contributed by performance issues or complex functionality that were not prioritized for system refinement.³⁶ Therefore, further usability evaluation is needed. There are numerous approaches for evaluating the usability of NLP systems, none of which is considered superior, and various evaluation methodologies may be beneficial for system development.³⁸

4.1. Importance of human-computer collaboration

C2Q was developed to leverage computer efficiency with human intelligence in transforming free-text clinical research eligibility criteria into executable cohort queries. It integrates the domain expertise of clinical research staff into the process of identifying entities from the eligibility criteria text with minimal or no database query experience. Vague, subjective, or complex eligibility criteria may need human determination, hence will not be included in the query but will be further assessed once the potentially eligible participant is referred.³⁹ Domain expertise is necessary to recognize and simplify the complexity of the eligibility criteria text, as previously reported.³⁹

Evaluators with more experience in AD modified the automated criteria parsing results, demonstrating that the breadth of content knowledge and background is necessary to maximize the system functionalities. Conversely, less-experienced evaluators typically included more criteria in the query and retained the automatic concept mapping. This may be due to a lack of knowledge of the nuances involved in eligibility prescreening and which criteria are feasible to query in the EHR.⁶ This is consistent with the previous usability evaluation of C2Q’s editable user interface with clinical research coordinators highlighting domain knowledge as a key differentiating factor that influences user experience.¹⁰ Regardless of the user’s modification approach, the system’s ability to allow changes iteratively based on the output supports cohort optimization until the desired cohort definition is attained.

4.2. Considerations for designing NLP-driven systems for eligibility prescreening

The recommendations addressed a broad range of issues, including system usability, interface design, and additional functionalities to improve system intuitiveness and the overall user experience. Understanding the end-user goal will help direct system refinements.⁴⁰ The recommendations in Table 5 can reduce the costs of implementing a system that does not meet user needs or standards.

Table 5.

Recommendations on NLP-driven system development for clinical research eligibility prescreening.

Recommendation
Accessible content and interface information	Context-aware hyperlinks, or “infobuttons,” where users can hover over a text or help icons, have been found helpful to support users in knowing what next step to take in the system interaction.⁴¹ Using jargon might potentially generate misunderstanding, resulting in missed opportunities to maximize the system.⁴² Designing systems with functionalities that are commonly used in other interfaces improves interface interaction and learnability⁴³
Allow for system flexibility	Concerns were raised on how to map imaging and laboratory test results (e.g., “positive Tau PET scan” versus measurement of tau protein based on the scan), warranting system flexibility to accommodate both categorical and numerical definitions of the measurement. The undo functionality will allow users to be more confident in doing the next step intuitively, knowing that a mistake can be undone with minimal effort.⁴⁴ A selective undo feature has been used in code editors and could potentially be adapted to systems like C2Q.⁴⁵
Prioritize accuracy to support recruitment efficiency	The availability of the medical record number, contact information, and the ability to filter and sort the output allow recruitment prioritization, thereby facilitating efficiency. Other recommended information in the output includes the patient’s consent to be contacted for research, living status, and current participation in other studies. Specific to ADRD research, the information on who provided the diagnosis (e.g., primary care provider, neurologist) helps determine eligibility efficiently because of the likelihood of having ADRD-specific workup (e.g., neuropsychological testing, biomarker tests).⁴⁶ A shareable cohort definition query can also streamline the recruitment process within the team or with other collaborators.
Need for a disease-focused eligibility prescreening system	The examples that were pointed out to support the feedback were mainly focused on the disease domain of interest, such as the documentation of AD biomarkers that evaluators more reliably look for within the clinician note rather than laboratory results (especially when the cerebrospinal fluid result indicates “borderline”), the documentation of MMSE scores in the clinician note or scanned form, and prescreening for specific behavioral symptoms (e.g., agitation, depressive episode) that may not necessarily be under the diagnosis list but were documented in the visit note. The limitations of the EHR (e.g., timeliness and completeness of the information, variability in how the information was documented) can affect how research staff conduct eligibility prescreening;⁴⁷ hence it is helpful to include if and how the system manages these constraints.
Transparency in handling protected health information	It is important to consider ensuring that ethical safeguards are in place when designing electronic eligibility prescreening tools to protect the patients’ privacy and confidentiality.⁴⁸
Include the “potentially eligible”	Going beyond the eligible and not eligible classification of patient determination based on the modified query, a third classification may be a relevant addition through criteria prioritization: the potentially eligible.⁴⁹

Open in a new tab

The evaluators prefer a more inclusive system for eligibility prescreening. Due to the inherent complexity of eligibility criteria (e.g., underspecified requirements, temporality), a prescreening tool should provide flexibility in eligibility criterion text content modifications at any phase of the system use.⁵⁰ Concept mapping was found to be challenging. Multiple options of recommended concepts could be overwhelming and may prompt users to choose the first option or decide not to map any concepts, regardless of the level of experience.

The evaluators highlighted the importance of the accuracy of the query output. While tools like C2Q will not eliminate manual patient chart review, it will reduce the number of patient charts the clinical research staff needs to review and consequently focus their thorough chart review efforts on the most likely eligible candidates.⁵¹ The evaluators also pointed out that knowing the characteristics of the ineligible patients and what criteria precluded them from the study is helpful to “flag” the criteria that are not definitive of exclusion. The prioritization of potentially eligible participants in eligibility prescreening has increased screening efficiency.³

The evaluators underscored the importance of unstructured clinical notes in eligibility prescreening, especially the narrative results of imaging tests and clinician’s visit notes. With up to 80% of EHR data stored as free text (e.g., progress notes, discharge summaries),⁵² NLP is instrumental in advancing the current state of eligibility prescreening. Concerns about abbreviations and misspellings were also expressed in how the system identifies them and how their presence in the EHR might affect the query. This echoes the need to advance NLP methods to extract relevant information from the EHR and recognize named entities in free-text eligibility criteria, focusing on a domain-specific language to facilitate a more comprehensive electronic eligibility prescreening.^53,54 The evaluators also raised concerns surrounding how the system handles protected health information and upholds ethical considerations in terms of confidentiality and consent to be contacted. The initial contact with potential participants and the permission for this contact are critical components of study recruitment that are highly variable between institutions and have significant ethical implications.⁵⁵

4.3. Strengths and limitations of the present study

The iterative study design has limitations and is accompanied by an acceptance of the need for further system refinement. In real-world applications of electronic eligibility prescreening tools for clinical research, the quality and availability of clinical documentation is an important issue.^12,56 Though the evaluation involved a synthetic database that could limit the full use of C2Q, it provided insight into the domain experts' strategies in eligibility prescreening and how clinical documentation affects clinical research staff’s electronic eligibility prescreening. While the usability scores improved, it is important to consider that the system’s accuracy in identifying potentially eligible participants is crucial for the evaluators – which may affect the perceived usability⁵⁷ – but its investigation was out of the scope of this study.

The results may not be generalizable to all clinical research in other disease domains, despite the study sample's diversity in domain expertise. Future studies are warranted to further test the generalizability of these findings in diverse clinical domains. The evaluation was conducted in English and with evaluators who currently live in the US; hence, the results cannot be generalized to other languages or cultural settings. We acknowledge that interacting with C2Q in a remote virtual environment posed particular challenges in the system interaction due to potential differences in the operating systems used by the evaluator and the researchers and the reliability of the internet connection, which may have impacted the evaluator’s perception to the usability of the system.⁵⁸ Finally, because the data obtained in this mixed-methods study was cross-sectional in design, future longitudinal study is needed to assess the long-term impact of adopting C2Q on clinical research staff workload efficiency and recruitment outcomes.

4.4. Clinical and research implications

Exploring the perception of clinical research staff in other disease domains is needed to comprehensively understand the system’s usability depending on the use case.⁵⁸ More research is required to investigate the impact of the implementation of NLP-driven eligibility prescreening tools by the type of study (e.g., pharmaceutical trial, longitudinal study) and other disease domains (e.g., rare diseases, cancer). Further system development that involves human-in-the-loop machine learning models to improve named entity recognition, concept mapping, and criteria simplification could potentially address the concerns identified in this evaluation (e.g., the system-recommended concepts, inaccuracy in the automated parsing result). This underlines the importance of training clinical researchers to develop literacy and competency in this methodology. There is a need for more research on the types of training needed by clinical research staff and the efficacy of various training strategies to recruitment outcomes. Systems like C2Q provide an opportunity to identify and enroll participants of diverse backgrounds eligible for clinical research but not necessarily seen by specialists conducting trials.

5. Conclusion

This study demonstrated that an NLP system that allows user-defined modification to the eligibility criteria query is generally usable and accepted by a group of clinical research staff. This iterative usability evaluation approach may benefit future NLP system evaluations by providing more granular information on usability issues. Findings indicate the importance of incorporating user feedback into the iterative refinement of an NLP-driven tool for eligibility prescreening prior to full-scale adoption.

Supplementary Material

NIHMS1864348-supplement-1.docx^{(471.5KB, docx)}

Highlights.

Resource-intensive eligibility prescreening slows clinical research recruitment.
NLP can automate large database queries and improve eligibility prescreening.
NLP alone may result in inaccurate cohort definitions for clinical data queries.
By combining human and machine intelligence, NLP can optimize eligibility prescreening.
Actionable recommendations for NLP system for eligibility prescreening were provided.

Summary Table.

What was already known in this topic:

Eligibility prescreening is a bottleneck in clinical research recruitment because it is costly and labor-intensive.
We can leverage NLP systems to automate large database queries and optimize clinical research staff efforts in eligibility prescreening.
Eligibility criteria are complex, and NLP alone may result in inaccurate cohort definitions for clinical data queries.

What this study adds to our knowledge:

By combining human and machine intelligence, NLP systems can be useful in narrowing down the list of potential participants to review for further clinical research study eligibility determination, thereby optimizing the eligibility prescreening process.
Due to the inherent complexity of eligibility criteria, an eligibility prescreening tool should provide flexibility in eligibility criterion text content modifications at any phase of the system use.
We identified actionable recommendations to avoid developing an NLP system for eligibility prescreening in isolation, only to discover that it does not meet significant user needs or standards.

Funding

This work was supported by the Agency for Healthcare Research and Quality grant R36HS028752, the National Institute of Nursing Research grants T32NR007969, P30NR016587, and K24NR018621, the National Library of Medicine grant R01LM009886, and the National Center for Advancing Clinical and Translational Science grants UL1TR001873 and OT2TR003434. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Statement on conflicts of interest

None declared.

References

1.Rimel BJ. Clinical trial accrual: Obstacles and opportunities. Front Oncol. 2016;6:103. Published 2016 Apr 25. doi: 10.3389/fonc.2016.00103 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Treweek S, Pitkethly M, Cook J, et al. Strategies to improve recruitment to randomised trials. Cochrane Database Syst Rev. 2018;2(2):MR000013. Published 2018 Feb 22. doi: 10.1002/14651858.MR000013.pub6 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Thadani SR, Weng C, Bigger JT, Ennever JF, Wajngurt D. Electronic screening improves efficiency in clinical trial recruitment. J Am Med Inform Assoc. 2009;16(6):869–873. doi: 10.1197/jamia.M3119 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Penberthy LT, Dahman BA, Petkov VI, DeShazo JP. Effort required in eligibility screening for clinical trials. J Oncol Pract. 2012;8(6):365–370. doi: 10.1200/JOP.2012.000646 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Jain NM, Culley A, Knoop T, Micheel C, Osterman T, Levy M. Conceptual Framework to Support Clinical Trial Optimization and End-to-End Enrollment Workflow. JCO Clin Cancer Inform. 2019;3:1–10. doi: 10.1200/CCI.19.00033 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Shivade C, Hebert C, Lopetegui M, de Marneffe MC, Fosler-Lussier E, Lai AM. Textual inference for eligibility criteria resolution in clinical trials. J Biomed Inform. 2015;58 Suppl(Suppl):S211–S218. doi: 10.1016/j.jbi.2015.09.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Scott E, McComb B, Trachtman H, et al. Knowledge and use of recruitment support tools among study coordinators at an academic medical center: The Novel Approaches to Recruitment Planning Study. Contemp Clin Trials Commun. 2019;15:100424. Published 2019 Jul 22. doi: 10.1016/j.conctc.2019.100424 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Lai YS, Afseth JD. A review of the impact of utilising electronic medical records for clinical research recruitment. Clin Trials. 2019;16(2):194–203. doi: 10.1177/1740774519829709 [DOI] [PubMed] [Google Scholar]
9.Yuan C, Ryan PB, Ta C, et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. J Am Med Inform Assoc. 2019;26(4):294–305. doi: 10.1093/jamia/ocy178 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Fang Y, Idnay B, Sun Y, et al. Combining human and machine intelligence for clinical trial eligibility querying. J Am Med Inform Assoc. 2022;29(7):1161–1171. doi: 10.1093/jamia/ocac051 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Idnay B, Dreisbach C, Weng C, Schnall R. A systematic review on natural language processing systems for eligibility prescreening in clinical research. J Am Med Inform Assoc. 2021;29(1):197–206. doi: 10.1093/jamia/ocab228 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Ni Y, Bermudez M, Kennebeck S, Liddy-Hicks S, Dexheimer J. A Real-Time Automated Patient Screening System for Clinical Trials Eligibility in an Emergency Department: Design and Evaluation. JMIR Med Inform. 2019;7(3):e14185. Published 2019 Jul 24. doi: 10.2196/14185 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.O'Brien MA, Rogers WA, Fisk AD. Developing a Framework for Intuitive Human-Computer Interaction. Proc Hum Factors Ergon Soc Annu Meet. 2008;52(20):1645–1649. doi: 10.1177/154193120805202001 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Salahshour Rad M, Nilashi M, Mohamed Dahlan H. Information technology adoption: a review of the literature and classification. Univ Access Inf Soc. 2018;17: 361–390. doi: 10.1007/s10209-017-0534-z [DOI] [Google Scholar]
15.Metzker E Adoption-centric usability engineering: systematic deployment, evaluation and improvement of usability engineering methods in the software engineering lifecycle, Universitat Ulm; 2005. [Google Scholar]
16.Wang J, Deng H, Liu B, et al. Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed. J Med Internet Res. 2020;22(1):e16816. Published 2020 Jan 23. doi: 10.2196/16816 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.England I, Stewart D, Walker S. Information technology adoption in health care: when organisations and technology collide. Aust Health Rev. 2000;23(3):176–185. doi: 10.1071/ah000176 [DOI] [PubMed] [Google Scholar]
18.Pressler TR, Yen PY, Ding J, Liu J, Embi PJ, Payne PR. Computational challenges and human factors influencing the design and use of clinical research participant eligibility pre-screening tools. BMC Med Inform Decis Mak. 2012;12:47. Published 2012 May 30. doi: 10.1186/1472-6947-12-47 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Lewis JR. Psychometric Evaluation of the Post-Study System Usability Questionnaire: The PSSUQ. Proc Hum Factors Soc Annu Meet. 1992;36(16):1259–1260. doi: 10.1177/154193129203601617 [DOI] [Google Scholar]
20.Alwashmi MF, Hawboldt J, Davis E, Fetters MD. The Iterative Convergent Design for Mobile Health Usability Testing: Mixed Methods Approach. JMIR Mhealth Uhealth. 2019;7(4):e11656. Published 2019 Apr 26. doi: 10.2196/11656 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Hinderer D, Nielsen J. 234 Tips and Tricks for Recruiting Users as Participants in Usability Studies. Nielsen Norman Group; 2003. Accessed June 15, 2021. https://media.nngroup.Com/media/reports/free/How_To_Recruit_Participants_for_Usability_Studies.pdf [Google Scholar]
22.Centers for Medicare & Medicaid Services. CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF). LTS Computing LLC; 20202. Updated December 1, 2021. Accessed December 15, 2021. http://www.ltscomputingllc.com/downloads/ [Google Scholar]
23.Polson P, Lewis C, Reiman J. Cognitive walkthroughs: a method for theory-based evaluation of user interfaces. Int J Man-Mach. 1992;36:741–773. doi: 10.1016/0020-7373(92)90039-N [DOI] [Google Scholar]
24.Khajouei R, Zahiri Esfahani M, Jahani Y. Comparison of heuristic and cognitive walkthrough usability evaluation methods for evaluating health information systems. J Am Med Inform Assoc. 2017;24(e1):e55–e60. doi: 10.1093/jamia/ocw100 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Nielsen J Usability Testing. In: Nielsen J, ed. Usability Engineering. San Diego: Morgan Kaufmann; 1993:165–206. [Google Scholar]
26.Nielsen J Usability Assessment Methods beyond Testing. In: Nielsen J, ed. Usability Engineering. San Diego: Morgan Kaufmann; 1993:207–226. [Google Scholar]
27.Sauro J, Lewis JR. Standardized usability questionnaires. In: Sauro J, Lewis JR, eds. Quantifying the User Experience (Second Edition). Boston: Morgan Kaufmann; 2016:185–248. [Google Scholar]
28.Nielsen J Severity Ratings for Usability Problems. Nielsen Norman Group; 1994. Accessed July 29, 2020. https://www.nngroup.com/articles/how-to-rate-the-severity-of-usability-problems [Google Scholar]
29.Lin TC. A computer literacy scale for newly enrolled nursing college students: development and validation. J Nurs Res. 2011;19(4):305–317. doi: 10.1097/JNR.0b013e318236d03f [DOI] [PubMed] [Google Scholar]
30.Jaspers MW. A comparison of usability methods for testing interactive health technologies: methodological aspects and empirical evidence. Int J Med Inform. 2009;78(5):340–353. doi: 10.1016/j.ijmedinf.2008.10.002 [DOI] [PubMed] [Google Scholar]
31.Hsieh HF, Shannon SE. Three approaches to qualitative content analysis. Qual Health Res. 2005;15(9):1277–1288. doi: 10.1177/1049732305276687 [DOI] [PubMed] [Google Scholar]
32.Dedoose Version 9.0.17, web application for managing, analyzing, and presenting qualitative and mixed method research data [computer program]. Los Angeles, CA: 2021. [Google Scholar]
33.Birt L, Scott S, Cavers D, Campbell C, Walter F. Member Checking: A Tool to Enhance Trustworthiness or Merely a Nod to Validation?. Qual Health Res. 2016;26(13): 1802–1811. doi: 10.1177/1049732316654870 [DOI] [PubMed] [Google Scholar]
34.Guba EG. Criteria for assessing the trustworthiness of naturalistic inquiries. ECTJ. 1981;29(2):75. doi: 10.1007/BF02766777 [DOI] [Google Scholar]
35.Lewis JR. Psychometric Evaluation of the PSSUQ Using Data from Five Years of Usability Studies. Int J Hum-Comput Interact. 2002;14(3-4):463–488. doi: 10.1080/10447318.2002.9669130 [DOI] [Google Scholar]
36.Nelson SD, Del Fiol G, Hanseler H, Crouch BI, Cummins MR. Software Prototyping: A Case Report of Refining User Requirements for a Health Information Exchange Dashboard. Appl Clin Inform. 2016;7(1):22–32. Published 2016 Jan 13. doi: 10.4338/ACI-2015-07-CR-0091 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Schnall R, Rojas M, Bakken S, et al. A user-centered model for designing consumer mobile health (mHealth) applications (apps). J Biomed Inform. 2016;60:243–251. doi: 10.1016/j.jbi.2016.02.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Kaufman DR, Sheehan B, Stetson P, et al. Natural Language Processing-Enabled and Conventional Data Capture Methods for Input to Electronic Health Records: A Comparative Usability Study. JMIR Med Inform. 2016;4(4):e35. Published 2016 Oct 28. doi: 10.2196/medinform.5544 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Fang Y, Kim JH, Idnay BR, et al. Participatory Design of a Clinical Trial Eligibility Criteria Simplification Method. Stud Health Technol Inform. 2021;281:984–988. doi: 10.3233/SHTI210325 [DOI] [PubMed] [Google Scholar]
40.Bannon LJ. From Human Factors to Human Actors: The Role of Psychology and Human–Computer Interaction Studies in System Design. In: Baecker RM, Grudin J, Buxton WAS, Greenberg S, eds. Readings in Human-Computer Interaction. Morgan Kaufmann; 1995:205–214. [Google Scholar]
41.Kennell T Jr., Dempsey DM, Cimino JJ. i3b3: Infobuttons for i2b2 as a Mechanism for Investigating the Information Needs of Clinical Researchers. AMIA Annu Symp Proc. 2016;2016:696–704. [PMC free article] [PubMed] [Google Scholar]
42.Dumas JS, Molich R, Jeffries R. Describing usability problems: Are we sending the right message? Interactions. 2004;11(4):24–29. doi: 10.1145/1005261.1005274 [DOI] [Google Scholar]
43.Dorner DG, Curtis A. A comparative review of common user interface products. Library Hi Tech. 2004; 22(2):182–197. doi: 10.1108/07378830410543502 [DOI] [Google Scholar]
44.Terry M, Mynatt ED. Recognizing creative needs in user interface design. Proc of the 4th Conf on Creativity & Cognition. 2002. doi: 10.1145/581710.581718 [DOI] [Google Scholar]
45.Yoon Y, Myers BA. Supporting selective undo in a code editor. 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 2015: 223–233. doi: 10.1109/ICSE.2015.43 [DOI] [Google Scholar]
46.Cummings J, Lee G, Ritter A, Sabbagh M, Zhong K. Alzheimer's disease drug development pipeline: 2020. Alzheimers Dement (N Y). 2020;6(1):e12050. Published 2020 Jul 16. doi: 10.1002/trc2.12050 [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Butler A, Wei W, Yuan C, Kang T, Si Y, Weng C. The Data Gap in the EHR for Clinical Research Eligibility Screening. AMIA Jt Summits Transl Sci Proc. 2018;2017:320–329. [PMC free article] [PubMed] [Google Scholar]
48.Obeid JS, Gerken K, Madathil KC, et al. Development of an electronic research permissions management system to enhance informed consents and capture research authorizations data. AMIA Jt Summits Transl Sci Proc. 2013;2013:189–193. [PMC free article] [PubMed] [Google Scholar]
49.Meystre SM, Heider PM, Kim Y, Aruch DB, Britten CD. Automatic trial eligibility surveillance based on unstructured clinical data. Int J Med Inform. 2019;129:13–19. doi: 10.1016/j.ijmedinf.2019.05.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Stubbs A, Filannino M, Soysal E, Henry S, Uzuner Ö. Cohort selection for clinical trials: n2c2 2018 shared task track 1. J Am Med Inform Assoc. 2019;26(11):1163–1171. doi: 10.1093/jamia/ocz163 [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Goodwin NC. Functionality and usability. Communications of the ACM. 1987;30(3):229–233. doi: 10.1145/214748.214758 [DOI] [Google Scholar]
52.Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008:128–144. [PubMed] [Google Scholar]
53.Liu H, Chi Y, Butler A, Sun Y, Weng C. A knowledge base of clinical trial eligibility criteria. J Biomed Inform. 2021;117:103771. doi: 10.1016/j.jbi.2021.103771 [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Li X, Liu H, Kury F, et al. A Comparison between Human and NLP-based Annotation of Clinical Trial Eligibility Criteria Text Using The OMOP Common Data Model. AMIA Jt Summits Transl Sci Proc. 2021;2021:394–403. [PMC free article] [PubMed] [Google Scholar]
55.Obeid JS, Beskow LM, Rape M, et al. A survey of practices for the use of electronic health records to support research recruitment. J Clin Transl Sci. 2017;1(4):246–252. doi: 10.1017/cts.2017.301 [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Tissot HC, Shah AD, Brealey D, et al. Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-Automated Simulation Based on the LeoPARDS Trial. IEEE J Biomed Health Inform. 2020;24(10):2950–2959. doi: 10.1109/JBHI.2020.2977925 [DOI] [PubMed] [Google Scholar]
57.Ji M, Genchev GZ, Huang H, Xu T, Lu H, Yu G. Evaluation Framework for Successful Artificial Intelligence-Enabled Clinical Decision Support Systems: Mixed Methods Study. J Med Internet Res. 2021;23(6):e25929. doi: 10.2196/25929 [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Bhutkar G, Konkani A, Katre D, Ray GG. A review: healthcare usability evaluation methods. Biomed Instrum Technol. 2013;Suppl:45–53. doi: 10.2345/0899-8205-47.s2.45 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1864348-supplement-1.docx^{(471.5KB, docx)}

PERMALINK

Clinical Research Staff Perceptions on a Natural Language Processing-driven Tool for Eligibility Prescreening: An Iterative Usability Assessment

Betina Idnay, PhD, RN

Yilu Fang, MA

Caitlin Dreisbach, PhD, RN

Karen Marder, MD, MPH

Chunhua Weng, PhD, FACMI

Rebecca Schnall, PhD, MPH, RN-BC, FAAN, FACMI

Abstract

Background:

Objective:

Methods:

Results:

Conclusions:

1. Introduction

2. Materials and methods

2.1. Setting and sample

2.2. Features of Criteria2Query

Fig. 1.

2.3. Iterative Usability Evaluation

2.3.1. Cognitive walkthrough with a think-aloud protocol

2.3.2. Post-Study System Usability Questionnaire (PSSUQ)

2.3.3. Independent usability severity rating

2.4. Data collection methods and procedures

Fig 2.

2.5. Data analysis and rigor

Fig. 3.

Table 1.

3. Results

3.1. Evaluator Characteristics

Table 2.

3.2. Usability evaluation

Table 3.

3.3. Qualitative Analysis

Fig. 4.

Fig. 5.

Table 4.

3.3.1. Theme 1: Seeking user goals

3.3.2. Theme 2: Performing well-learned tasks

3.3.3. Theme 3: Determining what to do next

4. Discussion

4.1. Importance of human-computer collaboration

4.2. Considerations for designing NLP-driven systems for eligibility prescreening

Table 5.

4.3. Strengths and limitations of the present study

4.4. Clinical and research implications

5. Conclusion

Supplementary Material

Highlights.

Summary Table.

What was already known in this topic:

What this study adds to our knowledge:

Funding

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases