Description
Task make a function called generate_crosswalk_table(all_questions, similarity, threshold)
which takes the output of match_instruments
and gives the pairs that match above a threshold.
Description
The web UI allows users to see the matching item pairs above a given threshold
Can we make the Python library also return the matching pairs above a threshold? This is called the crosswalk table
A crosswalk table is the same information as is currently coming back in the similarity matrix but just in a different format
It is a long-format data frame that shows each matching pair of questions above a certain threshold, along with their respective IDs, question texts, and match scores. Here's an example structure:
# Example structure of crosswalk table DataFrame:
# tibble [n × 6]
# $ pair_name : chr # Name of the survey pair
# $ question1_no : chr # ID of question from first survey
# $ question1_text : chr # Text of question from first survey
# $ question2_no : chr # ID of question from second survey
# $ question2_text : chr # Text of question from second survey
# $ match_score : num # Similarity score between the questions
See also equivalent issue in R: harmonydata/harmony_r#4