Description
I’m currently using your R package medrxivr to search and download preprint metadata from medRxiv for an academic project. First of all, thank you for developing and maintaining this very helpful tool!
I’ve encountered a problem where a specific article—although publicly available on medRxiv—is not present in the dataset returned by mx_api_content(server = "medrxiv")
.
Example:
"Empowering Radiologists with ChatGPT-4o: Comparative Evaluation of Large Language Models and Radiologists in Cardiac Cases"
→ Posted: June 25, 2024
→ DOI: 10.1101/2024.06.25.24309247
When I run mx_api_content("medrxiv")
, I receive around 65,900 records, and max(preprint_data$date)
returns a date from 2025, so at first glance the data appears current. However, this specific paper (and a few others) are missing from the dataset, even though they were published almost a year ago.
Is it possible that the underlying API snapshot used by mx_api_content()
does not always include all articles—even if they were posted long ago? Could this be due to an issue with the API itself, or is there another explanation (e.g. versioning, renaming, etc.)?
I’d really appreciate any insight you could provide!