Titile: Data quality navigated machine learning strategy with chemical intuition to improve generalization
We provided the source codes of data processing techniques including the data diversity evaluation, the data split technique, and the data filtering method. In order to reproduce the results and make our techniques easier for more researchers to use, we present them in an easy-to-run script form of.ipynb
---'data split.ipynb' is the code for data split. ---'data filtering.ipynb' is the code for data filtering. ---'data diversity.ipynb' is the code for evaluations of data diversity.
'DFE-DL' include the code for training and testing our ensemble model.
Also, we provided our high-quality RE dataset for other scientists to do more research.
Please modify the paths in code files according to your own directory and then you can run them to process data and construct prediction models.
Other details are described in our paper.