This repository contains a collection of utility scripts for preparing and analyzing training data for machine learning interatomic potentials:
Reads vasprun.xml or OUTCAR files, filters data based on specific criteria, and converts the data into POSCAR files, which are then organized into subfolders.
Splits the train.xyz file into groups, converts each group into POSCAR files, and organizes them into subfolders.
Converts data from a specified xyz file into POSCAR files, performs screening and splitting, and saves the data in subfolders. It supports both VASP and ABACUS calculation software.
Finds the points with the largest errors in force_train.out, virial_train.out, or energy_train.out files, locates the IDs of the training set to which these points belong, and outputs them to a file. Optionally, it can output the remaining structures to another file.
Selects specific lines from the dump.xyz file according to a predefined rule and writes them to the train.xyz file.
Creates phonon comparison plots, including visualizations of phonon frequencies and group velocities. It reads a unit cell file, generates input files, and plots phonon diagrams.
Plots visualizations of NEP (Neural Evolution Potential) training or prediction results, including loss curves, diagonal plots, charge distribution plots, and descriptor projection plots.
Splits the dataset in NEP-dataset.xyz into training and test sets. Users can choose between random splitting or specifying a structure range.
Converts data from directories with OUTCAR files into the NEP-dataset.xyz file, an extended coordinate file format for machine learning potential training.