This repository contains an open source implementation of the transfer learning model and corresponding dataset described in our paper. The model can be trained using the jupyter notebook code and data in this repository.
Metal-organic frameworks (MOFs) are a class of materials promising for gas adsorption due to their highly tunable nano-porous structures and host-guest interactions. While machine learning (ML) has been leveraged to aid the design or screen of MOFs for different purposes, the needs of big data are not always met, limiting the applicability of ML models trained against small data sets. In this work, we introduce an inductive transfer learning technique to improve the accuracy and applicability of ML models trained with small amount of MOF adsorption data. This technique leverages potentially shareable knowledge from a source task to improve the models on the target tasks. As demonstrations, a deep neural network (DNN) trained on H2 adsorption data with 13,506 MOF structures at 100 bar and 243 K is used as the source task. When transferring knowledge from the source task to H2 adsorption at 100 bar and 130 K (one target task), the average predictive accuracy on target tasks was improved from 0.960 (direct training) to 0.991 (transfer learning), and transfer learning works in 89.3% of the cases. We also tested transfer learning across different gas species (i.e. from H2 to CH4), with average predictive accuracy of CH4 adsorption being improved from 0.935 (direct training) to 0.980 (transfer learning), and transfer learning works in 82.0% of the cases. More importantly, transfer learning is shown to effectively improve the models on the target tasks with low accuracy from direct training. However, when transferring the knowledge from the source task to Xe/Kr adsorption, the transfer learning does not improve the predictive accuracy significantly and even makes it worse in ~50.0% of the cases, which is attributed to the lack of common descriptors that is key to the underlying knowledge.
The dataset used in our work contains 13,506 MOFs in 41 different topologies, which are generated by the top-down MOF construction package, ToBaCCo. Each MOF structure was energy minimized using the Forcite module in Materials Studio, by (1) optimizing while fixing the unit cell lattice parameters so that any bond strain can be released while the undue structural deformation can be minimized; (2) re-optimization while unfixing the unit cell lattice parameters. Textural descriptors for MOFs are then calculated. Surface area (including volumetric surface area (Vol. S.A.) and gravimetric surface area (Grav. S.A.)) of MOFs are calculated by rolling a nitrogen probe over the framework atoms; void fractions (Void fraction) of MOFs are calculated using Widom insertions of He; pore limiting diameter PLD) and largest cavity diameter (LCD) of MOFs are calculated using Zeo++. After that, the capabilities of those topologically diverse set of MOFs to adsorb H2, CH4, and Xe/Kr mixtures are calculated using grand canonical Monte Carlo , implemented using the RASPA code.
Tabular data.
If this repository is helpful for your research please cite the following publication:
Transfer Learning Study of Gas Adsorption in Metal-Organic Frameworks Ruimin Ma, Yamil J. Colón, Tengfei Luo
Code and data for academic purpose only.