Paper: SCIMAT: Dataset of Problems in Science and Mathematics, arXiv: 2109.15005, 2021, Paper
-You may use this dataset with Char2Char or Word2Word encoding in Transformer or in LSTM as shown in the paper above.
-You may also generate different variations of the questions using the generator files.
- This is an open source project, so feeel free to help us grow this dataset!
- For any dataset, the generator files can be used to generate the required number of samples.
- Some existing range of values for quantities in generator file may not make sense, feel free to adapt those.
- some part of Mathematics dataset is adapted from deepmind dataset, for more Mathematics data see: https://github.com/deepmind/mathematics_dataset
- Some part of the codes for mathematics datasets were written by Pratik Mandlecha (Microsoft, Hyderabad)
Neeraj Kollepara, Snehith Kumar Chatakonda and Pawan Kumar, SCIMAT: Science and Mathematics Dataset, arXiv: 2109.15005, 2021.
@misc{scimat,
title={SCIMAT: Science and Mathematics Dataset},
author={Neeraj Kollepara and Snehith Kumar Chatakonda and Pawan Kumar},
year={2021},
eprint={2109.15005},
archivePrefix={arXiv},
primaryClass={math.NA}
}