Description
To be removed, once it is done: Please add the appropriate label to this ticket, e.g. feature or enhancement.
Is your feature/enhancement request related to a problem? Please describe.
ONNX support is a frequently requested feature, some issues mention it (#2625, #2451, #2317, #1936, #1423, #999)
so I think there is a big desire for the community to support it.
I suppose the usual ONNX compatibility would also make the models compatible to torch.jit (#2528) or AWS Neutron (#2443)
ONNX provides large enhancements in terms of production readiness, it creates a static computational graph which can be quantized and optimized towards specific hardware, see https://onnxruntime.ai/docs/performance/tune-performance.html (it claims to be 17x faster)
Describe the solution you'd like
I'd suggest iterative progression as multiple architecture changes are required:
- split the
forward
/forward_pass
methods, such that all models have a method_prepare_tensors
which converts all DataPoints to tensors and aforward
which takes in tensors and outputs tensors (e.g. for the SeqeuenceTagger we the forward has the signaturedef forward(self, sentence_tensor: torch.Tensor, lengths: torch.LongTensor)
and returns a single tensorscores
)
this change allows conversion to ONNX models, however the logic (like decoding crf scores, filling up sentence results, extracting tensors) won't be implemented. Also embeddings won't be part of the ONNX model. - create the same
forward
/_prepare_tensors
architecture for embeddings, such that those could be converted too.
This would allow converting embeddings to ONNX models, but again without logic. - change the architecture, that both embeddings and models have the logic part (creating inputs, adding outputs to data points) and the pytorch part be split, such that the pytorch model part can be replaced by a converted ONNX model.
- create an end-to-end model wrapper, that both embeddings & the model can be converted to a single ONNX model and used as such.
Notice that this would be 4 different PRs and probably all of them would be very large and should be tested a lot before moving to the next PR,
I would offer to do the first one and then see how much effort this is/how much time I have for this.