8000 Is there any possibility of optimizing the flair model (like INT8 quantization etc.,) ? · Issue #2317 · flairNLP/flair · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Is there any possibility of optimizing the flair model (like INT8 quantization etc.,) ? #2317
Closed
@abhipn

Description

@abhipn

I have been using Flair in our production environment for some time now, and I haven't faced any issues so far. But the issue here is not every organization uses GPU for inference, and having a CPU for inference will not be ideal when latency becomes important.

I was wondering is there a way I couldn't convert flair.pt to flair.onnx in the process to apply integer quantization, a small trade off of accuracy over performance is not actually a bad idea. I have gone through docs, but couldn't find any reference for optimizations or distillation etc.,

If someone managed to do it, really appreciate if you could share the details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    wontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0