8000 GitHub - lorenzolucchese/multisig: Signature-based multi-modal (image, video and audio) classifier.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Signature-based multi-modal (image, video and audio) classifier.

License

Notifications You must be signed in to change notification settings

lorenzolucchese/multisig

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MultiSig

A classification architecture for multi-modal data. Each data modality is tokenized via signature methods. A decoder then performs two-task classification: label and data type. The use of a shared encoder proves especially useful for low-data environments with unbalanced data modalities. Currently supports image (.jpg), video (.mp4) and audio (.wav) data types. The signature tokenizations are extensions of the ideas discussed in ImageSig (https://arxiv.org/abs/2205.06929).

Alt text

The architecture was tested on a (quite unbalanced) dataset with the following structure:

data
├── training_set
│   ├── bird (1000 .jpg / 15 .mp4 / 8 .wav)
│   ├── cat  (5000 .jpg / 65 .mp4 / 5 .wav)
│   └── dog  (5000 .jpg /  2 .mp4 / 4 .wav)
└── test_set
    ├── bird (1000 .jpg /  0 .mp4 / 3 .wav)
    ├── cat  (1000 .jpg /  0 .mp4 / 3 .wav)
    └── dog  (1000 .jpg /  0 .mp4 / 0 .wav)

This work was produced as part of a 2 week industry mini-project in collaboration with DataSig and supervised by Dr Mohamed Ibrahim. Presentation.

About

Signature-based multi-modal (image, video and audio) classifier.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0