Simone Alberto Peirone*, Gabriele Goletto*, Mirco Planamente, Andrea Bottino, Barbara Caputo, Giuseppe Averta
This is the official PyTorch implementation of our work "Egocentric zone-aware action recognition across environments".
Abstract:
Human activities exhibit a strong correlation between actions and the places where these are performed, such as washing something at a sink. More specifically, in daily living environments we may identify particular locations, hereinafter named activity-centric zones, which may afford a set of homogeneous actions. Their knowledge can serve as a prior to favor vision models to recognize human activities.
However, the appearance of these zones is scene-specific, limiting the transferability of this prior information to unfamiliar areas and domains. This problem is particularly relevant in egocentric vision, where the environment takes up most of the image, making it even more difficult to separate the action from the context. In this paper, we discuss the importance of decoupling the domain-specific appearance of activity-centric zones from their universal, domain-agnostic representations, and show how the latter can improve the cross-domain transferability of Egocentric Action Recognition (EAR) models. We validate our solution on the EPIC-Kitchens-100 and Argo1M datasets.
Clone this repository and create a Conda environment:
git clone --recursive https://github.com/sapeirone/EgoZAR
cd EgoZAR
conda create --name egozar
conda activate egozar
pip install -r requirements.txt
The official EK100 UDA annotations and pre-extracted features are provided here.
# Annotations
mkdir -p annotations
git clone https://github.com/epic-kitchens/epic-kitchens-100-annotations.git
mv epic-kitchens-100-annotations/UDA_annotations EgoZAR/annotations
rm -r epic-kitchens-100-annotations
# TBN features
mkdir -p data
wget -O ek100.zip https://www.dropbox.com/scl/fo/us8zy3r2rufqriig0pbii/ABeUdV83UNmJ5US-oCxAPno?rlkey=yzbuczl198z067pnotx1zxvuo&e=1&dl=0
unzip ek100.zip -d data/
rm ek100.zip
Optional: for easy prototyping you can download the pre-extracted CLIP ViT-L/14 features for the source train and target validation splits.
Download the EPIC-Kitchens RGB frames under the EPIC-KITCHENS
directory, following the official instructions.
The expected data stucture for EPIC-KITCHENS videos is:
│
├── EPIC-KITCHENS/
│ ├── <p_id>/
│ │ ├── rgb_frames/
│ │ │ └── <video_id>/
│ │ │ ├── frame_0000000000.jpg
│ │ │ ├── frame_0000000001.jpg
│ │ │ └── ...
│ │
│ └── ...
│
└── ...
Extract the CLIP features using the save_CLIP_features.py
script for the desired CLIP variant.
mkdir -p clip_features
python save_CLIP_features.py --clip-model=ViT-L/14
This command should generate the files clip_features/ViT-L_14_source_train.pth
and clip_features/ViT-L_14_target_val.pth
for source train and target validation respectively.
Train the Source Only multimodal baseline with the following command:
python train.py --modality=RGB --modality=Flow --modality=Audio
python train.py --modality=RGB --modality=Flow --modality=Audio --ca \
--use-input-features=N --use-egozar-motion-features=Y --use-egozar-acz-features=Y \
--disent-loss-weight=1.0 \
--disent-n-clusters=4
This study was supported in part by the CINI Consortium through the VIDESEC project and carried out within the FAIR - Future Artificial Intelligence Research and received funding from the European Union Next-GenerationEU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR) – MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.3 – D.D. 1555 11/10/2022, PE00000013). This manuscript reflects only the authors’ views and opinions, neither the European Union nor the European Commission can be considered responsible for them. G. Goletto is supported by PON “Ricerca e Innovazione” 2014-2020 – DM 1061/2021 funds.
If you use EgoZAR in your research or applications, please cite our paper:
@article{peirone2024egocentric,
title={Egocentric zone-aware action recognition across environments},
author={Peirone, Simone Alberto and Goletto, Gabriele and Planamente, Mirco and Bottino, Andrea and Caputo, Barbara and Averta, Giuseppe},
journal={Pattern Recognition Letters},
year={2024},
publisher={Elsevier}
}
This project is licensed under the MIT License - see the LICENSE file for details.