Home » Publication » 26501

Dettaglio pubblicazione

2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Pages 3-20 (volume: 13589)

Unsupervised Pose Estimation by Means of an Innovative Vision Transformer (04b Atto di convegno in volume)

Brandizzi N., Fanti A., Gallotta R., Russo S., Iocchi L., Nardi D., Napoli C.

Attention-only Transformers [34] have been applied to solve Natural Language Processing (NLP) tasks and Computer Vision (CV) tasks. One particular Transformer architecture developed for CV is the Vision Transformer (ViT) [15]. ViT models have been used to solve numerous tasks in the CV area. One interesting task is the pose estimation of a human subject. We present our modified ViT model, Un-TraPEs (UNsupervised TRAnsformer for Pose Estimation), that can reconstruct a subject’s pose from its monocular image and estimated depth. We compare the results obtained with such a model against a ResNet [17] trained from scratch and a ViT finetuned to the task and show promising results.
ISBN: 978-3-031-23479-8; 978-3-031-23480-4
© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma