Computer Vision, Computer Graphics, and Perception

The Computer Vision, Computer Graphics and Perception group is a multidisci- plinary team of researchers that investigates several knowledge areas and apply them to scientific problems in many contexts. The team works on several topics related to Computer Vision and Graphics:

Visual Search — Visual search of relevant targets in the environment is a crucial robot skill. Our research group invetigates this topic by proposing a number of frame- works for the execution monitor of a agent task (described in the next section), taking care of the agent attitude to visually searching the environment for targets involved in the task. Visual search is also relevant in the field of artificial Intel- ligence for robotics and find one of its best application in the task of recovering from a failure. Our works exploit deep reinforcement learning to acquire a com- mon sense scene structure and it takes advantage of a deep convolutional network to detect objects and relevant relations holding between them.

Visual Execution Monitoring — The execution and monitoring of high-level robot ac- tions in a real environment can be concretely enhanced addressing the problem with an hybrid deterministic/nondeterministic state machine streaming perceptual information, strengthened by visual search and recognition. Our research line fo- cuses on the great results of deep learning, which allow to strongly rely on visual perception, for both monitoring the state of the world in terms of preconditions and postconditions that hold before and after the execution of an action and using a search policy to either guide where to look at or to refocus in case of a failure.

Action and Activity Recognition, Anticipation and Forecasting — Different works in literature afford the problem of Actions and Activities Recognition, Anticipation and Prediction in videos. The complexity of the problem requires the consideration of many aspect. First of all, the recognized action sequence has to be consistent with the final task of the whole activity. Furthermore, much attention needs to be given to the prediction of the correct action in those instances where specific se- quences are under represent in the dataset not because of the likelihood of them to happen. Finally, several implementation problems, caused by the large dimension of the data used, need to be addressed. Our researched work focused on tackling those problems producing a novel network, the Anticipation and Forcasting Net- work (AFN).

Memory and next step prediction in Long Short Time Memory Networks — Follow- ing the line of work presented in the above section we particularly placed much attention to the behavior of LSTMs in keeping past information through the various iterations. In the context of action forecasting this is a crucial step to address since the forecasting step is possible only if the relevant information are kept in memory. We also focused our attention on understanding the relation between the features of past sequences and future steps both mathematically and in the practically in the available datasets.

Scene Representation and Interpretation — In order to deal with real environment and complex tasks and problems, there is the necessity of having an optimized scene representation to deal with. This kind of representation needs to be at the same time parsimonious and full of information. Therefore, our research group investigates possible representations as Mental Maps, which exploits the semantic, geometrical and ... information kept by a semantic segmentation that includes only the elements that could be useful to the agent to achieve its task.

Object Detection and Instance Segmentation — Object detection is the task of detect- ing instances of certain object classes (such as humans, buildings or cars) in digital images and videos. Well-researched sub-tasks include face detection and pedes- trian detection. Instance segmentation is the task of grouping parts of the image that belongs to the same entity or class. In the field of research that combines Ob- ject Detection and Instance Segmentation, a new approach is proposed: from the classical machine learning algorithms, the research community moved to a neu- ral network approach via the use of several new architecture. Inspired from, first, Faster-RCNN network developed by Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian (2015) and, then, Mask-RCNN developed by He, Kaiming and Gkioxari, Georgia and Dolla´ r, Piotr and Girshick and Ross (2017), our research fo- cused on developing new architectures by improving performances, computation time, capacity and multi-tasking properties.

Scene and Context Understanding — The problem of enabling an agent to perceive and understand the surrounding environment is not limited only to a correct represen- tation via a semantic segmentation. A set of objects and a number of structural or contextual scene details can define a context. This information is crucial to infer some information but, even more, to disambiguate the increasing uncertanty that each prediction introduces in the prediction system. Therefore, our research group investigates the algorithms, both of classical machine learning and deep learning, to extract contexts from the analysed data and allow big frameworks to operate correctly with an enriched knowledge of the world.

Augmented Reality — Within the context of our research activities, Augmented Reality is becoming a compelling technology mainly for the interactive 3D visualization. First, it was used in the context of archaeological sites on hand-held devices and for building of complex planning scenarios for robots, eliminating the need to model the dynamics of both the robot and the real environment as it would be required by whole simulation environments. Then, relevant applications in this field are related to the augmentation of real environments with additional elements. Our research on these topics is mainly focused on the use of generative models and, in particular, Generative Adversarial Models.

Dense Image Fusion, Meshing, 3D Surface Reconstruction — In the field of Object Re- construction, a new approach is proposed for 3D modeling of articulated objects, specifically animals, using both components and component aspects. A component of an articulated object is defined here to be that part of it, which is only partially deformable. An aspect is defined as a view of the component from a specific van- tage point. Aspects are fixed for an object component. Each aspect is modeled from a single image, using an inflation algorithm and the deformation paradigm. Then aspects are blended and merged together to form the whole component.

Gesture Recognition from 3D data — The problem of Human Primitives Recognition is investigated, in our research work, within Motion Capture sequences. In this con- text, we investigated methods based on Gaussian Process Latent Variable Models and Alignment Kernels. We propose a new discriminative latent variable model with back-constraints induced by the similarity of the original sequences. We com- pare the proposed method with methods based on Dynamic Time Warping and with V-GPDS models, which are able to model highly dimensional dynamical sys- tems. Another line of work is to recognize human actions, starting from a 3D input data sequence, independently from the camera point of view and from the physical aspect of the person under examination. To face this problem, Kernelized Temporal Cut is used for segmenting the sequence and finding cut points among different actions. Then, a spatio-temporal manifold model is used for representing the time series data and a spatio-temporal alignment algorithm is introduced in order to find matches between action segments.

Terrain Traversability in Rescue Environments — 3D Terrain understanding and struc- ture estimation is a crucial issue for robots navigating rescue scenarios. Unfortunately, large scale 3D point clouds provide no information about what is ground, and what is top, what can be surmounted and what can be not, what can be crossed, and what is too deep to be traversed. In this context, this research work mainly concentrated in providing methods for point cloud structuring which can lead to a definition of traversability cost maps.



Linee di ricerca

Action and Activity Recognition
Activity Understanding from 3D data
Anticipation and Forecasting
Augmented Reality
Gesture Recognition
Human Motion Analysis
Memory and next step prediction in Long Short Time Memory (LSTM) Networks
Physics based methods
Scene Representation
Visual Search and Execution Monitoring

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma