BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Date iCal//NONSGML kigkonsult.se iCalcreator 2.20.2//
METHOD:PUBLISH
X-WR-CALNAME;VALUE=TEXT:Eventi DIAG
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:STANDARD
DTSTART:20191027T030000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20200329T020000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:calendar.19542.field_data.0@diag.uniroma1.it
DTSTAMP:20240421T210328Z
CREATED:20200227T165838Z
DESCRIPTION:1. Robust Statistics for Data Reduction (March 9th - 10th 2020\
, 09:00-13:00\, DIAG\, Aula B203)Prof. Alessio Farcomeni (Tor Vergata)2. D
imensionality Reduction in Clustering and Streaming (March 16th - 17th 202
0\, 09:00-13:00\, DIAG\, Aula B203)Prof. Chris Schwiegelshohn (Sapienza)A
bstracts1. Robust Statistics for Data Reduction We will briefly introduce
the main principles and ideas in robust statistics\, focusing on trimming
methods. The working example will be that of estimation of location and sc
atter in multidimensional problems\, together with outlier identification.
We will then discuss some methods for robust clustering based on impartia
l trimming and snipping. A simple robust method for dimensionality reduct
ion will be finally discussed. Illustrations will be based on the R softwa
re and some contributed extension packages. Tentative schedule: Introduct
ion to robust inference. Concepts of: masking\, swamping\, breakdown point
\, Tukey-Huber contamination\, entry-wise contamination. Estimation of loc
ation and scatter based on the Minimum Covariance Determinant. The fastMCD
algorithm. Outlier identification. Robust clustering: trimmed $k$-means\,
snipped $k$-means. The tclust and sclust algorithms. Selecting the trimmi
ng level and number of clusters though the classification trimmed likeliho
od curves. Plug-in methods for dimension reduction. Brief overview of most
recent contributions and venues for further work. The course will be bas
ed on the book: Farcomeni\, A. and Greco\, L. (2015) Robust Methods for Da
ta Reduction\, Chapman & Hall/CRC Press 2. Dimensionality Reduction in Clu
stering and StreamingFirst Day:The curse of dimensionality is a common occ
urrence when working with large data sets. In few dimensions (such as the
Euclidean plane)\, we visualize problems very well and can often find inte
resting properties of a data set just by hand. In more than three dimensio
ns\, our ability to visualize a problem is already severely impacted and o
ur intuition from the Euclidean plane may lead us completely astray. Moreo
ver\, algorithms often scale poorly:Finding nearest neighbors in 2d can be
done in nearly linear time. In high dimensions\, it becomes very difficul
t to improve over either n^2.Geometric data structures and decompositions
become hard to implement. Line sweeps\, Voronoi diagrams\, grids\, nets us
ually scale by at least a factor 2^d\, where d is the dimension. In some c
ases\, it may be even worse.Many problems that are easy to solve in 2D\, s
uch as clustering\, become computationally intractable in high dimensions.
Often\, exact solutions require running times that are exponential in the
number of dimensions.Unfortunately\, high dimensional data sets are not t
he exception\, but rather the norm in modern data analysis. As such\, much
of computational data analysis has been devoted with finding ways to redu
ce the dimension. In this course\, we will study two popular methods\, nam
ely principal component analysis (PCA) and random projections. Principal c
omponent analysis originated in statistics\, but is also known under vario
us other names\, depending on the fields (e.g. eigenvector problem\, low r
ank approximation\, etc). We will illustrate the method\, highlighting the
problem that is solved and the underlying assumptions of PCA. Next\, we w
ill see a powerful tool for dimension reduction known as the Johnson-Linde
nstrauss lemma. The Johnson-Lindenstrauss lemma states that given a point
set A in an arbitrary high dimension\, we can transform A into a point set
A' in dimension log |A|\, while preserving all pairwise distances. For bo
th of these problems\, we will see applications\, including k-nearest neig
hbor classification and k-means. Second day:Large data sets form a sister
topic to dimension reduction. While the benefits of having a small dimensi
on are immediately understood\, reducing the size of the data is a compara
tively recent paradigm. There are many reasons for data compression. Aside
from data storage and retrieval\, we want to minimize the amount of commu
nication in distributed computing\, enable online and streaming algorithms
\, or simply run an accurate (but expensive) algorithm on a smaller datase
t. A key concept in large-scale data analysis are coresets. We view corese
ts as a succinct summary of a data set that behaves\, for any candidate so
lution\, like the original data set. The surprising success story of data
compression is that for many problems\, we can construct coresets of size
independent of the input. For example\, linear regression in d dimensions
admits coresets of size O(d)\, k-means has coresets of size O(k)\, irrespe
ctive of the number of data points of the original data set. In our course
\, we will describe the coreset paradigm formally. Moreover\, we will give
an overview of methods to construct coresets for various problems. Exampl
es include constructing coresets from random projections\, by analyzing gr
adients\, or via sampling. We will further highlight a number of applicati
ons.
DTSTART;TZID=Europe/Paris:20200309T090000
DTEND;TZID=Europe/Paris:20200317T130000
LAST-MODIFIED:20200227T170118Z
LOCATION:DIAG - Aula B203
SUMMARY:Computational and Statistical Methods of Data Reduction - Data Scie
nce PhD Course - Prof. Alessio Farcomeni (Univ. of Tor Vergata) - Dr. Chri
s Schwiegelshohn (DIAG - Sapienza)\n\n\n \n \n\n \n\n\nChris\n\n\nSch
wiegelshohn \n\n \n\n \n\n\n\n\n\nOspite\n\nMember of: \n\n \n\n \n
\n \n\nqualifica_rr: \n\nAssistant professors (ricercatori)
URL;TYPE=URI:http://diag.uniroma1.it/node/19542
END:VEVENT
END:VCALENDAR