Workshop on High-dimensional Data Analysis
(27 – 29 Feb 2008)
... Jointly organized with Department of Statistics & Applied Probability
~ Abstracts ~
By slicing the region of the response (Li, 1991) and applying local kernel regression (MAVE, Xia, et al, 2002) to each slice, a new dimension reduction method is proposed.
Compared with the traditional inverse regression methods,
e.g. sliced inverse regression (Li, 1991), the new method is
free of the linearity condition (Li, 1991) and enjoys much
improved estimation accuracy. Compared with the direct
estimation methods (e.g., MAVE), the new method is much more
robust against extreme values and can capture the entire
central subspace (Cook, 1998) exhaustively. To determine the
CS dimension, a consistent cross-validation (CV) criterion
is developed. Extensive numerical studies including one real
example confirm our theoretical findings.
Slicing estimation is one of the most popularly used
methods in the sufficient dimension reduction area. However,
the efficacy of the slicing estimation for many inverse
regression methods depends heavily on the choice of slice
number when response variable is continuous. It is similar
to, but more difficult than classical tuning parameter
selection in nonparametric function estimation. Thus, how to
select the slice number is a longstanding, and still open
problem. In this paper, we propose a binary response
transformation-expectation (BRTE) method. It completely
avoids selecting the number of slices, and meanwhile
preserves the integrity of the original central subspace.
This generic method also ensures the root $n$ consistency
and the asymptotic normality of slicing estimators for many
inverse regression methods, and can be applied to
multivariate response cases. Finally, BRTE is compared with
the existing estimators by extensive simulations and an
illustrative real data example.
A central limit theorem (CLT) for linear spectral
statistics (LSS) of a product of a large dimensional sample
covariance matrix and a nonnegative definite Hermitian
matrix was established in Bai and Silverstein (2004).
However, their results don’t cover the case of a product of
one sample covariance matrix and the inverse of another
covariance matrix, independent of each other (F matrix).
This is because for F matrix, their CLT established the
asymptotic normality of the difference of two dependent
statistics defined by the empirical spectral distribution (ESD)
of F matrix and by the ESD of the inverse of the second
sample covariance matrix. But in fact, in many applications
of F matrix, one is often interested in making statistical
inference for the parameter defined by the limiting spectral
distribution (LSD) of F matrix. Then one is interested in
the asymptotic distribution of the difference of the
parameter and the estimator defined by LSS of F matrix. In
this paper, we shall establish the CLT for LSS of F matrix.
As a consequence, we shall also establish the CLT for LSS of
This study considers a functional clustering method, k-centers functional clustering, for random curves. The k-centers functional clustering approach accounts for both the mean and the modes of variation differentials among clusters, and predicts cluster memberships via projection and reclassification. The distance measures considered include the L2 distance and the functional correlation defined in this study, which are embedded in the clustering criteria. The cluster membership predictions are based on nonparametric random effect models of the truncated Karhunen-Loeve expansion, coupled with a nonparametric iterative mean and covariance updating scheme. The properties of the proposed clustering methods unravel the cluster qualities. Simulation studies and practical examples illustrate the practical performance of the proposed methods.
Dimension reduction has long been an important technique
for high-dimensional data analysis. The principal component
analysis (PCA), canonical correlation analysis (CCA), and
sliced inverse regression (SIR) are some important tools in
classical statistical analysis for linear dimension
reduction. In this talk we will introduce their nonlinear
extension using kernel methods.
The penalized least squares method with some appropriately defined penalty is widely used for simultaneous variable selection and coefficient estimation in linear regression.
However, the least squares (LS) based methods may be adversely affected by outlying observations and heavy tailed distributions.
Machine learning is often attempted through clustering
and/or classification of multidimensional input data. While
classification and clustering are used in supervised and
unsupervised learning, respectively, there are clustering
problems in the case of partially supervised learning, where
the classed represented in the training data are far from
being exhaustive. In all these cases, the problem of high
dimensionality has to be addressed. We consider dimension
reduction for clustering on the basis of a mixture model,
where observations are normally distributed around a cluster
center, and cluster centers also have a multivariate normal
distribution. We propose an intuitively appealing objective
function for this problem, and work out a solution in the
cases of unsupervised and partially supervised clustering.
We shall consider (square) matrices with random entries
(real or complex). For example, Sample variance covariance
matrix, IID matrix, Wigner matrix, Toeplitz matrix etc.
where the dimension is growing to infinity. Properties of
eigenvalues of such matrices are of interest.
We discuss the inference of two processes in the contexts
of functional data analysis, including canonical
correlations and regression. The common approach defines
canonical variables or regressors in terms of projections in
a Hilbert space. While this is conceptionally
straightforward, it has a number of weaknesses. We describe
an approach that does not require the specification of a
Hilbert space, which leads to theories and more general
Functional Magnetic Resonance Imaging(fMRI) has been used by neuroscientists as a powerful tool to study
Model selection, dimension
reduction and liquid association: a trilogy via Stein’s
In this talk, I will describe how a basic idea from Stein’s
monumental work in decision theory has led to my earlier
research in model selection (generalized cross validation,
honest confidence region), dimension reduction (sliced
inverse regression and principal Hessian direction) and more
recently in the development of liquid association for
This talk introduces Functional Mixture Regression (FMR), a
natural and useful extension of the classical functional
linear regression (FLR) model. FMR generalizes FLR
essentially in the same way as linear mixture regression
generalizes linear regression. That is, the observed
predictor random processes are allowed to form sub-groups in
such a way that each sub-group will have its own regression
parameter function. In this talk both theoretical and
empirical properties on FMR will be discussed.