
Workshop on Highdimensional Data Analysis
(27 – 29 Feb 2008)
... Jointly organized with Department of Statistics & Applied Probability
~ Abstracts ~
Sliced regression for dimension
reduction
Hansheng Wang, Peking University, China
By slicing the region of the response (Li, 1991) and
applying local kernel regression (MAVE, Xia, et al, 2002) to
each slice, a new dimension reduction method is proposed.
Compared with the traditional inverse regression methods,
e.g. sliced inverse regression (Li, 1991), the new method is
free of the linearity condition (Li, 1991) and enjoys much
improved estimation accuracy. Compared with the direct
estimation methods (e.g., MAVE), the new method is much more
robust against extreme values and can capture the entire
central subspace (Cook, 1998) exhaustively. To determine the
CS dimension, a consistent crossvalidation (CV) criterion
is developed. Extensive numerical studies including one real
example confirm our theoretical findings.
« Back...
A binary response
transformationexpectation estimation in dimension reduction
Lixing Zhu, The Hong Kong Baptist University, Hong Kong
Slicing estimation is one of the most popularly used
methods in the sufficient dimension reduction area. However,
the efficacy of the slicing estimation for many inverse
regression methods depends heavily on the choice of slice
number when response variable is continuous. It is similar
to, but more difficult than classical tuning parameter
selection in nonparametric function estimation. Thus, how to
select the slice number is a longstanding, and still open
problem. In this paper, we propose a binary response
transformationexpectation (BRTE) method. It completely
avoids selecting the number of slices, and meanwhile
preserves the integrity of the original central subspace.
This generic method also ensures the root $n$ consistency
and the asymptotic normality of slicing estimators for many
inverse regression methods, and can be applied to
multivariate response cases. Finally, BRTE is compared with
the existing estimators by extensive simulations and an
illustrative real data example.
« Back...
Central limit theorem for linear
spectral statistics of large dimensional F matrix
Shurong Zheng, Northeast Normal University, China
A central limit theorem (CLT) for linear spectral
statistics (LSS) of a product of a large dimensional sample
covariance matrix and a nonnegative definite Hermitian
matrix was established in Bai and Silverstein (2004).
However, their results don’t cover the case of a product of
one sample covariance matrix and the inverse of another
covariance matrix, independent of each other (F matrix).
This is because for F matrix, their CLT established the
asymptotic normality of the difference of two dependent
statistics defined by the empirical spectral distribution (ESD)
of F matrix and by the ESD of the inverse of the second
sample covariance matrix. But in fact, in many applications
of F matrix, one is often interested in making statistical
inference for the parameter defined by the limiting spectral
distribution (LSD) of F matrix. Then one is interested in
the asymptotic distribution of the difference of the
parameter and the estimator defined by LSS of F matrix. In
this paper, we shall establish the CLT for LSS of F matrix.
As a consequence, we shall also establish the CLT for LSS of
beta matrix.
Key words and phrases: Linear spectral statistics, central
limit theorem, large dimensional random matrix, large
dimensional data analysis.
« Back...
Clustering curves via subspace
projection
JengMin Chiou, Institute of Statistical Science,
Academia Sinica, Taiwan
This study considers a functional clustering method,
kcenters functional clustering, for random curves. The
kcenters functional clustering approach accounts for both
the mean and the modes of variation differentials among
clusters, and predicts cluster memberships via projection
and reclassification. The distance measures considered
include the L_{2} distance and the functional
correlation defined in this study, which are embedded in the
clustering criteria. The cluster membership predictions are
based on nonparametric random effect models of the truncated
KarhunenLoeve expansion, coupled with a nonparametric
iterative mean and covariance updating scheme. The
properties of the proposed clustering methods unravel the
cluster qualities. Simulation studies and practical examples
illustrate the practical performance of the proposed
methods.
« Back...
Nonlinear dimension reduction
with kernel methods
SuYun Huang, Institute of Statistical Science, Academia
Sinica, Taiwan
Dimension reduction has long been an important technique
for highdimensional data analysis. The principal component
analysis (PCA), canonical correlation analysis (CCA), and
sliced inverse regression (SIR) are some important tools in
classical statistical analysis for linear dimension
reduction. In this talk we will introduce their nonlinear
extension using kernel methods.
The essence of kernelbased nonlinear dimension reduction is
to map the pattern data originally observed in Euclidean
space to a highdimensional Hilbert space, called feature
space, by an appropriate kernel transformation.
Lowdimensional projections of highdimensional feature data
are approximately ellipticallycontoured and approximately
Gaussian distributed. Notions of PCA, CCA and SIR can be
extended to the framework of kernel associated feature
Hilbert space, known as reproducing kernel Hilbert space,
for nonlinear dimension reduction. Computing algorithms
including large data handling and numerical examples will be
presented.
« Back...
Variable selection and coefficient estimation via regularized rank regression
Chenlei Leng, National University of Singapore
The penalized least squares method with some appropriately defined penalty is widely used for simultaneous variable selection and coefficient estimation in linear regression.
However, the least squares (LS) based methods may be adversely affected by outlying observations and heavy tailed distributions.
On the other hand, the least absolute deviation (LAD) estimator is more robust, but may be inefficient for many distributions of interest.
To overcome these issues, we propose a novel method termed the regularized rank regression estimator by combining the LAD and the penalized LS methods for variable selection. We show that the proposed estimator has attractive theoreotical properties and is easy to implement.
Simulations and real data analysis both show that the proposed methed performs well in finite sample cases.
« Back...
Dimension reduction for
unsupervised and partially supervised learning
Debasis Sengupta, Indian Statistical Institute, India
Machine learning is often attempted through clustering
and/or classification of multidimensional input data. While
classification and clustering are used in supervised and
unsupervised learning, respectively, there are clustering
problems in the case of partially supervised learning, where
the classed represented in the training data are far from
being exhaustive. In all these cases, the problem of high
dimensionality has to be addressed. We consider dimension
reduction for clustering on the basis of a mixture model,
where observations are normally distributed around a cluster
center, and cluster centers also have a multivariate normal
distribution. We propose an intuitively appealing objective
function for this problem, and work out a solution in the
cases of unsupervised and partially supervised clustering.
We apply the methods to the problem of pugmark based
estimation of tiger population total, and that of clustering
organisms in terms of tetra nucleotide content pattern of
ribosomal DNA sequences.
« Back...
Spectra of large
dimensional random matrices (LDRM)
Arup Bose, Indian Statistical Institute, India
We shall consider (square) matrices with random entries
(real or complex). For example, Sample variance covariance
matrix, IID matrix, Wigner matrix, Toeplitz matrix etc.
where the dimension is growing to infinity. Properties of
eigenvalues of such matrices are of interest.
In this talk we will mostly look at real symmetric matrices
and discuss in a broad way the limiting spectral
distribution (LSD) of these matrices under suitable
conditions.
We shall provide some simulations with these matrices, loose
description of some results on LSD and pose some questions
which should be of interest to statisticians and
probabilists.
« Back...
RKHS formulations
of some functional data analysis problems
Tailen Hsing, University of Michigan, USA
We discuss the inference of two processes in the contexts
of functional data analysis, including canonical
correlations and regression. The common approach defines
canonical variables or regressors in terms of projections in
a Hilbert space. While this is conceptionally
straightforward, it has a number of weaknesses. We describe
an approach that does not require the specification of a
Hilbert space, which leads to theories and more general
inference procedures.
« Back...
Supervised singular value decomposition and its application to independent component analysis for fMRI
Young Truong, The University of North Carolina, USA
Functional Magnetic Resonance Imaging(fMRI) has been used by neuroscientists as a powerful tool to study
brain functions. Independent component analysis (ICA) is an effective method to explore spatiotemporal features in fMRI data.
It has been especially successful to recover brainfunctionrelated signals from recorded mixtures of unrelated signals. Due to the high
sensitivity of MR scanners, spikes are commonly observed in fMRI data, and they deteriorate the analysis. No particular method
exists yet to address this problem. In this paper, we introduce a supervised singular value decomposition technique into the data
reduction step of ICA. Two major advantages are discussed: first, the proposed method improves the robustness of ICA against spikes;
second, the method uses the particular fMRI experiment designs to guide the fully datadriven ICA, and makes the computation more
efficient. The advantages are demonstrated using a spatiotemporal simulation study as well as a real data analysis. This is a joint work with Bai, P., Shen, H. and Huang, X.
« Back...
Model selection, dimension
reduction and liquid association: a trilogy via Stein’s
lemma
KerChau Li, Institute of Statistical Science, Academia
Sinica, Taiwan and
University of California, Los Angeles, USA
In this talk, I will describe how a basic idea from Stein’s
monumental work in decision theory has led to my earlier
research in model selection (generalized cross validation,
honest confidence region), dimension reduction (sliced
inverse regression and principal Hessian direction) and more
recently in the development of liquid association for
bioinformatics applications.
References:
Li, K. C. (1985). From Stein's unbiased risk estimates to
the method of generalized cross validation. Ann. Statist. 13
13521377.
Li, K. C. (1992). On principal Hessian directions for data
visualization and dimension reduction : another application
of Stein's lemma. J. Ameri. Stat. Assoc. 87, 10251039.
Li, KC, Palotie A, Yuan, S, Bronnikov, D., Chen D., Wei X.,
Choi, O., Saarela J., Peltonen L. (2007) Finding candidate
disease genes by liquid association. Genome Biology, 8,
R205. oi:10.1186/gb2007810r205
« Back...
Functional mixture regression
Thomas Lee, The Chinese University of Hong Kong
This talk introduces Functional Mixture Regression (FMR), a
natural and useful extension of the classical functional
linear regression (FLR) model. FMR generalizes FLR
essentially in the same way as linear mixture regression
generalizes linear regression. That is, the observed
predictor random processes are allowed to form subgroups in
such a way that each subgroup will have its own regression
parameter function. In this talk both theoretical and
empirical properties on FMR will be discussed.
This is joint work with Yuejiao Fu and Fang Yao.
« Back...

