Learning Local Feature Relevance for Classification and Clustering (NSF CAREER Award)

Abstract Abstract
Pattern classification is a broad research area with numerous applications ranging from science, engineering, target marketing,
medical diagnosis and electronic commerce to weather forecast based on satellite imagery. While pattern classification has shown
promise in many areas of practical significance, it faces difficult challenges posed by real world problems, of which the most
pronounced is the curse of dimensionality. The emphasis of this project is on the design of novel classification and clustering
techniques to mitigate the curse of dimensionality and reduce bias, by estimating feature relevance and selecting features accordingly.
In particular, this project has the following specific and measurable objectives: (1) Develop non-linear and flexible metrics for
distance-based classifiers via kernel methods; (2) Construct effective ensembles by exploiting local feature relevance to perform
adaptive sampling in feature space. This approach takes advantage of the high dimensionality of the data; (3) Develop adaptive
metrics for subspace clustering by measuring local correlation of data with respect to different dimensions. Almost all problems
of practical interest are high dimensional. Thus, our research will have significant impact in fields and applications as diverse as
bioinformatics, security and intrusion detection, information and image retrieval. Our collaborative effort with biologists for the
analysis of microarray data has the potential to contribute new data mining techniques for the HIV genomic knowledge domain,
which will eventually lead to customized diagnosis and treatment of AIDS.