Research

Summary

A plethora of digital data is being generated at unparalleled speed with an inordinate number of dimensions. Machine learning and data mining are approaches that can assist us in keeping pace with the rapidly advancing data gathering and storage techniques and help us mine nuggets or patterns from high-dimensional data. Semi-supervised learning can be interpreted as supervised learning that uses additional information from unlabeled data, or as unsupervised learning guided by constraints formed from labeled data. This research is addressing two key pressing issues with massive data: high dimensionality and a shortage of labeled data. In particular, this project is: investigating semi-supervised feature selection to remove irrelevant features; studying the combination of feature extraction and model selection to further reduce dimensionality; and developing a novel framework to integrate feature selection and feature extraction based on sparse learning. This study is an explicit attempt to connect and unify feature selection and extraction for hypothesis space reduction. The project is directly facilitating basic machine learning research and practical data mining and advances innovative research beyond feature selection and extraction. The work is engaging students in both teaching and research, and the algorithms, tools and databases will be made publically available for research purposes and for use as teaching resources.

Funding

National Science Foundation Division of Information and Intelligent Systems

Timeline

September 2008 — August 2012

Research

Beyond Feature Selection and Extraction – an Integrated Framework for High-Dimensional Data of Small Labeled Samples

Summary

Funding

Timeline