|
PhD Dissertation Defense Abstract: Shen-Shyang Ho
Learning From Data Streams Using Transductive Inference and Martingale
Streaming data is ubiquitous, e.g., data generated from wireless sensor networks, web
logs and click streams, ATM transactions, phone call records, and multimedia video
data. One seeks to discover knowledge and to extract interesting patterns from the
data stream. Such knowledge can be extremely useful for commercial, military, and
home-land security purposes, among others. To ensure that what is learned is useful,
one seeks to describe the data generating process using the best available model. This
corresponds to model selection. Two questions that affect the model selection problem
in an online data streaming setting are explored: those of contents("what") and
time("when"):
- What are the data points needed to build a good predictive model?
- When does the data generation model change?
These questions correspond to the active learning problem and change detection
problem, respectively. The active learning problem involves the selection of
informative but yet unlabeled data points to label. The solution to this problem aims
to label a small [minimum] number of data points to build a good model. The change
detection problem involves the recognition of deviation from the existing data
generation model. The solution to this problem aims to detect the change as fast as
possible.
In this dissertation, an active learning strategy based on transductive inference, and
a change detection strategy using martingale is proposed. The contributions
of this dissertation are as follows:
- An active learning strategy based on transductive inference is proposed and
justified. The active learning strategy is empirically shown to be feasible and
compares favorably with other stream-based active learning strategies.
-
Change detection in data streams based on testing the exchangeability condition
using martingale is proposed. The feasibility of the two proposed martingale tests for
change detection is shown empirically on both labeled and unlabeled data points. The
advantages of our novel one-pass incremental martingale change detection method are
that it (i) does not require a sliding window on the data stream, (ii) does not
require monitoring the explicit performance (e.g. classification error) as data points
are streaming, and (iii) works well for high dimensional data streams. The change
detection method is used to implement (i) an online adaptive learning algorithm for
labeled data streams, which compares favorably with sliding window method; and (ii) a
video-shot change detector for unlabeled video streams, which compares favorably with
some standard methods.
|