Data visualization techniques are very important for data analysis, since the human eye has been frequently advocated as the ultimate data-mining tool. However, there has been surprisingly little work on visualizing massive time series datasets. To this end, we developed VizTree, a time series pattern discovery and visualization system based on augmenting suffix trees. VizTree visually summarizes both the global and local structures of time series data at the same time. In addition, it provides novel interactive solutions to many pattern discovery problems, including the discovery of frequently occurring patterns (motif discovery), surprising patterns (anomaly detection), and query by content.  The user interactive paradigm allows users to visually explore the time series, and perform real-time hypotheses testing.

Click here for demo. The executable can be downloaded here (make sure you read the README first). The C# code is available upon request.


Step 1: Subsequence Extraction and Discretization (via SAX)

Using a sliding window of size n (user input), extract subsequences via a sliding window. Then discretize each subsequence via SAX, e.g., like the following (the figure shows the discretization of just one subsequence). The time series is converted to string "baabccbc" (see the SAX website for more details). Do this for every subsequence (or selected subsequences if the numerosity option is turned on).

Step 2: Insertion

Push the data into a depth-limited, augmented suffix tree.The frequencies of the strings are encoded as the thickness of branches. The following figure summarizes these steps.

VizTree Overview

The design of VizTree follows the Visual Information Seeking Mantra: "Overview, zoom & filter, details-on-demand" championed by Dr. Ben Shneiderman.

Since the frequency of each string (pattern) is encoded in the line thickness, thick branches represent frequent patterns, whereas think braches represent infrequent or potentially anomalous patterns.

The example above shows the power consumption time series. A "normal" weekly pattern has 5 peaks, one for each day of the week. If we click on the branch "bab" (a thin branch), we can see that the subsequence mapped to this particular string (shown in the Detail 1 window) indeed has a different pattern: it has 3 peaks instead of 5, as a result of a short, Christmas week.


If training data is available, we can visualize the distributional differences of patterns between the training data and the testing data. The steps are summarized as follows.

Blue lines: under-represented patterns (pattern is more common in A)

Green lines: over-represented patterns (pattern is more common in B)

Red lines: surprising patterns

For the example below, the blue ECG data is the reference data, and the green ECG data is the testing data. The resulting tree shows the differences in pattern distributions in the two datasets. The surprising patterns are ranked. Clicking on the branch ranked #1, the anomalous heartbeat in the green time series is shown (also highlighted in the time series window).

Relevant Publications

Back to Main Page