Interface DataCuller

All Known Implementing Classes:
MinGapDataCuller

public interface DataCuller
This is meant to accomodate an on-line algorithm for keeping a constant number of data points from an on-going time series. It only looks at the X values of the time series data points. Implementations can assume the X values come in in increasing order.

Now if a series holds the data points in a Bag, removing data points screws with the order, so the next time a data point needs to be removed, its index will be compromized. The way this works is ... I get a bunch of X values and I return the indices of values that should be dropped. It's then your job to delete the correct items and recompact the remaining data.

Changing the API to include the actual data item, not just its x value could be useful if one wants to choose based on similarity of Y values, not just closeness of x values. Another example would be an implementation that averages data (e.g. average 2 consecutive data points)!

If a chart has multiple time series, does one cull each data series separatelly? I.e. is this a chart global property or each series has its own? I kinda expect that when multiple series reside in the same chart, they're "synchronized" in the sense that they have data points at the same moments in time. In this case, the cullers of those series do the same work, so one should be enough.
The user is expected to add the series at the same time. I make no attempt to detect if series in the same chart have the same set of x values.
The current implementation in TimeSeriesChartGenerator and the corresponding property inspector uses a single DacaCuller for all series. In the future I guess I could give each series a clone of the data culling algorithm, if it turns out that stateful algoriths are needed.

In order to improve the amortized time complexity, more than 1 data point should be culled at a time. E.g. as soon as you get 200 points, drop 100. After each such operation there's a linear time data shifting, so it pays off to delete multiple points at a time. It also helps with the stuff during the operation, since one does not have to scan starting from the beginning for each data point to be deleted. Heaps might be helpful while deciding which points to drop. The recompacting is still linear, but it should be very fast.

  • Method Summary

    Modifier and Type
    Method
    Description
    cull(double[] xValues, boolean sortOutput)
     
    boolean
    tooManyPoints(int currentPointCount)
     
  • Method Details

    • tooManyPoints

      boolean tooManyPoints(int currentPointCount)
    • cull

      IntBag cull(double[] xValues, boolean sortOutput)