Evaluation of AD algorithms perfomance¶

import pandas as pd
import numpy as np
import sys

try:
    import tsad
except:
    import sys
    sys.path.insert(1, '../')
    from tsad.evaluating.evaluating import evaluating

import warnings
warnings.filterwarnings('ignore')

Simple example¶

Init true and prediction labelled data

true = pd.Series(0,pd.date_range('2020-01-01','2020-01-20',freq='D'))
true.iloc[[6,14]]=1

prediction = pd.Series(0,pd.date_range('2020-01-01','2020-01-20',freq='D'))
prediction.iloc[[4,10]]=1

pd.concat([true,prediction],1).reset_index()

	index	0	1
0	2020-01-01	0	0
1	2020-01-02	0	0
2	2020-01-03	0	0
3	2020-01-04	0	0
4	2020-01-05	0	1
5	2020-01-06	0	0
6	2020-01-07	1	0
7	2020-01-08	0	0
8	2020-01-09	0	0
9	2020-01-10	0	0
10	2020-01-11	0	1
11	2020-01-12	0	0
12	2020-01-13	0	0
13	2020-01-14	0	0
14	2020-01-15	1	0
15	2020-01-16	0	0
16	2020-01-17	0	0
17	2020-01-18	0	0
18	2020-01-19	0	0
19	2020-01-20	0	0

Evaluating by default NAB metric

results = evaluating(true=true,prediction=prediction)
print(results)

Since you not choose numenta_time and portion, then portion will be 0.1
Standart  -  -5.5
LowFP  -  -11.0
LowFN  -  -3.67
{'Standart': -5.5, 'LowFP': -11.0, 'LowFN': -3.67}

Approaches for evaluating of anomaly detection algorithms for time series data¶

Metirc for evaluating of AD algorithms for time series data in tsad¶

NAB metric: link

Changepoints metrics¶

Variants of input variables¶

A crucial element for the changepoint detection problem is a detection window:

The predicted anomalies inside the detection window are perceived as only one true positive
The absence of predicted anomalies inside the detection window is perceived as only one false negative
the predicted points outside the detection windows as false positives.

Thus we must assign left and right boundaries of a window for any true changepoint if it is available for a dataset. In TSAD, we have three opportunities for this:

true variable as pd.Series and numenta_time variable (or portion*)
true variable as a list of true changepoints of pd.Timestamp format and numenta_time variable (or portion*)
true variable as boundaries itself as a list (we can have more than one change point for one dataset) of a list (length of 2) with left and right pd.Ttimestamp boundary of the window.

* The portion is needed if numenta_time = None. The width of the detection window in this case is equal to a portion of the width of the length of prediction divided by the number of real CPs in this dataset. default 0.1.

prediction is always pd.Series for one dataset

The picture above shows the predicted label values for changepoint problem.The variable numenta_time is actually with of window.

prediction is always have the same format (pd.Series) for one dataset:

prediction = pd.Series(0,pd.date_range('2020-01-01','2020-01-07',freq='D'))
prediction.iloc[3]=1
prediction

2020-01-01    0
2020-01-02    0
2020-01-03    0
2020-01-04    1
2020-01-05    0
2020-01-06    0
2020-01-07    0
Freq: D, dtype: int64

How would the true input variable look in each variant:

Variant 1. True as pd.Series¶

true = pd.Series(0,pd.date_range('2020-01-01','2020-01-07',freq='D'))
true.iloc[5]=1
numenta_time='3D'
true

2020-01-01    0
2020-01-02    0
2020-01-03    0
2020-01-04    0
2020-01-05    0
2020-01-06    1
2020-01-07    0
Freq: D, dtype: int64

results = evaluating(true=true,prediction=prediction,numenta_time=numenta_time,metric='average_time')

Amount of true anomalies 1
A number of missed CPs = 0
A number of FPs = 0
Average time 2 days 00:00:00

From here we can see that we really correctly detect the one true chanepoint by given our detection window (which is equal 3 days before changepoint).

If we not true CPs for specefic dataset:

true = pd.Series(0,pd.date_range('2020-01-01','2020-01-07',freq='D'))
results = evaluating(true=true,prediction=prediction,numenta_time=numenta_time,metric='average_time')

Amount of true anomalies 0
A number of missed CPs = 0
A number of FPs = 1
Average time nan

Variant 2. True as list of pd.Timestamp¶

true = [pd.Timestamp('2020-01-06')]
numenta_time='3D'
true

[Timestamp('2020-01-06 00:00:00')]

results = evaluating(true=true,prediction=prediction,numenta_time=numenta_time,metric='average_time')

Amount of true anomalies 1
A number of missed CPs = 0
A number of FPs = 0
Average time 2 days 00:00:00

The same result

If we not true CPs for specefic dataset:

true = []
results = evaluating(true=true,prediction=prediction,numenta_time=numenta_time,metric='average_time')

Amount of true anomalies 0
A number of missed CPs = 0
A number of FPs = 1
Average time nan

Variant 3.True as a list of a list with left and right pd.Timestamp boundary of the window¶

true = [[pd.Timestamp('2020-01-03'),pd.Timestamp('2020-01-06')]]
numenta_time='3D'
true

[[Timestamp('2020-01-03 00:00:00'), Timestamp('2020-01-06 00:00:00')]]

results = evaluating(true=true,prediction=prediction,numenta_time=numenta_time,metric='average_time')

Amount of true anomalies 1
A number of missed CPs = 0
A number of FPs = 0
Average time 2 days 00:00:00

The same result

If we not true CPs for specefic dataset:

true = [[]]
results = evaluating(true=true,prediction=prediction,numenta_time=numenta_time,metric='average_time')

Amount of true anomalies 0
A number of missed CPs = 0
A number of FPs = 1
Average time nan

Variant 4,5,6. Many datasets¶

# if we have 2 the same datasets
prediction = [prediction,prediction]
numenta_time='3D'

true = pd.Series(0,pd.date_range('2020-01-01','2020-01-07',freq='D'))
true.iloc[5]=1
true = [true,true]
results = evaluating(true=true,prediction=prediction,numenta_time=numenta_time,metric='average_time')

true = [pd.Timestamp('2020-01-06')]
true = [true,true]
results = evaluating(true=true,prediction=prediction,numenta_time=numenta_time,metric='average_time')

true = [[pd.Timestamp('2020-01-03'),pd.Timestamp('2020-01-06')]]
true = [true,true]
results = evaluating(true=true,prediction=prediction,numenta_time=numenta_time,metric='average_time')

Amount of true anomalies 2
A number of missed CPs = 0
A number of FPs = 0
Average time 2 days 00:00:00
Amount of true anomalies 2
A number of missed CPs = 0
A number of FPs = 0
Average time 2 days 00:00:00
Amount of true anomalies 2
A number of missed CPs = 0
A number of FPs = 0
Average time 2 days 00:00:00

Different situations with changepoint detection problem¶

Assigning characteristics of a window as well as selecting one point in the window must differ depend on business tasks, and in TSAD we seem to foreseen every case for this. Examples of cases from technical diagnostic:

We have clear anomalies that have to lead to failure. From history, we have objective information about the times of arising anomalies (true changepoints), and we understand that any predicted anomaly that is earlier than the true changepoint is a false positive.
We have a failure of a system. From history, we have objective information about the time of failure. But we haven’t any information about the anomaly. Thus predicted anomaly that is earlier than the true changepoint to be true positive.
We know approximately the time of the anomaly.
Many other cases.

To meet the business objectives, we make possible to adjust the following parameters in evaluating:

Evaluation of AD algorithms perfomance¶

Simple example¶

Approaches for evaluating of anomaly detection algorithms for time series data¶

Metirc for evaluating of AD algorithms for time series data in tsad¶

Changepoints metrics¶

Variants of input variables¶

Variant 1. True as pd.Series¶

Variant 2. True as list of pd.Timestamp¶

Variant 3.True as a list of a list with left and right pd.Timestamp boundary of the window¶

Variant 4,5,6. Many datasets¶

Different situations with changepoint detection problem¶

`anomaly_window_destenation` for input variant 1,2 of true variable¶

`clear_anomalies_mode`¶

`intersection_mode` for solving a problem of intersection of detection windows¶

Evaluation of AD algorithms perfomance¶

Simple example¶

Approaches for evaluating of anomaly detection algorithms for time series data¶

Metirc for evaluating of AD algorithms for time series data in tsad¶

Changepoints metrics¶

Variants of input variables¶

Variant 1. True as pd.Series¶

Variant 2. True as list of pd.Timestamp¶

Variant 3.True as a list of a list with left and right pd.Timestamp boundary of the window¶

Variant 4,5,6. Many datasets¶

Different situations with changepoint detection problem¶

anomaly_window_destenation for input variant 1,2 of true variable¶

clear_anomalies_mode¶

intersection_mode for solving a problem of intersection of detection windows¶

`anomaly_window_destenation` for input variant 1,2 of true variable¶

`clear_anomalies_mode`¶

`intersection_mode` for solving a problem of intersection of detection windows¶