tsad.utils.evaluating package¶
Submodules¶
tsad.utils.evaluating.evaluating module¶
Evaluating module
- tsad.utils.evaluating.evaluating.evaluating(true, prediction, metric='nab', window_width=None, portion=0.1, anomaly_window_destination='lefter', clear_anomalies_mode=True, intersection_mode='cut right window', table_of_coef=None, scale_func='improved', scale_koef=1, plot_figure=False, verbose=True)[source]¶
- Parameters:
- true: variants:
or: if one dataset : pd.Series with binary int labels (1 is anomaly, 0 is not anomaly);
or: if one dataset : list of pd.Timestamp of true labels, or [] if haven’t labels ;
or: if one dataset : list of list of t1,t2: left and right detection, boundaries of pd.Timestamp or [[]] if haven’t labels
or: if many datasets: list (len of number of datasets) of pd.Series with binary int labels;
or: if many datasets: list of list of pd.Timestamp of true labels, or true = [ts,[]] if haven’t labels for specific dataset;
or: if many datasets: list of list of list of t1,t2: left and right detection boundaries of pd.Timestamp; If we haven’t true labels for specific dataset then we must insert empty list of labels: true = [[[]],[[t1,t2],[t1,t2]]].
__True labels of anomalies or changepoints. It is important to have appropriate labels (CP or anomaly) for corresponding metric (See later “metric”)
- prediction: variants:
or: if one dataset : pd.Series with binary int labels (1 is anomaly, 0 is not anomaly);
or: if many datasets: list (len of number of datasets) of pd.Series with binary int labels.
__Predicted labels of anomalies or changepoints. It is important to have appropriate labels (CP or anomaly) for corresponding metric (See later “metric”)
- metric: {‘nab’, ‘binary’, ‘average_time’, ‘confusion_matrix’}.
Default=’nab’ Affects to output (see later: Returns) Changepoint problem: {‘nab’, ‘average_time’}. Standard AD problem: {‘binary’, ‘confusion_matrix’}. ‘nab’ is Numenta Anomaly Benchmark metric
‘average_time’ is both average delay or time to failure depend on situation.
‘binary’: FAR, MAR, F1.
‘confusion_matrix’ standard confusion_matrix for any point.
- window_width: ‘str’ for pd.Timedelta
Width of detection window. Default=None.
- portionfloat, default=0.1
The portion is needed if window_width = None. The width of the detection window in this case is equal to a portion of the width of the length of prediction divided by the number of real CPs in this dataset. Default=0.1.
- anomaly_window_destination: {‘lefter’, ‘righter’, ‘center’}. Default=’right’
The parameter of the location of the detection window relative to the anomaly. ‘lefter’ : the detection window will be on the left side of the anomaly ‘righter’ : the detection window will be on the right side of the anomaly ‘center’ : the scoring window will be positioned relative to the center of anom.
- clear_anomalies_modeboolean, default=True.
True : then the `left value of a Scoring function is Atp and the `right is Afp. Only the `first value inside the detection window is taken. False: then the `right value of a Scoring function is Atp and the `left is Afp. Only the `last value inside the detection window is taken.
- intersection_mode: {‘cut left window’, ‘cut right window’, ‘both’}.
Default=’cut right window’ The parameter will be used if the detection windows overlap for true changepoints, which is generally undesirable and requires a different approach than simply cropping the scoring window using this parameter. ‘cut left window’ : will cut the overlapping part of the left window ‘cut right window’: will cut the intersecting part of the right window ‘both’ : will crop the intersecting portion of both the left and right windows
- verbose: boolean, default=True.
If True, then output useful information
- plot_figureboolean, default=False.
If True, then drawing the score fuctions, detection windows and predictions It is used for example, for calibration the scale_koef.
- table_of_coef (metric=’nab’): pd.DataFrame of specific form. See bellow.
Application profiles of NAB metric.If Default is None: table_of_coef = pd.DataFrame([[1.0,-0.11,1.0,-1.0],
[1.0,-0.22,1.0,-1.0], [1.0,-0.11,1.0,-2.0]])
table_of_coef.index = [‘Standard’,’LowFP’,’LowFN’] table_of_coef.index.name = “Metric” table_of_coef.columns = [‘A_tp’,’A_fp’,’A_tn’,’A_fn’]
- scale_func (metric=’nab’): “default” of “improved”. Default=”improved”.
Scoring function in NAB metric. ‘default’ : standard NAB scoring function ‘improved’ : Our function for resolving disadvantages of standard NAB scoring function
- scale_koeffloat > 0. Default=1.0.
Smoothing factor. The smaller it is, the smoother the scoring function is.
- Returns:
- metricsvalue of metrics, depend on metric
- ‘nab’: tuple
Standard profile, float
Low FP profile, float
Low FN profile
- ‘average_time’: tuple
Average time (average delay, or time to failure)
Missing changepoints, int
FPs, int
Number of true changepoints, int
- ‘binary’: tuple
F1 metric, float
False alarm rate, %, float
Missing Alarm Rate, %, float
- ‘binary’: tuple
TPs, int
TNs, int
FPs, int
FNS, int
tsad.utils.evaluating.src module¶
- tsad.utils.evaluating.src.check_errors(my_list)[source]¶
Check format of input true data
- Parameters:
- my_list - uniform format of true (See evaluating.evaluating)
- Returns:
- mxdepth of list, or variant of processing
- tsad.utils.evaluating.src.extract_cp_confusion_matrix(detecting_boundaries, prediction, point=0, binary=False)[source]¶
prediction: pd.Series
point=None for binary case Returns ———- dict: TPs: dict of numer window of [t1,t_cp,t2] FPs: list of timestamps FNs: list of numer window
tsad.utils.evaluating.univariate_funcs module¶
- tsad.utils.evaluating.univariate_funcs.my_scale(fp_case_window=None, A_tp=1, A_fp=0, koef=1, detalization=1000, clear_anomalies_mode=True, plot_figure=False)[source]¶
ts - segment on which the window is applied
- tsad.utils.evaluating.univariate_funcs.single_average_delay(detecting_boundaries, prediction, anomaly_window_destination, clear_anomalies_mode)[source]¶
anomaly_window_destination: ‘lefter’, ‘righter’, ‘center’. Default=’right’
- tsad.utils.evaluating.univariate_funcs.single_evaluate_nab(detecting_boundaries, prediction, table_of_coef=None, clear_anomalies_mode=True, scale_func='improved', scale_koef=1, plot_figure=True)[source]¶
- detecting_boundaries: list of list of two float values
The list of lists of left and right boundary indices for scoring results of labeling if empty. Can be [[]], or [[],[t1,t2],[]]
- table_of_coef: pandas array (3x4) of float values
Table of coefficients for NAB score function indices: ‘Standard’,’LowFP’,’LowFN’ columns:’A_tp’,’A_fp’,’A_tn’,’A_fn’
scale_func {default}, improved недостатки scale_func default - 1 - зависит от относительного шага, а это значит, что если слишком много точек в scoring window то перепад будет слишком жестким в середение. 2- то самая левая точка не равно Atp, а права не равна Afp (особенно если пррименять расплывающую множитель)
clear_anomalies_mode тогда слева от границы Atp срправа Afp, иначе fault mode, когда слева от границы Afp срправа Atp