tsad.utils package

Subpackages

Submodules

tsad.utils.eda module

tsad.utils.eda.value_counts_interval(array, itervals)[source]

input : np.array, list of values retrun : pd.series

tsad.utils.imports module

tsad.utils.imports.import_module_from_GitHub(url, saving_path='./Downloads/')[source]

A function that imports a Python module published on the Internet.

Parameters:
urlstr

Link to a module. Can be GitHub link to module or raw.githubusercontent.com or other, which return HTTP get correctly.

saving_pathstr

Path to saving model.

Returns:
modulemodule

Then you can use this module Result as: import url

Notes

1Синтаксис

Хотелось бы from url import func1 чтобы работало.

Examples

>>> import_from_GitHub(https://github.com/Gabriel-p/minenergy/blob/master/minenergy.py)

tsad.utils.iterators module

Very useful librray

class tsad.utils.iterators.Loader(X, y, batch_size, shuffle=True, random_state=None)[source]

Bases: object

X, y are lists

tsad.utils.preproc module

tsad.utils.preproc.df2dfs(df, resample_freq=None, thereshold_gap=None, koef_freq_of_gap=1.2, plot=False, col=None)[source]

Function that splits df into a list of dfs by gaps. That is it makes raw df satisfying to the input requirements with the lack of gaps and different frequencies of discretization. Does not resample as it is a heavy task, but if the frequency is less than koef_freq_of_gap of thereshold_gap, it is perceived as a skip. The main idea: if the signal comes more often, then it does not slip too much, and therefore does not lead to anomalies, but if it is rare, it leads to anomalies, so it is perceived as a skip.

plot - very long

Parameters:
dfpd.DataFrame

The original time series for the entire history.

resample_freq: pd.TimeDelta (optional, default=None)

The frequency of time series discretization. If default, then the most frequent frequency of discretization. If there is no pronounced frequency, an error will occur.

thereshold_gappd.TimeDelta (optional, default=None)

The threshold period, exceeding which the function will perceive this period as a skip.

koef_freq_of_gapfloat or int (optional if thereshold_gap==None,

default=1.2) thereshold_gap = koef_freq_of_gap * resample_freq

plotbool (optional, default=False)

Plot the cut, but it is need very long time. If true, then the cut will be drawn. If false, then the cut will not be drawn.

colint of str (optional, default=True)

The name or number of the column to draw. If None, the first column is used.

Returns
——-
dfslist of pd.DataFrame

A list of time series without gaps with a relatively stable frequency of discretization.

tsad.utils.preproc.split_by_repeated(series, df=None)[source]

Splits a pandas series into sub-series based on repeated values.

Parameters:

seriespandas.Series

The series to be split.

df (, optional): pandas.DataFrame. Defaults is None.

The dataframe to be used to retrieve the original rows.

Returns:

dict: A dictionary where the keys are the unique values in the series and the values are lists of sub-series.

tsad.utils.preproc.value_counts_interval(array, itervals)[source]

Returns a pandas Series containing the count of values in the input array that fall within each interval.

Parameters:

arraynumpy.ndarray | list of values

Input array of values.

intervals (list):

List of interval boundaries. The first interval is defined as values less than the first boundary, and the last interval is defined as values greater than or equal to the last boundary.

Returns:

tspandas.Series

A Series containing the count of values in the input array that fall within each interval.

tsad.utils.trainTestSplitting module

This module describes options for working with train test sequence splitting.

tsad.utils.trainTestSplitting.ts_train_test_split(df, len_seq, points_ahead=1, gap=0, step=1, intersection=True, test_size=None, train_size=None, random_state=None, what_to_shuffle='train')[source]

A function that splits the time series into train and test sequence subsets.

Parameters:
dfpd.DataFrame

Array of shape (n_samples, n_features) with pd.timestamp index.

len_seqint

Length of the sequence, which is used to predict the next point/points.

points_aheadint, default=0

How many points ahead we predict, reflected in y

gapint, default=0

The gap between last point of sequence, which we used as input for prediction and first point of potential model output sequence (prediction).If the last point of input sequence is t, then the first point of the output sequence is t + gap +1. The parameter is designed to be able to predict sequence after a additional time interval.

stepint, default=1.

Sample generation step. If the first point was t for the 1st sample (sequence) of the train, then for the 2nd sample (sequence) of the train it will be t + step if intersection=True, otherwise the same but without intersections of the series values.

intersectionbool, default=True

The presence of one point in time in different samples (sequences) for the train set and and separately for the test test. If True, the train and the test never have common time points.

test_sizefloat or int or timestamp for df, or list of timestamps, default=0.25.
The size of the test set.
  • If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split.

  • If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size.

  • If 0, then it will return the X,y values in X_train, y_train.

  • If timestamp, for X_test we will use set from df[t:]

  • If list of timestamps [t1,t2], for X_test we will use set from df[t1:t2]

  • If train_size is None, it will be set to 0.25. *

train_sizefloat or int, default=None.
The size of the train set.
  • If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split.

  • If int, represents the absolute number of train samples.

  • If 0, then it will return the X,y values in X_test, y_test.

  • If timestamp for df, for X_train we will use set for train from df[:t]

  • If list of timestamps [t1,t2], for X_train we will use set for train from df[t1:t2]

  • If None,the value is automatically set to the complement of the test size.

what_to_shuffle: {‘nothing’, ‘all’,’train’}, str. Default = ‘train’.
  • If ‘train’ we random shuffle only X_train, and y_train. Test samples are unused for the shuffle. Any sample from X_test is later than any sample from X_train. This is also true for respectively

  • If ‘all’ in analogy with sklearn.model_selection.train_test_split

  • If ‘nothing’ shuffle is not performed.

random_stateint, RandomState instance or None, default=None

Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls. See Glossary.*

Returns:
(X_train, X_test, y_train, y_test)tuple

Tuple containing train-test split of inputs

tsad.utils.trainTestSplitting.ts_train_test_split_dfs(dfs, len_seq, points_ahead=1, gap=0, step=1, intersection=True, test_size=None, train_size=None, random_state=None, what_to_shuffle='train')[source]

An auxiliary function that eliminates duplication.

Parameters:
paramssee ts_train_test_split

tsad.utils.visualization module

Module contents