[1]:

import sys
import pandas as pd

sys.path.insert(1, '../')

Let’s demonstrate in this how to import datasets and check some information about them. To begin with, we can check all the available datasets:

[2]:

from tsad.base.datasets import list_of_datasets

list_of_datasets()

[2]:

{'Combines state monitoring': 'load_combines()',
 'SKAB (skoltech anomaly benchmark) teaser': 'load_skab_teaser()',
 'SKAB (skoltech anomaly benchmark)': 'load_skab()',
 'NASA Turbofan Jet Engine Data Set': 'load_turbofan_jet_engine()',
 'TEP (Tennessee Eastman process)': 'load_tep()',
 'Pressurized Water Reactor (PWR) Dataset for Fault Detection': 'load_pwr_anomalies()',
 'NPP Power Transformer RUL': 'load_transformer_rul()'}

In this dictionary keys represent names of the datasets, and values represent modules with the datasets. Let’s try them out.

Docstrings contain links to the detailed dataset description, if one exists.

Combines state monitoring dataset¶

Importing¶

[3]:

from tsad.base.datasets import load_combines

dataset = load_combines()

Dataset info¶

[4]:

print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")

Dataset's name: Combines state monitoring

Dataset's description:

Tast to solve with dataset:

Dataset's features: ['Anker', 'Cut', 'Go', 'Uncert']

Dataset's target: None

Dataset¶

[5]:

dataset.frame.head(2)

[5]:

Описание	Anker	Cut	Go	Uncert
Время
2023-04-21 13:32:48.228	0.0	NaN	NaN	NaN
2023-04-21 13:32:48.230	NaN	NaN	0.0	NaN

SKAB (skoltech anomaly benchmark) teaser¶

Importing¶

[6]:

from tsad.base.datasets import load_skab_teaser

dataset = load_skab_teaser()

Dataset info¶

[7]:

print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")

Dataset's name: SKAB (skoltech anomaly benchmark) teaser

Dataset's description: Dataset for process monitoring (changepoint detection) benchmarking. It is just a short version (teaser) of SKAB

Tast to solve with dataset: Process monitoring (changepoint detection)

Dataset's features: ['Accelerometer1RMS', 'Accelerometer2RMS', 'Current', 'Pressure', 'Temperature', 'Thermocouple', 'Voltage', 'Volume Flow RateRMS']

Dataset's target: None

Dataset¶

Dataset has separate markup (labels). Dataset itself:

[8]:

dataset.frame[0].head(2)

[8]:

id	Accelerometer1RMS	Accelerometer2RMS	Current	Pressure	Temperature	Thermocouple	Voltage	Volume Flow RateRMS
datetime
2019-07-08 17:02:14	0.04283	0.080612	0.000749	-0.273216	28.2099	23.4457	252.743	37.0242
2019-07-08 17:02:32	0.04333	0.084116	0.000849	0.054711	28.3486	23.4492	240.488	37.0000

Labels:

[9]:

dataset.frame[1]

[9]:

[('2019-07-08 18:39:22', '2019-07-08 18:42:32'),
 ('2019-07-08 18:44:36', '2019-07-08 18:46:51'),
 ('2019-07-08 19:06:57', '2019-07-08 19:11:31'),
 ('2019-07-08 19:14:40', '2019-07-08 19:21:16')]

SKAB (skoltech anomaly benchmark)¶

Importing¶

[10]:

from tsad.base.datasets import load_skab

dataset = load_skab()

Dataset info¶

[11]:

print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")

Dataset's name: SKAB (skoltech anomaly benchmark)

Dataset's description: Dataset for process monitoring (changepoint detection) benchmarking

Tast to solve with dataset: Process monitoring (changepoint detection)

Dataset's features: ['Accelerometer1RMS', 'Accelerometer2RMS', 'Current', 'Pressure', 'Temperature', 'Thermocouple', 'Voltage', 'Volume Flow RateRMS']

Dataset's target: ['anomaly', 'changepoint']

Dataset¶

[12]:

dataset.frame.head(2)

[12]:

		Accelerometer1RMS	Accelerometer2RMS	Current	Pressure	Temperature	Thermocouple	Voltage	Volume Flow RateRMS	anomaly	changepoint
experiment	datetime
valve1/6	2020-03-09 12:14:36	0.027429	0.040353	0.77031	0.382638	71.2129	25.0827	219.789	32.0000	0.0	0.0
valve1/6	2020-03-09 12:14:37	0.027269	0.040226	1.09696	0.710565	71.4284	25.0863	233.117	32.0104	0.0	0.0

NASA Turbofan Jet Engine Data Set¶

Importing¶

[13]:

from tsad.base.datasets import load_turbofan_jet_engine

dataset = load_turbofan_jet_engine()

Dataset info¶

[14]:

print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")

Dataset's name: NASA Turbofan Jet Engine Data Set

Dataset's description: Dataset includes Run-to-Failure simulated data from turbo fan jet engines. In this dataset the goal is to predict the remaining useful life (RUL) of each engine in the test dataset. RUL is equivalent of number of flights remained for the engine after the last datapoint in the test dataset.
    - In train dataset there are 100 engines. The last cycle for each engine represents the cycle when failure had happened.
    - In test dataset there are 100 engines as well. But this time, failure cycle was not provided.

Tast to solve with dataset: Remaining useful life prediction

Dataset's features: ['id', 'cycle', 'setting1', 'setting2', 'setting3', 's1', 's2', 's3', 's4', 's5', 's6', 's7', 's8', 's9', 's10', 's11', 's12', 's13', 's14', 's15', 's16', 's17', 's18', 's19', 's20', 's21']

Dataset's target: ['ttf']

Dataset¶

Dataset has separate X_train, X_test and y_test.

X_train:

[15]:

dataset.frame[0].head(2)

[15]:

	id	cycle	setting1	setting2	setting3	s1	s2	s3	s4	s5	...	s12	s13	s14	s15	s16	s17	s18	s19	s20	s21
0	1	1	-0.0007	-0.0004	100.0	518.67	641.82	1589.70	1400.60	14.62	...	521.66	2388.02	8138.62	8.4195	0.03	392	2388	100.0	39.06	23.4190
1	1	2	0.0019	-0.0003	100.0	518.67	642.15	1591.82	1403.14	14.62	...	522.28	2388.07	8131.49	8.4318	0.03	392	2388	100.0	39.00	23.4236

2 rows × 26 columns

X_test:

[16]:

dataset.frame[1].head(2)

[16]:

	id	cycle	setting1	setting2	setting3	s1	s2	s3	s4	s5	...	s12	s13	s14	s15	s16	s17	s18	s19	s20	s21
0	1	1	0.0023	0.0003	100.0	518.67	643.02	1585.29	1398.21	14.62	...	521.72	2388.03	8125.55	8.4052	0.03	392	2388	100.0	38.86	23.3735
1	1	2	-0.0027	-0.0003	100.0	518.67	641.71	1588.45	1395.42	14.62	...	522.16	2388.06	8139.62	8.3803	0.03	393	2388	100.0	39.02	23.3916

2 rows × 26 columns

y_test:

[17]:

dataset.frame[2].head(2)

[17]:

	ttf
0	112
1	98

TEP (Tennessee Eastman process)¶

Importing¶

[18]:

from tsad.base.datasets import load_tep

dataset = load_tep()

Dataset info¶

[19]:

print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")

Dataset's name: TEP (Tennessee Eastman process)

Dataset's description: Each training data file contains 480 rows and 52 columns and each testing data file contains 960 rows and 52 columns.  An observation vector at a particular time instant is given by x=[XMEAS(1), XMEAS(2), ..., XMEAS(41), XMV(1), ..., XMV(11)]^T where XMEAS(n) is the n-th measured variable and XMV(n) is the n-th manipulated variable.

Tast to solve with dataset: Outlier detection

Dataset's features: ['XMEAS(1)', 'XMEAS(2)', 'XMEAS(3)', 'XMEAS(4)', 'XMEAS(5)', 'XMEAS(6)', 'XMEAS(7)', 'XMEAS(8)', 'XMEAS(9)', 'XMEAS(10)', 'XMEAS(11)', 'XMEAS(12)', 'XMEAS(13)', 'XMEAS(14)', 'XMEAS(15)', 'XMEAS(16)', 'XMEAS(17)', 'XMEAS(18)', 'XMEAS(19)', 'XMEAS(20)', 'XMEAS(21)', 'XMEAS(22)', 'XMEAS(23)', 'XMEAS(24)', 'XMEAS(25)', 'XMEAS(26)', 'XMEAS(27)', 'XMEAS(28)', 'XMEAS(29)', 'XMEAS(30)', 'XMEAS(31)', 'XMEAS(32)', 'XMEAS(33)', 'XMEAS(34)', 'XMEAS(35)', 'XMEAS(36)', 'XMEAS(37)', 'XMEAS(38)', 'XMEAS(39)', 'XMEAS(40)', 'XMEAS(41)', 'XMV(1)', 'XMV(2)', 'XMV(3)', 'XMV(4)', 'XMV(5)', 'XMV(6)', 'XMV(7)', 'XMV(8)', 'XMV(9)', 'XMV(10)', 'XMV(11)']

Dataset's target: None

Dataset¶

[20]:

dataset.frame.head(2)

[20]:

		XMEAS(1)	XMEAS(2)	XMEAS(3)	XMEAS(4)	XMEAS(5)	XMEAS(6)	XMEAS(7)	XMEAS(8)	XMEAS(9)	XMEAS(10)	...	XMV(2)	XMV(3)	XMV(4)	XMV(5)	XMV(6)	XMV(7)	XMV(8)	XMV(9)	XMV(10)	XMV(11)
experiment	index
1	0	0.25025	3657.2	4520.1	9.3965	26.715	42.191	2704.5	74.593	120.42	0.33701	...	53.850	24.670	61.839	22.101	40.078	33.041	48.969	47.459	41.841	18.049
1	1	0.25135	3662.1	4532.3	9.4020	26.644	42.812	2704.9	75.044	120.39	0.33723	...	53.705	24.562	61.348	22.264	40.050	39.154	49.870	47.403	41.188	18.008

2 rows × 52 columns

Pressurized Water Reactor (PWR) Dataset for Fault Detection¶

Importing¶

[21]:

from tsad.base.datasets import load_pwr_anomalies

dataset = load_pwr_anomalies()

Dataset info¶

[22]:

print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")

Dataset's name: Pressurized Water Reactor (PWR) Dataset for Fault Detection

Dataset's description: Our collected dataset is benchmark data in case of reactor abnormalities detection with labels. There are 267 readings from 14 sensors of three categories: a temperature sensor, pressure sensor, and vibration sensor (including ionization chamber, accelerometer, and relative displacement sensors). This particular dataset can be utilized in the case of unsupervised abnormality detection.

Tast to solve with dataset: Anomaly detection

Dataset's features: ['Temperature', 'Pressure', 'Flow1', 'Flow2', 'VRR12', 'VRR22', 'VRR23', 'VRR33', 'VRS01', 'VRS03', 'VRS21', 'VRS31', 'VRS02', 'VRI01', 'VRI02', 'VRI03']

Dataset's target: None

Dataset¶

[23]:

dataset.frame.head(2)

[23]:

	Temperature	Pressure	Flow1	Flow2	VRR12	VRR22	VRR23	VRR33	VRS01	VRS03	VRS21	VRS31	VRS02	VRI01	VRI02	VRI03
Readings
1	248.852987	9.689813	4462.130014	13302.9265	19.060938	0.059119	0.050589	0.111864	0.033951	0.047812	0.232627	0.253775	0.400726	1.763223	0.003031	0.004995
2	269.315740	1.279532	4480.252595	13784.45225	19.062128	0.059089	0.048788	0.111340	0.034060	0.052611	0.233342	0.315067	0.128517	1.769272	0.003164	0.004999

NPP Power Transformer RUL¶

Importing¶

[3]:

from tsad.base.datasets import load_transformer_rul

dataset = load_transformer_rul()

Dataset info¶

[4]:

print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")

Dataset's name: NPP Power Transformer RUL

Dataset's description: Dataset for Determining the Remaining Useful Life of Transformers. It is necessary to create a mathematical model that will determine RUL by the final 420 points. The period between time points is 12 hours.

Tast to solve with dataset: Remaining useful life prediction

Dataset's features: ['H2', 'CO', 'C2H4', 'C2H2']

Dataset's target: ['predicted']

Dataset¶

Dataset has four separate files with X_train, X_test, y_train, y_test sets.

X_train:

[5]:

dataset.frame[0].head(2)

[5]:

		H2	CO	C2H4	C2H2
id	time point
2_trans_497.csv	0	0.001202	0.029565	0.001069	0.000251
2_trans_497.csv	1	0.001202	0.029563	0.001068	0.000251

X_test:

[6]:

dataset.frame[1].head(2)

[6]:

		H2	CO	C2H4	C2H2
id	time point
2_trans_1853.csv	0	0.001664	0.026699	0.003253	0.000104
2_trans_1853.csv	1	0.001664	0.026705	0.003253	0.000104

y_train:

[7]:

dataset.frame[2].head(2)

[7]:

	predicted
id
2_trans_497.csv	550
2_trans_483.csv	1093

y_test:

[8]:

dataset.frame[3].head(2)

[8]:

	predicted
id
2_trans_1853.csv	693
2_trans_1106.csv	1093

[ ]: