[1]:
import sys
import pandas as pd
sys.path.insert(1, '../')
Let’s demonstrate in this how to import datasets and check some information about them. To begin with, we can check all the available datasets:
[2]:
from tsad.base.datasets import list_of_datasets
list_of_datasets()
[2]:
{'Combines state monitoring': 'load_combines()',
'SKAB (skoltech anomaly benchmark) teaser': 'load_skab_teaser()',
'SKAB (skoltech anomaly benchmark)': 'load_skab()',
'NASA Turbofan Jet Engine Data Set': 'load_turbofan_jet_engine()',
'TEP (Tennessee Eastman process)': 'load_tep()',
'Pressurized Water Reactor (PWR) Dataset for Fault Detection': 'load_pwr_anomalies()',
'NPP Power Transformer RUL': 'load_transformer_rul()'}
In this dictionary keys represent names of the datasets, and values represent modules with the datasets. Let’s try them out.
Docstrings contain links to the detailed dataset description, if one exists.
Combines state monitoring dataset¶
Importing¶
[3]:
from tsad.base.datasets import load_combines
dataset = load_combines()
Dataset info¶
[4]:
print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")
Dataset's name: Combines state monitoring
Dataset's description:
Tast to solve with dataset:
Dataset's features: ['Anker', 'Cut', 'Go', 'Uncert']
Dataset's target: None
Dataset¶
[5]:
dataset.frame.head(2)
[5]:
| Описание | Anker | Cut | Go | Uncert |
|---|---|---|---|---|
| Время | ||||
| 2023-04-21 13:32:48.228 | 0.0 | NaN | NaN | NaN |
| 2023-04-21 13:32:48.230 | NaN | NaN | 0.0 | NaN |
SKAB (skoltech anomaly benchmark)¶
Importing¶
[10]:
from tsad.base.datasets import load_skab
dataset = load_skab()
Dataset info¶
[11]:
print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")
Dataset's name: SKAB (skoltech anomaly benchmark)
Dataset's description: Dataset for process monitoring (changepoint detection) benchmarking
Tast to solve with dataset: Process monitoring (changepoint detection)
Dataset's features: ['Accelerometer1RMS', 'Accelerometer2RMS', 'Current', 'Pressure', 'Temperature', 'Thermocouple', 'Voltage', 'Volume Flow RateRMS']
Dataset's target: ['anomaly', 'changepoint']
Dataset¶
[12]:
dataset.frame.head(2)
[12]:
| Accelerometer1RMS | Accelerometer2RMS | Current | Pressure | Temperature | Thermocouple | Voltage | Volume Flow RateRMS | anomaly | changepoint | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| experiment | datetime | ||||||||||
| valve1/6 | 2020-03-09 12:14:36 | 0.027429 | 0.040353 | 0.77031 | 0.382638 | 71.2129 | 25.0827 | 219.789 | 32.0000 | 0.0 | 0.0 |
| 2020-03-09 12:14:37 | 0.027269 | 0.040226 | 1.09696 | 0.710565 | 71.4284 | 25.0863 | 233.117 | 32.0104 | 0.0 | 0.0 |
NASA Turbofan Jet Engine Data Set¶
Importing¶
[13]:
from tsad.base.datasets import load_turbofan_jet_engine
dataset = load_turbofan_jet_engine()
Dataset info¶
[14]:
print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")
Dataset's name: NASA Turbofan Jet Engine Data Set
Dataset's description: Dataset includes Run-to-Failure simulated data from turbo fan jet engines. In this dataset the goal is to predict the remaining useful life (RUL) of each engine in the test dataset. RUL is equivalent of number of flights remained for the engine after the last datapoint in the test dataset.
- In train dataset there are 100 engines. The last cycle for each engine represents the cycle when failure had happened.
- In test dataset there are 100 engines as well. But this time, failure cycle was not provided.
Tast to solve with dataset: Remaining useful life prediction
Dataset's features: ['id', 'cycle', 'setting1', 'setting2', 'setting3', 's1', 's2', 's3', 's4', 's5', 's6', 's7', 's8', 's9', 's10', 's11', 's12', 's13', 's14', 's15', 's16', 's17', 's18', 's19', 's20', 's21']
Dataset's target: ['ttf']
Dataset¶
Dataset has separate X_train, X_test and y_test.
X_train:
[15]:
dataset.frame[0].head(2)
[15]:
| id | cycle | setting1 | setting2 | setting3 | s1 | s2 | s3 | s4 | s5 | ... | s12 | s13 | s14 | s15 | s16 | s17 | s18 | s19 | s20 | s21 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | -0.0007 | -0.0004 | 100.0 | 518.67 | 641.82 | 1589.70 | 1400.60 | 14.62 | ... | 521.66 | 2388.02 | 8138.62 | 8.4195 | 0.03 | 392 | 2388 | 100.0 | 39.06 | 23.4190 |
| 1 | 1 | 2 | 0.0019 | -0.0003 | 100.0 | 518.67 | 642.15 | 1591.82 | 1403.14 | 14.62 | ... | 522.28 | 2388.07 | 8131.49 | 8.4318 | 0.03 | 392 | 2388 | 100.0 | 39.00 | 23.4236 |
2 rows × 26 columns
X_test:
[16]:
dataset.frame[1].head(2)
[16]:
| id | cycle | setting1 | setting2 | setting3 | s1 | s2 | s3 | s4 | s5 | ... | s12 | s13 | s14 | s15 | s16 | s17 | s18 | s19 | s20 | s21 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 0.0023 | 0.0003 | 100.0 | 518.67 | 643.02 | 1585.29 | 1398.21 | 14.62 | ... | 521.72 | 2388.03 | 8125.55 | 8.4052 | 0.03 | 392 | 2388 | 100.0 | 38.86 | 23.3735 |
| 1 | 1 | 2 | -0.0027 | -0.0003 | 100.0 | 518.67 | 641.71 | 1588.45 | 1395.42 | 14.62 | ... | 522.16 | 2388.06 | 8139.62 | 8.3803 | 0.03 | 393 | 2388 | 100.0 | 39.02 | 23.3916 |
2 rows × 26 columns
y_test:
[17]:
dataset.frame[2].head(2)
[17]:
| ttf | |
|---|---|
| 0 | 112 |
| 1 | 98 |
TEP (Tennessee Eastman process)¶
Importing¶
[18]:
from tsad.base.datasets import load_tep
dataset = load_tep()
Dataset info¶
[19]:
print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")
Dataset's name: TEP (Tennessee Eastman process)
Dataset's description: Each training data file contains 480 rows and 52 columns and each testing data file contains 960 rows and 52 columns. An observation vector at a particular time instant is given by x=[XMEAS(1), XMEAS(2), ..., XMEAS(41), XMV(1), ..., XMV(11)]^T where XMEAS(n) is the n-th measured variable and XMV(n) is the n-th manipulated variable.
Tast to solve with dataset: Outlier detection
Dataset's features: ['XMEAS(1)', 'XMEAS(2)', 'XMEAS(3)', 'XMEAS(4)', 'XMEAS(5)', 'XMEAS(6)', 'XMEAS(7)', 'XMEAS(8)', 'XMEAS(9)', 'XMEAS(10)', 'XMEAS(11)', 'XMEAS(12)', 'XMEAS(13)', 'XMEAS(14)', 'XMEAS(15)', 'XMEAS(16)', 'XMEAS(17)', 'XMEAS(18)', 'XMEAS(19)', 'XMEAS(20)', 'XMEAS(21)', 'XMEAS(22)', 'XMEAS(23)', 'XMEAS(24)', 'XMEAS(25)', 'XMEAS(26)', 'XMEAS(27)', 'XMEAS(28)', 'XMEAS(29)', 'XMEAS(30)', 'XMEAS(31)', 'XMEAS(32)', 'XMEAS(33)', 'XMEAS(34)', 'XMEAS(35)', 'XMEAS(36)', 'XMEAS(37)', 'XMEAS(38)', 'XMEAS(39)', 'XMEAS(40)', 'XMEAS(41)', 'XMV(1)', 'XMV(2)', 'XMV(3)', 'XMV(4)', 'XMV(5)', 'XMV(6)', 'XMV(7)', 'XMV(8)', 'XMV(9)', 'XMV(10)', 'XMV(11)']
Dataset's target: None
Dataset¶
[20]:
dataset.frame.head(2)
[20]:
| XMEAS(1) | XMEAS(2) | XMEAS(3) | XMEAS(4) | XMEAS(5) | XMEAS(6) | XMEAS(7) | XMEAS(8) | XMEAS(9) | XMEAS(10) | ... | XMV(2) | XMV(3) | XMV(4) | XMV(5) | XMV(6) | XMV(7) | XMV(8) | XMV(9) | XMV(10) | XMV(11) | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| experiment | index | |||||||||||||||||||||
| 1 | 0 | 0.25025 | 3657.2 | 4520.1 | 9.3965 | 26.715 | 42.191 | 2704.5 | 74.593 | 120.42 | 0.33701 | ... | 53.850 | 24.670 | 61.839 | 22.101 | 40.078 | 33.041 | 48.969 | 47.459 | 41.841 | 18.049 |
| 1 | 0.25135 | 3662.1 | 4532.3 | 9.4020 | 26.644 | 42.812 | 2704.9 | 75.044 | 120.39 | 0.33723 | ... | 53.705 | 24.562 | 61.348 | 22.264 | 40.050 | 39.154 | 49.870 | 47.403 | 41.188 | 18.008 |
2 rows × 52 columns
Pressurized Water Reactor (PWR) Dataset for Fault Detection¶
Importing¶
[21]:
from tsad.base.datasets import load_pwr_anomalies
dataset = load_pwr_anomalies()
Dataset info¶
[22]:
print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")
Dataset's name: Pressurized Water Reactor (PWR) Dataset for Fault Detection
Dataset's description: Our collected dataset is benchmark data in case of reactor abnormalities detection with labels. There are 267 readings from 14 sensors of three categories: a temperature sensor, pressure sensor, and vibration sensor (including ionization chamber, accelerometer, and relative displacement sensors). This particular dataset can be utilized in the case of unsupervised abnormality detection.
Tast to solve with dataset: Anomaly detection
Dataset's features: ['Temperature', 'Pressure', 'Flow1', 'Flow2', 'VRR12', 'VRR22', 'VRR23', 'VRR33', 'VRS01', 'VRS03', 'VRS21', 'VRS31', 'VRS02', 'VRI01', 'VRI02', 'VRI03']
Dataset's target: None
Dataset¶
[23]:
dataset.frame.head(2)
[23]:
| Temperature | Pressure | Flow1 | Flow2 | VRR12 | VRR22 | VRR23 | VRR33 | VRS01 | VRS03 | VRS21 | VRS31 | VRS02 | VRI01 | VRI02 | VRI03 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Readings | ||||||||||||||||
| 1 | 248.852987 | 9.689813 | 4462.130014 | 13302.9265 | 19.060938 | 0.059119 | 0.050589 | 0.111864 | 0.033951 | 0.047812 | 0.232627 | 0.253775 | 0.400726 | 1.763223 | 0.003031 | 0.004995 |
| 2 | 269.315740 | 1.279532 | 4480.252595 | 13784.45225 | 19.062128 | 0.059089 | 0.048788 | 0.111340 | 0.034060 | 0.052611 | 0.233342 | 0.315067 | 0.128517 | 1.769272 | 0.003164 | 0.004999 |
NPP Power Transformer RUL¶
Importing¶
[3]:
from tsad.base.datasets import load_transformer_rul
dataset = load_transformer_rul()
Dataset info¶
[4]:
print(f"Dataset's name: {dataset.name}\n")
print(f"Dataset's description: {dataset.description}\n")
print(f"Tast to solve with dataset: {dataset.task}\n")
print(f"Dataset's features: {dataset.feature_names}\n")
print(f"Dataset's target: {dataset.target_names}")
Dataset's name: NPP Power Transformer RUL
Dataset's description: Dataset for Determining the Remaining Useful Life of Transformers. It is necessary to create a mathematical model that will determine RUL by the final 420 points. The period between time points is 12 hours.
Tast to solve with dataset: Remaining useful life prediction
Dataset's features: ['H2', 'CO', 'C2H4', 'C2H2']
Dataset's target: ['predicted']
Dataset¶
Dataset has four separate files with X_train, X_test, y_train, y_test sets.
X_train:
[5]:
dataset.frame[0].head(2)
[5]:
| H2 | CO | C2H4 | C2H2 | ||
|---|---|---|---|---|---|
| id | time point | ||||
| 2_trans_497.csv | 0 | 0.001202 | 0.029565 | 0.001069 | 0.000251 |
| 1 | 0.001202 | 0.029563 | 0.001068 | 0.000251 |
X_test:
[6]:
dataset.frame[1].head(2)
[6]:
| H2 | CO | C2H4 | C2H2 | ||
|---|---|---|---|---|---|
| id | time point | ||||
| 2_trans_1853.csv | 0 | 0.001664 | 0.026699 | 0.003253 | 0.000104 |
| 1 | 0.001664 | 0.026705 | 0.003253 | 0.000104 |
y_train:
[7]:
dataset.frame[2].head(2)
[7]:
| predicted | |
|---|---|
| id | |
| 2_trans_497.csv | 550 |
| 2_trans_483.csv | 1093 |
y_test:
[8]:
dataset.frame[3].head(2)
[8]:
| predicted | |
|---|---|
| id | |
| 2_trans_1853.csv | 693 |
| 2_trans_1106.csv | 1093 |
[ ]: