24-2학기 산업공학캡스톤PBL 수업의 일환으로, Ensemble of Sparsely connected RNN and Autoencoder for Anomaly Detection in Financial Time Series을 주제로 실험을 진행하였습니다.
PyTorch framework로 base & advanced model을 구현한 코드는 https://github.com/abcd-EGH/srnn-ae 에서 확인 가능합니다.
Base model 구현을 위해 다음 논문과 GitHub를 참고하였습니다: Tung Kieu, Bin Yang, Chenjuan Guo, Christian S. Jensen, Outlier Detection for Time Series with Recurrent Autoencoder Ensembles, IJCAI 2019. https://doi.org/10.24963/ijcai.2019/378
https://github.com/tungk/OED
This project is part of the 24-2 Semester Industrial Engineering Capstone PBL course. The experiment focuses on the topic: Ensemble of Sparsely Connected RNN and Autoencoder for Anomaly Detection in Financial Time Series.
The code for both the base and advanced models implemented using the PyTorch framework is available at https://github.com/abcd-EGH/srnn-ae.
To implement the base model, the following paper and GitHub repository were referenced:
Tung Kieu, Bin Yang, Chenjuan Guo, Christian S. Jensen, Outlier Detection for Time Series with Recurrent Autoencoder Ensembles, IJCAI 2019. https://doi.org/10.24963/ijcai.2019/378
https://github.com/tungk/OED
This implementation uses the following publicly available dataset:
- NAB (Numenta Anomaly Benchmark): For anomaly detection in time series data.
To use the dataset, please refer to the original paper or download them directly from the provided sources.
E.g. Download w/ Git Commend
git clone https://github.com/numenta/NAB.git
- realKnownCause/machine_temperature_system_failure.csv
- realAWSCloudwatch/ec2_cpu_utilization_825cc2.csv
- artificialWithAnomaly/art_daily_jumpsup.csv
- N = 10, Number of AutoEncoders in Ensemble
- input_size = 1, Input size (e.g., single time series)
- hidden_size = 8, Size of the hidden layer in the RNN
- output_size = 1, Output size
- num_layers = 1, Number of RNN layers (not explicitly mentioned in the paper)
- limit_skip_steps = 10, Maximum number of skip connections L (randomly chosen between 1 and 10)
- learning_rate = 1e-3, Learning rate for the optimizer
- l1_lambda = 1e-5, Regularization parameter for L1 penalty (As an exception, 1e-3 for Residual)
- window_size = 36, Window size for time series, One day for every 288 (not explicitly mentioned in the paper)
- num_epochs = 1000, Number of training epochs (not explicitly mentioned in the paper)
- random_seed = 777, Random seed for reproducibility
- stride = 1
Details of following models can be found at https://github.com/abcd-EGH/srnn-ae.
- ESLAE (Ensemble of Sparsely connection LSTM and AutoEncoder)
- Base Model, Sparsely Connections
- ERSLAE (Ensemble of Residual & Sparsely connection LSTM and AutoEncoder)
- Advanced Model, Residual & Sparsely Connections
- ECSLAE (Ensemble of Concatenation-based skip (Encoder-Decoder) & Sparsely connection LSTM and AutoEncoder)
- Advanced Model, Concatenation-based skip & Sparsely Connections
- EVSLAE (Ensemble of Variable-skip (Encoder-Decoder) & Sparsely connection LSTM and AutoEncoder)
- Advanced Model, Variable-skip & Sparsely Connections
- ESBLAE (Ensemble of Sparsely connection Bi-directional LSTM and AutoEncoder)
- Advanced Model, Bi-directional LSTM
- ECSLAE (Ensemble of Attention & Sparsely connection LSTM and AutoEncoder)
- Advanced Model, Attention
Model | Precision | Recall | F1-Score |
---|---|---|---|
Base | 0.7612 | 0.3810 | 0.5078 |
Residual | 0.7683 | 0.3845 | 0.5125 |
Concatenation | 0.7110 | 0.3558 | 0.4743 |
VariableSkip | 0.8035 | 0.4021 | 0.5360 |
Bi-directional | 0.7410 | 0.3708 | 0.4943 |
Attention | 0.5040 | 0.2522 | 0.3362 |
Model | Precision | Recall | F1-Score |
---|---|---|---|
Base | 0.5347 | 0.3149 | 0.3963 |
Residual | 0.5446 | 0.3207 | 0.4037 |
Concatenation | 0.5495 | 0.3236 | 0.4073 |
VariableSkip | 0.5396 | 0.3178 | 0.4000 |
Bi-directional | 0.5347 | 0.3149 | 0.3963 |
Attention | 0.6238 | 0.3673 | 0.4624 |
Model | Precision | Recall | F1-Score |
---|---|---|---|
Base | 0.5743 | 0.2878 | 0.3835 |
Residual | 0.5545 | 0.2779 | 0.3702 |
Concatenation | 0.5842 | 0.2928 | 0.3901 |
VariableSkip | 0.5594 | 0.2804 | 0.3736 |
Bi-directional | 0.5842 | 0.2928 | 0.3901 |
Attention | 0.5050 | 0.2531 | 0.3372 |
Model | Precision | Recall | F1-Score |
---|---|---|---|
L=1 | 0.5644 | 0.2829 | 0.3769 |
L=5 | 0.5545 | 0.2779 | 0.3702 |
L=9 | 0.1386 | 0.0695 | 0.0926 |
In each hypothesis ipynb file, you can find qualitative results such as Reconstructed data and errors, AUC-ROC curves, and so on.
- Observation: Output data closely follows the shape of input data, achieving high anomaly detection performance (hereafter referred to as "performance") even for time series with high irregularity.
- Interpretation: Residual Connection alleviated the vanishing gradient problem while enhancing generalization performance.
- Observation: Exhibited lower performance for time series with high irregularity but higher performance for time series with consistent amplitude.
- Interpretation: The decoder was able to learn not only compressed information from the encoder but also specific time step information, enabling it to capture large and periodic variations effectively.
- Observation: Achieved strong performance across both highly irregular time series and time series with large, consistent amplitudes.
- Interpretation: Ensuring that the skip length ( L ) was evenly distributed across all autoencoders (from 1 to ( n )) improved generalization performance.
- Observation: Did not achieve consistently high performance overall.
- Bi-directional: Bidirectional learning tended to capture unnecessary information in irregular time series, leading to vanishing gradient issues.
- Attention: For irregular time series, unnecessary attention weights were assigned to noise, while in simpler time series, it struggled to identify "important parts" to learn from.
(L: Length of skip connection)
- Short ( L ): Insufficient learning of time series information.
- Long ( L ): Better learning of long-term information and periodic patterns.
- Observation: Shorter ( L ) resulted in output data closely following the shape of input data, with better performance overall.
- Constructing the ensemble model with a single class led to the following issues:
- Interdependence during Gradient Descent: The weights of each model influenced the learning process of other models.
- Interference during Backpropagation: Overlapping gradients caused interference in the learning paths of individual models.
- Train models with different ( L ) values independently and use only the averaged outputs for predictions. This approach avoids interdependence and interference during the learning process.
- Kieu, T., Yang, B., Guo, C., & Jensen, C. S. (2019). Outlier detection for time series with recurrent autoencoder ensembles. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), 2725–2732. https://doi.org/10.24963/ijcai.2019/378
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. https://doi.org/10.1109/CVPR.2016.90
- Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
- Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681. https://doi.org/10.1109/78.650093
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 30, 5998–6008. https://arxiv.org/abs/1706.03762