Releases: tugraz-isds/systemds
Test RC
SystemDS 0.2.0 (March 24, 2020)
Release Notes
SystemDS 0.2.0 is the second release under the new name after forking from SystemML.
This release has seen a wealth of little fixes here and there to accomodate some the major
features which extend the functionality of this system.
Changes in this release include
- Initial work on federated operations on matrices and tensors, where special instructions push down as much computation as possible to remote workers.
- Tensor operations have been extended.
- Python bindings bridge the gap to make SystemDS available to a greater audience that already has experience or existing code in Python. Initial functionality is there for matrix operations, federated tensors and lineage traces.
- Lineage support has gained caching and reuse functionality. Lineage can also be traced on Spark now.
- Several methods for data cleaning have been implemented. A first version of multiple imputations with an implementation of multivariate imputation by chained equations (MICE) and support for outlier detection using standard deviation and inter-quartile range. Additional methods and builtin functions are detectSchema, typeof, hidden markov models for missing value imputation and functional dependency discovery.
- A slice finder helps in model debugging.
- Cloud deployment scripts for AWS and scripts to set up and start federated operations.
- More algorithms/methods/builtins (shared marriage, data augmentation, feature hashing, l2svm, msvm, multiLogReg, set intersection, crossvalidation, naïve bayes, is.na/nan/inf, eval fcall, list-entry-removal, GNMF, PNMF).
- Performance improvements (parallel sort, gpu cum agg, append cbind)
- New rewrites (agg remove empty, lineage, nary plus element-wise operations, eliminate rmEmpty, tsmm/mm over lists of folds)
- New data reader/writer for json frames and support for sql as a data source
- Miscellaneous improvements: compressed matrices ported from SystemML, more documentation, better testing, run/release scripts, bug fixes
Acknowledgements
Thanks to Enrique Barba Roque, Sebastian Baunsgaard, Matthias Boehm, Mark Dokter, Lukas Erlbacher, Kevin Innerebner, Florijan Klezin, Valentin Leutgeb, Arnab Phani, Benjamin Rath, Svetlana Sagadeeva, Afan Secic, Shafaq Siddiqi, Thomas Wedenig, Sebastian Wrede for their support in the creation of the release of SystemDS 0.2.0.
SystemDS 0.1.0 (August 31, 2019)
Release Notes
SystemDS 0.1.0 is the initial public release of SystemDS after being forked from Apache SystemML in September 2018 and contains a major refactoring and several experimental features that aim at better support for the end-to-end data science lifecycle.
The major changes (compared to SystemML 1.2) and new features are
- New mechanism for DML-bodied (script-level) builtin functions, and selected new built-in functions for data augmentation, outlier detection, data preprocessing, feature engineering, ML algorithms, and model debugging.
- Various compiler and runtime improvements: new and improved IPA rewrites, new libsvm I/O format, reduced Spark context creation, updated native kernel libraries
- New lineage tracing and reuse (lineage tracing, loop lineage deduplication, full and partial reuse of intermediates, serialization and deserialization of lineage traces) [experimental]
- New tensor data model (basic tensors of different value types, data tensors with schema) [experimental]
- Backported SystemML features on cumulative aggregates, various sparsity estimators, and improved transform.
- Removed baggage: MapReduce compiler and runtime backend, pydml parser and language support, Java-UDF framework, script-level debugger
Acknowledgements
Thanks to Iulian Antonov, Matthias Boehm, Mark Dokter, Kevin Innerebner, Philipp Ortner, Arnab Phani, Benjamin Rath for their contributions to SystemDS 0.1.0 as well as the entire Apache SystemML team for the initial code base, documentation, and other resources.