Location: Koeln, North Rhine-Westphalia, Germany
Email: arijitd@gmail.com
Phone: +49 17684376170
LinkedIn: linkedin.com/in/arijitdas1986
GitHub: github.com/das-projects
Data Scientist and Research Engineer with over 10 years of experience in machine learning, computational biology, and statistical data analysis. Proven track record of developing and deploying advanced models for intelligent document processing, automated medical screening, and signal processing. Expertise in Python, PyTorch, and various MLOps tools, with a strong background in managing large-scale datasets and implementing fairness-aware algorithms. Recognized for improving system performance, reducing bias in automated decision-making, and enhancing the accuracy of predictive models. Seeking to leverage my skills and experience to drive innovation and efficiency in a dynamic research or industry setting.
ERGO Group AG
March 2022 - Present
Düsseldorf, North Rhine-Westphalia, Germany
- Developed and deployed 5+ models into production for large-scale Intelligent Document Processing, automating 40% of 100 million documents that were previously manually processed.
- Led the development of 3+ prototypes and MVPs for Retrieval Augment Generation using AWS and Azure, achieving over 95% accuracy in retrieval performance.
- Designing an MLOps framework to automate the deployment of classification and extraction models, reducing deployment time by 50%.
- Modernized the software stack to Python (PyTorch, PyTorch Lightning, MLFlow, Fast-API) and JavaScript (NextJS, Svelte) to enhance pre-processing, training/validation, experiment management, and server deployment, leading to a 25% improvement in system performance.
Institute and Faculty of Actuaries
August 2021 - December 2023
- Spearheaded the development and implementation of fairness-aware algorithms, decreasing bias in automated decision-making processes by 25%, enhancing ethical standards across 10 institutions, and improving decision accuracy by 20% through comprehensive data analysis and cross-disciplinary collaboration.
- Published a comprehensive paper in the British Actuarial Journal titled "From Bias to Black Boxes: Understanding and Managing the Risks of AI," influencing industry standards and practices in managing AI risks.
Uniklinik Köln
April 2019 - December 2021
Köln, North Rhine-Westphalia, Germany
- Secured a Köln Fortune Research Grant of €120,000 for developing Automated Breast Cancer Screening technology, advancing early detection methods by 30%.
- Supervised four master's theses and collaborated with two doctors on their PhD theses, resulting in three published papers and advancements in Statistically Robust Machine Learning.
- Developed anomaly detection methods in multi-parametric MRIs using Deep Convolutional Neural Networks with FDR control, enhancing detection accuracy by 25%.
- Implemented Lie Group covariant representation of 3D data, improving 3D model accuracy by 20%.
- Conducted Non-linear Independent Component Analysis using Random Fourier Features, improving data separation quality by 15%.
Uniklinik Köln
January 2018 - March 2019
Cologne
- Developed a Discrete Compound Process model for single-cell modeling, incorporating a novel cost function with regularization, improving parameter estimation consistency in under-sampled regimes by 20%.
- Automated breast cancer screening using multiparametric MRI and Deep Convolutional Neural Networks, enhancing early detection accuracy by 30%.
- Enhanced interpretability of Deep Bayesian Convolutional Networks, ensuring invariance to rotations in 3D imaging, leading to a 25% increase in diagnostic reliability.
- Implemented model selection techniques in Deep Neural Networks, controlling false discoveries of features and improving predictive model reliability by 15%.
Max Planck Institute: MPIPZ
September 2013 - December 2017
Cologne Area, Germany
- Designed and analyzed algorithms to control false discoveries, developing machine learning techniques to manage generalization errors. Achieved state-of-the-art results in Genome-Wide Association Studies (GWAS) for breast cancer, reducing false discoveries by 25%.
- Developed an efficient sampling algorithm to sparsify a kernel matrix with bounded error in O(n log n) time, improving computational efficiency by 50% over the standard O(n^2) complexity.
- Facilitated efficient implementations of Gaussian process regression and kernel-based hypothesis testing algorithms for large datasets, reducing processing time by 40%.
- Constructed a regularized cost function for Deep Convolutional Networks to classify Diabetic Retinopathy from retinal images, ensuring controlled false discovery rates and improving classification accuracy by 30%.
Trinity College Dublin
January 2011 - August 2012
Electronics Engineering Department
- Developed variational Bayes techniques for signal processing, focusing on turbo-coding algorithms, which improved inference speed and accuracy by 25%.
- Conducted an in-depth study on signal inference under Raleigh fading of wireless signals in a noisy environment, resulting in improved signal processing algorithms by 20%.
- Taught Statistical Signal Processing to Masters and PhD students, receiving excellent feedback and improving course engagement by 30%.
INRIA Süd-Ouest
December 2009 - November 2010
Bordeaux, France
- Designed and analyzed unsupervised learning algorithms for time series prediction, improving prediction accuracy by 20%.
- Collaborated with EDF (Électricité de France) on forecasting daily and weekly consumption patterns of millions of customers across Europe, using advanced time series analysis techniques to improve prediction accuracy by 40% from their baseline.
- Managed and processed large-scale datasets, handling hundreds of gigabytes of data, which improved data processing efficiency by 30%.
R Foundation for Statistical Computation
May 2008 - August 2008
- Implemented the EM algorithm in C to exploit multi-core architectures, providing an API for parallel computing, which improved computational speed by 35%.
Doctor of Philosophy (PhD), Machine Learning and Computational Biology
Max Planck Institute, Cologne, Germany, 2018
Magna cum Laude
Masters, Mathematics and Statistics
Indian Institute of Technology, Kanpur, India, 2009
Bachelor's Degree, Statistics
Delhi University, New Delhi, 2007
MLOps Engineering on AWS
Amazon Web Services (AWS), 2022
- Generative AI
- Large Language Models (LLM) fine-tuning
- Reinforcement Learning from Human Feedback (RLHF)
- Direct Policy Optimization (DPO)
- Proximal Policy Optimization (PPO)
- Prompt Engineering
- Semantic Search
- Text-to-Speech
- Speech-to-Text
- Statistical Data Analysis
- Time Series Analysis
- Project Management
- Agile Project Management
- Python: PyTorch, PyTorch Lightning, HuggingFace Transformers, Docker, MLFlow, Weights and Biases
- MLOps: AWS Sagemaker, Azure ML
- Web Development: NextJs, Svelte, Fast-API
- Data Processing: Custom OCR, Document Classifier, Named Entity Recognition (NER)
- Machine Learning: Deep Convolutional Neural Networks, Gaussian Process Regression, Kernel-Based Hypothesis Testing
- Cloud Platforms: AWS, Azure
- Software Development: C, Gradio