- About
- Section 01 - Fundamentals
- Section 02 - Mathematics and Statistics Applied in Data and Computing
- Section 03 - Programming for Data Science
- Section 04 - Data Mining
- Section 05 - Databases
- Section 06 - Big Data
- Section 07 - Machine Learning
- Section 08 - Deep Learning
- Section 09 - Data Warehousing
- Section 10 - Cloud Computing
- Extra Bibliography
- Notes and Clarifications
- References
The Self-taught Data Science Curriculum is a learning guide I developed to master data science concepts and skills for free. Upon realizing the vast amount of high-quality, free resources available online, I decided to compile and organize them into a coherent roadmap. This project is not only my personal journey into data science but also a guide for anyone who wishes to follow a similar path.
Initially, this curriculum was designed for my own learning, but you are welcome to clone it and explore the courses if they align with your goals. The material here covers a broad range of topics essential for a successful data science career, from programming to artificial intelligence. The sources I used can be found in the "References" section at the end of the README.
The main objective is to follow a structured learning path inspired by the roadmap from the AI Expert team. The key skills and concepts I aim to master by the end of this curriculum include:
- Python: The primary language for data manipulation, machine learning, and AI model development. Python will be heavily explored due to its versatility and wide adoption in data science.
- R: A powerful language for statistical analysis, data visualization, and in-depth exploration of statistical data.
- Databases: Focus on both relational (SQL) and non-relational (NoSQL) database systems for effective data management and retrieval.
- Data Warehousing: Understanding the design and implementation of data warehouses for efficient storage and management of large datasets.
- Machine Learning: Learn how to build and apply machine learning models for tasks such as predictive analytics, classification, and pattern recognition.
- Deep Learning: Dive into neural networks, with an emphasis on frameworks like TensorFlow and PyTorch, to explore architectures and advanced AI techniques.
This curriculum is broken down into various modules that align with the core areas of data science. You can follow them sequentially or skip to specific areas based on your current knowledge and interests. I encourage you to adapt this guide to your own learning style, pace, and goals.
The "References" section at the end of this repository contains a comprehensive list of resources that I consulted while building this guide, including free online courses, tutorials, and learning platforms.
Feel free to make this description more personal or technical based on your style! It provides a structured overview while highlighting your personal journey and intention of sharing knowledge with others.
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Data – What It Is, What We Can Do With It | Johns Hopkins University | ~11h | Certificate of Completion | ✓ |
What is Data Science? | IBM Skills Network | ~11h | Certificate of Completion | ✓ |
The Data Scientist's Toolbox | Johns Hopkins University | ~18h | Certificate of Completion | ✓ |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
Linear Algebra for Machine Learning and Data Science | DeepLearning.AI | ~34h | -- | -- |
Calculus for Machine Learning and Data Science | DeepLearning.AI | ~25h | -- | -- |
Probability and Statistics for Machine Learning and Data Science | DeepLearning.AI | ~33h | -- | -- |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
Introduction to Data Science in Python | University of Michigan | ~34h | -- | -- |
Applied Plotting, Charting & Data Representation in Python | University of Michigan | ~24h | -- | -- |
Applied Machine Learning in Python | University of Michigan | ~31h | -- | -- |
Applied Text Mining in Python | University of Michigan | ~25h | -- | -- |
Applied Social Network Analysis in Python | University of Michigan | ~26h | -- | -- |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
R Programming | Johns Hopkins University | ~27h | -- | -- |
Advanced R Programming | Johns Hopkins University | ~18h | -- | -- |
Building R Packages | Johns Hopkins University | ~20 | -- | -- |
Building Data Visualization Tools | Johns Hopkins University | ~12h | -- | -- |
Mastering Software Development in R | Johns Hopkins University | ~3h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Data Visualization | University of Illinois Urbana-Champaign | ~15h | -- | -- |
Text Retrieval and Search Engines | University of Illinois Urbana-Champaign | ~30h | -- | -- |
Text Mining and Analysis | University of Illinois Urbana-Champaign | ~33h | -- | -- |
Pattern Discovery in Data Mining | University of Illinois Urbana-Champaign | ~17h | -- | -- |
Cluster Analysis in Data Mining | University of Illinois Urbana-Champaign | ~16h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Relational Database Design | University of Colorado | ~34h | -- | -- |
The Structured Query Language (SQL) | University of Colorado | ~26h | -- | -- |
Advanced Topics and Future Trends in Database Technologies | University of Colorado | ~16h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Introduction to Big Data | University of California | ~17h | -- | -- |
Big Data Modeling and Management Systems | University of California | ~13h | -- | -- |
Big Data Integration and Processing | University of California | ~17h | -- | -- |
Machine Learning with Big Data | University of California | ~23h | -- | -- |
Graph Analytics for Big Data | University of California | ~13h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Supervised Machine Learning: Regression and Classification | DeepLearning.AI | ~33h | -- | -- |
Advanced Machine Learning Algorithms | DeepLearning.AI | ~34h | -- | -- |
Unsupervised Learning, Recommenders, Reinforcement Learning | DeepLearning.AI | ~37h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Neural Networks and Deep Learning | DeepLearning.AI | ~24h | -- | -- |
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization | DeepLearning.AI | ~23h | -- | -- |
Structuring Machine Learning Projects | DeepLearning.AI | ~06h | -- | -- |
Convolutional Neural Networks | DeepLearning.AI | ~35h | -- | -- |
Sequence Models | DeepLearning.AI | ~37h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Database Management Essentials | Colorado Boulder | ~122h | -- | -- |
Data Warehouse Concepts, Design, and Data Integration | Colorado Boulder | ~62h | -- | -- |
Relational Database Support for Data Warehouses | Colorado Boulder | ~71h | -- | -- |
Business Intelligence Concepts, Tools, and Applications | Colorado Boulder | ~21h | -- | -- |
Design and Build a Data Warehouse for Business Intelligence Implementation | Colorado Boulder | ~31h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Cloud Concepts 1 | University of Illinois Urbana-Champaign | ~24h | -- | -- |
Cloud Concepts 2 | University of Illinois Urbana-Champaign | ~19h | -- | -- |
Cloud applications 1 | University of Illinois Urbana-Champaign | ~15h | -- | -- |
Cloud applications 2 | University of Illinois Urbana-Champaign | ~19h | -- | -- |
Cloud Networks | University of Illinois Urbana-Champaign | ~22h | -- | -- |
Cloud Computing Project | University of Illinois Urbana-Champaign | ~21h | -- | -- |
- Discrete Mathematics: Foundations - David J. Hunter
- Concrete Mathematics: A Foundation for Computer Science - Ronald Graham
- Pre-Calculus - Valéria Zuma Medeiros
- Calculus I - James Stewart
- Calculus II - James Stewart
- Numerical Calculus: Theoretical and Computational Aspects - Marcia Gomes
- Elementary Linear Algebra - Howard Anton
- Analytical Geometry: A Vector Treatment - Ivan De Camargo
- Introduction to Statistical Theory - Alexander Mood
- Matrix Algebra Useful for Statistics - Andre I Khuri
- The Elements of Statistical Learning - Trevor Hastie, Robert Tibshirani, Jerome Friedman
- Introduction to Linear Regression Analysis - Douglas C Montgomery
- Bayesian Statistics - Peter M. Lee
- Monte Carlo Markov Chain: Stochastic Simulation for Bayesian Inference - Dani Gamerman
- Applied Nonparametric Statistical Methods - Nigel C Smeeton
- Interpreting Regression Models Based on Computational Intelligence - János Abonyi
- Regression Models with Computational Support - Gilberto A. Paula
- An Introduction to Statistical Learning with Applications in R - Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
- SQL for Smarties: Advanced SQL Programming - Joe Celko
- Deep Learning Papers Reading Roadmap - Roadmap of DL Papers
- Artificial Intelligence: A Modern Approach - Stuart J. Russell
- The Missing Semester of Your CS Education - MIT
These resources cover a wide range of topics from foundational mathematics and statistical theory to advanced machine learning and artificial intelligence.
-
The duration of the courses listed here are estimates provided by the platforms where they are offered.
-
At the moment, I am working on this graduation, so the tense of this
readme
is a bit strange, sometimes in the past, sometimes in the future. As I work on it, I will reformat it to better reflect my experience. -
Regarding the books, my university has partnerships with some platforms like O'Reilly, in addition to a very large library where I managed to find almost all of them. But if you don't have access... ahem... try to see if they fall off the truck... ahem... but if you can buy them, please do.
Sources consulted for the construction of this curriculum.
-
OSSU Data Science - OSSU offers a free, open-source curriculum in data science, perfect for those looking to study technology in a self-paced and flexible manner. I highly recommend OSSU and any initiative that aims to democratize education.
-
AI Expert Roadmap - A detailed roadmap to becoming an AI expert, developed by specialists in the field.
-
Python Developer - Roadmap SH provides comprehensive learning paths across various technology areas and tools. This link directs to the Python roadmap, but they offer many other paths.
-
PostgreSQL - PostgreSQL Database Administrator roadmap, also from Roadmap SH, outlining a specific learning path for professionals in the field.
-
USP Statistics Course - Curriculum for the Bachelor's Degree in Statistics at the University of São Paulo, used to guide the selection of courses and books in this list.