- Kaggle - one of the best known data science websites. Home to a ton of datasets and data science competitions.
- data.world - a data collaboration platform (kind of a social networok for data scientists) with a ton of datasets, organized by topics.
- Google's Dataset Search - "Dataset Search enables users to find datasets stored across thousands of repositories on the Web, making these datasets universally accessible and useful."
- Awesome Public Datasets - a GitHub repository with (awesome!) public datasets, organizied by topics/fields.
- Awesome Datasets - a curated list of awesome datasets for papers/experiments/validation.
- Another List of Awesome Datasets - from Awesome Data Sciencce.
- Fivethirtyeight publishes a fair amount of data on their GitHub account.
- Data is Plural - a weekly newsletter of datasets by Jeremy Singer-Vine, the data editor of BuzzFeed News.
- Also check out the Data is Plural Archives.
- 20 Free Big Data Sources Everyone Should Know
- Open Data Sources
- "A Plethora of Data Set Repositories" - these 19 'sets of data sets' cover free or public data from various industries, including small and large, structured and unstructured data sets. From Data Science Centeral.
- The 50 Best Free Datasets for Machine Learning" - "What are some open datasets for machine learning? We at Gengo decided to create the ultimate cheat sheet for high quality datasets. These range from the vast (looking at you, Kaggle) or the highly specific (data for self-driving cars)."
- "Where can I find large datasets open to the public?" - a Quora question with a lot of answers.
- "18 places to find data sets for data science projects" - Dataquest