Datasets – Data Science Hub @ the Claremont Colleges Library

Open data repositories

These are mostly structured data (some very large datasets) for secondary analysis, reproducible research, and exploring big data techniques.

Collaborative Research in Computational Neuroscience (CRNS) hosts experimental datasets of high quality that will be valuable for testing computational models of the brain and new analysis methods. The data include physiological recordings from sensory and memory systems, as well as eye movement data.
figshare contains data deposited from academic researchers in a number of disciplines
ICPSR is one of the largest and oldest archives of social science data. Its faceted search interface enables you to find datasets based on subject areas, as well as variable types and research methods used.
Kaggle hosts a large variety of datasets primarily for training and experimentation. Data is primarily in .csv format though many datasets are in JSON. There are datasets for “big data” and machine learning work and Kaggle sponsors a variety of competitions around data analysis. The data comes from a variety of sources and is not peer-reviewed by Kaggle itself (so be cautious about research use).
Zenodo is an open source global repository for research data.

The library has subscription-based access to some data resources that will require you to log in with your Claremont credentials.

These subscription databases primarily include government statistics, and business and industry reports.

Take a look at the Finding Data & Statistics Guide for more information.

New and unusual datasets that might be interesting to work with (a rotating feature — suggestions welcome!)

What’s on the menu? Historic menu data from the city of New York.

Remittances – World Bank data on international remittance payments.