Open data repositories
These are mostly structured data (some very large datasets) for secondary analysis, reproducible research, and exploring big data techniques.
- Collaborative Research in Computational Neuroscience (CRNS) hosts experimental datasets of high quality that will be valuable for testing computational models of the brain and new analysis methods. The data include physiological recordings from sensory and memory systems, as well as eye movement data.
- figshare contains data deposited from academic researchers in a number of disciplines
- ICPSR is one of the largest and oldest archives of social science data. Its faceted search interface enables you to find datasets based on subject areas, as well as variable types and research methods used.
- Kaggle hosts a large variety of datasets primarily for training and experimentation. Data is primarily in .csv format though many datasets are in JSON. There are datasets for “big data” and machine learning work and Kaggle sponsors a variety of competitions around data analysis. The data comes from a variety of sources and is not peer-reviewed by Kaggle itself (so be cautious about research use).
- Zenodo is an open source global repository for research data.
Subscription-based datasets
The library has subscription-based access to some data resources that will require you to log in with your Claremont credentials.
These subscription databases primarily include government statistics, and business and industry reports.
Take a look at the Finding Data & Statistics Guide for more information.
Dataset grab bag
New and unusual datasets that might be interesting to work with (a rotating feature — suggestions welcome!)
What’s on the menu? Historic menu data from the city of New York.
Wikipedia – How often do Wikipedia editors edit?
Remittances – World Bank data on international remittance payments.