Datasets

Open data repositories These are mostly structured data (some very large datasets) for secondary analysis, reproducible research, and exploring big data techniques. figshare contains data deposited from academic researchers in […]

Open data repositories

These are mostly structured data (some very large datasets) for secondary analysis, reproducible research, and exploring big data techniques.

  • figshare contains data deposited from academic researchers in a number of disciplines
  • Kaggle hosts a large variety of datasets primarily for training and experimentation. Data is primarily in .csv format though many datasets are in JSON. There are datasets for “big data” and machine learning work and Kaggle sponsors a variety of competitions around data analysis. The data comes from a variety of sources and is not peer-reviewed by Kaggle itself (so be cautious about research use).
  • Zenodo is an open source global repository for research data.

Subscription-based datasets

The library has subscription-based access to some data resources that will require you to log in with your Claremont credentials.

These subscription databases primarily include government statistics, and business and industry reports.

Take a look at the Finding Data & Statistics Guide for more information.

Social media datasets

Library research guide to using Twitter data.

Dataset grab bag

New and unusual datasets that might be interesting to work with (a rotating feature — suggestions welcome!)

What’s on the menu? Historic menu data from the city of New York.

Wikipedia – How often do Wikipedia editors edit?

Remittances – World Bank data on international remittance payments.