Start by defining your topic
When looking for data, be specific about your topic so that you can narrow your search, but be flexible enough to tailor your needs to existing sources. Remember to define your topic with enough flexibility to adapt to available data! Data is not available for every thinkable topic. Some data is hidden (behind a pay-wall for example), uncollected, unavailable. Be prepared to try alternative data.
#1. Identify your research or question
Think about
#2. Consider the characteristics of data needed:
Unit of analysis: This is the population that you want to study (e.g. individuals, households, companies, crops, arrests, nations).
Geography: Geography or place (e.g. nation, state, county, metropolitan statistical areas, tracts, block groups).
Time period: This is the time period you want to study (e.g. point in time, change over time, current information, date range)
Frequency: How frequently the data is updated (e.g. annually, quarterly)
#3. Brainstorm likely sources of data
Think about who would be likely to collect data on this topic. Often, the answers are government agencies, organizations such as NGO or non-profits, international organizations, or researchers.
#4. Use specialized data search tools
Data are often collected in repositories or reported in compendia and repots, which can be used as starting points.
Compendia, portals, and indexes: When data are likely to be compiled or reported, these tools allow you to search by topic and discover data and data producers. Some likely candidates could be:
Data collections, archives, and repositories: When data are likely to be shared by the researchers who produced them, they are likely to deposit the data in repositories. Some likely candidates could be:
#5. Take cues from secondary literature
Track what you find in the literature to discover data sources, understand the data landscape for a particular topic, and place your research into context with related research.
#6. Evaluating potential data
- Find Overview Information: Who is the creator of the data? Why was it collected? What is its scope? What geography and time period are covered?
- Find the Technical Documentation: Look for and download or document technical documentation about the dataset, including information on how it was created (e.g., survey, administrative reporting, direct measure), variable definitions, indications of what was included or excluded. Survey instruments are also helpful. Hint: look for a codebook, user guide, ReadMe, or documentation section of the site.
- Identify the Download Options and Access Restrictions: Who gets to use the data? What formats of download are available--CSV, text, Excel?
#7. Document what you find
For extended projects, you will want to keep a research journal. Do your future self a favor and keep good records of what you find so you can retrace your steps.
What to record about a data download in a research journal
- URL or DOI
- Date
- Author, principal investigator, producing agency, etc.
- Exact name of dataset (not just the web site’s name) and the version if appropriate
- What you had to search in order to get the exact download you got
- Where you are storing local copies data and all relevant documentation
- Suggested citation information if provided
Steps adapted from the Data Reference Worksheet, Gould Library, Carleton College, rev. 2016. Kristin Partlo & Danya Leebaw. : https://goo.gl/1z6LBH, CCBYSA 4.0