Sampled Data Exercise

We often want to know something about a group of people (such as how voters in a state feel about a political candidate) or about some geographic phenomenon that covers a large area (such as the volume of an underground resource). However, it is often too difficult or expensive to survey each and every member of a group or dig up every square inch of a resource to get full population data.

In cases like this, what we have to do is take a smaller group, a sample, of the population and use statistical calculations to tell us how much confidence we can have that the data we get from our sample represents the whole population.

This tutorial describes an class exercise in sampled data that involves taking a class survey on some characteristic or opinion. The class represents a sample that can be used to make an inference about the broader student population.

Note that this particular type of sampling is actually convenience sampling, because we are choosing survey respondents based on whether they are conveniently available.

However, for the purposes of this exercise we will evaluate the results as if this were a random sample of all students in our institution, and we will use that sample to estimate a characteristic of that population.

Survey Question

During class, you will be asked to form teams of two or three people:

Example questions from past variants of this exercise have included:

  1. Laptop operating system: Mac OS, Windows, none (multichotomous)
  2. Distance in miles from your home town to school: 0 - x (amount)
  3. Number of pets: 0 - x (count)

On a sheet of paper (for submission at the end of class), note the following:

  1. Your full name
  2. The full names of your team members
  3. Your primary question
  4. Your backup question

Capture

In order to preserve participant privacy, we will anonymize our data, identifying subjects by their home zip code.

During class, at least one member of your team will circulate around the room, gathering the following responses from each class member (including themselves):

  1. Their zip code
  2. Their response to the question for your team

Processing

At the conclusion of the capture class, each member of the class should take a smartphone picture of the survey data (if done on paper) or get an e-mail of that data.

At least one member of your team (and preferably all members of the team) should enter the survey responses into a Google Sheets spreadsheet. Your spreadsheet should have the following columns:

  1. Their zip code
  2. Their response to the question for your team
Example Google Sheets Spreadsheet

Analysis: Counts or Amounts

Descriptive Statistics

If your variable is an amount or count, calculate the descriptive statistics for your values:

Calculating Descriptive Statistics for a Quantitative Variable

Confidence Interval

Because this is a survey, there is a possiblilty that our sample does not match the overall population. For this number-of-pets example, we may have accidentally gotten group of respondents that have an unusually high (or low) number of pets.

Accordingly, you always need to present the results from a sample with a margin of error and a level of confidence that you have in that margin of error.

The range of values around your percent values plus or minus your margin of error is called the confidence interval.

A commonly-used estimate for the margin of error for means is:

margin_of_error = 1.96 * sample_mean / √sample_count

The 1.96 is a z-score for a 95% level of confidence, and the mean divided by the square-root of the sample count is a rough estimate for standard error.

You can use spreadsheet formulas to calculate these values for your survey:

=1.96 * mean / sqrt(count)

Interpreting the values in the example, we can say with a 95% level of confidence that average number of pets owned by students at this school is between 0.37 and 1.43.

Calculating Confidence Intervals For a Quantitative Variable

Google Bubble Map

To map your results:

  1. From Google Drive, create a New, Google My Maps
  2. Import your survey results spreadsheet from Google Drive
  3. Use the zip code column to position the placemarks
  4. Use the variable column to title your markers
  5. Style the points by your variable column
  6. Group by ranges so that different ranges of values display with different icons
  7. Optional: Change the icons to bubble icons
  8. Give your map a meaningful name
  9. Share the map publicly
  10. Copy the shared link to give to anyone you want to share the map with

While Google Maps does not directly supporte graduated bubble maps, bubble icons you can use with a map are given below. Right click on the icon, copy the icon image URL, and copy that URL as a custom icon.

Mapping a Quantitative Variable

ArcGIS Online Bubble Map

Alternatively, to map your results in ArcGIS Online:

  1. Download a CSV file to your computer with File, Download As, comma-separated values
  2. Create a new map in ArcGIS Online
  3. Add, Add layer from file and select the CSV file
  4. For location, select the zip code column
  5. For Choose an attribute to show, select your quantitative variable column
  6. For Select a drawing style choose Counts and Amounts (Size)
  7. If you don't like the default color or want to adjust the size of the bubbles, select OPTIONS

  8. Save the map with a meaningful name
  9. Share the map with everyone to get a link
Mapping a Quantitative Variable in ArcGIS Online

Analysis: Categorical Variables

Descriptive Statistics

If your variable is categorical, calculate the count and percentage of respondents for each category:

Calculating Descriptive Statistics For a Categorical Variable

Confidence Interval

Because this is a survey, there is a possiblilty that our sample does not match the overall population. For this Mac/PC survey example, we may have accidentally gotten an unusually high number of Mac users or an unusually low number of PC users.

Accordingly, you always need to present the results from a sample with a margin of error and a level of confidence that you have in that margin of error.

The range of values around your percent values plus or minus your margin of error is called the confidence interval.

A commonly-used estimate for the margin of error for proportions is:

margin_of_error = 1.96 * √percent * (1 - percent) / sample_count

The 1.96 is a z-score for a 95% level of confidence, and the calculations under the radical are a rough estimate for standard error.

You can use spreadsheet formulas to calculate these values for your survey:

Interpreting the values in the example, we can say with a 95% level of confidence that the percent of students in the broader student population that own PC laptops is between 18% and 51%.

Calculating Confidence Intervals For a Categorical Variable

Google Categorical Map

To map your results in Google Maps:

  1. From Google Drive, create a New, Google My Maps
  2. Import your survey results spreadsheet from Google Drive
  3. Style the points by your categorical variable column
  4. Give your map a meaningful name
  5. Share the map publicly
  6. Copy the shared link to give to anyone you want to share the map with
Mapping a Categorical Variable

ArcGIS Online Categorical Map

Alternatively, to map your results in ArcGIS Online:

  1. Download a CSV file to your computer with File, Download As, comma-separated values
  2. Create a new map in ArcGIS Online
  3. Add, Add layer from file and select the CSV file
  4. For location, select the zip code column
  5. For Choose an attribute to show, select your categorical variable column
  6. For Select a drawing style choose Types - Unique Symbols and select options
  7. Click on each symbol to change it
  8. Select Shapes and find appropriate pictograms
  9. If you have a limited number of points, you may want to increase the size of the pictograms
  10. Save the map with a meaningful name
  11. Share the map with everyone to get a link
Mapping a Categorical Variable in ArcGIS Online