Sampled Data Exercise
We often want to know something about a group of people (such as how voters in a state feel about a political candidate) or about some geographic phenomenon that covers a large area (such as the volume of an underground resource). However, it is often too difficult or expensive to survey each and every member of a group or dig up every square inch of a resource to get full population data.
In cases like this, what we have to do is take a smaller group, a sample, of the population and use statistical calculations to tell us how much confidence we can have that the data we get from our sample represents the whole population.
This tutorial describes an class exercise in sampled data that involves taking a class survey on some characteristic or opinion. The class represents a sample that can be used to make an inference about the broader student population.
Note that this particular type of sampling is actually convenience sampling, because we are choosing survey respondents based on whether they are conveniently available.
However, for the purposes of this exercise we will evaluate the results as if this were a random sample of all students in our institution, and we will use that sample to estimate a characteristic of that population.
Survey Question
During class, you will be asked to form teams of two or three people:
- As a team, you will brainstorm two questions (a primary and a backup) that you could ask each member of the class
- Your question should yield a categorical (dichotomous or multichotomous), count, or amount variable
- Your question should be something that you would not mind being asked by a stranger with uncertain motives. Accordingly, you will probably want to avoid questions on topics like health history, political perspective, religious affiliation, financial status, sexuality, etc.
- We will go around the room and declare our questions so each question is unique in the class. During group discussion you should have a primary question and a backup, in case your primary is taken by an earlier team
- Despite those restrictions, it would be helpful if you can find a question that is at least remotely related to your occupational area
- While these questions may not be particularly profound, they may yield interesting or amusing results
Example questions from past variants of this exercise have included:
- Laptop operating system: Mac OS, Windows, none (multichotomous)
- Distance in miles from your home town to school: 0 - x (amount)
- Number of pets: 0 - x (count)
On a sheet of paper (for submission at the end of class), note the following:
- Your full name
- The full names of your team members
- Your primary question
- Your backup question
Capture
In order to preserve participant privacy, we will anonymize our data, identifying subjects by their home zip code.
During class, at least one member of your team will circulate around the room, gathering the following responses from each class member (including themselves):
- Their zip code
- Their response to the question for your team
Processing
At the conclusion of the capture class, each member of the class should take a smartphone picture of the survey data (if done on paper) or get an e-mail of that data.
At least one member of your team (and preferably all members of the team) should enter the survey responses into a Google Sheets spreadsheet. Your spreadsheet should have the following columns:
- Their zip code
- Their response to the question for your team

Analysis: Counts or Amounts
Descriptive Statistics
If your variable is an amount or count, calculate the descriptive statistics for your values:
- Add a new sheet to your workbook
- Use the count() function to count the number of responses
- Use the max() function to find the maximum value in your responses
- Use the average() function to find the mean value for your responses
- Use the median() function to find the median value in your responses
- Use the min() function to find the minimum value in your responses
- Remove unnecessary significant digits so the displayed precision reflects the accuracy of the data
Confidence Interval
Because this is a survey, there is a possiblilty that our sample does not match the overall population. For this number-of-pets example, we may have accidentally gotten group of respondents that have an unusually high (or low) number of pets.
Accordingly, you always need to present the results from a sample with a margin of error and a level of confidence that you have in that margin of error.
The range of values around your percent values plus or minus your margin of error is called the confidence interval.
A commonly-used estimate for the margin of error for means is:
margin_of_error = 1.96 * sample_mean / √sample_count
The 1.96 is a z-score for a 95% level of confidence, and the mean divided by the square-root of the sample count is a rough estimate for standard error.
You can use spreadsheet formulas to calculate these values for your survey:
=1.96 * mean / sqrt(count)- Add a Margin of Error column and use the sqrt() function to calculate using the formula given above
- Add a Confidence Interval Low column and subtract the margin of error from the mean to get the low values in the 95% confidence interval
- Add a Confidence Interval High column and add the margin of error from the mean to get the high values in the 95% confidence interval
Interpreting the values in the example, we can say with a 95% level of confidence that average number of pets owned by students at this school is between 0.69 and 1.43.
Google Bubble Map
To map your results:
- From Google Drive, create a New, Google My Maps
- Import your survey results spreadsheet from Google Drive
- Use the zip code column to position the placemarks
- Use the variable column to title your markers
- Style the points by your variable column
- Group by ranges so that different ranges of values display with different icons
- Optional: Change the icons to bubble icons
- Give your map a meaningful name
- Share the map publicly
- Copy the shared link to give to anyone you want to share the map with
While Google Maps does not directly supporte graduated bubble maps, bubble icons you can use with a map are given below. Right click on the icon, copy the icon image URL, and copy that URL as a custom icon.
ArcGIS Online Bubble Map
Alternatively, to map your results in ArcGIS Online:
- Download a CSV file to your computer with File, Download As, comma-separated values
- Create a new map in ArcGIS Online
- Add, Add layer from file and select the CSV file
- For location, select the zip code column
- For Choose an attribute to show, select your quantitative variable column
- For Select a drawing style choose Counts and Amounts (Size)
- If you don't like the default color or want to adjust the size of the bubbles, select OPTIONS
- Save the map with a meaningful name
- Share the map with everyone to get a link
Analysis: Categorical Variables
Descriptive Statistics
If your variable is categorical, calculate the count and percentage of respondents for each category:
- Add a new sheet to your workbook
- Add a Responses column with the different possible responses for your variable
- Add a Count column and use the countif() function to get the count of responses in your data that match each possible response category
- Add a Total row and use the sum() function to count the total number of responses
- Add a Percent column and divide the response counts by the total number of counts to get the percent of responses in each category
- Format the cells to display as percents rather than as decimal ratios
- Remove unnecessary significant digits so the displayed precision reflects the accuracy of the data
Confidence Interval
Because this is a survey, there is a possiblilty that our sample does not match the overall population. For this Mac/PC survey example, we may have accidentally gotten an unusually high number of Mac users or an unusually low number of PC users.
Accordingly, you always need to present the results from a sample with a margin of error and a level of confidence that you have in that margin of error.
The range of values around your percent values plus or minus your margin of error is called the confidence interval.
A commonly-used estimate for the margin of error for proportions is:
margin_of_error = 1.96 * √percent * (1 - percent) / sample_count
The 1.96 is a z-score for a 95% level of confidence, and the calculations under the radical are a rough estimate for standard error.
You can use spreadsheet formulas to calculate these values for your survey:
- Add a Margin of Error column and use the sqrt() function to calculate using the formula given above
- Add a Confidence Interval Low column and subtract the margin of error from each percent to get the low values in the 95% confidence interval
- Add a Confidence Interval High column and add the margin of error from each percent to get the high values in the 95% confidence interval
Interpreting the values in the example, we can say with a 95% level of confidence that the percent of students in the broader student population that own Macintosh laptops is between 38% and 70%.
Google Categorical Map
To map your results in Google Maps:
- From Google Drive, create a New, Google My Maps
- Import your survey results spreadsheet from Google Drive
- Style the points by your categorical variable column
- Give your map a meaningful name
- Share the map publicly
- Copy the shared link to give to anyone you want to share the map with
ArcGIS Online Categorical Map
Alternatively, to map your results in ArcGIS Online:
- Download a CSV file to your computer with File, Download As, comma-separated values
- Create a new map in ArcGIS Online
- Add, Add layer from file and select the CSV file
- For location, select the zip code column
- For Choose an attribute to show, select your categorical variable column
- For Select a drawing style choose Types - Unique Symbols and select options
- Click on each symbol to change it
- Select Shapes and find appropriate pictograms
- If you have a limited number of points, you may want to increase the size of the pictograms
- Save the map with a meaningful name
- Share the map with everyone to get a link