# Sampled Data Exercise

We often want to know something about a group of people (such as how voters
in a state feel about a political candidate) or about some geographic
phenomenon that covers a large area (such as the volume of an underground
resource). However, it is often too difficult or expensive to survey each and
every member of a group or dig up every square inch of a resource to get full
**population** data.

In cases like this, what we have to do is take a smaller group, a
**sample**, of the population and use statistical calculations to tell us
how much confidence we can have that the data we get from our sample represents
the whole population.

This tutorial describes an class exercise in sampled data that involves taking a class survey on some characteristic or opinion. The class represents a sample that can be used to make an inference about the broader student population.

Note that this particular type of sampling is actually convenience sampling, because we are choosing survey respondents based on whether they are conveniently available.

However, for the purposes of this exercise we will evaluate the results as if this were a random sample of all students in our institution, and we will use that sample to estimate a characteristic of that population.

## Survey Question

During class, you will be asked to form teams of two or three people:

- As a team, you will brainstorm two questions (a primary and a backup) that you could ask each member of the class
- Your question should yield a categorical (dichotomous or multichotomous), count, or amount variable
- Your question should be something that you would not mind being asked by a stranger with uncertain motives. Accordingly, you will probably want to avoid questions on topics like health history, political perspective, religious affiliation, financial status, sexuality, etc.
- We will go around the room and declare our questions so each question is unique in the class. During group discussion you should have a primary question and a backup, in case your primary is taken by an earlier team
- Despite those restrictions, it would be helpful if you can find a question that is at least remotely related to your occupational area
- While these questions may not be particularly profound, they may yield interesting or amusing results

Example questions from past variants of this exercise have included:

- Laptop operating system: Mac OS, Windows, none (multichotomous)
- Distance in miles from your home town to school: 0 - x (amount)
- Number of pets: 0 - x (count)

On a sheet of paper (for submission at the end of class), note the following:

- Your full name
- The full names of your team members
- Your primary question
- Your backup question

## Capture

In order to preserve participant privacy, we will **anonymize** our data,
identifying subjects by their home zip code.

During class, at least one member of your team will circulate around the room, gathering the following responses from each class member (including themselves):

- Their zip code
- Their response to the question for your team

## Processing

At the conclusion of the capture class, each member of the class should take a smartphone picture of the survey data (if done on paper) or get an e-mail of that data.

At least one member of your team (and preferably all members of the team) should enter the survey responses into a Google Sheets spreadsheet. Your spreadsheet should have the following columns:

- Their zip code
- Their response to the question for your team

## Analysis: Counts or Amounts

### Descriptive Statistics

If your variable is an amount or count, calculate the descriptive statistics for your values:

- Add a new sheet to your workbook
- Use the
**count()**function to count the number of responses - Use the
**max()**function to find the maximum value in your responses - Use the
**average()**function to find the mean value for your responses - Use the
**median()**function to find the median value in your responses - Use the
**min()**function to find the minimum value in your responses - Remove unnecessary significant digits so the displayed precision reflects the accuracy of the data

### Confidence Interval

Because this is a survey, there is a possiblilty that our sample does not match the overall population. For this number-of-pets example, we may have accidentally gotten group of respondents that have an unusually high (or low) number of pets.

Accordingly, you always need to present the results from a sample with
a **margin of error** and a **level of confidence** that
you have in that margin of error.

The range of values around your percent values plus or minus your
margin of error is called the **confidence interval**.

A commonly-used estimate for the margin of error for means is:

margin_of_error = 1.96 * sample_mean / √sample_count

The 1.96 is a z-score for a **95% level of confidence**, and the
mean divided by the square-root of the sample count is a rough estimate for **standard error**.

You can use spreadsheet formulas to calculate these values for your survey:

=1.96 * mean / sqrt(count)- Add a
**Margin of Error**column and use the**sqrt()**function to calculate using the formula given above - Add a
**Confidence Interval Low**column and subtract the margin of error from the mean to get the low values in the 95% confidence interval - Add a
**Confidence Interval High**column and add the margin of error from the mean to get the high values in the 95% confidence interval

Interpreting the values in the example, we can say with a 95% level of confidence that average number of pets owned by students at this school is between 0.69 and 1.43.

### Google Bubble Map

To map your results:

- From Google Drive, create a
**New, Google My Maps** - Import your survey results spreadsheet from Google Drive
- Use the zip code column to
**position the placemarks** - Use the variable column to
**title your markers** - Style the points by your variable column
- Group by
**ranges**so that different ranges of values display with different icons - Optional: Change the icons to bubble icons
- Give your map a meaningful name
- Share the map publicly
- Copy the shared link to give to anyone you want to share the map with

While Google Maps does not directly supporte graduated bubble maps, bubble icons you can use with a map are given below. Right click on the icon, copy the icon image URL, and copy that URL as a custom icon.

### ArcGIS Online Bubble Map

Alternatively, to map your results in ArcGIS Online:

- Download a CSV file to your computer with
*File, Download As, comma-separated values* - Create a new map in ArcGIS Online
*Add, Add layer from file*and select the CSV file- For location, select the zip code column
- For
*Choose an attribute to show*, select your quantitative variable column - For
*Select a drawing style*choose*Counts and Amounts (Size)* - If you don't like the default color or want to adjust the size of the bubbles, select
*OPTIONS* - Save the map with a meaningful name
- Share the map with everyone to get a link

## Analysis: Categorical Variables

### Descriptive Statistics

If your variable is categorical, calculate the count and percentage of respondents for each category:

- Add a new sheet to your workbook
- Add a
**Responses**column with the different possible responses for your variable - Add a
**Count**column and use the**countif()**function to get the count of responses in your data that match each possible response category - Add a
**Total**row and use the**sum()**function to count the total number of responses - Add a
**Percent**column and divide the response counts by the total number of counts to get the percent of responses in each category - Format the cells to display as percents rather than as decimal ratios
- Remove unnecessary significant digits so the displayed precision reflects the accuracy of the data

### Confidence Interval

Because this is a survey, there is a possiblilty that our sample does not match the overall population. For this Mac/PC survey example, we may have accidentally gotten an unusually high number of Mac users or an unusually low number of PC users.

Accordingly, you always need to present the results from a sample with
a **margin of error** and a **level of confidence** that
you have in that margin of error.

The range of values around your percent values plus or minus your
margin of error is called the **confidence interval**.

A commonly-used estimate for the margin of error for proportions is:

margin_of_error = 1.96 * √percent * (1 - percent) / sample_count

The 1.96 is a z-score for a **95% level of confidence**, and the
calculations under the radical are a rough estimate for **standard error**.

You can use spreadsheet formulas to calculate these values for your survey:

- Add a
**Margin of Error**column and use the**sqrt()**function to calculate using the formula given above - Add a
**Confidence Interval Low**column and subtract the margin of error from each percent to get the low values in the 95% confidence interval - Add a
**Confidence Interval High**column and add the margin of error from each percent to get the high values in the 95% confidence interval

Interpreting the values in the example, we can say with a 95% level of confidence that the percent of students in the broader student population that own Macintosh laptops is between 38% and 70%.

### Google Categorical Map

To map your results in Google Maps:

- From Google Drive, create a
**New, Google My Maps** - Import your survey results spreadsheet from Google Drive
- Style the points by your categorical variable column
- Give your map a meaningful name
- Share the map publicly
- Copy the shared link to give to anyone you want to share the map with

### ArcGIS Online Categorical Map

Alternatively, to map your results in ArcGIS Online:

- Download a CSV file to your computer with
*File, Download As, comma-separated values* - Create a new map in ArcGIS Online
*Add, Add layer from file*and select the CSV file- For location, select the zip code column
- For
*Choose an attribute to show*, select your categorical variable column - For
*Select a drawing style*choose*Types - Unique Symbols*and select options - Click on each symbol to change it
- Select
*Shapes*and find appropriate pictograms - If you have a limited number of points, you may want to increase the size of the pictograms
- Save the map with a meaningful name
- Share the map with everyone to get a link