Preliminary Data Report: Mapping the Outcomes of Urban Sustainability Policies to Determine Spatial Equity
20 April 2014
Principal Investigator: Prof. Julie Cidell (jcidell [at] illinois.edu)
Research Assistant: Michael Minn (minn2 [at] illinois)
Funded by the University of Illinois Campus Research Board, award RB14025
This is a preliminary report on data acquisition and analysis for the project Mapping the Outcomes of Urban Sustainability Policies to Determine Spatial Equity.
This preliminary data may be viewed on the Harvard WorldMap GIS portal: http://worldmap.harvard.edu/maps/mapping-sustainability
Analysis Areas
The areas of analysis include the primary cities of New York, Los Angeles, Chicago, and Houston as well as cities of secondary interest: Philadelphia, Phoenix, San Antonio, and San Diego.
Counties are used to delimit the boundaries of analysis areas: Chicago (Cook County), Los Angeles (Los Angeles County), New York City (Bronx, Kings, New York, Queens and Richmond Counties), Houston (Harris County), San Diego (San Diego County), San Antonio (Bexar County), Phoenix (Maricopa County), and Philadelphia (Philadelphia County).
County boundaries were chosen over city boundaries in order to include surrounding relevant metropolitan areas. Counties were chosen over Metropolitan Statistical Areas (MSA) to avoid including large areas of exurban and rural areas within polycentric metropolitan areas.
ZIP Code Tabulation Areas (ZCTA) are used for fine-grained mapping within counties. ZCTA were chosen because rain barrel information is only available aggregated at the ZIP Code level, and location information for many LEED-Certified buildings is only accurate to ZIP code. Also the ZCTA may be both small enough to permit meaningful analysis of geographic dispersion while also being large enough to be clearly mappable at the city level. The ZCTA is also large enough to avoid performance degradation that would be associated with vector web mapping of more-finely-grained polygons.
Because most of the data used for analysis is point data, it would be possible to reprocess the data to the census tract level, although this would raise accuracy issues, especially with the rain barrel and green building data. Block group data is not available for the most recent (2012) American Community Survey, so moving to the block group level would require using older 2010 Decennial Census data and introduce significant error, especially with datasets like brownfields that use points to represent often expansive areas with effects that extend across block group boundaries.
Data Overview
Neighborhood Demographic Data
The 2008-2012 American Community Survey (ACS) 5-year Estimates were used for neighborhood demographic, housing and economic characteristics. This is the most recent data available down to the the ZCTA and tract level. This data was downloaded from the US Census Bureau American FactFinder website (http://factfinder2.census.gov) and was selected from three separate ACS data sets. Each data series is referenced by the field name used in the census-zip shapefile.
- DP03 - SELECTED ECONOMIC CHARACTERISTICS
- PCUNEMPLOY (HC01_VC13): EMPLOYMENT STATUS - Percent Unemployed
- PCDRIVE (HC01_VC29): COMMUTING TO WORK - Car, truck, or van -- drove alone
- HHINCOME (HC01_VC85): INCOME AND BENEFITS (IN 2012 INFLATION-ADJUSTED DOLLARS) - Median household income (dollars)
- DP04 - SELECTED HOUSING CHARACTERISTICS
- HHRENTER (HC03_VC64): HOUSING TENURE - Renter-occupied
- PCNOVEHICL (HC03_VC82): VEHICLES AVAILABLE - 1 vehicle available
- DP05 - ACS DEMOGRAPHIC AND HOUSING ESTIMATES
- POPTOTAL (HC01_VC03): Total population
- MEDIANAGE (HC01_VC21): Median age (years)
- PCWHITE (HC03_VC72): Percent; RACE - White (Race alone or in combination with one or more other races)
- PCBLACK (HC03_VC73): Percent; RACE - Black or African American (Race alone or in combination with one or more other races)
- PCHISPANIC (HC03_VC81): Percent; HISPANIC OR LATINO AND RACE - Total population
Bike Sharing Programs
By far, the easiest data to collect for this project was the locations of bike racks in bike sharing programs. These programs are centrally-controlled through city franchises and clear rack location information is provided to patrons through web maps on web sites.
Chicago (http://divvybikes.com/stations), Houston (https://houston.bcycle.com/default.aspx), New York City (http://citibikenyc.com/stations), and San Antonio (https://sanantonio.bcycle.com/default.aspx) have active programs and it was possible to find location information in JavaScript or JSON representation within the source code for the program websites.
Bike sharing programs are planned for Philadelphia (http://www.phila.gov/bikeshare/Pages/default.aspx), Phoenix (http://gridbikes.com), San Diego (http://www.decobikesandiego.com/), and Los Angeles (http://www.bikenationusa.com/), although no firm bike rack locations have been set or publicized, so data for these programs could not be analyzed.
Rain Barrel Programs
Six of the eight analysis cities have some kind of city-sponsored rainwater harvesting program, with the logical exception of low-rainfall Phoenix and San Antonio. The New York (http://www.nyc.gov/html/dep/html/stormwater/rainbarrel.shtml) and Philadelphia (http://www.phillywatersheds.org/whats_in_it_for_you/residents/rainbarrel) programs are both giveaway (free) programs. Los Angeles (http://socalwatersmart.com/index.php), San Diego (http://www.sandiego.gov/water/conservation/residentialoutdoor/resrainwaterharvesting.shtml), and Chicago (http://www.sustainablebackyards.org) are rebate programs. Houston offers discounts (http://www.greenhoustontx.gov/compostbinssale.html). There are private advocacy groups for rain barrels such such as (http://catchtexasrain.com/index.php?page=barrel), but these are not considered for this project.
Rain barrel program participant information is constrained by the limited number and scope of the programs, as well as by confidentiality issues and unresponsiveness of city officials managing these programs. Of the five programs contacted via e-mail requesting information, only representatives of Chicago and San Diego were ultimately responsive with usable data. Despite an initial positive response from New York City, subsequent follow-up was not acknowledged. Houston was responsive, but because the program will not begin until this Summer, no data is currently available. Los Angeles was completely unresponsive. Remarkably, the Philadelphia Water Department actually provides a Google Map of rain barrel program participants (http://www.phillywatersheds.org/whats_in_it_for_you/residents/rainbarrel/map2) and it was possible to download the associated KML file of geographic locations.
The result of this search is rain barrel data for Chicago, San Diego and Philadelphia. To preserve anonymity, the count of recipients is aggregated by ZIP code, and those counts are available in the census-zip shapefile in the RAINBARREL field.
Because of the paucity of available data, the limited amount of participation in these programs, and the fact that water issues are commonly handled at larger scales (city, region, state, etc.) than the scales being mapped in this research project, the use of rain barrels as a proxy for individual concern about environmental sustainability is problematic. Although no suitable alternative makes itself immediately apparent (other than the expensive and similarly problematic use of surveys), further consideration should be given to replacing or augmenting this data set with information about other localized collective activities such as farmer's markets.
Green Buildings
For the purposes of this research project, "Green" Buildings are defined as buildings certified by the U.S. Green Building Council (USGBC) as part of their Leadership in Energy & Environmental Design (LEED) program. The USGBC provides information on LEED-certified projects on their website: http://www.usgbc.org/projects.
Selection from the project listing within the analysis areas resulted in 6,746 green buildings. The web pages for the individual projects also include information on brownfield redevelopment credits (if any) awarded to projects. This provides an alternative list for brownfield redevelopment (discussed below), although only 331 projects (5% of total) have brownfield credits. The count of projects by ZIP code in the census-zip shapefile is given in the LEED field, while the number of LEED projects with brownfield credits is given in the LEEDBROWN field. The specific locations of LEED-certified projects are provided in the leed-locations shapefile.
The USGBC project page provides capability to search for individual projects, and each project has a unique project web page, although many pages contain almost no information beyond the name of the project. The main project page provides two links for downloading project data: a Download All button and an Export results (XLS) link. Since the default results display without a search term is all projects, the Export results link is functionally equivalent to Download All, although Export results produces an large HTML table that contains more information than the Download All data, which is a tab-separated CSV that does not include URLs for the individual the project pages or any entries for LEED for Homes projects or other confidential projects. A python script was used to download and screen-scrape the project pages for locational and brownfield data.
There are three significant issues with the LEED data
First, the number of LEED projects is highly dynamic, quickly outdating any analysis based on static downloads. A preliminary download on 21 January 2014 resulted in 63,512 projects, while a review of the website three months later on 19 April 2014 showed 65,734 projects, or a 15% annualized growth rate. The data used in this preliminary report was downloaded on 29 March and 65,094 total projects worldwide. While there is no clear solution to alleviate this issue with a research project of this type, this is an important caveat that should be included with reports of final results.
Second, there is a definitional question of exactly which projects should be included in a listing of green buildings that are relevant to the scope of this research project. Many projects listed on the project page are prospective and projects have varying levels of LEED certification. Around 9% of projects appear to be LEED for Homes projects where information (including location) is confidential. For the purposes of this report, all listed LEED projects with ZIP code information are considered green buildings, and caveats about this ambiguity should be included with reports of the final results.
The third and most serious issue with the LEED project data is the ambiguity of location information. As mentioned above, information on many projects is confidential or prospective and includes no ZIP code or locational information. Many public project pages include Google maps links to the sites, although these links are queries based on address that are subject to geocoding errors by Google. Scripted attempts to convert these queries to latitude/longitude were constrained by Google's 2,500 address per day geocoding limit. While OpenStreetMap/Nominatim (OSM) does not have such a limit, the OSM geocoding capability is much more limited than Google, permitting geocoding of far fewer addresses than would be possible with Google. Geocoding via ArcMap was prevented by campus ArcMap 10.0 version issues with ESRI's new geocoding service, and the use of ArcGIS Online (AGO) is limited by the large number of AGO credits that would be consumed to geocode 60,000+ addresses. The project pages usually include a hidden Google Maps link with a latitude/longitude query, although a cursory review of those values indicates a high level of inaccuracy of these values for many projects that may be the result of a crude geocoding techniques.
For the purposes of this project, the point locations (leed-locations shapefile) for individual projects were geocoded using OSM, with a handful of locations at the beginning of the shapefile geocoded with Google Maps. Where OSM could not geocode addresses, the hidden latitude/longitude values in the project pages were used. Around 25% of the 65,054 total projects do not have some kind of latitude/longitude.
For counts of buildings by ZIP code, project ZIP code information was joined and cross-tabulated with ZCTA shapefile data. Around 15% of the 65,054 total projects do not have ZIP code information.
Brownfield Redevelopment
By far the most challenging and time-consuming component of this research was compiling a list of brownfield redevelopment projects. Data issues are only partially resolved with this preliminary report and significant strategic and tactical consideration by the Principal Investigator will be required to create a final data set that can be considered even remotely valid.
The primary administrative and epistemological issue is that there is no central agency at any scale that maintains a definitive list of brownfields. Different states and cities have varying levels of coverage. Although redevelopment projects are often coordinated at the city level, identification of brownfields is commonly left to state and federal authorities. An exception to this is Pennsylvania, which has an extensive legacy of environmental contamination but a state government that appears to make very little brownfields information available on their public website. At the Federal level, brownfields and potential brownfields are listed under a variety of programs (discussed further below), at different scales (project vs city-wide grants) and in differing media formats (including text PDF). Different data sources have both redundant information and incomplete information from other databases that would seem to merit inclusion.
These issues run deeper to fundamental ontological questions about what exactly should be considered a brownfield and what should be considered redevelopment. As referenced in the project proposal, the EPA definition of a brownfield is "real property, the expansion, redevelopment, or reuse of which may be complicated by the presence or potential presence of a hazardous substance, pollutant, or contaminant." That definition is filled with ambiguous and contested terms like "hazardous," "contaminant," and "redevelopment."
Brownfields can be though of as four-dimensional spaces, although for convenience they are commonly represented as points. As real property they are administratively defined in representational space by two-dimensional polygons. However, contaminants commonly extend in the third dimension below ground level in plumes and above ground level in airborne clouds that extend the influence of these properties well beyond the property boundaries to volumes with indefinite extents. In the fourth dimension of time, these sites are dynamic, with contaminant levels, plumes and clouds changing over time as the result of environmental forces (such as groundwater flow), remediation efforts, and continued industrial activity. Indeed, given the limitations of remediation technology and practices, remediation efforts can generally be said to only redistribute contaminants or mitigate effects as part of a political project to obscure the existence of this toxic industrial legacy and/or limit the liability of responsible parties that may or may not have had any direct involvement with contamination practices that frequently extend over a century through multiple generations of human and corporate control.
The temporal dimension presents a special challenge for this particular research project in that a brownfield generally ceases to be a brownfield after redevelopment. Given the dynamic nature of web data resources, this may make it difficult to identify locations that have been redeveloped and then delisted from administrative databases.
The lumping of contaminated sites into rigidly demarcated areas under a binary categorization of brownfield/not-brownfield also obscures the wide variety of different contaminants and the wide variety of levels of contamination for each different substance. For example, the lumping of a former manufactured gas plant site saturated with tons of carcinogenic polycyclic aromatic hydrocarbons (PAH) from coal tar into the same category as a former gasoline station with modest groundwater contamination from a leaking underground storage tank (LUST) obscures the much more dramatic health effects of the former site.
The subsequent post-remediation binary classification of safe/not-safe further obscures often limited knowledge about the detrimental effects over different time-scales of different levels of these contaminants on different human populations. This hints that the project of brownfield redevelopment is as much or more discursive than material, with the political objective of permitting neoliberal re-exploitation of formerly-productive geographies that is evocative of or associated with revanchist gentrification efforts to reclaim the city for capital.
A clearer definition of "redevelopment" is needed to clarify what stages of brownfield remediation and redevelopment should be mapped. Brownfields go through phases of identification, assessment, cleanup and, commonly, subsequent monitoring/maintenance. Cleanup can involve simple remediation or extensive redevelopment into public spaces (like parks) or private structures.
Given these issues, it appears that a specific list of brownfields that is complete and uncontestable cannot exist in a way that permits positive Cartesian representation. The best that can probably be hoped for are snapshots of contingent spaces that permit heuristic insight into possible effects on some localized populations.
In the preliminary mapping for this report, seven different sets of data sources are provided in separate layers. These layers are provided both in counts aggregated by ZIP code and as point features. These layers are provided only to promote further discussion and they should not be regarded individually or collectively as forming a definitive list of brownfield projects.
The Resource Conservation and Recovery Act (RCRA) is the primary national law governing the disposal of solid and hazardous waste. Congress passed RCRA on October 21, 1976 to address the increasing problems the nation faced from a growing volume of municipal and industrial waste. The Federal government maintains a search page into the Envirofacts database (http://www.epa.gov/enviro/facts/rcrainfo/search.html) which permits download of RCRA site information, including latitude/longitude. This is the most expansive of the data sources and includes numerous active sites that are simply monitored generators of hazardous waste and should not be considered brownfields for the purposes of this research project. Counts of these projects by ZIP code are included in the RCRA field of the census-zip shapefile, with individual site locations in the rcra shapefile.
The Comprehensive Environmental Response, Compensation, and Liability Information System (CERCLIS) is the database of sites included in the Federal Superfund program, which is administered by the EPA to locate, investigate, and clean up the worst hazardous waste sites throughout the United States. The Federal government maintains a search page into the Envirofacts database (http://www.epa.gov/enviro/facts/cerclis/search.html) which permits download of CERCLIS site information, including latitude/longitude. While most Superfund sites can probably be considered brownfields, not all sites that should be considered brownfields for the purposes of this research project will fall into the clear category of Superfund sites. A subset of Superfund sites is the National Priorities List (NPL), which identifies sites where remediation is especially urgent. Counts of these projects by ZIP code are included in the CERCLIS field of the census-zip shapefile, with individual site locations in the cerclis shapefile.
The Cleanups in My Community (CIMC) list from the EPA (http://www.epa.gov/cimc/) is a somewhat more promising list of areas where hazardous waste is being or has been cleaned up throughout the United States. Sites include NPL, RCRA Corrective Actions and Brownfields properties, Federal facilities under EPA's cleanup programs and removals from the EPA On-Scene Coordinator (OSC) listing. The primary issue with CIMC is that it appears to be incomplete, with numerous sites identified on city and state websites not included. Counts of these projects by ZIP code are included in the CIMC field of the census-zip shapefile, with individual site locations in the cimc shapefile.
The Regional EPA Offices for Region 2 (New York: http://www2.epa.gov/aboutepa/epa-region-2) Region 5 (Illinois: http://www.epa.gov/region5/cleanup/index.htm) and Region 9 (Arizona: http://www.epa.gov/region9/cleanup/arizona.html) maintain web page lists of cleanup sites. In addition, the EPA maintains PDF lists of brownfields grant recipients (http://cfpub.epa.gov/bf_factsheets/basic/index.cfm). While these pages often contain extensive narrative information and links to documents associated with these sites, the lists are clearly incomplete from the perspective of this research. They also generally lack explicit locational information, making assignment of latitude/longitude an imprecise and time-consuming manual task. Counts of these projects by ZIP code are included in the BROWNEPA field of the census-zip shapefile, with individual site locations in the epa-grants shapefile.
Most State Departments of Environmental Protection maintain online databases of sites of industrial contamination, although the coverage varies widely from the exhaustive California Department of Toxic Substances Control EnviroStor database (http://www.envirostor.dtsc.ca.gov/public/) to the skeletal Pennsylvania Brownfields Inventory (http://brownfields.pasitesearch.com/). Also included in this listing are the New York Department of Environmental Conservation Environmental Site Remediation Database (http://www.dec.ny.gov/cfmx/extapps/derexternal/index.cfm?pageid=3), the Illinois EPA Site Remediation Program (http://epadata.epa.state.il.us/land/srp/), and the Texas Commision on Environmental Quality Brownfields Site Assessment (http://www.tceq.state.tx.us/remediation/bsa/bsa.html) and Voluntary Cleanup Programs (http://www.tceq.state.tx.us/remediation/vcp/vcp.html/). Counts of these projects by ZIP code are included in the BROWNSTATE field of the census-zip shapefile, with individual site locations in the state-programs shapefile.
Many City Agencies maintiain lists of success stories on their websites, although these are often very limited in coverage and include sites that have been funded by state and/or federal programs. Counts of these projects by ZIP code are included in the BROWNCITY field of the census-zip shapefile, with individual site locations in the city-programs shapefile.
Finally, as mentioned earlier, many LEED-certified green building projects receive credits for development on brownfield property. Counts of these projects by ZIP code are included in the LEEDBROWN field of the census-zip shapefile.
Preliminary Analysis Results
A matrix of Pearson correlation coefficients (R2) between ZIP code aggregated fields is provided in census-zip-matrix-2014-04-19.pdf. This matrix provides a starting point for further, more-detailed analysis. Across all eight analysis cities, there are only a limited number of variables that have even modestly-high correlation.
Demographics: There are expected moderate correlations between:
- employment and income (PCUNEMPLOY vs HHINCOME)
- unemployment and African-American population (PCUNEMPLOY vs PCBLACK)
- driving to work alone and vehicle ownership (PCDRIVE vs PCNOVEHICLE)
There are strong inverse correlations between:
- renting and vehicle ownership (PCDRIVE vs HHRENTER)
- White and African-American population (PCWHITE vs PCBLACK)
- Hispanic population and income (PCHISPANIC vs HHINCOME)
- Hispanic population and median age (PCHISPANIC vs MEDIANAGE)
Bike Sharing: There is a expected modest inverse correlation between number of bike rack slots (multiple slots per site) and both vehicle ownership and driving to work alone. There is also a modest correlation (R2 = 0.246) with the number of LEED-certified buildings. The surprisingly strong correlation (R2 = 0.612) between number of bike rack slots and RCRA sites is likely an indirect relationship due to the large number of RCRA sites in dense urban areas that also host bike sharing programs.
Rain Barrels: There are no strong correlations between rain barrel program participation and any of the aggregated ZIP code variables. A high correlation between rain barrels and EPA brownfield grant sites is a spurious artifact of limited overlap between these two sparse data sets.
Green Buildings: Aside from the aforementioned modest correlation between number of LEED-certified buildings and bike sharing racks, there is also a modest correlation between number of city brownfield redevelopment projects and number of green buildings, although the limited size of the city brownfield data set likely exaggerates this relationship through a small number of highly-publicized city brownfield projects that seek LEED-certification.
Brownfields: Aside from the aforementioned indirect or spurious relationships with brownfields, there are no other meaningful correlations identified in this limited analysis between brownfields and other variables. Given the severe weaknesses of the brownfield data sets, this should not be interpreted as an absence of relationship. But given the ubiquity of environmental contamination, exposure to this dark side of industrial development may indeed prove to be surprisingly democratic.
Data Package
In addition to the WorldMap visualization website linked above, the data referenced in this report is available for offline viewing and analysis as zipped shapefiles in data-2014-04-19.zip. Data was processed with R Spatial and visualized with QGis. Specific source links and download dates will be included once data is finalized in the concluding report.