Michael Minn - 30 April 2015
NDVI: Median tract residential area NDVI
SQMLAND: Tract area in square meters
MEDIANAGE: Median age
MEDHHINC: Median household income
MEANHHSIZE: Mean number of members in each household
PCOWNEROCC: Percent housing units owner-occupied
PCTURNOVER: Percent residents in same house one year ago
PCBORNUSA: Percent of residents born in the USA
PCUNEMPLOYED: Percent 16 years of age or older unemployed (mean?)
Data from 2013 Maricopa County Assessor's Office ST 42030 File
SQMLAWN: Total square meters of PLA (lot_size - (home_size / floors) - pool_size)
MEDCONSTYR: Median construction year
P_MINUS_PET: Precipitation - potential evapotranspiration
RAINFALL60: Running sum of rainfall for the past 60 days
A temporally-lagged running sum of rainfall has the highest correlation with tract NDVI.
A five-day lag between P_MINUS_PET and NDVI gives best correlation r = 0.18
A nine-day lag between 60-day Rainfall sum and NDVI gives best correlation r = 0.423
Call: lm(formula = NDVI ~ ., data = regression_data) Residuals: Min 1Q Median 3Q Max -4.2980 -0.5458 -0.1283 0.3942 13.0695 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.009428 0.001658 5.687 1.3e-08 *** SQMLAND -0.017284 0.001708 -10.117 < 2e-16 *** MEDIANAGE -0.334194 0.002522 -132.535 < 2e-16 *** MEDHHINC 0.320085 0.002271 140.941 < 2e-16 *** MEANHHSIZE -0.246225 0.002517 -97.832 < 2e-16 *** PCOWNEROCC 0.018373 0.002167 8.479 < 2e-16 *** PCTURNOVER 0.113816 0.002150 52.944 < 2e-16 *** PCBORNUSA 0.095260 0.002555 37.286 < 2e-16 *** PCUNEMPLOYED -0.070619 0.002257 -31.295 < 2e-16 *** SQMLAWN 0.081585 0.001991 40.984 < 2e-16 *** MEDCONSTYR -0.479292 0.002037 -235.245 < 2e-16 *** RAINFALL60 0.136666 0.001585 86.234 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.8769 on 293032 degrees of freedom (160337 observations deleted due to missingness) Multiple R-squared: 0.2538, Adjusted R-squared: 0.2538 F-statistic: 9060 on 11 and 293032 DF, p-value: < 2.2e-16
> model Call: randomForest(formula = NDVI ~ ., data = database[, rfields], na.action = na.omit) Type of random forest: regression Number of trees: 500 No. of variables tried at each split: 4 Mean of squared residuals: 0.2130862 % Var explained: 78.88 > summary(model) Length Class Mode call 4 -none- call type 1 -none- character predicted 301211 -none- numeric mse 500 -none- numeric rsq 500 -none- numeric oob.times 301211 -none- numeric importance 12 -none- numeric importanceSD 0 -none- NULL localImportance 0 -none- NULL proximity 0 -none- NULL ntree 1 -none- numeric mtry 1 -none- numeric forest 11 -none- list coefs 0 -none- NULL y 301211 -none- numeric test 0 -none- NULL inbag 0 -none- NULL terms 3 terms call na.action 152170 omit numeric > importance(model) IncNodePurity SQMLAND 22156.939 SQMWATER 8838.317 MEDIANAGE 24688.405 MEDHHINC 26790.211 MEANHHSIZE 23293.283 PCOWNEROCC 13776.804 PCTURNOVER 13744.089 PCBORNUSA 13494.287 PCUNEMPLOYED 9865.677 SQMLAWN 31388.236 MEDCONSTYR 54570.323 PMINPET 12390.099
summary(gamodel) Family: gaussian Link function: identity Formula: NDVI ~ s(SQMLAND) + s(MEDIANAGE) + s(MEDHHINC) + s(MEANHHSIZE) + s(PCOWNEROCC) + s(PCTURNOVER) + s(PCBORNUSA) + s(PCUNEMPLOYED) + s(SQMLAWN) + s(MEDCONSTYR) + s(RAINFALL60) Parametric coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.004502 0.001493 3.016 0.00256 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Approximate significance of smooth terms: edf Ref.df F p-value s(SQMLAND) 8.952 8.999 337.97 <2e-16 *** s(MEDIANAGE) 8.999 9.000 1375.92 <2e-16 *** s(MEDHHINC) 8.995 9.000 2272.24 <2e-16 *** s(MEANHHSIZE) 8.975 9.000 1308.43 <2e-16 *** s(PCOWNEROCC) 8.990 9.000 473.22 <2e-16 *** s(PCTURNOVER) 8.911 8.998 246.44 <2e-16 *** s(PCBORNUSA) 8.982 9.000 334.18 <2e-16 *** s(PCUNEMPLOYED) 8.958 8.999 97.41 <2e-16 *** s(SQMLAWN) 8.985 9.000 1296.71 <2e-16 *** s(MEDCONSTYR) 8.932 8.998 5495.54 <2e-16 *** s(RAINFALL60) 8.976 9.000 1098.06 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 R-sq.(adj) = 0.366 Deviance explained = 36.7% GCV = 0.65316 Scale est. = 0.65294 n = 293044