Given that we can easily generate linear models to fit PurpleAir to AirNow values, now we can explore the ways in which these models behave and compare. In this document we will analyze how distance and geography can effect a model’s accuracy in its fit.
For this experiment we will use data from the Methow Valley in Washington. In this area are two AirNow monitors: One in Winthrop and one in Twisp, and spread between and around them are a number of PurpleAir units part of the Methow Valley Clean Air Ambassador (MVCAA) group. Here’s a map of all the MVCAA monitors and their distance from the closest Airnow monitor.
Let’s look at Twisp first. Here’s a zoomed in view of the monitors we’ll be working with, which are at most 2km away from the AirNow unit at 3rd Avenue and Glover Street.
Let’s get the MVCAA monitors’ time series raw PM 2.5 readings for the past week. Since PurpleAir monitors take readings for each of their two channels, we’ll only use data from channel A.
Download the Twisp Town Hall time series data between 2018-08-29 and 2018-09-01: twisp_mvcaa_townHall <- downloadParseTimeseriesData(pas = pas_enh, name = "MV Clean Air Ambassador @ Twisp Town Hall", startdate =
2018-08-29, enddate =
2018-09-01)
Keep only channel A readings: twisp_mvcaa_townHall$data <- filter(twisp_mvcaa_townHall$data, channel == "A")
PurpleAir monitors take readings once about every 40 seconds while our AirNow data has one entry per hour. So let’s time-average the data of our two MVCAA monitors so they align with AirNow:
Time-averaging Balky Hill time series data by the hour using the AirNow package: twisp_mvcaa_balkyHill$data <- timeAverage(twisp_mvcaa_balkyHill$data, avg.time = "hour")
Here are the Twisp AirNow readings:
Now that we have baseline PM 2.5 readings for this period and location, we can can make our linear fit models to translate from MVCAA PurpleAir monitors to AirNow values:
Using lm()
to make linear fit models for both MCVAA monitors: twisp_mvcaa_townHall_model <- lm(data = twisp_combined_services, an_pm2.5 ~ mvcaa_townhall_pm2.5)
twisp_mvcaa_balkyHill_model <- lm(data = twisp_combined_services, an_pm2.5 ~ mvcaa_balkyhill_pm2.5)
Monitor | AirNow Distance (m) | Model | R-Squared |
---|---|---|---|
Twisp Town Hall | 22.92364 | y = 0.5x + 2.64 | 0.9801462 |
Balky Hill | 1790.89668 | y = 0.56x + 8.45 | 0.6384973 |
We can see how these models actually translate the MVCAA data to AirNow values by using R’s predict.lm()
function: twisp_combined_services$mvcaa_townhall_pm2.5 <- predict.lm(twisp_mvcaa_townHall_model, twisp_combined_services)
twisp_combined_services$mvcaa_balkyhill_pm2.5 <- predict.lm(twisp_mvcaa_balkyHill_model, twisp_combined_services)
Judging from the R-Squared values of the models and the alignment of the fitted plots, it is safe to say that the MVCAA monitor closest to the AirNow unit has a much better fit than the one further away. This itself isn’t much of a revalation, but we should also keep in mind that not only lateral distance has an effect here, because the valley surrounding Twisp and Winthrop may cause smoke to flow down and settle to the bottom. Since both the AirNow and MVCAA Twisp Town Hall monitors are located at nearly the same elevation, there is not much variation in smoke vertically and it’s understandable that their readings would align well. However, the Balky Hill MVCAA monitor is located on the side of the valley about 500 ft above the bottom. Despite being less than 2km from downtown, the fact that the geography and elevation change substantially may explain why the Balky Hill R-squared value is only 0.6384973 compared to the Town Hall value of 0.9801462.
With Twisp finished for now, let’s repeat the process with the MVCAA monitors near the AirNow unit in Winthrop:
We’ll download and take the hourly averages of the time series:
Find the Winthrop AirNow readings:
Generate the linear fit models from the MVCAA monitors to the AirNow unit:
Monitor | AirNow Distance (m) | Model | R-Squared |
---|---|---|---|
Winthrop Library | 79.15582 | y = 0.51x + 1.55 | 0.9859980 |
Lower Studhorse | 1498.46546 | y = 0.29x + 6.35 | 0.5771908 |
And finally apply the models to their respective monitor readings:
In the case of Winthrop, it seems that both MVCAA monitors produce good fits. Once again, the one closest to the AirNow unit (<100m away) has the best r-squared value. What’s particularly exciting to see here though is that the scale factors of both models, 0.5145 and 0.5145, are just about the same. A constant scale factor would be ideal for applying to PurpleAir monitors throughout the area.