Data from Purple Air (PA) sensors is often strongly correlated with official measurements of PM2.5 from a federal regulatory monitor (FRM). Unfortunately, PA data has a strong positive bias when compared with co-located FRM data. A scaling factor must be applied to convert raw PA measurements into PM2.5 equivalents and hence AQI values and colors.
This page explores functionality for:
There is still a great deal to learn about the sources of variability in PA sensor data. Tools that make statistical analysis of sensor data quick and easy to assess will help state and federal agencies understand how best to use this data.
Unless otherwise stated, all timeseries analyses use data from a period of high smoke impacts from the Camp Fire: Nov 11-18.
PA sensor metadata is available as a JSON file on the PA website. This data can be enhanced with important data from other sources. We find it useful to add several columns of metadata relating to FRM data made available through the PWFSLSmoke R package. Below, an interactive map of PA sensors is colored by proximity to a "PWFSL" monitor.
Software resets and electrical glitches regularly cause spikes in PA data. Other fields have developed robust mathematical techniques for handling such outliers. Here were are using a "Hampel" filter from a seismology package to highlight outliers. A count of the number of outliers detected could be used as a "state-of-health" metric for sensor performance.
Another opportunity for calculating "state-of-health" metrics is to look at
the consistency of data reported by the two identical sensors in each PA device.
The two channels sample at staggered times so minor deviations from the diagonal
are to be expected. But any large deviation from R^2 = 1, slope = 1,
intercept = 0
would imply that the sensor is unhealthy.
It is wise to visually inspect data before attempting any sort of statistical analysis. The following plot displays raw data from six PA sensors associated with the AMTS_TESTING site (in purple) along with data from the FRM (black line) located 1 meter away. Background shading identifies periods of day and night.
With seemingly well behaved data, we can proceed and perform a multi-variate linear
fit to calculate the scale and offset needed to map PA data onto FRM data.
Ideally, we would hope for a result with bias only: R^2 = 1, slope = X,
intercept = 0
. In practice, any result with R^2 > 0.9
seems
reasonably good for PA sensors.
The plot above implies that PA sensors demonstrate a high R2 when fit to FRM data during one of the smokiest periods in California history. Wood smoke dominates the particulate makeup and the air is very dry -- ideal conditions for an inexpensive particle counter.
But how well do they perform in a low-smoke period. We can perform the same
analysis with the same monitor for a period in mid October. In the plots below
we see that, in period of relatively good air quality, the fit is nowhere
near as good -- R^2 = 0.74
.
This implies the need for continuous recalibration.
If we keep the original time frame and move to a location outside of the Camp Fire impacts, we can test whether fitting depends on location. The plots below, from a location near San Luis Obispo, demonstrate that there are times and places where PA sensor data simply cannot be meaningfully converted in AQI values.
This implies the need for location-specific calibration.