Increased public interest in low cost air quality sensors creates a need for software that provides high quality analysis and visualization of sensor data in a manner that is open, transparent and reproducible. The AirSensor R package addresses part of this need with functionality focused on data from Purple Air sensors.
Low-cost air quality sensors are providing an increasingly dense network of high resolution air quality measurements throughout North America. This is particularly true of laser particle counters measuring PM1.0, PM2.5 and PM10 such as those produced by PurpleAir. Much of the careful analysis of PurpleAir data by academic institutions and government agencies is performed with in-house or proprietary tools that are not freely available to members of the general public. Mazama Science has partnered with the Air Quality Sensor Performance Evaluation Center (AQ-SPEC) at California’s South Coast Air Quality Management District to create an open source R package that provides a full suite of data download and analysis capabilities for PurpleAir PM2.5 data.
The R statistical programming language and the RStudio IDE provide powerful tools for air quality researchers and analysts. The AirSensor R package is part of a suite of open source R packages focused on making high quality analysis and visualization more accessible to anyone working with air quality data. Other R packages of note include: openair, PWFSLSmoke, AirMonitorPlots. Taken together, these packages make it straightforward to thoroughly interrogate air quality data. In the example above, PurpleAir data is used to quantify the particulate levels seen in two images of the Bay Bridge before and during the 2018 Camp Fire smoke episode.
The AirSensor R package provides functions to access, regularize and QC data from over 9000 PurpleAir sensors with a simple, readable syntax. With AirSensor, analysts can access up-to-the-hour PurpleAir data as well as historical time series. Accessing PurpleAir synoptic and timeseries data can be achieved with the code below.
# Import the library and set the URL to gather the Synoptic Data list
library(AirSensor)
setArchiveBaseUrl("http://smoke.mazamascience.com/data/PurpleAir")
# Load the Synoptic PurpleAir Sensor data
pa_synoptic <- pas_load()
# Load the timeseries data from a selected PurpleAir Sensor
pa_timeseries <- pat_createNew( pas = pa_synoptic,
label = 'POLK GULCH',
startdate = 20181001,
enddate = 20181201 )
The PurpleAir Synoptic data contains recent information on all avliable sensors. This data includes recent & averaged PM2.5, humiditiy, temperature, and other information. We can visualize the location of all the PurpleAir sensors avliable using an interactive map.
pa_synoptic %>% pas_leaflet()
The AirSensor R package provides many intuitively named functions and arguments and harnesses R’s pipeline syntax so that analysts can write R code that is easy to understand. Several functions provide high level analysis and visualization in a single line of code. The code below demonstrates how straightforward it is to work with data from a PurpleAir sensor in downtown San Francisco. The plot clearly shows the decline in air quality from smoke that drifted in from Butte County’s catastrophic Camp Fire in 2018.
pa_timeseries %>% pat_multiplot()
There are also interactive options for plotting.
pa_timeseries %>% pat_dygraph()
Mazama Science offers many tools for dealing with hourly time axis data regarding air quality. Other packages Mazama Science supports can be found at our GitHub. The following will demonstrate the integration of Mazama Science’s R tools using PWFSLSmoke R package, and AirMonitorPlots R package.
# Convert to 'AirSensor object' aloing a regularized hourly axis.
airsensor <- pa_timeseries %>% pat_createAirSensor()
We can use Mazama Science’s AirMonitorPlots package to expand the capabilites of plotting hourly axis data. Below is an example of a daily average calendar plot.
library(AirMonitorPlots)
monitor_ggCalendar(airsensor)
We can also look at daily averages for any PurpleAir hourly timeseries using the AirMonitorPlots package.
monitor_dailyBarplot(airsensor)
Because Mazama Science R tools use a regular air quality data format we can further expand the capablilties in a consistent and powerful way.
PurpleAir sensors generate data using laser particle counters. With assumed values of scattering coefficient and particle density, particle counts are converted into mass concentration (µg/m3). Federal reference monitors measure mass concentration more directly and accurately. It is thus important to compare sensor data with federal reference data to validate and scale the sensor data. The AirSensor R package provides functions to compare a PurpleAir sensor with the nearest federal reference monitor. The comparison below shows a sensor performing quite well during the 2018 Camp Fire.
pa_timeseries %>% pat_monitorComparison()
An important aspect in the analysis of low-cost sensor data is determining whether or not the device itself is “healthy”. The AirSensor package provides functions that calculate “State-of-Health” metrics to help assess the day-to-day reliability of data from a PurpleAir sensor. Below is an example of a San Francisco PurpleAir sensor performing poorly and providing inaccurate results during the Camp Fire smoke intrusion in October-November, 2018.
# Load a bad PurpleAir Sensor
bad_pat <- pat_createNew( pas = pa_synoptic,
label = 'South Beach',
startdate = 20181001,
enddate = 20181201 )
# Plot the bad data
bad_pat %>% pat_multiplot()
Visually, this sensor looks ‘bad’. We can futher confirm this with an overlayed timeseries plot of both channels and performing a Pearson correlation using AirSensors built in tools.
bad_pat %>% pat_internalFit()
Indeed, our correlation coefficent is too low and therefore our sensor channels do not agree. We can assume this sensor is ill-performant and is not recommended for analysis. We can further our investigation using PurpleAir timeseries State of Health functions in the AirSensor package.
bad_pat %>% pat_dailySoHPlot()
This clearly shows that while this sensor is reporting real values, they are likely erronous due to a pegged DC signal on channel B and low channel-to-channel correlation.
The AirSensor package provides tools for PurpleAir sensors and air quality analysis. The rise in public interest and safety regarding air quality deserves easy access and we at Mazama Science believe air quality data should be inutitive and open across all disciplines. Mazama Science provides modern, open source approaches to data management, analysis and visualization. We collaborate with clients across disciplines to make large, high-value datasets more usable with intuitive, interactive tools.
To Learn more about our mission, visit our website and Github.
Mazama Science