Taking a look at this PurpleAir time series for PM 2.5, there are a handful of points which clearly stand out as outliers. It seems highly unlikly that the PM 2.5 in the area naturally spiked for just one or two minutes before settling back to its previous trend, so is there something about the sensor itself that is causing this pattern? We can’t tell much just from this PM 2.5 time series since these outliers do not occur at equal time intervals, but it may have to do with other measurements like temperature, humidity, uptime, or PM 1.0, and PM 10.0. Can we find similar outliers among them too?
Let’s plot them all against each other:
That’s a lot to look at! Since our PM 2.5 outliers occur at seemingly random points in time, do any of the other measurements also do something strange at these same instances? Scanning the “Time” column on the far left, temperature and humidity don’t appear to do anything notable when PM 2.5 shows an outlier. Uptime, however, is interesting since over it often resets to zero after uneven intervals.
Maybe PM 2.5 measurements and uptime are somehow related? Let’s remove the other measurements and analyze these further:
Again, looking down the “Time” column on the far left you can see that uptime resets might actually align with the PM 2.5 outliers. We can cross-reference this theory by taking the right-middle “uptime vs. PM 2.5” plot, where you can see many isolated PM 2.5 values scattered about vertically along the line uptime = 0. Something must be happening to the PM readings when the sensor resets.
Let’s finally plot these two measurements on the same axis and see if the reset and outlier times match:
Wow, it sure seems like it! The reset accounts for pretty much every clear outlier we can see in this interval. Let’s try it over a longer time span just to check:
Once again we find that most obvious ouliers occur during sensor resets. This is quite helpful to know not only for raising awareness about this issue, but also marking outliers for quality control while this phenomenon is still present. Now since we have found the underlying cause, it may be unnecessary to run outlier detection algorithms to detect these odd measuremets which might end up marking natural dips and spikes as sensor errors.