Defining protocols for reliable air pollution monitoring.
High levels of air pollution are harmful to health and thus need to be monitored. However, measurement devices are generally expensive and therefore sparsely distributed spatially. Meanwhile, research has shown that air pollution can vary widely on small scales, so personal exposure at the current coverage in cities is unknown. In some parts of the world there is even a lack of access. The hope is that such gaps can be filled by means of (electrochemical) low-cost gas sensor systems and networks (Fig. 1).
Nonetheless, the low price comes at the expense of lower reliability. Cross-sensitivities (i.e., a lack of selectivity), interferences with environmental factors such as temperature and humidity, high unit-to-unit variability, sensor aging, and concept drift are common deficiencies. Typically, these systems are calibrated using field data and machine learning algorithms (e.g., random forests or neural networks) to overcome some of these shortcomings.
Still, the operation of a larger sensor network in a city is very challenging due to the number and frequency of maintenance (i.e., recalibration) required. Although solutions exist with algorithms such as blind calibration, these only work if certain assumptions are met, which does not seem to be the case for air pollution monitoring.
The aim of this project was to address the remaining failure modes and design protocols to increase or track the reliability. Also, the question arose as to what extent algorithms are trustworthy at all. I worked on this as a research scientist at METAS and the peer-reviewed publications eventually became the basis for my dissertation.
First, the relocation problem with field calibrated systems was revised and traced back to lacking representativeness of the calibration data, typically followed by concept drift. In other words, the quality of the data decreases when the devices are moved to a different location, but it can also decrease over time after it has been in the same location. This work replicated field calibrations and examined the resulting models and the data used with techniques from data science. The main finding was that the cause was the lack of representativeness of the calibration data and the strong correlations between the measured variables. If the relationships between the variables change, the calibration model also becomes invalid. It was concluded that this problem could be solved by using orthogonal experimental designs.
With this knowledge, a compact continuous-flow automaton that allows characterizing cross-sensitivities and interferences with environmental factors as well as resolving spatial and temporal relocation problems was developed. It generates orthogonal atmospheres, i.e., gas mixtures at different relative humidities and temperatures, in an efficient manner using fractional factorial designs for the simultaneous calibration of an array of low-cost sensor systems in the laboratory. Such devices can then serve as mobile references (e.g., on top of buses or trams) to recalibrate other low-cost devices.
For field calibrated systems, which are heavily affected by concept drift, machine learning algorithms that monitor the trustworthiness of incoming measurements were proposed and discussed. This would allow, for instance, dynamic maintenance of the sensor systems. Anomalies are detected by estimating the support of the sensor signal distribution and by assessing the position of new signals with respected to this support. Moreover, it was demonstrated how such algorithms might be evaluated with strategies from software testing (i.e., evaluation in “virtual evironments” via numerical simulations based on finite difference equations representing physical phenomena).
Finally, a theoretical concept for the stochastic online recalibration of gas sensor networks by means of mobile reference instruments was presented. In essence, measured values would be compared during encounters and the calibration models would be adjusted by means of stochastic gradient updates. Recently developed gradient update rules such as RMSProp (with and without momentum) were explored. The algorithms and their design parameters were evaluated using Monte Carlo simulations. The analysis suggested that the reliability of the measurements could be maintained in this manner as sensor aging and concept drift are continuously compensated for.