Emulation and uncertainty analysis

        Clive Anderson, Stefano Conti, Marc Kennedy, Tony O'Hagan

          Statistical emulators
          We are using Bayesian statistical methods to create emulators of the various computer models currently being developed within CTCD. Given training data comprising a set of points covering the space of plausible inputs, and the corresponding outputs at those points, the emulator is a probability distribution for the output function. Each emulator can be used for:

          Prediction
          For a set of untried inputs the emulator output can typically produce outputs much faster than the original code and is therefore a cheap surrogate model. In the example shown below we emulate the SPA model. Root mean squared error of the 150 point emulator is 0.314, compared with an error of 0.726 using ACM, a response surface approximation to SPA built using 6561 points:


          Sensitivity analysis
          As part of model validation, the effect that any individual input (or group of inputs) has on a code output can quickly be fed back to the model developer. This technique has helped identify flaws within the code. The figure below for example shows the impact of parameter sensitivity within an early version of SDGVM on the calculation of NEP:

          The effect of the leaf life span parameter seen here is questionable, and on further investigation was found to be due to an error in the phenology algorithm. A later version of SDGVM was used to create a series of 9 emulators with soil texture and bulk density as inputs. The remaining inputs were fixed to reflect conditions at 9 test sites. At some of the sites the Gaussian process model did not fit the model output data properly due to an error in the code. An example is shown below

          Here the roughness parameter associated with bulk density was unusually large, resulting in large emulator variances. Closer examination of the code led to the discovery of a severe discontinuity in the output as a function of bulk density. This discovery was passed back to the modellers, who were able to identify and correct the problem. The figure below shows the main effects using the corrected code.

          Uncertainty analysis
          If input parameters are uncertain then the effects of propagating this uncertainty through to the model output can be calculated analytically directly from the emulator. This is in contrast to a traditional Monte Carlo-based estimator which requires simulation from the inputs' probability distributions and could require many thousands of additional code runs.

          The percentage contribution of each input to the total uncertainty in the output can also be calculated. In this way, efforts to reduce uncertainty can be targeted on those inputs which contribute most. In the SDGVM picture shown above, for example, the contributions are: senescence (42%), bud burst (26%), soil sand % (22%), leaf life span (5%) and soil clay % (0%). The remaining 5% is due to interaction effects.

          GEM-SA software
          Software (for MS-Windows) which has been developed within CTCD is freely available to build emulators and perform prediction, sensitivity analysis and uncertainty analysis as described above. GEM-SA (Gaussian Emulation Machine for Sensitivity Analysis) has a user-friendly interface and includes features to generate suitable training input points, edit/load/save uncertainty projects, and to specify uniform or normal distributions for the unknown input parameters.


          Calibration
          When information is available from field observations the statistical analysis can be extended to jointly model uncertainty in

                  • code output
                  • discrepancy between code and reality
                  • code input parameters
                  • field measurement errors

          We are developing methods which update knowledge about the input parameters in the light of field data: calibration. Subsequent predictions from the model can take into account the remaining uncertainty in these "tuned parameters". At the same time we can learn about, and automatically correct for, the model inadequacy.


          HOME | MISSION STATEMENT | SCIENCE PROJECTS | PARTICIPATING ORGANISATIONS
          PEOPLE | NEWS | CTCD ORGANISATION | CONTACTS