Calibration and analysis of complex individual-based stochastic models: methodological development and application to explore the impact of HAART on HIV/AIDS in Africa
Complex individual-based stochastic models are being commonly used in communicable and non-communicable disease epidemiology, public health and health economics. The utility of models for the prediction and planning relies on how well they are calibrated to empirical data and how well can they be analysed to test the robustness of model predictions. However, the development of methods to calibrate and analyse complex models has greatly lagged behind their application. This can be attributed to the fact that most formal calibration methods, (including distance and likelihood based measures) require that the models are run many times. This poses a considerable problem for complex models as they may require many minutes or even hours for a single scenario. Methodological developments in Bayesian Emulation and Approximate Bayesian Computation (ABC) have addressed these issues to some extent for deterministic models. The objective of this project is to develop novel methods based both on Bayesian Emulation and ABC for the calibration and analysis of complex stochastic models.
The primary phenomenon we will model and use for this analysis is the HIV/AIDS trends over time in rural South West Uganda, before and after the introduction of HAART (Fig. 1). In this population, HIV prevalence rose rapidly and peaked in the early 1990s, before falling and stabilising between 1997 and 2004, and finally rising after the introduction of HAART. The fall in prevalence occurred after reductions in reported risk behaviour in this cohort.
Using our calibrated model we will predict the future impact of HAART on HIV prevalence, incidence and mortality. We will explore a range of HAART strategies, including achieving the current WHO recommendations and strategies of earlier treatment e.g. `Test and Treat’, incorporating the latest data on impact on behaviour, mortality, transmission and resistance.
Bayesian Emulation based on Gaussian processes has been developed over the past 20 years for the calibration and analysis of complex deterministic models. This approach represents a computer model, however complex, as a function that maps the vector of input parameters to the vector of outputs. An emulator is a stochastic function that represents beliefs about the model for parameter sets that have yet to be evaluated. Although it is still necessary to make a few evaluations of the complex model to train the emulator, the technique has been shown to reduce computational time for calibration and sensitivity analysis by orders of magnitude compared with standard methods, such as Markov chain Monte Carlo.
Emulators can also be combined with the calculation of an `Implausibility’ measure to rapidly exclude areas of parameter space in which fits are unlikely to be found. A high value of implausibility implies that, even considering all the existing uncertainties, a good match between the model output and observed data is unlikely to be found for a given set of input parameters.
To date, the emulation technology has been applied almost exclusively for the calibration and analysis of deterministic models. Significant methodological developments are required for the analysis and calibration of stochastic models. Our initial exploratory work using this methodology has been encouraging. In the example shown above, the emulator was trained using 1000 parameter sets and showed moderately good agreement to the complex model output for HIV prevalence trend. Computer time was reduced from over 1 hour on a computer cluster to just minutes on a desktop.
Recent developments in Approximate Bayesian Computation methods mean they can be immediately applied to the calibration and analysis of individual-based stochastic models. ABC algorithms seek to approximate the posterior distribution for the parameters by replacing the direct evaluation of the likelihood by model simulations, comparing observed and simulated data. In its simplest form, a candidate parameter vector is sampled from its prior distribution and a simulated dataset is generated from the model. If the simulated data match the observed data exactly, then by definition the candidate parameter vector must be a draw from the posterior. In practice the requirement to match the simulated to the observed data exactly is relaxed in order to produce manageable acceptance rates. The candidate parameter vector is accepted if the simulated data match the observed data sufficiently closely. Practically, `closeness’ can be achieved if summary statistics, calculated for both the observed and simulated data, are within a specified tolerance.
ABC methods are successful in providing accurate probabilistic statements about goodness of fit. They are however expected to struggle with models that have lots of parameters and subsequently span large areas of parameter space. Emulation based methods on the other hand, have proven their efficiency in reducing the size of parameter spaces in models with many inputs and establish which of the parameters are the more influential. This project will attempt to develop hybrid calibration strategies that combine the strengths of both calibration methods.
The model we will use for developing the calibration methods and predicting the impact of HAART is `Mukwano’. This is a dynamic, event-driven, individual-based stochastic model, which has been developed over the past five years at LSHTM and MRC/UVRI and has been designed so that it can be easily used at different levels of complexity. Mukwano simulates births and deaths, partnership formation and dissolution and heterosexual STI/HIV transmission. Its 50+ parameters include HAART coverage, duration of HIV and HIV infectiousness on HAART, HAART failure rate and others.
A key data source is the MRC/UVRI `General Population Cohort’ (GPC) of all residents of 25 villages in rural South Uganda (n~18000, annual survey from 1989 onwards to collect demographic, behavioural and HIV sero-data). Other data sources include the `Rural Clinical Cohort’ (RCC) of all consenting HIV positives from the GPC and matched HIV-negative controls (n=548, 1989 onwards) and the `Entebbe’ open cohort of HIV positives from nearby peri-urban setting (~1000, 1995 onwards). Key data for this project include trends in birth, death and migration rates, age at sexual debut, numbers of sexual partners per year, HIV prevalence, HIV incidence, AIDS incidence and mortality rates. Recent data on the sexual behavioural response to HAART in treated and untreated individuals have also become available from the GPC.