This article is part of the network’s archive of useful research information. This article is closed to new comments due to inactivity. We welcome new content which can be done by submitting an article for review or take part in discussions in an open topic or submit a blog post to take your discussions online.

Estimates of the sensitivity and specificity of a new diagnostic tool will be inaccurate when the reference standard used in an evaluation study is not perfect. This is true for many, if not most, diagnostics for NTDs.

Bayesian latent class models (LCMs) are useful statistical frameworks, and are used increasingly in evaluating diagnostic tests, if the available gold standard is imperfect.  For an introduction, refer to the FDA’s Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials, available via

Asst. Prof. Direk Limmathurotsakul, Cherry Lim and colleagues from the Mahidol-Oxford Tropical Medicine Research Unit (MORU) at Mahidol University, Bangkok (Thailand) have developed an interesting open-access application that allows non-experts to apply Bayesian LCMs to their datasets of diagnostics evaluation studies, returning results within minutes.

The MICE (Modelling for Infectious disease CEnter) online application provides two Bayesian LCMs: A two-tests in two-populations model and a three-tests in one-population model. In the simplified version, the statistical settings are fixed as defaults, whereas the advanced version provides more flexibility. In the advanced version correlation among diagnostic test results and prior distributions for all unknown parameters can be customised.

The MICE application is free and available online here:

The architecture of MICE: The data is entered via a web page, processed at a central web server and the results are provided on an output web page (Lim et al. (2013))


Examples screens of data input (A) and parameter web pages (B) (Lim et al. (2013))


We have asked the developers, Asst. Prof. Direk Limmathurotsakul and Cherry Lim, a few questions regarding the practical application, and here are their answers:

How would new users go about using MICE?

New users who have developed a new diagnostic test and would like to explore using a “imperfect gold standard model” instead of a classical approach by comparing the new test to a gold standard test should read the short introduction under ‘General description’ and ‘Model description’ tabs on the website first:

It is definitely of advantage to try the model’s simplified version, with pre-set parameters, before moving on to the advanced version, and to check the fit of the model before results are interpreted: Instructions on how to interpret the results can be found on the output page. It should also be noted that the interpretation of a 95% credible interval in Bayesian statistics differs from a 95% confidence interval that is used in a classical approach (in Frequentist statistics). If possible, results should be double checked with a statistician.

The provided models can also be used by statisticians to get a preliminary idea about their dataset. They should check the model fit and the data structure, and may wish to consider tailoring the codes to the specific dataset either by rewriting the model in WinBugs or R or using tools provided on the advanced version webpage.

What things will need to be taken into consideration from a practical point of view?

First of all, this model is used for a prospective evaluation of diagnostic test, not in a case-control study design. The application is very user-friendly. Only 8 numbers from a dataset are needed to use the model. However, models’ limitations should also be kept in mind. For example, the two-tests in two-populations model assumes that the prevalence of the disease in those two populations are different. Moreover, the most essential (and perhaps the most difficult thing) is to check the goodness-of-fit of the model.

Which applications do you consider your model most suitable for?

The models are especially useful when there is no perfect gold standard, unethical to apply the gold standard test onto all patients in daily practice (during routine practice), or when the gold standard is imperfect. Having said that, the model can also be useful for when there is a perfect gold standard; results from the model would support that there is truly a perfect gold standard. In general: The larger the sample size with both diseased and non-diseased patients, the more precise the estimate will be. Users need to ensure that each patient provides an equal amount of information, such that all two/three tests applied on each individual.

Could you perhaps explain how you used the models for diagnostic tests for meliodosis, to give a practical example?

Meliodosis, or Whitmore’s disease, is an infectious disease caused by a Gram-negative bacterium, Burkholderia pseudomallei, found in soil and water and widely endemic in tropical countries. It causes severe respiratory tract infections and the septicemic form of melioidosis has a mortality rate that exceeds 90% without access to appropriate antibiotics.

For the diagnosis of meliodosis, bacterial culture, in general, has low sensitivity (i.e. it is impossible that 100% of the diseased patients will have culture positive for the organism); therefore, the bacterial culture likely to be an imperfect gold standard against which to evaluate alternative tests.  We applied Bayesian latent class models (LCMs) to data from patients with a single Gram-negative bacterial infection to define the true sensitivity of culture, and were also looking at the impact of misclassification by culture on the reported accuracy of alternative diagnostic tests.

As a result, estimates of the accuracy of four serological tests were significantly different from previously published values in which culture was used as the gold standard. The estimates of the accuracy are also supported by the external evidence. “Imperfect gold standard” models should be used to support the evaluation of diagnostic tests in this situation. It is likely that the poor sensitivity/specificity of culture is not specific for melioidosis, but rather a generic problem for many bacterial and fungal infections.

This study is published in Limmathurotsakul, D. et al. Defining the True Sensitivity of Culture for the Diagnosis of Melioidosis Using Bayesian Latent Class Models. PLoS One 5, (2010).


Thank you!



Lim, C. et al. Using a web-based application to define the accuracy of diagnostic tests when the gold standard is imperfect. PLoS ONE 8, e79489 (2013).

Limmathurotsakul, D. et al. Defining the True Sensitivity of Culture for the Diagnosis of Melioidosis Using Bayesian Latent Class Models. PLoS One 5, (2010).

Limmathurotsakul, D. et al. Fool’s gold: Why imperfect reference tests are undermining the evaluation of novel diagnostics: a reevaluation of 5 diagnostic tests for leptospirosis. Clin. Infect. Dis. 55, 322–331 (2012).

Pan-ngum, W. et al. Estimating the true accuracy of diagnostic tests for dengue infection using bayesian latent class models. PLoS ONE 8, e50765 (2013).