Comparing diagnostic tests: verification bias
An article in this week's Archives of Internal Medicine discusses a limitation of study design and execution that can happen in comparisons of diagnostic testing options, an issue known as verification bias.
You have a new diagnostic test, Exciting Test A, that may be an option for seeing if patients have Awful Disease X.
You also have Old-Standby Test B, the existing "gold standard" diagnostic test for diagnosing Awful Disease X ("gold standard" means that Old-Standby Test B is the best thing you had going up until now to figure out if someone has Awful Disease X).
You want to set up a study to see if Exciting Test A is an accurate test for diagnosing this disease, in comparison to the Old-Standby.
There are lots of pitfalls in designing this kind of study (the Bandolier site has a really good discussion of the most common potential limitations of diagnostic studies).
An example of one of these pitfalls - verification bias:
The study by Lauer et al. in this week's Archives estimates the impact of verification bias - this kind of bias happens when everyone in the study gets Exciting Test A, but not everyone gets Old-Standby Test B - i.e. the "truth" of the Test A results are not verified in the whole set of patients by Test B , which should be the definition of true disease status.
The reference: Lauer MS, Murthy SC, Blackstone EH, Okereke IC, Rice TW. [18F]Fluorodeoxyglucose Uptake by Positron Emission Tomography for Diagnosis of Suspected Lung Cancer: Impact of Verification Bias. Arch Intern Med. 2007;167:161-165 (abstract).
What this study looked at:
- The patient population: 534 patients with suspected lung cancer (Awful Disease X)
- Exciting Test A: PET scan
- Old-Standby Test B: tissue diagnosis (including mediastinoscopy, transbronchial biopsy, thoracotomy, percutaneous fine needl aspiration, or thoracentesis)
- 419 patients (78%) underwent both PET scan and tissue diagnosis. In this group, sensitivity (people with the disease who test positive) of PET scanning was 95% and specificity (people without the disease who test negative) was 31% (both figures related to the test's ability to detect cancer at any site).
- Authors used two methods to adjust for verification bias (since 115 patients only underwent PET scanning): the Diamond method (relatively simple) and the Begg Greenes method (more complex formula).
- Using the Diamond method, the adjusted sensitivity was 87% and the adjusted specificity was 55%. The Begg Greenes method yielded a sensitivity of 85% and 51% specificity. So, with each method of adjustment, sensitivity went down (a lower percentage of people with lung cancer actually tested positive) and specificity went up (a higher percentage of people without lung cancer actually tested negative).
- "Real world" meaning of these estimates -- a higher proportion of diagnoses of lung cancer were probably missed by PET scanning when it was not accompanied by tissue diagnosis -- so a greater number of lung cancer cases were missed by the PET-scan-only approach than the results would indicate if you didn't account for verification bias (i.e. if you ignore the potential impact of verification bias, PET scanning looks better than it actually is for diagnosing lung cancer).
- The authors conclude that verification bias in this case has a substantial impact on the measures of diagnostic accuracy for PET in assessing cases of suspected lung cancer, and suggest that clinicians should "lower their threshold for proceeding to definitive tissue diagnosis in the setting of negative PET scan findings."
Another prominent evaluation of verification bias:
Punglia RS, D’Amico AV, Catalona WJ, Roehl KA, Kuntz KM. Effect of verification bias on screening for prostate cancer by measurement of prostate-specific antigen. N Engl J Med. 2003;349:335-342. (full-text)