Mammography

logo-VinBigData-2020-02

VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography

Dataset Description

This project introduces a large-scale benchmark dataset of full-field digital mammography, called VinDr-Mammo, which consists of 5,000 four-view exams with breast-level assessment and finding annotations. Each of these exams was independently double read, with discordance (if any) being resolved by arbitration by a third radiologist. 

We make VinDr-Mammo publicly available as a new imaging resource to promote advances in developing computer-aided detection and diagnosis (CADe/x) tools for breast cancer screening.

Figure 1. A sample mammography exam with the right breast being assessed with BI-RADS 5, density B, and the left breast with BI-RADS 1, density B. CC denotes craniocaudal and MLO denotes mediolateral oblique.

Figure 2. Distribution of patient age. This statistic is calculated overall all exams in which patient’s age is available.

Table 1. Statistics of breast-level BI-RADS assessment.

Table 2. Statistics of breast-density.

finding_stats

Table 3. Findings statistics on the VinDr-Mammo dataset. The number of findings and the rate of findings per 100 images are provided for the training set, test set, and the whole dataset.

Download

The VinDr-Mammo dataset can be downloaded from Physionet.

Citation

For any publication that explores this resource, the authors must cite the original paper as follows:

Hieu T. Nguyen et al. “A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography” – A preprint is available on medRxiv.

BibTeX citation:

@article{Nguyen2022.03.07.22272009,
author={Nguyen, Hieu T. and Nguyen, Ha Q. and Pham, Hieu H. and Lam, Khanh and Le, Linh T. and Dao, Minh and Vu, Van},
title={VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography},
year={2022},
doi={10.1101/2022.03.07.22272009},
URL={https://www.medrxiv.org/content/early/2022/03/10/2022.03.07.22272009},
journal={medRxiv}
}

Contact

Correspondence should be addressed to: Ha Nguyen (v.hanq3@vinbigdata.com).

en_USEN