Dataset CXR

logo-VinBigData-2020-02

VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations

Dataset description

In an effort to provide a large dataset of chest x-ray (CXR) images with high-quality labels for the research community, we have built the VinDr-CXR dataset from more than 100,000 raw images in DICOM format that were retrospectively collected from the Hospital 108 and the Hanoi Medical University Hospital, two of the largest hospitals in Vietnam. The published dataset consists of 18,000 postero-anterior (PA) view CXR scans that come with both the localization of critical findings and the classification of common thoracic diseases. These images were annotated by a group of 17 radiologists with at least 8 years of experience for the presence of 22 critical findings (local labels) and 6 diagnoses (global labels); each finding is localized with a bounding box. The local and global labels correspond to the “Findings” and “Impressions” sections, respectively, of a standard radiology report.  

We divide the dataset into two parts: the training set of 15,000 scans and the test set of 3,000 scans. Each image in the training set was independently labeled by 3 radiologists, while the annotation of each image in the test set was even more carefully treated and obtained from the consensus of 5 radiologists. The labeling process was performed via our own web-based framework called VinLab, which was built on top of a Picture Archiving and Communication System (PACS). A demonstration of this framework can be found here.

All DICOM images and the labels of the training set are released. We temporarily retain the labels of the test set for the purpose of holding a CXR analysis competition on the Kaggle.com platform, which is expected to launch in December of 2020.   

pasted image 0

Examples of CXRs with radiologist’s annotations. Abnormal findings (local labels) marked by radiologists are plotted on the original images for visualization purpose. The global labels are in bold and listed at the bottom of each example. Better viewed on a computer and zoomed in for details.

Dataset Statistics

Note: the numbers of positive labels were reported based on the majority vote of the participating radiologists. (*) The calculations were only based on the CXR scans where patient’s sex and age were known. (-) To preserve the integrity of the test set, its labels are not released to the public. The statistic of the labels on the test set is therefore not shown here.

Distribution of findings and pathologies on the training set of the VinDr-CXR Dataset.

Download Dataset

Downloading the full version of the VinDr-CXR dataset is not allowed at the moment as it is currently used for the VinBigdata Chest X-ray Abnormalities Detection Competition hosted on the Kaggle platform. A slightly modified version of the dataset can be downloaded via the competition’s webpage.      

Author List and Affiliations

Ha Q. Nguyen [1,2,†], Khanh Lam [3,†], Linh T. Le [4,†], Hieu H. Pham [1,2,*], Dat Q. Tran [1], Dung B. Nguyen [1], Dung D. Le [4,‡], Chi M. Pham [4,‡], Hang T. T. Tong [4,‡], Diep H. Dinh [3,‡], Cuong D. Do [3,‡], Luu T. Doan [4,‡], Cuong N. Nguyen [4,‡], Binh T. Nguyen [4,‡], Que V. Nguyen [4,‡], Au D. Hoang [4,‡], Hien N. Phan [4,‡], Anh T. Nguyen [4,‡], Phuong H. Ho [5,‡], Dat T. Ngo [1], Nghia T. Nguyen [1], Nhan T. Nguyen [1], Minh Dao [1], and Van Vu [1,6]

[1] Vingroup Big Data Institute (VinBigdata), Hanoi, Vietnam

[2] VinUniversity, Hanoi, Vietnam

[3] 108 Military Central Hospital, Department of Radiology, Hanoi, Vietnam

[4] Hanoi Medical University Hospital, Department of Radiology, Hanoi, Vietnam

[5] Tam Anh Hospital, Department of Radiology, Ho Chi Minh City, Vietnam 

[6] Yale University, Department of Mathematics, New Heaven, CT 06511

† These authors contributed equally to this work

‡ These authors contributed equally to this work

* Corresponding author: Hieu H. Pham (v.hieuph4@vinbigdata.org)

 

Citation

For any publication that explores this resource, the authors must cite the original paper as follows:

Ha Q. Nguyen et al. “VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations” – A preprint is available on ArXiv 

BibTeX citation:

@misc{nguyen2020vindrcxr,
      title={VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations}, 
      author={Ha Q. Nguyen and Khanh Lam and Linh T. Le and Hieu H. Pham and Dat Q. Tran and Dung B. Nguyen and Dung D. Le and Chi M. Pham and Hang T. T. Tong and Diep H. Dinh and Cuong D. Do and Luu T. Doan and Cuong N. Nguyen and Binh T. Nguyen and Que V. Nguyen and Au D. Hoang and Hien N. Phan and Anh T. Nguyen and Phuong H. Ho and Dat T. Ngo and Nghia T. Nguyen and Nhan T. Nguyen and Minh Dao and Van Vu},
      year={2020},
      eprint={2012.15029},
      archivePrefix={arXiv},
      primaryClass={eess.IV}
}

We also encourage such authors to release their code and models, which will help the community to reproduce experiments and to boost the research in the field of medical imaging.

Contact

We welcome any comments, suggestions or feedback you have for us that help improve the dataset, correspondence should be addressed to Hieu H. Pham (v.hieuph4@vinbigdata.org)