logo-VinBigData-2020-02

VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations

Dataset description

In an effort to provide a large dataset of chest x-ray (CXR) images with high-quality labels for the research community, we have built the VinDr-CXR dataset from more than 100,000 raw images in DICOM format that were retrospectively collected from the Hospital 108 and the Hanoi Medical University Hospital, two of the largest hospitals in Vietnam. The published dataset consists of 18,000 postero-anterior (PA) view CXR scans that come with both the localization of critical findings and the classification of common thoracic diseases. These images were annotated by a group of 17 radiologists with at least 8 years of experience for the presence of 22 critical findings (local labels) and 6 diagnoses (global labels); each finding is localized with a bounding box. The local and global labels correspond to the “Findings” and “Impressions” sections, respectively, of a standard radiology report.  

We divide the dataset into two parts: the training set of 15,000 scans and the test set of 3,000 scans. Each image in the training set was independently labeled by 3 radiologists, while the annotation of each image in the test set was even more carefully treated and obtained from the consensus of 5 radiologists. The labeling process was performed via our own web-based framework called VinLab, which was built on top of a Picture Archiving and Communication System (PACS). A demonstration of this framework can be found here.

All DICOM images and the labels of the training set are released. We temporarily retain the labels of the test set for the purpose of holding a CXR analysis competition on the Kaggle.com platform, which is expected to launch in December of 2020.   

pasted image 0

Examples of CXRs with radiologist’s annotations. Abnormal findings (local labels) marked by radiologists are plotted on the original images for visualization purpose. The global labels are in bold and listed at the bottom of each example. Better viewed on a computer and zoomed in for details.

Dataset Statistics

Note: the numbers of positive labels were reported based on the majority vote of the participating radiologists. (*) The calculations were only based on the CXR scans where patient’s sex and age were known. (-) To preserve the integrity of the test set, its labels are not released to the public. The statistic of the labels on the test set is therefore not shown here.

Distribution of findings and pathologies on the training set of the VinDr-CXR Dataset.

Download Dataset

This Data Use Agreement (DUA) determines whenever the VinDr-CXR Dataset is transmitting or receiving by any individual or organization. By registering for download of this dataset, you are agreeing the following terms:

  1. Recipients (individuals or organizations) may use or disclose the VinDr-CXR Dataset only as permitted by this DUA. Permission is granted to use the VinDr-CXR Dataset without charge for non-commercial research purposes only. Any other purpose is prohibited;

  2. Recipients (individuals or organizations) may use the VinDr-CXR Dataset for legal purposes only.

  3. Require the recipient to use appropriate safeguards to prevent an unauthorized use or disclosure the VinDr-CXR Dataset not contemplated by the agreement;

  4. Require the recipient to report to the VinBigdata any use or disclosure of which it becomes aware that is not permitted by this agreement or illegal;

  5. Re-identification is strictly prohibited. All recipients agree that they will not make any attempt to re-identify any of the individual data subjects from the VinDr-CXR Dataset. The recipients will not publish any information on an individual patient in the case an individual patient can be identified. Any re-identification of any individual data subject shall be immediately reported to the VinBigdata;

  6. Do not distribute, publish, or reproduce a copy of any portion or all of the VinDr-CXR Dataset to a third party without permission from the VinBigdata. Do not share the downlink of the VinDr-CXR Dataset to others. If another user wishes to use the VinDr-CXR Dataset, they must register as an individual user and comply with all the terms of this DUA;

  7. Do not modify the original images and label annotations from the VinDr-CXR Dataset. You must not remove or alter any copyright or other proprietary notices in the VinDr-CXR Dataset;

  8. The use of the VinDr-CXR Dataset for clinical research purposes in the diagnosis or provision of patient care should be approved by authorities. The VinBigdata does not assume liability when the dataset is used for clinical use;

  9. Recipients will cite the source of information in all publications, which use the VinDr-CXR Dataset.

  10. Any violation of this DUA or other impermissible use shall be grounds for immediate termination of use of the VinDr-CXR Dataset. In this case, the violator is responsible for indemnifying and holding the VinBigdata harmless from any claims, losses or damages, including legal fees, arising out of or resulting from the use of the VinDr-CXR Dataset./

Author List and Affiliations

Ha Q. Nguyen [1,2,†], Khanh Lam [3,†], Linh T. Le [4,†], Hieu H. Pham [1,2,*], Dat Q. Tran [1], Dung B. Nguyen [1], Dung D. Le [4,‡], Chi M. Pham [4,‡], Hang T. T. Tong [4,‡], Diep H. Dinh [3,‡], Cuong D. Do [3,‡], Luu T. Doan [4,‡], Cuong N. Nguyen [4,‡], Binh T. Nguyen [4,‡], Que V. Nguyen [4,‡], Au D. Hoang [4,‡], Hien N. Phan [4,‡], Anh T. Nguyen [4,‡], Phuong H. Ho [5,‡], Dat T. Ngo [1], Nghia T. Nguyen [1], Nhan T. Nguyen [1], Minh Dao [1], and Van Vu [1,6]

[1] Vingroup Big Data Institute (VinBigdata), Hanoi, Vietnam

[2] VinUniversity, Hanoi, Vietnam

[3] 108 Military Central Hospital, Department of Radiology, Hanoi, Vietnam

[4] Hanoi Medical University Hospital, Department of Radiology, Hanoi, Vietnam

[5] Tam Anh Hospital, Department of Radiology, Ho Chi Minh City, Vietnam 

[6] Yale University, Department of Mathematics, New Heaven, CT 06511

† These authors contributed equally to this work

‡ These authors contributed equally to this work

* Corresponding author: Hieu H. Pham (v.hieuph4@vinbigdata.org)

 

Citation

For any publication that explores this resource, the authors must cite the original paper as follows:

Ha Q. Nguyen et al. “VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations” – A preprint

BibTeX citation:

 

@misc{nguyen2020vindrcxr,
      title={VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations}, 
      author={Ha Q. Nguyen and Khanh Lam and Linh T. Le and Hieu H. Pham and Dat Q. Tran and Dung B. Nguyen and Dung D. Le and Chi M. Pham and Hang T. T. Tong and Diep H. Dinh and Cuong D. Do and Luu T. Doan and Cuong N. Nguyen and Binh T. Nguyen and Que V. Nguyen and Au D. Hoang and Hien N. Phan and Anh T. Nguyen and Phuong H. Ho and Dat T. Ngo and Nghia T. Nguyen and Nhan T. Nguyen and Minh Dao and Van Vu},
      year={2020},
      eprint={2012.15029},
      archivePrefix={arXiv},
      primaryClass={eess.IV}
}

We also encourage such authors to release their code and models, which will help the community to reproduce experiments and to boost the research in the field of medical imaging.

Contact

We welcome any comments, suggestions or feedback you have for us that help improve the dataset, correspondence should be addressed to Hieu H. Pham (v.hieuph4@vinbigdata.org)

viVI