VinDr-PCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in children

Dataset Description

Computer-aided diagnosis (CAD) systems in adult chest radiography (CXR) have recently achieved great success thanks to the availability of large-scale, annotated datasets and the advent of high-performance supervised learning algorithms. However, the development of diagnostic models for detecting and diagnosing pediatric diseases in CXR scans is undertaken due to the lack of high-quality physician-annotated datasets. To overcome this challenge, Vingroup Big Data Institute (VinBigdata) introduce and release VinDr-PCXR, a new pediatric CXR dataset of 9,125 studies retrospectively collected from a major pediatric hospital in Vietnam between 2020 and 2021. Each scan was manually annotated by a pediatric radiologist who has more than ten years of experience. The dataset was labeled for the presence of 36 critical findings and 15 diseases.

To the best of our knowledge, this is the first and largest pediatric CXR dataset containing lesion-level annotations and image-level labels for the detection of multiple findings and diseases.

Table 1. Overview of existing public datasets for CXR interpretation in pediatric patients.

Figure 1. Several examples of pediatric CXR images with radiologist’s annotations. Local labels marked by radiologists are plotted on the original images for visualization purposes.

Dataset Statistics

Table 2. Dataset characteristics of VinDr-PCXR.


The dataset can be downloaded from PhysioNet. Note that only credentialed users who sign the specified DUA can access the files.


For any publication that explores this resource, the authors must cite this original paper:

Ngoc H. Nguyen, Hieu H. Pham, Thanh T. Tran, Tuan N.M. Nguyen, and Ha Q. Nguyen “VinDr-PCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in children,” medRxiv preprint



Correspondence should be addressed to: Ha Nguyen (v.hanq3@vinbigdata.com).


[1] Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131.e9 (2018).

[2] Chen, K.-C. et al. Diagnosis of common pulmonary diseases in children by X-ray images and deep learning. Sci. Reports 10, 1–9 (2020).