X-ray imaging in Digital Imaging and Communications in Medicine (DICOM) format is the most commonly used imaging modality in clinical practice, resulting in vast, nonnormalized databases. This leads to an obstacle in deploying artificial intelligence (AI) solutions for analyzing medical images, which often requires identifying the right body part before feeding the image into a specified AI model. This challenge raises the need for an automated and efficient approach to classifying body parts from X-ray scans. Therefore, Vingroup of Big Data Institute (VinBigData) introduces and releases VinDr-BodyPartXR dataset including 16,093 X-ray images that are collected and manually annotated.
To the best of of our knowledge, the VinDr-BodyPartXR is currently the largest dataset to date that provides annotations for developing supervised-learning classification algorithms. We believe that the dataset will serve as a benchmark dataset for accelerating the development and evaluation of new machine learning models for the body part X-ray classification.