Towards Fully Automated Segmentation of Proximal Femur MRI with Developmental Dysplasia of the Hip

Mahdis Khodadadi Jokar*, Jasper Kwasny, Cynthia Fantini Pagani, Alexander Zimmerer, Leon Robertz, Kevin Bill, Uwe Kersting, Bhushan Borotikar

*Corresponding author for this work

Publication: Chapter in Book/Report/Conference proceedingConference contribution - Published abstract for conference with selection processResearchpeer-review


INTRODUCTION: Three-dimensional (3D) surface models of hip joint bone structures prove to be crucial in the clinical management of developmental
dysplasia of the hip (DDH) in conducting accurate morphological analysis and in pre-surgery planning processes. Bone anatomy is typically segmented from
medical images using approaches ranging from fully manual to fully automated (1,2). While manual segmentation is considered a gold standard, it is time and
resource intensive. Automatic approaches, on the other hand, are less time-consuming but require substantial data to train the model for anatomical variations
in pathological morphology (2). State-of-the-art automatic segmentation techniques employ deep neural networks that can efficiently segment regions of
interest in medical images (2). Developing a robust, accurate, and fully automated segmentation tool that can handle both healthy and pathological datasets
would make patient-specific 3D bone models more accessible and practical for clinical and research purposes. Additionally, clinicians and data scientists often
only have mixed datasets of both CT and MRI. In this study, we proposed a novel strategy to use both CT and MRI datasets from normal and DDH cohorts to
build a deep learning segmentation framework. We evaluated the efficacy of this framework for the automatic segmentation of the proximal femur using MRI
of young adults diagnosed with DDH as well as CT scans from healthy adults.
METHODS: We employed deep learning algorithms using the open-source PyTorch-based MONAI Label (3) plugin through a medical image computing
platform 3DSLICER (4). Model building and testing was done using datasets comprised of cadaveric CTs from healthy adults (CTHealthy) (5), and clinical MRIs
from young adults with DDH (MRIDDH) and manually segmented labels. All images were cropped to a limited field of view (mean size: 460*480*46 for
MRIDDH, 160*170*300 for CTHealthy) centred on the proximal femur and normalized and standardized using a pixel brightness normalization. Three DyUNet
models were customized and trained using three different methods. The first model was trained on 25 CTHealthy scans with 350 epochs and used to segment a
separate test dataset of three CTHealthy scans. The second model was trained on 25 MRIDDH images, also with 350 epochs and used to segment a separate test
dataset of three MRIDDH images. The third model was trained on a dataset composed of both CTHealthy and MRIDDH images with 450 epochs for training. This
model was then assessed by segmenting a test dataset of three CTHealthy scans and three MRIDDH images. During training in all approaches, 20% of the training
images were randomly separated by the model for validation in each epoch. Each model’s performance was evaluated by calculating the dice score in each
epoch for training and validation images, and a total dice score of all epochs for the corresponding training and validation set. Additionally, the segmentations
of the test femurs were assessed by calculating the root mean squared error (RMSE) between the manually segmented masks and the model-generated
segmentations, as well as qualitative assessment to evaluate the practical relevance of the segmentation. A segmentation was considered to be failed if the
predicted mask had excessive edge bleeding or miss-identification of bone to a point where near complete manual segmentation was required.
RESULTS: The model training process was completed in 1:46, 1:23, and 3:15 hours for the first, second, and third model, respectively. The mean Dice score
for the validation set of the CTHealthy model was 0.97 and MRIDDH model was 0.62. The mean Dice score of the validation set for the combined model reached
0.72 and saw visual improvement in segmenting MRIDDH test images. Test femur segmentation took about 10 seconds per femur. In the combined approach,
all three test CTs had good segmentation with minimal to no boundary bleeding (Figure 1). One out of three MRI segmentations had minimal boundary
bleeding, requiring minimal manual corrections, while two predictions had a qualitative failure, requiring manual segmentation. The mean RMSE of the test
femurs was 0.20 mm for the combined approach, and 0.11 mm and 0.29 mm for the first and second approaches, respectively.
DISCUSSION: The initial model performance shows mixed yet promising results. While the majority of the predicted labels required only minimal manual
correction, some femurs failed completely, effectively requiring complete manual segmentation. Results are promising as we can see that combining two
completely different imaging domain datasets lead to enhancement in the overall performance of the predictions of MRIDDH images, which was not promising
in the second model. It is expected to have a reduced performance from CTHealthy model to a combined approach as we tried to tackle a real-life scenario with
at least five challenges 1) having mixed imaging domains (CT and MRI), 2) data scarcity, 3) anatomical heterogeneity (healthy and DDH morphology), 4)
multi-site, multi-modality variations, and 5) mixed image size, quality and interpretability. While the tool is not yet fully effective for segmenting a given
femur with no manual corrections, it would greatly reduce the workload of segmenting a large dataset. We aim to continue to improve the model with the
ultimate aim of achieving consistent no-correction automatic segmentation by increasing the training dataset. Further improvements could be made by
incorporating a more robust pre-processing methodology, fine-tuning the model’s hyperparameters, and restructuring the training hierarchy.
SIGNIFICANCE/CLINICAL RELEVANCE: Improvement in diagnostic and surgery planning procedures can be achieved by using 3D bone models instead
of 2D images, which do not consider the entire bone morphology. The drawbacks of using 3D bone models can be substantially reduced by optimizing the
segmentation process. The development of user-friendly methods for 3D bone reconstruction enables significant advances in the field of Orthopedics.
REFERENCES: 1) Gelaude et al. Comp Meth in Biomech and Biomed Eng (9) 2006: 65-77. 2) Zeng et al. MLMI 2017:274-282. 3) Diaz-Pinto. et al. arXiv
preprint arXiv:2203.12362 (2022). 4) Fedorov et al. Mag Res Imaging (9) 2012:1323-41. 5) Kistler et al. J Med Internet Res (15) 2013:e245.
ACKNOWLEDGEMENTS: Funding for this project was provided by the German Sport University Cologne.
Original languageEnglish
Title of host publicationORS 2023 Annual Meeting
Number of pages1
Publication date2023
Article number1688
Publication statusPublished - 2023
EventORS 2023 Annual Meeting - Hilton Anatole, Dallas, USA/United States of America
Duration: 10.02.202314.02.2023


Dive into the research topics of 'Towards Fully Automated Segmentation of Proximal Femur MRI with Developmental Dysplasia of the Hip'. Together they form a unique fingerprint.