Usformer - A small network for left atrium segmentation

Hui Lin, Santiago López-Tapia, Florian Schiffers, Y. Wu, S. Gunasekaran, J. Hwang, D. Bishara, E. Kholmovski, M. Elbaz, RS. Passman, Daniel Kim, and Aggelos K. Katsaggelos

Heliyon 2024

Fig. 1. The proposed Usformer belonging to single 3D methods captures the inter-slice correlation not included in the 2D methods and avoids error propagation introduced in two-stage methods.

Abstract: Left atrial (LA) fibrosis significantly influences the progression of atrial fibrillation, with 3D late gadolinium-enhancement (LGE) MRI being a proven method for identifying LA fibrosis. However, manual segmentation of the LA wall from 3D LGE MRI is time-consuming and difficult. Automated segmentation is also challenging due to varying data intensities, limited contrast between the LA and surrounding tissues, and the complex anatomy of the LA. Traditional 3D network approaches are computationally intensive, often requiring two-stage methods. To address these issues, we propose Usformer, a lightweight, transformer-based 3D architecture for precise, single-stage LA segmentation. Usformer’s transposed attention captures global context efficiently, outperforming state-of-the-art methods in both accuracy and speed, with a dice score of 93.1% in the 2018 Atrial Segmentation Challenge and 92.0% on our local dataset. Usformer also significantly reduces parameter count and computational complexity by 2.8x and 3.8x, respectively, and achieves a 92.1% dice score using only 16 labeled MRI scans. This method may enhance the clinical translation of LA LGE for catheter ablation planning in atrial fibrillation.

Paper Display
Paper Thumbnail
Usformer: A small network for left atrium segmentation of 3D LGE MRI
Hui Lin, Santiago López-Tapia, Florian Schiffers, Y. Wu, S. Gunasekaran, J. Hwang, D. Bishara, E. Kholmovski, M. Elbaz, RS. Passman, Daniel Kim, and Aggelos K. Katsaggelos
Heliyon 2024

my alt text

Fig. 2. The architecture of Usformer. It is designed for end-to-end left atrium segmentation from 3D LGE MRIs. In the final two stages, the U-Net architecture integrates transformer blocks represented by the orange boxes. The transposed block includes both a transposed attention module (shown in Fig. 3) and a feed-forward network made up of fully connected layers. H × W × Ż represents the size of a 3D LGE scan. All feature maps are 3D volumes instead of 2D images. For additional insights into Usformer, please turn to Section 2.

Fig. 3. Transformer attention module, where the matrix K is transposed to significantly decrease computation complexity. The output of the transposed attention is calculated by Equation (1). Ĥ × Ŵ × Ż represent the input size, and the variable n represents the total number of voxels present in the input, which is calculated as Ĥ × Ŵ × Ż, much larger than the channel number Ĉ. The computation complexity of the transposed module is O(n2Ĉ), much smaller than the conventional module’s O(Ĉ2n).

Fig. 4. Example 3D LGE MRIs in the challenge and NU datasets with manual segmentations denoted in orange. Each slice of the LGE MRI scans underwent manual segmentation, and the resulting results were aggregated to construct a 3D model of the left atrium. Viewing this figure in color is advised in the printed edition for optimal visualization.

my alt text

Fig. 6. Results of LA segmentation in the axial view by Usformer, nnU-Net [10], UNeXt [36], and TMS-Net [35]. Cases are randomly selected from the challenge and NU datasets, respectively. Each visualization includes the 2D dice score, denoted in the top left corner. Red and green delineate the contours of manual and predicted segmentation. Arrows highlight regions where Usformer exhibits notably superior performance in comparison to the other two baselines. Viewing this figure in color is advised in the printed edition.

Fig. 7. Three-dimensional representation of the best, median, and worst left atrium segmentation implemented by our method regarding the 3D dice score. The first and second columns are from the challenge and NU datasets, respectively. Distance from the manual segmentation to the prediction is indicated by the color of the surface. For improved visualization, the surface distances are rescaled within the range of 0 to 10 mm. Arrows (1) and (2) highlight the errors in MV and PV, respectively. Viewing this figure in color is advised in the printed edition.