Kavli Affiliate: Jing Wang
| First 5 Authors: Jing Wang, Xiaofeng Liu, Fangyun Wang, Lin Zheng, Fengqiao Gao
| Summary:
Congenital heart disease (CHD) is the most common birth defect and the
leading cause of neonate death in China. Clinical diagnosis can be based on the
selected 2D key-frames from five views. Limited by the availability of
multi-view data, most methods have to rely on the insufficient single view
analysis. This study proposes to automatically analyze the multi-view
echocardiograms with a practical end-to-end framework. We collect the five-view
echocardiograms video records of 1308 subjects (including normal controls,
ventricular septal defect (VSD) patients and atrial septal defect (ASD)
patients) with both disease labels and standard-view key-frame labels.
Depthwise separable convolution-based multi-channel networks are adopted to
largely reduce the network parameters. We also approach the imbalanced class
problem by augmenting the positive training samples. Our 2D key-frame model can
diagnose CHD or negative samples with an accuracy of 95.4%, and in negative,
VSD or ASD classification with an accuracy of 92.3%. To further alleviate the
work of key-frame selection in real-world implementation, we propose an
adaptive soft attention scheme to directly explore the raw video data. Four
kinds of neural aggregation methods are systematically investigated to fuse the
information of an arbitrary number of frames in a video. Moreover, with a view
detection module, the system can work without the view records. Our video-based
model can diagnose with an accuracy of 93.9% (binary classification), and
92.1% (3-class classification) in a collected 2D video testing set, which does
not need key-frame selection and view annotation in testing. The detailed
ablation study and the interpretability analysis are provided.
| Search Query: ArXiv Query: search_query=au:”Jing Wang”&id_list=&start=0&max_results=3