Aiming at the problem of tracking failure due to target deformation
flipping and occlusion in visual tracking
a template updating algorithm based on image structural similarity is proposed by dynamically updating the template to adapt to the changes of the target during tracking. The tracking feature enhancement module and segmentation feature enhancement module are also designed based on the SiamMask network. The tracking feature enhancement module consists of non-local operations and convolutional downsampling
which is used to establish contextual correlation
enhance the target features
suppress the background interference
improve the tracking robustness
and solve the feature attenuation problem due to the occlusion of the target. The segmentation feature enhancement module introduces the convolutional block attention module and deformable convolution to improve the network’s ability to capture channel and spatial features
adaptively learn the shape and contour information of the target
and enhance the network’s segmentation accuracy of the tracked target
which in turn improves the tracking accuracy. In comparison to the baseline SiamMask
experiments demonstrate that the proposed algorithm performs well and steadily in solving the aforementioned problems
improving the expected average overlap rate by 0.052
0.053
and 0.025 and the robustness by 0.06
0.079
and 0.156 on the VOT2016
VOT2018
and VOT2019 datasets
respectively. It also achieves a real-time speed of 91 frames per second on average.