ViT-ConvGAN: a hybrid model for spatiotemporal action recognition using video transformer and 3D CNN
We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors ...
Human action recognition in close-contact sports is hindered by mutual occlusion, rapid pose changes, and distracting backgrounds. We study freestyle wrestling—a representative close-contact setting ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results