Abstract
Attention layers have contributed to state-of-the-art results on vision tasks. Still, they leave room for improvement because position information is used in a fixed manner, and the computation cost is typically high. To mitigate both issues, we propose a convolution-style local attention layer (LA-layer) as a replacement for traditional attention
... read more