In everyday communication, semantic information is transmitted by both speech and the gestures that accompany speech. Listeners, therefore, need to monitor two quite different sources of information, more or less simultaneously. But we know little about the nature or timing of this process.This study analysed participants’ attentional focus on speech—gesture combinations, differing in both span and viewpoint, using a remote eye tracker. It found that participants spent most time fixating the face with just 2.1% of the time looking at gestures, but with certain categories of gesture, up to 26.5% of the stroke phases were successfully fixated. In other words, visual attention moves unconsciously and quickly to these information-rich movements. It was also found that low-span Character-Viewpoint gestures attracted most fixations and were looked at longest. Such gestures are particularly communicative, and the way these gestures attract visual attention may well be a crucial factor.