Abstract
This thesis investigates the temporal and semantic alignment between verbal language and hand gestures. The two forms of alignment are considered as two fundamental components in the process of joint interpretation of speech and gestures. Regarding temporal alignment, the main hypothesis is that prosodic cues (together with other linguistic parameters
... read more
such as constituency) guide the first step of the joint interpretation of language and gestures, determining the restricted set of verbal items to which the gesture is related. The alignment is assumed to be determined by quasi-simultaneous peaks in the contours of prosody (considered as a combination of pitch, rhythm and amplitude) and hand activity (roughly the quantity of observable kinetic activity). The hypothesis is tested with two experiments. The first experiment measures the subjects' sensitivity to the temporal alignment (or absence thereof) between speech and gesture. The second experiment is designed to isolate the effect of prosodic cues on the successful integration of the two communicative channels. The results of the experiments support the hypothesis that prosodic contour is the main component in the determination of the 'point of attachment' of gestures in the linguistic structures they accompany. The alignment between the meaning conveyed by verbal language and gestures is reconstructed on the base of a formal theory of meaning. The analysis is restricted to the class of iconic gestures, and in particular to those used to represent spatio-temporal properties or events. The meaning of gestures is taken to correspond to a function that monotonically restricts the meaning of the accompanying verbal expression. The restriction takes the form of an intersection between the spatio-temporal extensional reference of the verbal component and the class of spatio-temporal structures that are iconically equivalent to the virtual space described by the gesture. Iconic equivalence is defined as that relation that holds between two spatial structures that present a similar set of spatial features. This relation is formally defined in the thesis as a logical language that allows the description of different levels of 'similarity' between spatial structures, corresponding to the different modes of representation observed in gestures. The thesis describes the use of the theory for the design and implementation of a computational prototype capable of producing verbal and gestural descriptions of simple physical scenes. With the help of the prototype two experiments have been developed, aimed at testing the validity of the hypothesis regarding the semantic alignment between speech and gestures. The first experiment measures the extent to which the specific form of iconicity proposed in the thesis captures the modes of representation observable in natural gestures. The second experiment tests instead some specific predictions of the theory, connected to the combinatorial properties of the two communicative channels. In both cases the experimental results corroborates the proposed analysis.
show less