Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "rst"

Sort by: Order: Results:

  • Haverinen, Jonas (2022)
    Communication, by nature, is multimodal: it uses various forms (modes) of communication, such as spoken language, written language, illustrations, and many others to create meaning. Multimodality research is the study of communicative situations that rely on such various modes and their combinations. One form of multimodality very commonly seen in everyday life comes in diagrams, which can convey very complex concepts by combining visual expressive resources (such as illustrations or photographs), written language, and diagrammatic elements such as lines and arrows. The primary aim of my thesis is to establish whether the linguistic structures of written labels – that is, textual elements – in diagrams can inform the decomposition of visual expressive resources. Put simply, I seek to find if said visual elements can more accurately be divided into further, more granular units in accordance with linguistic patterns in their accompanying textual elements. To answer my main research question, I posit three sub-questions. First, if certain diagram types (macro-structures), such as tables, cycles, or cross-sections co-occur with specific linguistic patterns; second, if different rhetorical functions found in diagrams employ different structures in their written labels as well; and third, if these functions are signaled by other means in tandem with written language. Answering these questions can help in designing future multimodal corpora and their annotation schemata, increasing annotation accuracy and possibilities for their processing. The theoretical framework used in this thesis synthesizes theories from multimodality theory, discourse studies, and diagrams research. I approach diagrams from the perspective of multimodality, highlighting them as discursive artefacts. This is enabled by the diagrammatic mode, which establishes how discourse semantics can function in the context of diagrams and how their interpretation is dynamic; that is, each element or combination of multiple elements can in turn contextualize or be a part of other elements and their combinations on a different scale. I also discuss the discourse-semantic concepts of coherence and cohesion as they relate to multimodal artefacts: different elements, even if not linguistic, can combine to create semantically meaningful connections between constituents in such an artefact. To exemplify this, I also apply Rhetorical Structure Theory (RST), which seeks to formalize how units of discourse are interconnected and work towards a shared communicative goal. RST employs rhetorical relations such as ELABORATION and IDENTIFICATION to describe how units and their combinations relate to other parts of a text (or other communicative whole). The data I use consists of two interrelated and complementary multimodal corpora: AI2D and AI2D-RST. AI2D is a collection of primary-school textbook science diagrams, annotated for blobs (visual expressive resources), labels, and diagrammatic elements, created for question-answering purposes. It also contains the linguistic data in each of the corpus’s diagrams. AI2D-RST contains a subset of the diagrams in AI2D, expanding them with additional annotation layers for information on macro-structures, visual connectivity, and RST, describing each element’s rhetorical relation in the diagram. I computationally find each rhetorical relation containing a label in AI2D-RST, noting its type, the type of the diagram it appears in, and fetching the labels’ linguistic content from AI2D. I then process each label’s contents with spaCy, a library for natural language processing, for linguistic elements such as phrase types, part-of-speech patterns, and average word counts. The output contains data on each label’s rhetorical relation, the possible macro-structure it is contained in, and said linguistic structures. The results show that there are indeed some differences in how distinct rhetorical relations and macro-groups use language: for example, cycles contain the most verb phrases and highest word count, indicating the use of written language to explicate certain processes to viewers. As linguistic patterns differ across these classes and are contextualized by surrounding diagrammatic elements, approaching diagrams from a discursive standpoint may be beneficial for future empirical multimodality research as well as designing annotation schemata to be more intuitive for annotators. With larger datasets and further research, precise sets of rules containing linguistic structures and layout information may be developed to increase accuracy in probability-based computational analysis of diagrams.