Speech-to-Text Alignment

James J. Fiumara

Sample text:
"If I Told Him: A Completed Portrait of Picasso"

Speech to text alignment is a process of creating a correspondence between a speech-unit and a text-unit (i.e., a segment of audio is linked to a segment of text) which is then indexed allowing a user to listen to the speech sound that corresponds to a word, an utterance, a sentence, or other granularity. A corpus of aligned speech and text can be used to test and develop automatic speech recognition (ASR) systems and multimedia language learning systems, and for various linguistic research (e.g., phonetics, speech synthesis). The Linguistic Data Consortium (LDC) develops and distributes speech-to-text aligned databases to the research community often created with the assistance of a tool called Transcriber.

Although not specifically designed for poetic and literary uses, the Transcriber tool was used to align the text of Gertrude Stein’s “If I Told Him: A Completed Portrait of Picasso” with an audio recording of the poet reading her work. The use of language technology as a tool to assist in the analysis or teaching of poetry and prose literature is only beginning to be explored. At minimum, a speech-text aligned poem can provide a more layered experience for a reader-listener allowing easy access to both the text of a poem and the corresponding spoken rendition of a poem. Here poetry as sound takes its proper place along with the poem as a textual object. An easy-to-use interface providing simultaneous access to the text and audio of a poem could also prove to be pedagogically useful in a classroom.

The most challenging task in aligning a work of poetry is deciding on the granularity of the alignment. Should the correspondence between sound and text occur by textual line breaks or stanzas? Or perhaps the correspondence should follow the audio performance using breath pauses? Additionally, how does one segment a more modernist or abstract poem like Stein’s where line or stanza breaks are not obvious and without losing the necessary poetic force of repetitions and rhythms within the poem? Segmenting and aligning a poem becomes a creative and interpretative act in itself.

This project is a work in progress. In time, we plan to update the webpage with such elements as embedded streaming audio clips, metadata, and a more aesthetically pleasing and user-friendly design. This project was created in collaboration with the Linguistic Data Consortium ). Special thanks to Chris Cieri and Shawn Medero for their advice and assistance.

Contact information: James J. Fiumara (email: