Skip to content


A new way to display the spoken word in manuscripts and screenplays & speeches

In 1996, while in my third year at Cent­ral St Mar­tins col­lege of design, I devised a basic sys­tem for  dis­play­ing the spoken word typo­graph­ic­ally. Since then I’ve expan­ded it into a full nota­tion sys­tem. I’d really value your feed­back and ideas on ways to improve it as well as any input from speech-writers, screen and stage writers and radio pro­du­cers on whether this would be of any use to you.

As we listen to someone speak, subtle vari­ations in inton­a­tion, volume, speed and rhythm con­trib­ute more to our under­stand­ing than words alone. Con­ven­tional typo­graphy com­mu­nic­ates pure, refined con­tent, stripped of most of the emo­tion. Unless we high­light a word with italic, any inform­a­tion about someone’s tone of voice must be annot­ated into the text. Such a marked dif­fer­ence between speech and text means that we have one voice for speak­ing and another for writ­ing. Dic­to­graphy tries to bridge this divide.

In dic­to­graphic nota­tion, con­ven­tions of typo­graphic and musical nota­tion are com­bined and aug­men­ted. The four basic prop­er­ties of speech: pitch, volume, tone and speed are divided into sep­ar­ate chan­nels. These ele­ments are then encoded accord­ing to a set of rules which use rel­at­ive pos­i­tion, visual weight (bold­ness and condensed-expanded) to back­ground and text col­our, word spacing.

That basic frame­work is fur­ther aug­men­ted with sym­bols relat­ing to indi­vidual vocal char­ac­ter­ist­ics: key sig­na­ture gives inform­a­tion on over­all tone, the speak­ers’ sex, nation­al­ity and accent and stand­ard pitch of the voices (think bass, bari­tone, treble and sop­rano). The ends of phrases or sen­tences are marked with a large blue dot. Phrases need­ing exclam­a­tions or ques­tion marks add Spanish-style inver­ted marks before the phrase, issu­ing advanced warn­ing to read­ers. Any non-specific vocal sounds such as a tut or a click of the tongue is indic­ated with an orange star. Like­wise, a trem­bling voice is rep­res­en­ted by a trill mark. Finally, en dashes are replaced with a dis­crete arrow­head because of the risk of con­flicts with the stave lines.

The inside front cover looks, at first glance, to be entirely abstract but is in fact a ren­der­ing of the Arch­ers epis­ode as a sample:

Ana­lys­ing a 5 minute scene from BBC Radio 4’s The Arch­ers took a huge amount of work, codi­fy­ing each speaker’s pitch, speed, volume and emo­tional cues. It should be pos­sible to auto­mate this using a soft­ware tool, or per­haps one day even to auto­mate speech record­ing dir­ectly into dic­to­graphic nota­tion. That would give a dif­fer­ent look to Hansard!

This close-up view shows the level of detail I had to go into in order to pro­duce the transcript:

Obvi­ously, this sys­tem still has ser­i­ous lim­it­a­tions. It can­not truly por­tray the vast sub­tlety of vocal dynam­ics, har­mon­ics and the bar­rage of other vari­ables inher­ent in any­thing as com­plex as human speaking.

But by offer­ing sev­eral more lay­ers of data into the text stream, it can offer a richer repro­duc­tion than con­ven­tional text and annota­tions can con­vey and I hope that might be of value to cer­tain pro­fes­sion­als for whom con­vey­ing mean­ing through speech has a value.

Bookmark and Share
No Tweet­Backs yet. (Be the first to Tweet this post)

Tech­nor­ati: , , , , , ,

Posted in folio.

Tagged with , , , , , , .


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.

blog comments powered by Disqus