Visualising the spoken word with dictographic notation
In 1996, while in my third year at Central St Martins college of design, I devised a basic notational system for visualising the spoken word. The system which I named “dictography” adapted aspects of standard musical notation as well as typographic conventions. Since the first version I have further expanded dictography into a more complete notation system.
As we listen to someone speak, subtle variations in intonation, volume, speed and rhythm contribute more to our understanding than words alone. Conventional typography communicates pure, refined content, stripped of most of the emotion. Unless we highlight a word with italic, any information about someone’s tone of voice must be annotated into the text. Such a marked difference between speech and text means that we have one voice for speaking and another for writing. Dictography tries to bridge this divide.
In dictographic notation, conventions of typographic and musical notation are combined and augmented. The four basic properties of speech: pitch, volume, tone and speed are divided into separate channels. These elements are then encoded according to a set of rules which use relative position, visual weight (boldness and condensed-expanded) to background and text colour, word spacing.
That basic framework is further augmented with symbols relating to individual vocal characteristics: key signature gives information on overall tone, the speakers’ sex, nationality and accent and standard pitch of the voices (think bass, baritone, treble and soprano). The ends of phrases or sentences are marked with a large blue dot. Phrases needing exclamations or question marks add Spanish-style inverted marks before the phrase, issuing advanced warning to readers. Any non-specific vocal sounds such as a tut or a click of the tongue is indicated with an orange star. Likewise, a trembling voice is represented by a trill mark. Finally, en dashes are replaced with a discrete arrowhead because of the risk of conflicts with the stave lines.
The inside front cover looks, at first glance, to be entirely abstract but is in fact a rendering of the Archers episode as a sample:
Analysing a 5 minute scene from BBC Radio 4’s The Archers took a huge amount of work, codifying each speaker’s pitch, speed, volume and emotional cues. It should be possible to automate this using a software tool, or perhaps one day even to automate speech recording directly into dictographic notation. That would give a different look to Hansard!
This close-up view shows the level of detail I had to go into in order to produce the transcript:
Obviously, this system still has serious limitations. It cannot truly portray the vast subtlety of vocal dynamics, harmonics and the barrage of other variables inherent in anything as complex as human speaking.
But by offering several more layers of data into the text stream, it can offer a richer reproduction than conventional text and annotations can convey and I hope that might be of value to certain professionals for whom conveying meaning through speech has a value.
I would really value feedback and suggestions on any ways you can see that I could further improve this. I would especially love to hear from any speech writers, screen and stage writers and radio producers. Would dictography would useful to you and your colleagues?