When I travel, I often find it difficult to understand the announcements broadcast through the P.A. system. Whether I am on the road or waiting for someone, the airports, train stations and bus terminals all use similar technology, digitally assembling announcement text from the pre-recorded fragments.

Whenever the announcement starts on the speakers, I need to stop whatever I am doing and concentrate, memorizing the letters and digits of the flight number and re-assembling them by myself.

All this is due to the way in which the audio clips used in the system are recorded, and can be fixed with just a little bit of extra effort.

The problem:

Consider this message:
The flight number TK1764 from London to New York is now boarding.

If you read this announcement aloud you will have no problem understanding, and clearly communicating the message.

Now try reading this one aloud (take care of the punctuation):

The flight number.
T.
K.
1.
7.
6.
4.
From?
London.
To?
New York.
Is now boarding.

If you pronounced all the “sentences” correctly, chances are you ended up reproducing the way airport announcements sound perfectly.

During the recording, the voice talent would read all the letters of the alphabet, digits and destination names one by one. These are then stored on the P.A. computer as separate sound bits and re-assembled according to the announcement needs.

The system, however, does not take into the consideration the way humans process information.

The bits versus the words:

If you ask me for my telephone number, I would tell you it is: 535 – 784 – 0213. Whether you are taking note of it or trying to memorize it, breaking the number into easily processed “words” helps communicating it efficiently. What a brain recognizes are three “words”: fivethreefive, seveneightfour and ohtwoonethree.

If I have given you the entire number (5357840213 – fivethreefiveseveneightfourohtwoonethree) the amount of information contained in a single stream of information would be overwhelming, unless you are professionally trained to remember numbers.

Similarly, if I break the number into single digits (5 – 3 -5 -7 -8 -4 -0 – 2 -1 -3 – five, three, five, seven, eight, four, zero, two, one, three), it becomes even more difficult to understand. While each “word” is easy to understand, the fragmentation of the entire “sentence” makes memorizing the number a hard task.

The fragmentation of the message is what makes P.A. announcements so difficult to understand.

Human brain likes patterns. It likes melodies and phrases that “flow naturally”. Let’s look at our flight number (TK1764) again. If you read it aloud several times, you will notice that you break it into two parts: the letters and the number. You most likely read it as: teekay onesevensixfour. You would also apply proper intonation and accents on each of these words. In the starting syllables you would have the tone of your voice raise, and towards the end, descend back:

If “/” symbolizes the raising voice tone and “\” a descent, TK1764 would be pronounced as “/\ //\\”. Establishing a rhythm and melody within an abstract phrase increases the pattern recognition and helps the brain process the information efficiently. The first “T” is clearly the beginning of the sentence, while “4” is the last word of it.

In the airport P.A. recordings, every letter and digit are pronounced as the last word of the sentence. They are read as if there was a period after each one. The resulting intonation looks like this: “\ \ \ \ \ \ “. Without the much needed melody, your brain needs to “record” each digit and number and then re-assemble it into a meaningful word. It requires concentration and effort resulting in a really inefficient mode of communication.

The solution:

There are two distinct issues that need to be addressed when creating an automated P.A. system: the recording method and the assembly process.

During the recording, the voice talent should not be presented with a string of numbers such as: 1. 2. 3. 4. 5. 6. 7. 8. 9. 0. Instead, the reading sheet should contain ten four-digit numbers. They should be arranged in the way that each of the 0-9 digits appears once in each of the four positions, while appearing completely random. Here is one of the possible arrangements:

1746
9062
2135
5928
8571
4203
0814
3697
6350
7489

It’s a little bit like a game of sudoku. When creating the script for the voice recording, it is important to note the following:

  1. The same digit should not occur in any of the numbers more than once
  2. The adjacent digits should never increase or decrease value by one (no 23, 56, 98, 21 etc.)
  3. The numbers should look as random as possible

These precautions need to be in place, since the brain of voice talent works just like everyone else’s – and any patterns would be immediately recognized resulting in augmented pronunciation of these segments. (ie. We do pronounce 1234 differently than 7295).

Similar arrangement should be made for the letters, following all the instructions that apply to the numbers.

The recorded digits and letters should then be stored in separate libraries, each for a different position in the code:

First letter (contains A-Z)
Second letter (contains A-Z)
First digit (0-9)
Second digit (0-9)
Third digit (0-9)
Fourth digit (0-9)

During the playback, the audio samples would be taken from each library, corresponding to the letter/digit location in the code.

When re-assembled into flight code, all the letters and digits would now maintain a correct melody, as if read by a live announcer, rather than the automated system.

Finally the entire sentences should be recorded, to avoid words and phrases like “The flight number”, “from”, “to” etc. sounding out of place. During the recording of full sentences, the names of destinations can be put in the script and then used during the assembly.

Within the assembly process, the special care should be placed on timing and spacing between the words. Unnatural pauses between the words, letters and digits should be avoided, and the resulting sentences should resemble the intonation and melody of a live announcer.

Applications:

While the concept for this recording/assembly method was developed based on my experiences with airport announcements, the similar techniques could be used in any transit terminals, answering machines, automated telephone systems, and many other forms of audible information processing.

The messages created using this system will be easier to understand and memorize. They will also eliminate the sense of detachment for the listeners, through resembling the natural speech patterns in a manner much closer than the currently used software.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.