Poor audio quality often means a customer can’t complete the objective of their call. It has a negative effect on average call duration, customer experience and call abandonment rates for contact centres.
Most big companies with contact centres recognise this, and so are often paying their telecoms providers for the highest quality lines, aiming for the best quality audio for their customers.
However, in digital networks, carriers have the ability to transcode audio quality from excellent quality codecs like G711 to lower quality codecs such as G729 or GSM. This may be happening consistently, or it may be done intermittently, so the carrier can save bandwidth at certain times of the day.
|Common codec scale|
|G711||Will produce ‘excellent’ audio quality but comes with higher bandwidth usage.|
|G729||Can only ever produce ‘very good’ audio quality but comes at a lower bandwidth usage than that of G711.|
|GSM||Can only produce ‘good’ audio quality but comes with the lowest bandwidth usage out of the commonly used codecs.|
Where transcoding is happening, the telecoms team for the contact centre has no visibility of this and, even if they suspect it may be happening, it can be difficult to pinpoint and address with suppliers.
So, what are the different ways of measuring audio quality in telecoms?
MOS (Mean Opinion Score) is the traditional method of measuring audio quality. Originally it involved sitting real people in a room to listen to audio and establish a subjective quality score based on their opinion.
Nowadays, this measure is still in use, but a score is generated by analysing network performance metrics such as packet loss, jitter and latency.
A score is generated between 1 (bad) to 5 (excellent).
The main drawback of using MOS to judge audio quality is that it doesn’t look at real audio. The score generated is an assumption of audio quality based on the performance of data on your internal network. MOS will assume the audio quality is as good as your network performance is, yet if you put poor quality audio into an excellent network, it will still be poor audio on the other side.
PESQ (Perceptual Evaluation of Speech Quality) is an ITU (International Telecommunication Union) standard for measuring audio quality.
It’s a ‘full reference’ (FR) algorithm, meaning that it uses a reference signal for comparison against the test (ie a difference analysis). It compares a sample of the reference signal (talker side) to a corresponding sample of the degraded signal (listener side). The comparison produces an objective measure based on how much the signal has been degraded from the original.
This delivers higher accuracy because there are no assumptions being made. The drawback is that the process can only be applied to dedicated tests in live networks (ie you can’t take a measure of quality from a given customer’s call as there is no reference signal for comparison).
The audio file being measured is:
- A WAV file
- PCM encoded
- 16-bit mono
- Audio sharpness
- Call volume
- Background noise
- Variable latency or lag in audio
- Audio interference
|PESQ listening effort scale||Codec required||Corresponding bandwidth|
|3.80 - 4.50||Complete relaxation possible; no effort required||G711 or above||80Kbps|
|3.30 - 3.79||Attention necessary; no appreciable effort required||G729 or above||32 Kbps|
|2.80 - 3.29||Attention necessary; small amount of effort required||GSM or above||28 Kbps|
|2.40 - 2.79||Moderate effort required||-||-|
|2.00 - 2.39||Considerable effort required||-||-|
|1.00 - 1.99||No meaning understood with any feasible effort||-||-|
Similar to PESQ, POLQA (Perceptual Objective Listening Quality Analysis) is a ‘full reference’ (FR) algorithm that compares the output of a test with an original reference signal to produce an objective measure of the difference between the two.
POLQA is the successor of PESQ - it works in a similar way but can handle higher bandwidth audio signals including super-wideband (HD) and full-band voice signals, as well as the most recent voice coding and VoIP/VoLTE transmission technologies.
So, what measures do we recommend?
To recap, the measures available to you depend on the bandwidth of voice signal you are using:
|Common telephony band
|HD voice/super wideband
|Full-band speech signals
Because of the subjective, assumptive nature of MOS, as well as its inability to take into account what’s happening outside of your internal network, we would never recommend MOS as an accurate measure of audio quality.
As the majority of voice calls made are over PSTN and mobile (which use narrowband codecs), we use PESQ in the vast majority of our audio quality testing. We recommend it because it measures a real audio sample, allows for an end-to-end assessment of the full audio path (looking at audio sharpness, background noise, loss in volume and clipping) and produces an objective measure based on this. In most cases, as long as you’re not using wideband and HD voice, PESQ has no significant downside compared to POLQA.
Where you are using wideband or HD voice, POLQA is the best way to measure audio quality. Like PESQ, it measures a real audio sample, allows for end-to-end assessment and produces an objective measure, with the added flexibility to be used for HD and wideband frequencies. It’s a measure we’re beginning to introduce as we find more of our customers using wideband and HD voice, but there’s no significant advantage to using it otherwise. If you're using standard narrowband voice, then PESQ is still your best option.