With respect to voice over IP, a codec is an algorithm used to encode and decode the voice conversation. Since voice and sound as we hear it is analogue, it needs to be converted (or encoded) to a digital format suitable for transmission over the Internet. Once at the other end, it needs to be decoded again so the other person can hear what you are saying. There are a variety of different ways this encoding and decoding can be done - many of which utilise compression in order to reduce the required bandwidth of the conversation. A key thing to remember with VoIP, is that encoding, particularly when heavy compression is used, takes time, which adds a delay to the conversation. Thus, the holy grail is a codec which not only maintains good quality with compression, but is able to do the encoding and decoding in a minimal amount of time.
These pages attempt to demistify codecs and give a brief overview of the different codecs and when they are used. It is important to keep in mind that different VoIP clients support different codecs, and each VoIP provider will only support a subset of the codecs too. Generally, when a VoIP call is established, you will need to use a codec that both parties and the provider support. No need to worry though, this sort of negotiation is handled automatically, but knowing the details will enable you to force or encourage certain codecs to be used. Understanding codecs will also help you understand why some VoIP clients sound better than others, and why voice quality with some providers, or through certain ISPs, are better than others.
If you would like to read up more about codecs with respect to VoIP, the following links may be of interest:
The following table lists the various codecs used in voice over IP, and in particular SIP. Many codecs come in a few varieties, and we have attempted to list all such version of each codec. If you would like to voice your opinion about a particular codec, or discuss the merits of one over another, feel free to do so in our voice over IP forums.
| Codec | Sampling Rate (kHz) |
Bandwidth (kbps) |
Nominal
Bandwidth (kbps) |
Payload
Size (ms) |
License | Comments | Pros | Cons | ? |
|---|---|---|---|---|---|---|---|---|---|
| DVI4 | unknown | unknown | unknown | Not a very common codec. | |||||
| G.711 | 8 | 64 | 87.2 | 20 | Open Source | G.711u/a often refered to as u-law/a-law: where a-law is the European version and u-law the US/Japanese version | Designed to deliver precise transmission of speech Very low processing overheads |
Including overheads, uses >64kbps, thus at least 128kbps bandwidth in each direction is required | |
| G.722 | 16 | 48 | unknown | Open Source | An ITU standard codec. | ||||
| 16 | 56 | unknown | 30 | ||||||
| 16 | 64 | unknown | |||||||
| G.723.1 | 8 | 5.3 | 20.8 | 30 | Proprietry | Often used by dialup VoIP users for optimal quality. | Very high compression whilst maintaining
high quality audio. |
Requires a lot of processor power. | |
| 8 | 6.3 | 21.9 | 30 | ||||||
| G.726 | 8 | 16 | unknown | Open Source | An improved version of G.721 and G.723 (totally different from G.723.1) | CPU overhead is relatively low for level of compression obtained. | |||
| 8 | 24 | 47.2 | 20 | ||||||
| 8 | 32 | 55.2 | 20 | ||||||
| 8 | 40 | unknown | |||||||
| G.728 | unknown | 16 | 31.5 | Open Source | An ITU standard codec. | ||||
| G.729 | 8 | 8 | 31.2 | 20 | Patented | An ITU standard codec. | Excellent bandwidth utilisation for toll quality speech Performs well under random bit errors |
License required for use | |
| GSM | 8 | 13 | unknown | Proprietry | Same encoding as used in GSM mobile phones (though improved version are often used nowadays). | Relatively high compression ratio. Royalty free means it is available in many hardware and software platforms. |
|||
| iLBC | unknown | 13.33 | unknown | 30 | Free to use | High robustness to packet loss |
|||
| unknown | 15 | unknown | 20 | ||||||
| Siren | unknown | unknown | unknown | Not much known about this codec, and does not appear to be commonly supported. | |||||
| Speex | 8 | unknown | unknown | Open Source | Uses variable bit rate to minimise
bandwidth usage |
||||
| 16 | unknown | unknown | |||||||
| 32 | unknown | unknown |
Notes
The information provided here is for information purposes only, if you find errors or ommissions, please report them in the relevant discussion forum.
The sampling rate is the rate at which the analogue audio signal is sampled. Nyquist's Theorem states that in order to record a certain frequency, sampling must occur at at least twice that frequency. Thus, the higher the sampling rate, the greater the frequency range in the encoded audio stream. The human ear is capable of hearing from about 20Hz to about 20,000Hz. Typically, speech is around 100-4,000Hz. Thus, a sampling rate of at least 8kHz is required to accurately encode the human voice. Greater sampling rates will capture higher frequencies (this is useful, for example, if you are playing music down the phone), but will also increase bandwidth as there are more samples to encode and transmit.
The size of the payload of each encoded voice packet influences two things: lag and bandwidth. Every encoded packet that is sent incurs fixed bandwidth overheads (due to IP and other headers added to the data in the network). Thus, larger payloads incur a proportionately smaller overhead, thus reducing the nominal bandwidth utilisation. However, by using larger payloads, more audio (ie., a longer period of time) is required to construct a single packet, which in turn increases the amount of time it takes for even the beginning of the packet to reach the other end and be decoded, thus increasing the lag in the conversation. This is a typical trade-off in VoIP. Most codecs use payload sizes of 10-40ms.