MP3
From AudioLexic
MPEG-1 Audio Layer 3, more commonly referred to as MP3, is a popular digital audio encoding format. It uses a lossy compression algorithm that is designed to greatly reduce the amount of data required to represent the audio recording, yet still sound like a faithful reproduction of the original uncompressed audio to most listeners. It was invented by a team of European engineers of Philips, CCETT (Centre commun d'études de télévision et télécommunications), IRT and Fraunhofer Society, who worked in the framework of the EUREKA 147 DAB digital radio research program, and it became an ISO/IEC standard in 1991.
Contents |
[edit] Overview
MP3 is an audio-specific compression format. The compression removes certain sounds that cannot be heard by the listener, i.e. outside the normal human hearing range. It provides a representation of pulse-code modulationâÂÂencoded audio in much less space than straightforward methods, by using psychoacoustic models to discard components less audible to human hearing, and recording the remaining information in an efficient manner. Similar principles are used by JPEG, a lossy image compression format.
[edit] Encoding audio
The MPEG-1 standard does not include a precise specification for an MP3 encoder. The decoding algorithm and file format, as a contrast, are well defined. Implementers of the standard were supposed to devise their own algorithms suitable for removing parts of the information in the raw audio (or rather its MDCT representation in the frequency domain). During encoding 576 time domain samples are taken and are transformed to 576 frequency domain samples. If there is a transient, 192 samples are taken instead of 576. This is done to limit the temporal spread of quantization noise accompanying the transient. (See psychoacoustics.)
As a result, there are many different MP3 encoders available, each producing files of differing quality. Comparisons are widely available, so it is easy for a prospective user of an encoder to research the best choice. It must be kept in mind that an encoder that is proficient at encoding at higher bit rates (such as LAME, which is in widespread use for encoding at higher bit rates) is not necessarily as good at other, lower bit rates.
[edit] Decoding audio
Decoding, on the other hand, is carefully defined in the standard. Most decoders are "bitstream compliant", meaning that the decompressed output they produce from a given MP3 file will be the same (within a specified degree of rounding tolerance) as the output specified mathematically in the ISO/IEC standard document. The MP3 file has a standard format which is a frame consisting of 384, 576, or 1152 samples (depends on MPEG version and layer) and all the frames have associated header information (32 bits) and side information (9, 17, or 32 bytes, depending on MPEG version and stereo/mono). The header and side information help the decoder to decode the associated Huffman encoded data correctly.
Therefore, comparison of decoders is usually based on how computationally efficient they are (i.e., how much memory or CPU time they use in the decoding process).
[edit] Audio quality
When creating an MP3 file, there is a trade-off between the amount of space used and the sound quality of the result. Typically, the user is allowed to set a bit rate which specifies how many kilobits the file may use per second of audio. The lower the bit rate used, the lower will be the audio quality. Likewise, the higher the bitrate used, the higher quality the resulting MP3 will be.
MP3 files encoded with a lower bit rate will generally play back at a lower quality. With too low a bit rate, "compression artifacts" (i.e., sounds that were not present in the original recording) may be audible in the reproduction. A good demonstration of compression artifacts is provided by the sound of applause: it is hard to compress because of its randomness and sharp attacks. Therefore compression artifacts can be heard as ringing or pre-echo.
As well as the bit rate of the encoded file, the quality of MP3 files depends on the quality of the encoder and the difficulty of the signal being encoded. As the MP3 standard allows quite a bit of freedom with encoding algorithms, different encoders may feature quite different quality, even when targeting similar bit rates. Quality is heavily dependent on the choice of encoder and encoding parameters. While quality around 128kbps was somewhere between annoying and acceptable with older encoders, modern MP3 encoders can provide very good quality at those bitrates.
The transparency threshold of MP3 can be estimated to be at about 128k with good encoders on typical music as evidenced by its strong performance in the above test, however some particularly difficult material can require 192k or higher. As with all lossy formats, some samples can not be encoded to be transparent for all users.
For digital stereophonic sounds, this transparency threshold of MP3 can be greatly reduced by using the Joint stereo coding mode based on stereo intensity redundancy removal. This feature further reduces the overall bit-rate of a stereophonic sound down to 96 k. Unfortunately, in spite of a wide use of this feature in most MP3 files and all standardized encoders no official results of this transparency level were ever published due to strong lobbying and opposition of the professional music industry.
The simplest type of MP3 file uses one bit rate for the entire file - this is known as Constant Bit Rate (CBR) encoding. Using a constant bit rate makes encoding simpler and faster. However, it is also possible to create files where the bit rate changes throughout the file. These are known as Variable bit rate (VBR) files. The idea behind this is that, in any piece of audio, some parts will be much easier to compress, such as silence or music containing only a few instruments, while others will be more difficult to compress. So, the overall quality of the file may be increased by using a lower bit rate for the less complex passages and a higher one for the more complex parts. With some encoders, it is possible to specify a given quality, and the encoder will vary the bitrate accordingly. Users who know a particular "quality setting" which is transparent to their ears can use this value when encoding all of their music, and not need to worry about performing personal listening tests on each piece of music to determine the correct settings.
[edit] Bit rate
Several bit rates are specified in the MPEG-1 Layer 3 standard: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s, and the available sampling frequencies are 32, 44.1 and 48 kHz. A sample rate of 44.1 kHz is almost always used since this is also used for CD audio, the main source used for creating MP3 files. A greater variety of bitrates are used on the internet. 128 kbit/s is the most common since it typically offers very good audio quality in a relatively small space. 192 kbit/s is often used by those who notice artifacts at lower bitrates. By contrast, uncompressed audio as stored on a compact disc has a bit rate of 1411.2 kb/s (16 bits/sample × 44100 samples/second × 2 channels).
Some additional bit rates and sample rates were made available in the MPEG-2 and the (unofficial) MPEG-2.5 standards: bit rates of 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160 kb/s and sample rates of 8, 11.025, 12, 16, 22.05 and 24 kHz.
Non-standard bit rates up to 640 kb/s can be achieved with the LAME encoder and the --freeformat option, but few MP3 players can play those files. Gabriel Bouvigne, a principal developer of the LAME project, says that the freeformat option is compliant with the standard but, according to the standard, decoders are only required to be able to decode streams up to 320 kbit/s.
[edit] File structure
An MP3 file is made up of multiple MP3 frames which consist of the MP3 header and the MP3 data. This sequence of frames is called an Elementary stream. Frames are independent items: one can cut the frames from a file and an MP3 player would be able to play it. The MP3 data is the actual audio payload. The diagram shows that the MP3 header consists of a sync word which is used to identify the beginning of a valid frame. This is followed by a bit indicating that this is the MPEG standard and two bits that indicate that layer 3 is being used, hence MPEG-1 Audio Layer 3 or MP3. After this, the values will differ depending on the MP3 file. ISO/IEC 11172-3 defines the range of values for each section of the header along with the specification of the header. Most MP3 files today contain ID3 metadata which precedes or follows the MP3 frames; this is also shown in the diagram.
[edit] Design limitations
There are several limitations inherent to the MP3 format that can not be overcome by any MP3 encoder.
Newer audio compression formats such as Vorbis and AAC no longer have these limitations.
In technical terms, MP3 is limited in the following ways:
- Bit rate is limited to a maximum of 320 kb/s (while some encoders can create higher bit rates, there is little-to-no support for these higher bit rate mp3s)
- Time resolution can be too low for highly transient signals, may cause some smearing of percussive sounds although this effect is to a great extent limited by the psychoacoustical properties of the Musicam polyphase filterbank (Layer II). Pre-echo is concealed due to the specific time-domain characteristics of the filter.
- Frequency resolution is limited by the small long block window size, decreasing coding efficiency
- No scale factor band for frequencies above 15.5/15.8 kHz
- Joint stereo is done on a frame-to-frame basis
- Encoder/decoder overall delay is not defined, which means lack of official provision for gapless playback. However, some encoders such as LAME can attach additional metadata that will allow players that are aware of it to deliver seamless playback.
Nevertheless, a well-tuned MP3 encoder can perform competitively even with these restrictions.
[edit] ID3 and other tags
A "tag" in a compressed audio file, is a section of the file that contains metadata such as the title, artist, album, track number or other information about the file's contents.
As of 2006, the most widespread standard tag formats are ID3v1 and ID3v2, and the more recently introduced APEv2.
APEv2 was originally developed for the MPC file format (see the APEv2 specification). APEv2 can coexist with ID3 tags in the same file or it can also be used by itself.
Tag editing functionality is often built-in to MP3 players and editors, but there also exist tag editors dedicated to the purpose.
[edit] Volume normalization
As compact discs and other various sources are recorded and mastered at different volumes, it may be useful to store volume information about a file in the tag so that at playback time, the volume can be dynamically adjusted.
A few standards for encoding the gain of an MP3 file have been proposed. The idea is to normalize the average volume (not the volume peaks) of audio files, so that the volume does not change between consecutive tracks. This should not be confused with dynamic range compression (DRC) which is a form of normalization used in audio mastering.
Listeners who prefer to experience music as it was intended to be heard on the original compact disc may prefer to not use volume normalization, because the average volume of each track was set intentionally by a professional mastering engineer.
The most popular and widely used solution for storing replay gain is known simply as "Replay Gain". Typically, the average volume and clipping information about audio track is stored in the metadata tag.
One can download audio converting software to change the formats.
[edit] Alternative technologies
Many other lossy and lossless audio codecs exist. Among these, mp3PRO, MP3, AAC, and MP2 are all members of the same technological family as MP3 and depend on roughly similar psychoacoustic models. The Fraunhofer Gesellschaft owns many of the basic patents underlying these codecs as well, with others held by Dolby Labs, Sony, Thomson Consumer Electronics, and AT&T.
In a 2005 listening test<ref name="listening-test-128-2006" /> which compared the performance of the LAME MP3 encoder against more modern compression formats at 128 kbit/s, it was found that there was no statistically significant difference between the results for LAME, Ogg Vorbis, several AAC encoders and WMA. However, a test at a very low bitrate of 32 kbit/s, showed that MP3 was significantly worse than the more modern codecs at that lower bitrate.
It remains that MP3 is by far the most popular audio format on the Internet.
[edit] External links
- Fraunhofer IIS
- The Story of MP3 - How MP3 was invented, by Fraunhofer IIS
- List of relevant patents
- MP3 info
- Thompson Licensing FAQ
- SanDisk MP3 seizure order overturned
- Do MP3 encoders sound different?
- MP3 Patents in Upheaval After Verdict (February 23, 2007, The New York Times)
This article was started using a Wikipedia article |
Categories: Lexicon | Computers | Audio