2. Formatting

Unless you've been living under a rock for the past 20 years, chances are you've heard of the ubiquitous, even controversial audio file known as the MP3. The MP3 is a compressed audio format popularized by both piracy and peer-to-peer sharing networks in the late 90s and early 2000s, as well as the advent of the iPod.

One of the main reasons the MP3 was (and continues to remain) so successful is its small size—around 3 to 4 MB for a single song, which in the 2000s meant huge savings in terms of digital storage as well as relatively fast download speeds. Of course, by today's standards, 3 MB is chump change compared to our 500 GB or 1 TB hard drives, but the format is still popular because it's so easily and quickly streamed over the internet. Nowadays we take it for granted that we can listen to Spotify and browse our Facebook profiles simultaneously. And while the algorithm for crafting the perfect MP3 has changed over the years, the basic concept is the same: slash away as much excess audio as possible while making sure the result is as faithful to the original recording as possible; MP3 conversion cuts away inaudible and barely audible audio in a process that attempts to emulate human hearing patterns.

Of course, typically when we record audio in a digital format, we don't use MP3. The MP3 belongs to a class of audio formats which are known as "lossy" formats, meaning that some data is lost between the original and the new file. Professionals almost always prefer to record in "lossless" formats, digital formats that come the closest to approximating an original analog source (or, in the case of digital sources, replicate them exactly). The WAV file is arguably the most popular lossless format, helped in part by its near-universal compatibility.

To understand why digital sources can only ever be approximations of analog sources, we have to step back and understand the differences between digital and analog signals.

No matter what type of microphone you're using, its function is going to be the same: converting a sound (specifically, changes in air pressure over time) to an analog signal (a series of electrical impulses). In this conversion process, the input and output are 1:1; what you hear is what you get. If we think about the raw sound and the analog signal in mathematical terms, they resemble straight lines, in that between any two points on the line there are an infinite number of additional points. Put differently, these are "smooth" or "uninterrupted" signals, meaning that between two changes in pitch there are an infinite number of gradations.

A digital signal, on the other hand, is represented in a very finite fashion by binary code. This is achieved by "sampling" the sound. In mathematical terms, we can think of this as a sort of Taylor Series, or an approximation of a line based on a series of very-tightly spaced points. The rate at which a sound is sampled (the "sample rate") varies, and the basic rule of thumb is that to sample a frequency properly, the sample rate needs to be at least twice as high as the sound's frequency in Hertz. The songs you hear on CDs use a sample rate of 44.1 KHz (a number you may be familiar with already). The reason for this is that the highest frequency theoretically audible to humans is 22 KHz (in practice this number tends to be lower, and lowers even further with age). A more popular sample rate for recording is 48 KHz, and some engineers like to go as high as 92 KHz, although the payoffs for using such a high sample rate are debatable.

The sample rate is an important concept, and is one of the two metrics we use to measure the quality of an audio file. The second metric is the "bit rate," which tends to have a more noticeable impact on the perceived quality if an audio file.

To understand how the bit rate impacts an audio file, it's best to just listen to some examples:

Readings and Resources