tag:blogger.com,1999:blog-72256982772118400792019-09-09T02:07:17.405-07:00bjorgBjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.comBlogger73125tag:blogger.com,1999:blog-7225698277211840079.post-9779447856486152052013-11-20T13:41:00.002-08:002013-11-20T13:47:57.493-08:00Solving acoustics problems<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: justify;"><tbody><tr><td style="text-align: center;"><a href="http://3.bp.blogspot.com/-3G3-yfl7B60/Uo0iSBXGaEI/AAAAAAAAAEs/bqVBHHWRIPI/s1600/art_dan2.gif" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="251" src="http://3.bp.blogspot.com/-3G3-yfl7B60/Uo0iSBXGaEI/AAAAAAAAAEs/bqVBHHWRIPI/s320/art_dan2.gif" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">A "waterfall plot" like this one is one of many tools used by<br />acousticians to determine the problems with a room.<br />Photo from <a href="http://realtraps.com/">realtraps</a> which provides high quality bass traps,<br />an important type of acoustic treatment.</td></tr></tbody></table><div style="text-align: justify;">I recently received the following letter (edited):</div><div style="text-align: justify;"><span style="font-size: x-small;"><i>&nbsp; &nbsp; &nbsp;&nbsp;</i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><br /></i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i>Greetings,</i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><br /></i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><span class="Apple-tab-span" style="white-space: pre;"> </span>The echo in my local church is really bad. &nbsp;I am lucky if I can understand 10% of what’s being said. &nbsp; I have checked with other members of the congregation and without exception they all have the same problem. </i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><br /></i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><span class="Apple-tab-span" style="white-space: pre;"> </span>The church is medium size with high vaulted ceiling, very large windows with pillars spaced throughout. &nbsp;The floor is mostly wood. &nbsp; The speakers are flat against the side walls, spaced approx 15 metres apart and approx 10 feet above the floor.</i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><br /></i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><span class="Apple-tab-span" style="white-space: pre;"> </span>The speakers are apparently ‘top of the range’… I just wonder if a graphic equalizer was used between the microphone and speaker, would this ‘clean up’ the sound a little?</i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><br /></i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><span class="Apple-tab-span" style="white-space: pre;"> </span>I know that lining the walls with acoustic tiles and carpeting the floor would lessen the echo, but, we don’t want to do that if we can avoid it.</i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><br /></i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><span class="Apple-tab-span" style="white-space: pre;"> </span>With regard to putting carpet on the floor, my thoughts are that instead of sound being absorbed by the carpet, the congregation present would absorb just as much as the carpet?. &nbsp;One other theory I have is regarding the speakers.</i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><br /></i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><span class="Apple-tab-span" style="white-space: pre;"> </span>If &nbsp;the speakers were moved…</i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i><br /></i></span></div><div style="text-align: justify;"><span style="font-size: x-small;"><i>Michael</i></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Hey Michael,</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I sympathize with you. Going to service every week and not being able to understand what is being said must be very frustrating. While this is not the kind of thing I do every day, I do have some training &nbsp;in this area and will do my best to give you something helpful.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Most churches are built with little attention to acoustics and old churches were built before there was any understanding of what acoustics is. With all those reflective surfaces and no care taken to prevent the acoustic problems that they create, problems are inevitable, and sometimes, such as in your church, they are simply out of hand. In a situation like that, even a great sound-system won't be able to solve the problem.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I recommend you hire a professional in your area to come look at the space and be able to give some more specific feedback. To have them improve the situation may cost anywhere from hundreds to tens of thousands of dollars (or even more) depending on the cause of problem. However, it's helpful to have some idea of what some of the solutions are so that when you hire that professional you are prepared for what's to come. You might be able to do some more research and take a stab at solving these issues yourself.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">For example, it might be useful to listen to room and conjecture, even without measurements, if the problem is bound to specific frequencies or if it's just a problem of too many echos. If you are a trained listener you might be able to stand in the room in various places, clap loudly and listen to get a sense of this. Although even a trained listener would never substitute such methods for actual measurements, I often find this method useful for developing a hypothesis (eg. I might listen and say "I believe there is a problem in the low frequencies" before measuring. Then use measurements to confirm or reject this hypothesis). Also, look at the room, are there lots of parallel walls? If so, you are likely suffering from problems at specific frequencies and it's possible that a targeted, and probably less expensive, approach will help.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Another thing you can do is find someone with some stage acting experience and have them speak loud and clear at the pulpit. Have them do this both with and without the sound system and listen to the results. If they sound much clearer without the sound-system than with the sound-system, then that suggests that your sound-system may be causing at least some of the problems.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><span style="white-space: pre;">If you can't afford an acoustician, but you are willing to experiment a bit, this kind of testing might lead you to something. For example, maybe you notice some large open parallel walls and you agree that covering one or both of them with some heavy draperies is either acceptable or would look nice. You could try it and see if it helps. It's no guarantee, but it might make a difference. Draperies are, of course, unlikely to make that much difference by themselves, so you might consider putting acoustic absorbing material behind them.</span></div><div style="text-align: justify;"><span style="white-space: pre;"><br /></span></div><div style="text-align: justify;"><span style="white-space: pre;">Be warned, however, that acoustic treatments done by amateurs without measurements are often beset with problems. For example, you may reduce the overall reverberation time, but leave lots of long echos at certain frequencies. This can be yield results that are no better than where you started -- possibly even worse (although in your case I think that's unlikely).</span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Here are the types of things a professional is likely to recommend. You've already alluded to all of them, but I'll repeat them with some more detail. I put them roughly in order of how likely they are to help, but it does depend on your specific situation:</div><div style="text-align: justify;"></div><ul><li><b>Acoustic treatments.</b> Churches like the one you describe are notorious for highly reflective surfaces like stone and glass, and as you surmised, adding absorptive materials to the walls, floors and ceiling will reduce the echo significantly. Also as you surmised, floor covering may be of limited effectiveness since people do also absorb and diffuse sound, but, of course, it depends on how much of the floor they cover and where. I understand your hesitation to go this route since it may impact the aesthetics of the church, and it may be expensive, but, as I mentioned above, depending on the specific situation, you may be able to achieve a dramatic result in acoustics with relatively little visual impact, and depending on the treatment needed you may be able to keep your costs controlled. You should also be able to collaborate with someone who can create acoustic treatments that are either not noticeable or enhance the esthetics of your space. (Of course, you'll also need someone familiar with things like local fire codes!)</li><li><b>Adjusting the speakers.</b> It's certainly possible that putting the speakers in another location would help. If they were hung by a contractor or someone who did not take acoustics into account, they are likely to be placed poorly. Location matters more than the quality of the speakers themselves. Also, if the speakers are not in one cluster at the front, adding the appropriate delay to each set of speaker may help to ensure that sound arrives "coherently" from all speakers, which can improve intelligibility significantly. Devices to provide this kind of delay, and lots of other features, are sold under various names such as "speaker processors," and "speaker array controllers," etc.</li><li><b>Electronic tools.</b> Although this is likely to be least effective, you can usually achieve some improvement with EQ, as you suggested. For permanent installations, I prefer parametric EQs, but a high quality graphic will also work. An ad-hoc technique for setting the EQ is to increase the gain until you hear feedback, and then notch out the EQ frequency that causes the feedback. Continue increasing the gain until you are happy with the results. You must be very careful to protect your speakers and your hearing when using this technique, both of which can be easily damaged if you don't know what you are doing. Most speaker processors have built-in parametric EQs and some even come with a calibrated mike that you can use with the device to adjust the settings for you automatically. I've done this, and it works great, especially with a little manual tweaking, but you do have to know what you are doing. But, of course, you can't work miracles in a bad room.</li></ul>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0tag:blogger.com,1999:blog-7225698277211840079.post-76700005407901696342013-09-21T08:53:00.000-07:002013-09-21T08:53:21.758-07:00Mapping Parameters<div class="separator" style="clear: both; text-align: center;"><br /></div><table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody><tr><td style="text-align: center;"><a href="http://2.bp.blogspot.com/-6eiXwhp5NGw/Uj26GgtrA_I/AAAAAAAAAEY/LY2W9VlyQg8/s1600/linear+mapping.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="http://2.bp.blogspot.com/-6eiXwhp5NGw/Uj26GgtrA_I/AAAAAAAAAEY/LY2W9VlyQg8/s1600/linear+mapping.png" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Visualizing a Linear Mapping</td></tr></tbody></table>Very often we need to "map" one set of values to another. For example, if we have a slider that ranges from 0 to 1, and we want to use it to control the value of a frequency setting. Or perhaps we have the output of a sine wave (which ranges from -1 to 1) and we want to use that to control the intensity of a EQ. In these cases and many more, we can use a linear mapping to get from one range of values to another.<br /><br />A linear mapping is simply a linear equation, such as <span style="font-family: Courier New, Courier, monospace;">y = mx + b</span>, that takes an input, your slider value for example, and gives you back an output. The input is <span style="font-family: Courier New, Courier, monospace;">x</span>, and the output is <span style="font-family: Courier New, Courier, monospace;">y</span>. The trick is to find the values of <span style="font-family: Courier New, Courier, monospace;">m</span> and <span style="font-family: Courier New, Courier, monospace;">b</span>.<br /><br />Let's take a concrete example. Let's say you have the output of a sine wave (say from an LFO) that oscillates between -1 and 1. Now we want to use those values to control a frequency setting from 200 to 2000. In this case, <span style="font-family: Courier New, Courier, monospace;">x</span> from the equation above represents the oscillator, and <span style="font-family: Courier New, Courier, monospace;">y</span> represents the frequency setting.<br /><br />We know two things: we want <span style="font-family: Courier New, Courier, monospace;">x=-1</span> to map to <span style="font-family: Courier New, Courier, monospace;">y=200</span>, and <span style="font-family: Courier New, Courier, monospace;">x=1</span> to map to <span style="font-family: Courier New, Courier, monospace;">y=2000</span>. Since our original equation, <span style="font-family: Courier New, Courier, monospace;">y = mx + b</span>, had two unknowns (<span style="font-family: Courier New, Courier, monospace;">m</span> and <span style="font-family: Courier New, Courier, monospace;">b</span>), we can solve it:<br /><br />Original equation with both unknowns:<br /><span style="font-family: Courier New, Courier, monospace;">y = mx + b</span><br /><br />Substituting our known values for <span style="font-family: Courier New, Courier, monospace;">x</span> and <span style="font-family: Courier New, Courier, monospace;">y</span>:<br /><span style="font-family: Courier New, Courier, monospace;">200 = (-1)m + b</span><br /><span style="font-family: Courier New, Courier, monospace;">2000 = (1)m + b</span><br /><br />Solving for <span style="font-family: Courier New, Courier, monospace;">b</span>:<br /><span style="font-family: Courier New, Courier, monospace;">2200 = 2b</span><br /><span style="font-family: Courier New, Courier, monospace;">1100 = b</span><br /><br />Solving for <span style="font-family: Courier New, Courier, monospace;">m</span>:<br /><span style="font-family: Courier New, Courier, monospace;">2000 = m + 1100</span><br /><span style="font-family: Courier New, Courier, monospace;">900 = m</span><br /><br />Final equation:<br /><span style="font-family: Courier New, Courier, monospace;">y = 900x + 1100</span><br /><br />You can check the final equation by substituting -1 and 1 for <span style="font-family: Courier New, Courier, monospace;">x</span> and making sure you get 200 and 2000 respectively for <span style="font-family: Courier New, Courier, monospace;">y</span>.<br /><br />So in our LFO/frequency example, we would take our LFO value, say .75, and use that as x. Then plug that value into the formula (y=900(.75) + 1100=1775) and get our final value for our frequency setting.<br /> Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0tag:blogger.com,1999:blog-7225698277211840079.post-42447870940913216602013-07-21T07:43:00.001-07:002013-08-07T10:41:12.417-07:00Peak Meters, dBFS and Headroom<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody><tr><td style="text-align: center;"><a href="http://4.bp.blogspot.com/-Hwhn0tZB8I4/Uevn4E5VKLI/AAAAAAAAADY/lYBZKI1EU4Q/s1600/level_meter_horizontal.gif" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="http://4.bp.blogspot.com/-Hwhn0tZB8I4/Uevn4E5VKLI/AAAAAAAAADY/lYBZKI1EU4Q/s320/level_meter_horizontal.gif" height="129" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">The level meter from audiofile engineering's<br /><a href="http://www.audiofile-engineering.com/spectre/">spectre</a> program accurately shows peak values<br />in dBFS</td></tr></tbody></table>Level meters are one of the most basic features of digital audio software. In software, they are very often implemented as peak meters, which are designed to track the maximum amplitude of the signal. Other kinds of meters, such as VU meters, are often simulations of analog meters. Loudness meters, which attempt to estimate our perception of volume rather than volume itself, are also becoming increasingly common. You may also come across RMS and average meters. In this post, I'm only going to talk about peak meters.<br /><h3>Peak Meters</h3>Peak meters are useful in digital audio because they show the user information that is closely associated with the limits of the medium and because they are efficient and easy to implement. Under normal circumstances, we can expect peak meters to correspond pretty well with our perception of volume, but not perfectly. The general expectation users have when looking at peak meters is that if a signal goes above a certain level at some point, that level should be indicated on the meters. In other words, if the signal goes as high as, say -2 dBFS, over some time period, then someone watching the peak meter during that time will see the meter hit the -2 dBFS mark (see below for more on dBFS). Many peak meters have features such as "peak hold" specifically designed so that the user does not need to stare at the meter.<br /><br />Beyond that, there are rarely any specifics. Some peak meters show their output linearly, some show their output in dB. Some use virtual LEDs, some a bar graph. In general, if there is a numeric readout or units associated with the meter, the unit should be dBFS.<br /><br />Now that we know the basics of peak meters, let's figure out how to implement them.<br /><h3>Update Time</h3>Peak meters should feel fast and responsive. However, they don't update instantly. In software, it is not uncommon to have audio samples run at 44100 samples per second while the display refreshes at only 75 times per second, so there is absolutely no point in showing the value of each sample (not to mention the fact that our eyes couldn't keep up). Clearly we need to figure out how to represent a large number of samples with only one value. For peak meters, we do this as follows:<br /><br /><ol><li>Figure out how often we want to update. For example, every 100 ms (.1s) is a good starting point, and will work well most of the time.</li><li>Figure out how many samples we need to aggregate for each update. If we are sampling at 44100 Hz, a common rate, and want to update every .1s, we need N = 44100 * .1 = 4410 samples per update.</li><li>Loop on blocks of size N. Find the peak in each block and display that peak. If the graphics system does not allow us to display a given peak, the next iteration should display the max of any undisplayed peaks.</li></ol><h3>Finding the Peak</h3><table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody><tr><td style="text-align: center;"><a href="http://1.bp.blogspot.com/-kvyZ2_g1aX4/UevoppbFXVI/AAAAAAAAADk/novw4OjuOpo/s1600/loudspeaker-waveform.gif" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="http://1.bp.blogspot.com/-kvyZ2_g1aX4/UevoppbFXVI/AAAAAAAAADk/novw4OjuOpo/s320/loudspeaker-waveform.gif" height="150" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Sound is created by air pressure swing both above<br />below the mean pressure.</td></tr></tbody></table><div>Finding the peak of each block of N samples is the core of peak metering. To do so, we can't simply find the maximum value of all samples because sound waves contain not just peaks, but also troughs. If those troughs go further from the mean than the peaks, we will underestimate the peak.</div><div><br /></div><div>The solution to this problem is simply to take the absolute value of each sample, and then find the max of those absolute values. In code, it would look something like this:</div><div><br /><br /><br /></div><div><span style="font-family: Courier New, Courier, monospace;">float max = 0;</span></div><div><span style="font-family: Courier New, Courier, monospace;">for( int i=0; i&lt;buf.size(); ++i ) {<n font="" i=""></n></span></div><div><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;const float v = abs( buf[i] )</span></div><div><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;if( v &gt; max )</span></div><div><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; &nbsp; max = v;</span><br /><span style="font-family: Courier New, Courier, monospace;">}</span></div><div><br /></div><div>At the end of this loop, max is your peak value for that block, and you can display it on the meter, or, optionally, calculate its value in dBFS first.</div><h3>Calculating dBFS or Headroom</h3><div>(For a more complete and less "arm wavy" intro to decibels, try <a href="http://www.animations.physics.unsw.edu.au/jw/dB.htm">here</a> or <a href="http://en.wikipedia.org/wiki/Decibel">here</a>.) The standard unit for measuring audio levels is the decibel or dB. But the dB by itself is something of an incomplete unit, because, loosely speaking, instead of telling you the amplitude of something, dB tells you the amplitude of something relative to something else. Therefore, to say something has an amplitude of 3dB is meaningless. Even saying it has an amplitude of 0dB is meaningless. You always need some point of reference. In digital audio, the standard point of reference is "Full Scale", ie, the maximum value that digital audio can take on without clipping. If you are representing your audio as a float, 0 dB is nominally calibrated to +/- 1.0. We call this scale dBFS. To convert the above max value (which is always positive because it comes from an absolute value) to dBFS use this formula:</div><div><br /></div><div><span style="font-family: Courier New, Courier, monospace;">dBFS = 20 * log10(max);</span></div><div><span style="font-family: Courier New, Courier, monospace;"><br /></span></div><div><span style="font-family: inherit;">You may find it odd that the loudest a signal can normally be is 0 dBFS, but this is how it is. You may find it useful to think of dBFS as "headroom", ie, answering the question "how many dB can I add to the signal before it reaches the maximum?" (Headroom is actually equal to -dBFS, but I've often seen headroom labeled as dBFS when the context makes it clear.)</span></div>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com1tag:blogger.com,1999:blog-7225698277211840079.post-62303224087015005322013-05-30T18:16:00.000-07:002013-07-13T07:26:04.923-07:00The ABCs of PCM (Uncompressed) digital audioDigital audio can be <a href="http://en.wikipedia.org/wiki/Audio_file_format">stored in a wide range of formats</a>. If you are a developer interested in doing anything with audio, whether it's changing the volume, editing chunks out, looping, mixing, or adding reverb, you absolutely must understand the format you are working with. That doesn't mean you need to understand all the details of the <i>file</i> format, which is just a container for the audio which can be read by a library. It does mean you need to understand the <i>data</i> format you are working with. This blog post is designed to give you an introduction to working with audio data formats.<br /><h3>Compressed and Uncompressed Audio</h3><div>Generally speaking, audio comes in two flavors: compressed and uncompressed. Compressed audio can further be subdivided into different kinds of compression: lossless, which preserves the original content exactly, and lossy which achieves more compression at the expense of degrading the audio. Of these, lossy is by far the most well known and includes MP3, AAC (used in iTunes), and Ogg Vorbis. Much information can be found online about the various kinds of lossy and lossless formats, so I won't go into more detail about compressed audio here, except to say that there are many kinds of compressed audio, each with many parameters.<br /><br /></div><div>Uncompressed PCM audio, on the other hand, is defined by two parameters: the <i>sample rate</i> and the <i>bit-depth</i>.&nbsp;Loosely&nbsp;speaking, the sample rate limits the maximum frequency that can be represented by the format, and the bit-depth determines the maximum dynamic range that can be represented by the format. You can think of bit-depth as determining how much noise there is compared to signal.</div><div><br />CD audio is uncompressed and uses a 44,100 Hz sample rate and 16 bit samples. What this means is that audio on a CD is represented by 44,100 separate measurements, or samples, taken per second. Each sample is stored as a 16-bit number. Audio recorded in studios often use a bit depth of 24 bits and sometimes a higher sample rate.<br /><br /></div><div><div>WAV and AIFF files support both compressed and uncompressed formats, but are so rarely used with compressed audio that these formats have become synonymous with uncompressed audio. The most common WAV files use the same parameters as CD audio: 44,100 Hz and bit depth of 16-bits, but other sample rates and bit depths are supported.</div></div><h3>Converting From Compressed to Uncompressed Formats</h3><div>As you probably already know, lots of audio in the world is stored in compressed formats like MP3. However, it's difficult to do any kind of meaningful processing on compressed audio. So, in order to change a compressed file, you must uncompress, process, and re-compress it. Every compression step results in&nbsp;degradation, so compressing it twice results in extra&nbsp;degradation. You can use lossless compression to avoid this, but the extra compression and decompression steps are likely to require a lot of CPU time, and the gains from compression will be relatively minor. For this reason, compressed audio is usually used for delivery and uncompressed audio is usually used in intermediate steps.</div><div><br /></div><div>However, the reality is that sometimes we process compressed audio. Audiofiles and music producers may scoff, but sometimes that's life. For example, it you are working on mobile applications with limited storage space, telephony and VOIP applications with limited bandwidth, and web applications with many free users, you might find yourself need to store intermediate files in a compressed format. Usually the first step in processing compressed audio, like MP3, is to decompress it. This means converting the compressed format to PCM. Doing this involves a detailed understanding of the specific format. I&nbsp;recommend&nbsp;using a library such as <a href="http://www.mega-nerd.com/libsndfile/">libsoundfile</a>,&nbsp;<a href="http://www.ffmpeg.org/">ffmpeg</a>&nbsp;or <a href="http://lame.sourceforge.net/">lame</a> for this step.</div><h3>Uncompressed Audio</h3><div>Most stored, uncompressed audio is 16-bit. Other bit depths, like 8 and 24 are also common and many other bit-depths exist. Ideally,&nbsp;intermediate&nbsp;audio would be stored in floating point format, as is supported by both WAV and AIFF formats, but the reality is that almost no one does this.<br /><br /></div><div>Because 16-bit is so common, let's use that as an example to understand how the data is formatted. 16-bit audio is usually stored as packed 16-bit signed integers. The integers may be big-endian (most common for AIFF) or little-endian (most common for WAV). If there are multiple channels, the channels are usually interleaved. For example, in stereo audio (which has two channels, left and right), you would have one 16-bit integer representing the left channel, followed by one 16-bit integer representing the right channel. These two samples represent the same time and the two together are sometimes called a sample frame or simply a frame.<br /><br /></div><div><table border="1" cellpadding="0" cellspacing="0" style="width: 100%;"><tbody><tr width="50%"><td>Sample Frame 1:<br /><table border="1" cellpadding="0" cellspacing="0" style="width: 100%;"><tbody><tr><td></td><td width="25%">Left MSB</td><td width="25%">Left LSB</td><td width="25%">Right MSB</td><td width="25%">Right LSB</td></tr></tbody></table></td><td>Sample Frame 2:<br /><table border="1" cellpadding="0" cellspacing="0" style="width: 100%;"><tbody><tr><td></td><td width="25%">Left MSB</td><td width="25%">Left LSB</td><td width="25%">Right MSB</td><td width="25%">Right LSB</td></tr></tbody></table></td></tr></tbody></table></div><div>2 sample frames of big-endian, 16-bit interleaved audio. Each box represents one 8-bit byte.<br /><br />The above example shows 2 sample frames of big-endian, 16-bit interleaved audio. You can tell it's big-endian because the most significant byte (MSB) comes first. It's 16-bit because 2 8-bit bytes make up a single sample. It's interleaved because each left sample is followed by a corresponding right sample in the same frame.<br /><br />In Java, and most C environments, a 16 bit signed integer is represented with the <span style="font-family: Courier New, Courier, monospace;">short</span> datatype. Therefore, to read raw 16 bit data, you will usually want to get the data into an array of <span style="font-family: Courier New, Courier, monospace;">short</span>s. If you are only dealing with C, you can do your IO directly with <span style="font-family: Courier New, Courier, monospace;">short</span> arrays, or simply use casting or type punning from a raw <span style="font-family: Courier New, Courier, monospace;">char</span> array. In Java, you can use <span style="font-family: Courier New, Courier, monospace;">readShort()</span> from <a href="http://docs.oracle.com/javase/6/docs/api/java/io/DataInputStream.html"><span style="font-family: Courier New, Courier, monospace;">DataInputStream</span></a>.<br /><br />To store 16-bit stereo&nbsp;interleaved&nbsp;audio in C, you might use a structure like this:<br /><br /><span style="font-family: Courier New, Courier, monospace;">struct {</span><br /><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;short l;</span><br /><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;short r;</span><br /><span style="font-family: Courier New, Courier, monospace;">} stereo_sample_frame_t ;</span><br /><br />or you might simply have an array of shorts:<br /><br /><span style="font-family: Courier New, Courier, monospace;">short samples[];</span><br /><br />In the latter case, you would just need to be aware that when you index an even number it's the left channel, and when you index an odd number it's the right channel. Iterating through all your data and finding the max on each channel would look something like this:<br /><br /><span style="font-family: Courier New, Courier, monospace;">int sampleCount = ...//total number of samples = sample frames * channels</span><br /><span style="font-family: Courier New, Courier, monospace;">int frames = sampleCount / 2 ;</span><br /><span style="font-family: Courier New, Courier, monospace;">short samples[]; //filled in elsewhere</span><br /><span style="font-family: Courier New, Courier, monospace;"><br /></span><span style="font-family: Courier New, Courier, monospace;">short maxl = 0;</span><br /><span style="font-family: Courier New, Courier, monospace;">short maxr = 0;</span><br /><span style="font-family: Courier New, Courier, monospace;">for( int i=0; i&lt;SIZE; ++i )<frames font="" i=""></frames></span><br /><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;maxl = (short) MAX( maxl, abs( samples[2*i] ) );</span><br /><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;maxr = (short) MAX( maxr, abs( samples[r*i+1] ) );</span><br /><span style="font-family: Courier New, Courier, monospace;">}</span><br /><span style="font-family: Courier New, Courier, monospace;">printf( "Max left %d, Max right %d.", maxl, maxr );</span><br /><span style="font-family: Courier New, Courier, monospace;"><br /></span>Note how we find the absolute value of each sample. Usually when we are interested in the maximum, we are looking for the maximum deviation from zero, and we don't really care if it's positive or negative -- either way is going to sound equally loud.<br /><h3>Processing Raw Data</h3>You may be able to do all the processing you need to do in the native format of the file. For example, once you have an array of <span style="font-family: Courier New, Courier, monospace;">short</span>s representing the data, you could divide each short by two to cut the volume in half:<br /><br /><span style="font-family: Courier New, Courier, monospace;">int sampleCount; //total number of samples = sample frames * channels</span><br /><span style="font-family: Courier New, Courier, monospace;">short samples[]; //filled in elsewhere</span><br /><br /><span style="font-family: Courier New, Courier, monospace;">for( int i=0; i<size font="" i="" samplecount=""></size></span></div><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;samples[i] /= 2 ;</span><br /><span style="font-family: Courier New, Courier, monospace;">}</span><br /><br /><br />A few things to watch out for:<br /><br /><ul><li>You must actually use the native format of the file or the proper conversion. You can't simply deal with the data as a stream of bytes. I've seen many questions on stack overflow where people make the mistake of dealing with 16-bit audio data byte-by-byte, even though each sample of 16-bit audio is composed of 2 bytes. This is like adding a multidigit number without the <a href="http://en.wikipedia.org/wiki/Carry_(arithmetic)">carry</a>.</li><li>You must watch out for overflow. For example, when increasing the volume, be aware that some samples my end up out of range. You must ensure that all samples remain in the correct range for their datatype. The simplest way to handle this is with clipping (discussed below), which will result in some distortion, but is better than "wrap-around" that will happen otherwise. (the example above does not have to watch out for overflow because we are dividing not multiplying.)</li><li>Round-off error is virtually inevitable. If you are working in an integer format, eg 16-bit, it is almost impossible to deal with roundoff error. The effects of round-off will be minor but ugly. Eventually these errors will accumulate and be&nbsp;noticeable&nbsp; The example above will&nbsp;definitely&nbsp;have problems with roundoff error.</li></ul>As long as studio quality isn't your goal, however, you can mix, adjust volume and do a variety of other basic operations without needing to worry too much.<br /><h3>Converting and Using Floating Point Samples</h3><div>If you need more powerful or flexible processing, you are probably going to want to convert your samples to floating point. Generally speaking, the nominal range used for audio when audio is represented as floating point numbers is [-1,1].</div><div><br /></div><div>You don't have to abide by this convention. If you like, you can simply convert your raw data to float by casting:</div><div><br /></div><div><span style="font-family: Courier New, Courier, monospace;">short s = ... // raw data</span></div><div><span style="font-family: Courier New, Courier, monospace;">float f = (float) s;</span></div><div><br /></div><div>But if you have some files that are 16-bit and some that are 24-bit or 8-bit, you will end up with unexpected results:</div><div><br /></div><div><span style="font-family: Courier New, Courier, monospace;">char d1 = ... //data from 8-bit file</span></div><div><span style="font-family: Courier New, Courier, monospace;">float f1 = (float) d1;&nbsp;</span><span style="font-family: 'Courier New', Courier, monospace;">// now in range [ -128, 127 ]</span></div><div><span style="font-family: Courier New, Courier, monospace;">short d2 = ... //data from 16-bit file</span></div><div><span style="font-family: Courier New, Courier, monospace;">float f2 = (float) d2;&nbsp;</span><span style="font-family: 'Courier New', Courier, monospace;">// now in range [ -32,768, 32,767 ]</span></div><div><br /></div><div>It's hard to know how to use <span style="font-family: Courier New, Courier, monospace;">f1</span> and <span style="font-family: Courier New, Courier, monospace;">f2</span> together since their ranges are so different. For example, if you want to mix the two, you most likely won't be able to hear the 8-bit file. This is why we usually scale audio into the [-1,1] range.</div><div><br /></div><div>There is much <a href="http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html">debate</a> about the right constants to use when scaling your integers, but it's hard to go wrong with this:</div><div><br /></div><div><span style="font-family: Courier New, Courier, monospace;">int i = //data from n-bit file</span></div><div><span style="font-family: Courier New, Courier, monospace;">float f = (float) i ;</span></div><div><span style="font-family: Courier New, Courier, monospace;">f /= M;</span></div><div><span style="font-family: Courier New, Courier, monospace;"><br /></span></div><div><span style="font-family: Times, Times New Roman, serif;">where </span><span style="font-family: Courier New, Courier, monospace;">M</span><span style="font-family: Times, Times New Roman, serif;"> is </span><span style="font-family: Courier New, Courier, monospace;">2^(n-1)</span><span style="font-family: Times, Times New Roman, serif;">. Now, </span><span style="font-family: Courier New, Courier, monospace;">f</span><span style="font-family: Times, Times New Roman, serif;"> is guaranteed to be in the range [-1,1]. After you've done your processing, you'll usually want to convert back. To do so, use the same constant and check for out of range values:</span></div><div><span style="font-family: Times, Times New Roman, serif;"><br /></span></div><div><span style="font-family: Courier New, Courier, monospace;">float f &nbsp;= // processed data</span></div><div><span style="font-family: Courier New, Courier, monospace;">f *= M;</span></div><div><span style="font-family: Courier New, Courier, monospace;">if( f &lt; - M ) f = -2^(n-1);</span></div><div><span style="font-family: Courier New, Courier, monospace;">if( f &gt; M-1) &nbsp;f = M-1;</span></div><div><span style="font-family: Courier New, Courier, monospace;">i = (int) f;</span></div><h3><span style="font-family: Times, Times New Roman, serif;">Distortion and Noise</span></h3><div><span style="font-family: Times, Times New Roman, serif;">It's hard to avoid distortion and noise when processing audio. In fact, unless what you are doing is trivial or represents a special case, noise and/or distortion are inevitable. The key is to minimize it, but doing so is not easy.&nbsp;</span><span style="font-family: Times, 'Times New Roman', serif;">Broadly speaking, noise happens every time you are forced to round and distortion happens when you change values nonlinearly. We potentially created distortion in the code where we converted from a float to an integer with a range check, because any values outside the range boundary would have been treated differently than values inside the range boundary.&nbsp;</span><span style="font-family: Times, 'Times New Roman', serif;">The more of the signal is out of range the more distortion this will introduce.</span><span style="font-family: Times, 'Times New Roman', serif;">&nbsp;We created noise in the code where we lowered the volume because we introduced round-off error when we divided by two. We also introduce noise when we convert from floating point to integer. In fact, many mathematical operations will introduce noise.</span></div><div><span style="font-family: Times, 'Times New Roman', serif;"><br /></span></div><div><span style="font-family: Times, 'Times New Roman', serif;">Any time you are working with integers, you need to watch out for overflows. For example, the following code will mix two input signals represented as an array of shorts. We handle overflows in the same way we did above, by clipping:</span></div><div><span style="font-family: Times, 'Times New Roman', serif;"><br /></span></div><div><span style="font-family: Courier New, Courier, monospace;">short input1[] = ...//filled in elsewhere</span></div><div><span style="font-family: Courier New, Courier, monospace;">short input2[] = ...//filled in elsewhere</span></div><div><span style="font-family: Courier New, Courier, monospace;">// we are assuming input1 and input2 have size SIZE or greater</span></div><div><span style="font-family: Courier New, Courier, monospace;">short output[ SIZE ];</span></div><div><span style="font-family: Courier New, Courier, monospace;"><br /></span></div><div><span style="font-family: Courier New, Courier, monospace;">for( int i=0; i</span><span style="font-family: 'Courier New', Courier, monospace;">&lt;</span><span style="font-family: 'Courier New', Courier, monospace;">SIZE; ++i )</span></div><div><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;int tmp = (int)</span><span style="font-family: 'Courier New', Courier, monospace;">input1[i] + (int)input2[i];</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">&nbsp; &nbsp;if( tmp &gt; SHRT_MAX ) tmp = SHRT_MAX;</span></div><div><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;if( tmp &lt; SHRT_MIN ) tmp = SHRT_MIN;</span><span style="color: #666666; font-family: arial, sans-serif; font-size: x-small; line-height: 15px; white-space: nowrap;">&nbsp;</span></div><div><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;output[i] = tmp ;</span></div><div><span style="font-family: Courier New, Courier, monospace;">}</span></div><div><span style="font-family: Courier New, Courier, monospace;"><br /></span></div><div><span style="font-family: inherit;">If it so happens that the signal frequently "clips", then we will hear a lot of distortion. If we want to get rid of distortion altogether, we can eliminate it by dividing by 2. This will reduce the output volume and introduce some round-off noise, but will solve the distortion problem:</span></div><div><div><span style="font-family: Courier New, Courier, monospace;"><br /></span></div><div><span style="font-family: Courier New, Courier, monospace;">for( int i=0; i</span><span style="font-family: 'Courier New', Courier, monospace;">&lt;</span><span style="font-family: Courier New, Courier, monospace;">SIZE; ++i )<size font="" i=""></size></span></div><div><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;int tmp = (int)</span><span style="font-family: 'Courier New', Courier, monospace;">input1[i] + (int)input2[i];</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">&nbsp; &nbsp;tmp /= 2;</span></div><div><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp;output[i] = tmp ;</span></div><div><span style="font-family: Courier New, Courier, monospace;">}</span></div></div><h3>Notes:</h3><div>A few final notes:</div><div><ul><li>For some reason, WAV files don't support signed 8-bit format, so when reading and writing WAV files, be aware that 8-bits means unsigned, but in virtually all other cases it's safe to assume integers are signed.</li><li>Always remember to swap the bytes if the native endian-ness doesn't match the file endian-ness. You'll have to do this again before writing.</li><li>When reducing the resolution of data (eg, casting from float to int; multiplying an integer by a non-integer, etc), you are introducing noise because you are throwing out data. It might seem as though this will not make much difference, but it turns out that for sampled data in a time-series (like audio) it has a surprising impact. This impact is small enough that for simple audio applications you probably don't need to worry, but for anything studio-quality you will want to understand something called <a href="http://en.wikipedia.org/wiki/Dither">dither</a>, which is the only correct way to solve the problem.</li><li>You may have come across <a href="http://www.vttoth.com/CMS/index.php/technical-notes/68">one of these</a>&nbsp;<a href="http://atastypixel.com/blog/how-to-mix-audio-samples-properly-on-ios/" rel="nofollow">unfortunate posts</a>, which claims to have found a better way to mix two audio signals. Here's the thing: there is no secret, magical formula that allows you to mix two audio signals and keep them both at the same original volume, but have the mix still be within the same bounds. The correct formula for mixing two signals is the one I described. If volume is a problem, you can either turn up the master volume control on your computer/phone/amplifier/whatever or use some kind of processing like a <a href="http://en.wikipedia.org/wiki/Dynamic_range_compression">limiter</a>, which will also degrade your signal, but not as badly as the formula in that post, which produces a terrible kind of distortion (<a href="http://en.wikipedia.org/wiki/Ring_modulation">ring modulation</a>).</li></ul></div>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com26tag:blogger.com,1999:blog-7225698277211840079.post-70429813416482985482012-11-27T08:51:00.002-08:002012-11-27T08:51:57.548-08:00Audio IIR v FIR EQs<br />Digital filters come in two flavors: IIR (or "Infinite Impulse Response")&nbsp;and FIR (or "Finite Impulse Response"). Those complex acronyms may confuse you, so let's shed a little light on the situation by defining both and explaining the differences.<br /><br />Some people are interested in which is better. Unfortunately, as with many things, there is no easy answer to that question, other than "it depends", and sometimes what it depends on is your ears. I won't stray too deep into field of opinions, but I will try to mention why some people claim one is better than the other and what some of the advantages and disadvantages are in different situations.<br /><h3>How Filters Work</h3>When you design a filter, you start with a set of specifications. To audio engineers, this might be a bit vague, like "boost 1 kHz by 3 dB", but&nbsp;electrical&nbsp;engineers are usually trained to design filters with very specific constraints. However you start, there's usually some long set of equations, and rules used to "design" the filter, depending on what type of filter you are designing and what the specific constraints are (to see one way you might design a filter, see this post on&nbsp;<a href="http://blog.bjornroche.com/2012/08/basic-audio-eqs.html">audio eq design</a>). Once the filter is "designed" you can actually process audio samples.<br /><h3>IIR Filters</h3>Once the filter is designed, the filter itself is implemented as difference equations, like this:<br /><br />&nbsp; &nbsp; y[i] = a0 * x[i] + a1 * x[i-1] ... + a<i>n</i>&nbsp;* x[i-n] - b1 * y[i-1] ... - b<i>m</i>&nbsp;* y[i-m].<br /><br />In this case, y is an array storing the output, and x is an array storing the input. Note that each output is a linear function of previous inputs and outputs, as well as the current input.<br /><br />In order to know the&nbsp;current value of y, we need to know the last value of y, and to know that, you must know the value of still earlier values of y, and so on, all the way back until we reach our initial conditions. For this reason, this kind of filter is sometimes called a&nbsp;"recursive" filter. In principle, this filter can be given a finite input, and it will produce output forever. Because its response is infinite, we call this filter an IIR, or "Infinite Impulse Response" filter.<br /><br />(To further confuse the terminology, IIR filters are often designed with certain constraints that make them "minimum phase." While IIR filters are not all minimum phase, many people use the terms "recursive", "IIR" and "minimum phase"&nbsp;interchangeably.)<br /><br />Digital IIR filters are often modeled after analog filters. In many ways, analog-modled IIR filters sound like analog filters. They are very efficient, too: for audio purposes, they usually only require a few multiplies.<br /><h3>FIR Filters</h3>FIR filters, on the other hand, are usually implemented with a difference equation that looks like this:<br /><br />&nbsp; &nbsp; y[i] = a0 * x[i] + a1 * x[i-1] &nbsp;a2 * x[i-2] + ... a<i>n</i>&nbsp;* x[i-n] + a<i>n</i>&nbsp;* x[i-n-1] + ... + a1 * x[2i+1] + a0 * x[2i]<br /><br />In this case, we don't use previous outputs: in order to calculate the current output, we only need to know the previous&nbsp;<i>n</i>&nbsp;inputs. This may improve the numerical stability of the filter because roundoff errors are not accumulated inside the filter. However, generally speaking, FIR filters are much more CPU intensive for a comparable response, and have some other problems, such as high latency, and both pass-band and stop-band ripple.<br /><br />If an FIR filter can be implemented using a difference equation that is symmetrical, like the one above, it has a special property called "linear phase." Linear phase filters delay all frequencies in the signal by the same amount, which is not possible with IIR filters.<br /><h3>Which Filter?</h3><div>When deciding which filter to use, there are many things to take into account. Here are some of those things:</div><br /><ul><li>Some people feel&nbsp;that linear phase FIR filters sound more natural and have fewer "artifacts".</li><li>FIR filters are usually much more processor intensive for the same response.</li><li>FIR filters have "ripple" in both the passband and stopband, meaning the response is "jumpy". IIR filters can be designed without any ripple.</li><li>IIR filters can be easily designed to sound like analog filters.</li><li>IIR filters require careful design to ensure stability and good numerical error properties, however, that art is fairly advanced.</li><li>FIR filters generally have a higher latency.</li></ul><br />Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0tag:blogger.com,1999:blog-7225698277211840079.post-4228689315563430952012-09-08T08:57:00.002-07:002012-09-11T09:10:38.982-07:00Compiling libjingle on OS XI recently spent the day (yes, the entire day) compiling libjingle on OS X. I'm still running OS X 10.6.8, so that may have been part of the problem, but there are clearly some deeper issues. I thought I'd document the changes I had to make to the compilations instructions in case anyone else (like me in the future) has to go through this nightmare.<br /><br />First off, the package includes compilation instructions in the README file. This file has some organizational issues (For example, the dependencies expat and srtp are not listed under the "prerequisites" section, but rather the "libjingle" section)&nbsp;and does not account for some bugs I found, but otherwise includes some pretty good detail. Unfortunately, all the "examples" they give are for windows, so I imagine that's where all the development and testing is done. Still, you need to read it. This post is just an outline and only goes into detail where the README doesn't explain things.<br /><br />Also, there's no longer an active mailing list to go to ask questions, which is sad because that would be a good place to bring these issues up (there are already bugs posted for most of the fixes). It also makes me think maybe libjingle is dead or on critical life-support. (the mailing list linked from <a href="https://developers.google.com/talk/libjingle/">the developer's page</a> is currently non-existant, and the link from <a href="http://googletalk.blogspot.com/">their blog</a> to the "google talk help center" goes to archive.org!) If you need help, your best bet is probably <a href="http://stackoverflow.com/">stackoverflow.com</a>, which is a great place to go for help, but it's no substitute for a mailing list.<br /><h3>Compiling libjingle</h3><ol><li>Download and extract libjingle from the <a href="http://code.google.com/p/libjingle/">google code</a> page. I used 0.6.14 for this.</li><li>Be sure to extract it somewhere without any weird characters in the path (including spaces) or the build will barf.</li><li>Create a makefile (below) at the top level of libjingle. This will be especially useful in case you need to run the build over and over again as you tweak things.</li><li>Install the prerequisites (see the README for more details)</li><ol><li>Python should already be installed</li><li>To install scons, I recommend homebrew: <span style="font-family: Courier New, Courier, monospace;">$ brew install scons</span></li><li><span style="font-family: Courier New, Courier, monospace;"><span style="font-family: Times;">download swtoolkit and extract it as </span>talk/third_party/swtoolkit</span></li><li><span style="font-family: Courier New, Courier, monospace;"><span style="font-family: Times;">download gtest. extract it as </span>talk/third_party/gtest</span></li><li><span style="font-family: Courier New, Courier, monospace;"><span style="font-family: Times;">download expat 2.0.1. extract as </span>talk/third_party/expat-2.0.1</span></li><li><span style="font-family: Courier New, Courier, monospace;"><span style="font-family: Times;">download srtp and extract as </span>talk/third_party/srtp</span></li></ol><li>Apply the following fixes:</li><ol><li>Fix talk/third_party/swtoolkit/site_scons/site_init.py as described <a href="http://code.google.com/p/libjingle/issues/detail?can=2&amp;start=0&amp;num=100&amp;q=&amp;colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary&amp;groupby=&amp;sort=&amp;id=229">here</a> and <a href="http://stackoverflow.com/questions/5238953/problem-compiling-libjingle">here</a>.</li><li>Fix talk/libjingle.scons as described <a href="http://code.google.com/p/libjingle/issues/detail?id=184#makechanges">here</a>.</li><li>Make the following two changes to talk/main.scons:</li><ol><li>comment out the line that has '-fno-rtti' in it (if you are running a newer version of OS X, and up-to-date dev tools, you may not need to do this.)</li><li>Apply the fix described <a href="http://stackoverflow.com/questions/8039343/link-error-when-build-libjingle-on-mac-os-x-10-7-2">here</a>. A logical place to add the mac_env.Replace(...) is&nbsp;after mac_env.Append( … ).</li></ol></ol><li>Holy crap! You did it! It should now build with <span style="font-family: Courier New, Courier, monospace;">$ make</span></li><li><span style="font-family: inherit;">If you get stuck, you may get a hint from </span><span style="font-family: Courier New, Courier, monospace;">$ make verbose</span></li><li><span style="font-family: inherit;">To compile 64-bit binaries, you need to do a few more things:</span></li><ol><li>Comment out "session/phone/carbonvideorenderer.cc" from libjingle.scons, and 'Carbon' from main.scons.</li><li>Change '-arch', 'i386', to '-arch', 'x86_64', in two places in main.scons</li><li>Though the build will terminate with errors, you should at least have the .a files you need.</li></ol></ol><br /><span style="font-family: Courier New, Courier, monospace;">======= Makefile ========</span><br /><span style="font-family: Courier New, Courier, monospace;"><br /></span><span style="font-family: Courier New, Courier, monospace;">SCONS_DIR ?= /usr/local/Cellar/scons/2.2.0/libexec/scons-local/</span><br /><span style="font-family: Courier New, Courier, monospace;">export</span><br /><span style="font-family: Courier New, Courier, monospace;"><br /></span><span style="font-family: Courier New, Courier, monospace;">default: build</span><br /><span style="font-family: Courier New, Courier, monospace;">talk/third_party/expat-2.0.1/Makefile:</span><br /><span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>cd talk/third_party/expat-2.0.1 &amp;&amp; ./configure</span><br /><span style="font-family: Courier New, Courier, monospace;">talk/third_party/srtp/Makefile:</span><br /><span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>cd talk/third_party/srtp &amp;&amp; ./configure</span><br /><span style="font-family: Courier New, Courier, monospace;"><br /></span><span style="font-family: Courier New, Courier, monospace;">build: talk/third_party/expat-2.0.1/Makefile talk/third_party/srtp/Makefile</span><br /><span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>cd talk &amp;&amp; third_party/swtoolkit/hammer.sh</span><br /><span style="font-family: Courier New, Courier, monospace;"><br /></span><span style="font-family: Courier New, Courier, monospace;">verbose: talk/third_party/expat-2.0.1/Makefile talk/third_party/srtp/Makefile</span><br /><span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>cd talk &amp;&amp; third_party/swtoolkit/hammer.sh --verbose</span><br /><span style="font-family: Courier New, Courier, monospace;"><br /></span><span style="font-family: Courier New, Courier, monospace;">help:</span><br /><span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>~/bin/swtoolkit/hammer.sh --help</span><br /><span style="font-family: Courier New, Courier, monospace;"><br /></span><span style="font-family: Courier New, Courier, monospace;">clean:</span><br /><span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>cd talk &amp;&amp; third_party/swtoolkit/hammer.sh --clean</span><br /><div><br />UPDATE: notes on 64-bit build.</div>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0tag:blogger.com,1999:blog-7225698277211840079.post-13632060351092459672012-08-23T21:12:00.002-07:002012-08-25T21:16:15.564-07:00Basic Audio EQs<a href="http://blog.bjornroche.com/2012/08/why-eq-is-done-in-time-domain.html">In my last post</a>, I looked at why it's usually better to do EQ (or filtering) in the time domain than the frequency domain as far as audio is concerned, but I didn't spend much time explaining how you might implement a time-domain EQ. That's what I'm going to do now.<br /><br />The theory behind time-domain filters could fill a book. Instead of trying to cram you full of theory we'll just skip ahead to what you need to know to do it. I'll assume you already have some idea of what a filter is.<br /><h3>Audio EQ Cookbook</h3>The <a href="http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt">Audio EQ Cookbook</a> by&nbsp;<span style="white-space: pre-wrap;">Robert Bristow-Johnson is a great, albeit very terse, description of how to build basic audio EQs. These EQs can be described as second order digital filters, sometimes called "<a href="http://en.wikipedia.org/wiki/Digital_biquad_filter">biquads</a>"because the equation that describes them contains two quadratics. In audio, we sometimes use other kinds of filters, but second order filters are a real workhorse. First order filters don't do much: they generally just allow us to adjust the overall balance of high and low frequencies. This can be useful in "tone control" circuits, like you might find on some stereos and guitars, but not much else. Second order filters give us more control -- we can "dial in" a specific frequency, or increase or decrease frequencies above and below a certain threshold, with a fair degree of accuracy, for example. If we need even more control than a second order filter offers, we can often simply take several second order filters and place them in series to simulate the effect of a single higher order filter.</span><br /><span style="white-space: pre-wrap;"><br /></span><span style="white-space: pre-wrap;">Notice I said series, though. Don't try putting these filters in parallel, because they not only alter the frequency response, but also the phase response, so when you put them in parallel you might get unexpected results. For example, if you take a so-called all-pass filter and put it in parallel with no filter, the result will not be a flat frequency response, even though you've combined the output of two signals that have the same frequency response as the original signal.</span><br /><span style="white-space: pre-wrap;"><br /></span><span style="white-space: pre-wrap;">Using the Audio EQ Cookbook, we can design a peaking, high-pass, low-pass, band-pass, notch (or band-stop), or shelving filter. These are the basic filters used in audio. </span><span style="white-space: pre-wrap;">We can even design that crazy all-pass filter I mentioned which actually does come in handy if you are building a phaser. (It has other uses, too, but that's for another post.)</span><br /><h3><span style="white-space: pre-wrap;">Bell Filter</span></h3><span style="white-space: pre-wrap;">Let's design a "bell", or "peaking" filter using RBJ's cookbook. Most other filters in the cookbook are either similar to the bell or simpler, so once you understand the bell, you're golden. To start with, you will need to know the sample rate of the audio going into and coming out of your filter, and the center frequency of your filter. The center frequency, in the case of the bell filter, is the frequency that "most affected" by your filter. You will also want to define the width of the filter, which can be done in a number of ways usually with some variation on "Q" or "quality factor" and "bandwidth". RBJ's filters define bandwidth in octaves, and you want to be careful that you don't extend the top of the bandwidth above the Niquist frequency (or 1/2 the sample rate), or your filter won't work. We also need to know how much of our center frequency to add in dB (if we want to remove, we just use a negative value, and for no change, we set that to 0).</span><br /><span style="white-space: pre-wrap;"><br /></span><span style="white-space: pre-wrap;"><span style="font-family: Courier New, Courier, monospace;">Fs = Sample Rate</span></span><br /><span style="white-space: pre-wrap;"><span style="font-family: Courier New, Courier, monospace;">F0 = Center Frequency (always less than Fs/2)</span></span><br /><span style="white-space: pre-wrap;"><span style="font-family: Courier New, Courier, monospace;">BW = Bandwidth in octaves</span></span><br /><span style="white-space: pre-wrap;"><span style="font-family: Courier New, Courier, monospace;">g = gain in dB</span></span><br /><span style="white-space: pre-wrap;"><br /></span><span style="white-space: pre-wrap;">Great! Now we are ready to begin our calculations. First, RJB suggests calculating some intermediate values:</span><br /><span style="white-space: pre-wrap;"><br /></span><span style="white-space: pre-wrap;"><span style="font-family: Courier New, Courier, monospace;">A = 10^(g/40)</span></span><br /><span style="white-space: pre-wrap;"><span style="font-family: Courier New, Courier, monospace;">w0 = 2*pi*f0/Fs c = cos(w0) s = sin(w0) alpha = s*sinh( ln(2)/2 * BW * w0/s )</span></span><br /><span style="white-space: pre-wrap;"><span style="font-family: Courier New, Courier, monospace;"><br /></span></span><span style="white-space: pre-wrap;">This is a great chance to use that hyperbolic</span><span style="font-family: inherit;"><span style="white-space: pre-wrap;"> sin button on your scientific </span></span><span style="white-space: pre-wrap;">calculator</span><span style="font-family: inherit;"><span style="white-space: pre-wrap;"> that, until now, has only been collecting dust. Now that we've done that, we can finally calculate the filter </span></span><span style="white-space: pre-wrap;">coefficients</span><span style="font-family: inherit;"><span style="white-space: pre-wrap;">, which we use when actually processing data:</span></span><br /><span style="font-family: inherit;"><span style="white-space: pre-wrap;"><br /></span></span><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">b0 = 1 + alpha*A b1 = -2*c b2 = 1 - alpha*A a0 = 1 + alpha/A a1 = -2*c a2 = 1 - alpha/A</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"><br /></span></span><span style="white-space: pre-wrap;"><span style="font-family: inherit;">Generally speaking, we want to "normalize" these </span></span><span style="white-space: pre-wrap;">coefficients</span><span style="white-space: pre-wrap;"><span style="font-family: inherit;">, so that </span><span style="font-family: Courier New, Courier, monospace;">a0 = 1</span><span style="font-family: inherit;">. We can do this by dividing each coefficient by </span><span style="font-family: Courier New, Courier, monospace;">a0.</span><span style="font-family: inherit;"> Do this in advance or the electrical engineers will laugh at you:</span></span><br /><span style="white-space: pre-wrap;"><span style="font-family: inherit;"><br /></span></span><span style="font-family: 'Courier New', Courier, monospace; white-space: pre-wrap;">b0 /= a0 b1 /= a0 b2 /= a0 a1 /= a0 a2 /= a0</span><br /><span style="white-space: pre-wrap;"><span style="font-family: Courier New, Courier, monospace;"><br /></span></span><span style="white-space: pre-wrap;"><span style="font-family: inherit;">Now, in pseudocode, here's how we process our data, one sample at a time using a "process" function that looks something like this:</span></span><br /><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">number xmem1, xmem2, ymem1, ymem2;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"><br /></span></span><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">void reset() {</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> xmem1 = xmem2 = ymem1 = ymem2 = 0;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">}</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">number process( number x ) {</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> number y = b0*x + b1*xmem1 + b2*xmem2 - a1*ymem1 - a2*ymem2;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"><br /></span></span><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> xmem2 = xmem1;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> xmem1 = x;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> ymem2 = ymem1;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> ymem1 = y;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"><br /></span></span><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> return y;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">}</span></span><br /><br />You'll probably have some kind of loop that your process function goes in, since it will get called once for each audio sample.<br /><br /><span style="font-family: inherit;"><span style="white-space: pre-wrap;">There's actually more than one way to implement the process function given that particular set of coefficients. This implementation is called "Direct Form I" and happens to work pretty darn well most of the time. "Direct form II" has some admirers, but those people are either suffering from graduate-school-induced trauma or actually have some very good reason for doing what they are doing that in all </span></span><span style="white-space: pre-wrap;">likelihood</span><span style="font-family: inherit;"><span style="white-space: pre-wrap;"> does not apply to you. There are of course other implementations, but DFI is a good place to start.</span></span><br /><span style="white-space: pre-wrap;"><span style="font-family: inherit;"><br /></span></span><span style="white-space: pre-wrap;"><span style="font-family: inherit;">You may have noticed that the output of the filter, </span><span style="font-family: Courier New, Courier, monospace;">y</span><span style="font-family: inherit;">, is stored and used as an input to future iterations. The filter is therefore "recursive". This has several implications:</span></span><br /><br /><ul><li><span style="white-space: pre-wrap;"><span style="font-family: inherit;">The filter is fairly sensitive to errors in the recursive values and coefficients</span></span><span style="white-space: pre-wrap;"><span style="font-family: inherit;">. Because of this, we need to take care of what happens with the error in our </span><span style="font-family: Courier New, Courier, monospace;">y</span><span style="font-family: inherit;"> values. In practice, on computers, we usually just need to use a high resolution floating point value (ie double precision) to store these (on fixed point hardware, it is often another matter).</span></span></li><li><span style="white-space: pre-wrap;"><span style="font-family: inherit;">Another issue is that you can't just blindly set the values of your </span></span><span style="white-space: pre-wrap;">coefficients</span><span style="font-family: inherit;"><span style="white-space: pre-wrap;">, or your filter may become unstable. Fortunately, the coefficients that come out of RJB's equations always result in stable filters, but don't go messing around. For example, you might be tempted to interpolate coefficients from one set of values to another to simulate a filter sweep. Resist this temptation or you will unleash the numerical fury of hell! The values in between will be "unstable" meaning that your output will run off to infinity. Madness, </span></span><span style="white-space: pre-wrap;">delirium</span><span style="font-family: inherit;"><span style="white-space: pre-wrap;">, vomiting and broken speakers are often the unfortunate </span></span><span style="white-space: pre-wrap;">casualties</span><span style="font-family: inherit;"><span style="white-space: pre-wrap;">.</span></span></li><li><span style="font-family: inherit;"><span style="white-space: pre-wrap;">On some platforms you will have to deal with something called "denormal" numbers. This is a major <a href="http://musicdsp.org/files/denormal.pdf">pain in the ass</a>, I'm sorry to say. Basically it means our performance will be between 10 and 100 times worse than it should be because the CPU is busy calculating tiny numbers you don't care about. This is one of the rare cases where I would advocate optimizing before you measure a problem because sometimes your code moves around and it comes up and it's very hard to trace this issue. In this case, the easiest solution is probably to do something like this (imagine we are in C for a moment):</span></span></li></ul><span style="white-space: pre-wrap;"><br /></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">#DEFINE IS_DENORMAL(f) (((*(unsigned int *)&amp;(f))&amp;0x7f800000) == 0)</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">float xmem1, xmem2, ymem1, ymem2;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"><br /></span></span><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">void reset() {</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> xmem1 = xmem2 = ymem1 = ymem2 = 0;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">}</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">float process( float x ) {</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> number y = b0*x + b1*xmem1 + b2*xmem2 - a1*ymem1 - a2*ymem2;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"><br /></span></span><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> if( IS_DENORMAL( y ) )</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> y = 0;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"><br /></span></span><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> xmem2 = xmem1;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> xmem1 = x;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> ymem2 = ymem1;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> ymem1 = y;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"><br /></span></span><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"> return y;</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;">}</span></span><br /><span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre-wrap;"><br /></span></span><span style="white-space: pre-wrap;"><span style="font-family: inherit;">Okay, happy filtering!</span></span>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com21tag:blogger.com,1999:blog-7225698277211840079.post-47807127501467926262012-08-08T14:58:00.000-07:002012-09-28T10:21:29.455-07:00Why EQ Is Done In the Time Domain<a href="http://blog.bjornroche.com/2012/08/when-to-not-use-fft.html">In my last post</a>, I discussed how various audio processing may be best done in the frequency or time domain. Specifically, I suggested that EQ, which is a filter that alters the frequency balance of a signal, is best done in the time domain, not the frequency domain. (See my next post if you want to learn <a href="http://blog.bjornroche.com/2012/08/basic-audio-eqs.html">how to implement a time-domain filter</a>.)<br /><br />If this seems counter intuitive to you, rest assured you are not alone. I've been following the "audio" and "FFT" tags (among others) on <a href="http://www.stackoverflow.com/">stack overflow</a>&nbsp;and it's clear that many people attempt to implement EQs in the frequency domain, only to find that they run into a variety of problems.<br /><h2>Frequency Domain Filters</h2>Let's say you want to eliminate or reduce high frequencies from your signal. This is called a "low-pass" filter, or, less&nbsp;commonly, a "high-cut" filter. In the frequency domain, high frequencies get "sorted" into designated "bins", where you can manipulate them or even set them to zero. This seems like an ideal way to do low-pass filtering, but lets explore the process to see why it might not work out so well.<br /><br />Our first attempt at a low-pass filter, implemented with the FFT might look something like this:<br /><ul><li>loop on audio input</li><li>if enough audio is received, perform FFT, which gives us audio in the frequency domain</li><ul><li>in frequency domain, perform manipulations we want. In the case of&nbsp;eliminating&nbsp;high frequencies, we set the bins representing high frequencies to 0.</li><li>perform inverse FFT, to get audio back in time domain</li><li>output that chunk of audio</li></ul></ul><br />But there are quite a few problems with that approach:<br /><ul><li>We must wait for a chunk of audio before we can even begin processing, which means that we will incur latency in our processing. The higher quality filter we want, the more audio we need to wait for. If the input buffer size does not match the FFT size, extra buffering needs to be done.</li><li>The FFT, though efficient compared to the DFT (which is the FFT without the "fast" part), performs worse than linear time, and we need to do both the FFT and it's inverse, which is computationally similar. EQing with the FFT is therefore generally very inefficient compared to comparable time-domain filters.</li><li>Because our output chunk has been processed in the frequency domain independent of samples in neighboring chunks, the audio in neighboring chunks may not be continuous. One solution is to process the entire file as one chunk (which only works for offline, rather than real-time processing, and is computationally expensive). The better solution is the <a href="http://en.wikipedia.org/wiki/Overlap%E2%80%93add_method">OLA or Overlap Add method</a>&nbsp;but this involves complexity that many people miss when implementing a filter this way.</li><li>Filters implemented via FFT, as well as time-domain filters implemented via IFFT, often do not perform the way people expect. For example, many people expect that if they set all values in bins above a certain frequency to 0, then all&nbsp;frequencies&nbsp;above the given frequency will be eliminated. This is not the case. Instead, frequency responses <i>at</i> the bin values will be 0, but the frequency response <i>between</i> those values is free to fluctuate -- and it does fluctuate, often greatly. This fluctuation is called "ripple." There are techniques for reducing ripple but they are complex, and they don't eliminate ripple.&nbsp;Note that, in general, frequencies across the entire spectrum are subject to ripple,&nbsp;so even just manipulating a small frequency band many create ripple across the entire frequency spectrum.</li><li>FFT filters suffer from so-called "pre-echo", where the sounds can be heard before the main sound hits. In and of itself, this isn't really a problem, but sounds are "smeared" so badly by many designs, that many in the audio world feel that these filters can effect the impact of transients and stereo imaging if not implemented and used correctly.</li></ul>So it's clear that FFT filters may not be right, or if they are, they involve much more complexity than many people first realize.<br /><br />As a side note, one case where it might be worth all that work is a special case of so-called FIR filters (also sometimes called "Linear phase" filters). These are used sometimes in audio production and in other cases. In audio, they are usually used only in mastering because of their high latency and computational cost, but even then, many engineers don't like them (while others swear by them). FIR filters are best implemented in the time domain, as well, until the number of "taps"in the filter becomes&nbsp;enormous, which it&nbsp;sometimes&nbsp;does, and it actually becomes more efficient to implement using an FFT with OLA. FIR filters suffer from many of the problems mentioned above including pre-echo, high computational cost and latency, but they do have some acoustical properties that make them desirable in some applications.<br /><h2>Time Domain Filters</h2>Let's try removing high frequencies in the time domain instead. In the time domain, high frequencies are represented by the parts of the signal that change quickly, and low frequencies are represented as the parts that change slowly. One simple way to remove high frequencies, then, would be to use&nbsp;a moving average filter:<br /><br /><span style="font-family: Courier New, Courier, monospace;">y(n) = { x(n) + x(n-1) + .... + x(n-M) } / (M+1)</span><br /><br /><span style="font-family: inherit;">where&nbsp;</span><span style="font-family: Courier New, Courier, monospace;">x(i)</span><span style="font-family: inherit;">&nbsp;is your input sample at time&nbsp;</span><span style="font-family: Courier New, Courier, monospace;">i</span><span style="font-family: inherit;">, and&nbsp;</span><span style="font-family: Courier New, Courier, monospace;">y(i)</span><span style="font-family: inherit;">&nbsp;is your output sample at time&nbsp;</span><span style="font-family: Courier New, Courier, monospace;">i</span><span style="font-family: inherit;">. No FFT required for that (This is not the best filter for removing high frequencies -- in fact we can do WAY better -- but it is my favorite way to illustrate the point. The moving average filter is not uncommon in economics, image processing and other fields partly for this reason.). Several advantages are immediately obvious, and some are not so obvious:</span><br /><ul><li><span style="font-family: inherit;">Each input sample can be processed one at a time to produce one output sample without having to chunk or wait for more audio. Therefore, there are also no continuity issues and minimal latency.</span></li><li><span style="font-family: inherit;">It is extremely efficient, with only a few multiplies, adds and memory stores/retrievals required per sample.</span></li><li><span style="font-family: inherit;">These filters can be designed to closely mimic analog filters.</span></li></ul>A major disadvantage is that it is not immediately obvious how to design a high-quality filter in the time domain. In fact, it can take some serious math to do so. It's also worth noting&nbsp;that many time-domain filters, like frequency domain filters, also suffer from ripple, but for many design methods, this ripple is well defined and can be limited in various ways.<br /><br />In the end, the general rule is that for a given performance, you can get much better results with the time-domain than the frequency domain.Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com8tag:blogger.com,1999:blog-7225698277211840079.post-17639118610607295922012-08-04T09:15:00.003-07:002012-08-04T09:15:47.475-07:00When to (not) use the FFT<a href="http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html">In the last post</a> I discussed one use for the FFT: pitch tracking. I also mentioned that there were better ways to do pitch tracking. Indeed, aside from improvements on that method, you could also use entirely different methods that don't rely on the FFT at all.<br /><br />The FFT transforms data into the "frequency domain", or, if your data is broken down into chunks, the FFT transforms it into the "time-frequency domain," which we often still think of as the frequency domain. However, the most basic "domain" you can work in is usually the "time domain." In the time domain, audio is represented as sequence of amplitude values. You may know this as "PCM" audio. This is what's usually stored in WAVs and AIFs, and when we access audio devices like soundcards, this is the most natural way to transfer data. It turns out we can also do a whole lot of processing and analysis in the time domain as well.<br /><br /><table bgcolor="#dddddd" border="0" cellpadding="4" cellspacing="0" frame="hsides" width="100%"><tbody><tr><th>Process</th><th>Time Domain</th><th>Frequency Domain</th></tr><tr><td>Filtering/<br />EQ</td><td bgcolor="#ddffdd">Yes!</td><td bgcolor="#ffdddd">No!</td></tr><tr><td>Pitch Shifting</td><td bgcolor="#ffffdd">Okay</td><td bgcolor="#ffffdd">Okay</td></tr><tr><td>Pitch Tracking</td><td bgcolor="#ffffdd">Okay</td><td bgcolor="#ffffdd">Okay</td></tr><tr><td>Reverb<br />(Simulated)</td><td bgcolor="#ddffdd">Yes!</td><td bgcolor="#ffdddd">No!</td></tr><tr><td>Reverb<br />(Impulse)</td><td bgcolor="#ffdddd">No!</td><td bgcolor="#ddffdd">Yes!</td></tr><tr><td>Guitar effects<br />Chorus/flanger/distortion/etc</td><td bgcolor="#ddffdd">Yes!</td><td bgcolor="#ffdddd">No!</td></tr><tr><td>SR Conversion</td><td bgcolor="#ddffdd">Yes!</td><td bgcolor="#ffdddd">No!</td></tr><tr><td>Compression</td><td bgcolor="#ddffdd">Yes!</td><td bgcolor="#ffdddd">No!</td></tr><tr><td>Panning, Mixing, etc</td><td bgcolor="#ddffdd">Yes!</td><td bgcolor="#ffdddd">No!</td></tr></tbody><caption>Table 1: Recommendations for Audio Processing in the Time Domain vs. the Frequency Domain</caption></table><br /><br />Wow, so impulse reverb is really the only thing on that list you need an FFT for? Actually even that can be done in the time domain, it's just much more efficient in the frequency domain (so much so that it might be considered impossible in the time domain).<br /><br />You might wonder how to adjust the frequency balance of a signal, which is what an EQ does, in the time domain rather than the frequency domain. Well, you <i>can</i> do it in the frequency domain, but you are asking for trouble. I'll talk about this in my next post.Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com1tag:blogger.com,1999:blog-7225698277211840079.post-61514722129569357072012-07-22T18:45:00.001-07:002017-02-23T11:31:45.831-08:00Frequency detection using the FFT (aka pitch tracking) With Source Code<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody><tr><td style="text-align: center;"><a href="http://1.bp.blogspot.com/-4hK5HJz3Z3k/UelNne2ktPI/AAAAAAAAADI/k9WFYnx6KHY/s1600/new_fft2.gif" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://1.bp.blogspot.com/-4hK5HJz3Z3k/UelNne2ktPI/AAAAAAAAADI/k9WFYnx6KHY/s320/new_fft2.gif" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">It's not necessarily as simple a it seems to find the pitch<br />from an FFT. Some pre-processing is required as well<br />as some knowledge of how the data is organized.</td></tr></tbody></table>How to track pitch with the FFT seems to be a very commonly asked question on <a href="http://stackoverflow.com/">stack overflow</a>. Many people seem to think tracking pitch is as simple as putting your data into an FFT, and looking at the result. Unfortunately, this is not the case. Simply applying an FFT to your input, even if you know what size FFT to use, is not going to give you optimal results, although it might work in some cases.<br /><br />At the end of the day, using the FFT is not actually the best pitch tracking method available for tracking or detecting pitch of an audio signal. While it is possible to make a good pitch tracker using the FFT, doing it right requires a tremendous amount of work. The algorithm shown here works, and works pretty well, but if you need something that converges on the correct pitch really quickly, is very accurate, or tracks multiple notes simultaneously, you need something else.<br /><br />Still, you can create a decent pitch tracking algorithm that's reasonably easy to understand using the FFT. It doesn't require too much work, and I've explained it and provided code, in the form of a command-line C <a href="https://github.com/bejayoharen/guitartuner">guitar tuner</a> app which you can get from github. It compiles and runs on Mac OS X and you should be able to get it to run on other platforms without much trouble. If you want to port to other languages, that shouldn't be too hard either. It's worth noting that I specifically designed this app to be similar to the tuner described by Craig A. Lindley in <a href="http://www.amazon.com/Digital-Audio-Java-Craig-Lindley/dp/0130876763">Digital Audio with Java</a>, so if you are looking for Java source code, you can check out his code (although there are differences between hi code and mine).<br /><h2> The Big Picture</h2><div>To do our pitch detection, we basically loop on the following steps:</div><br /><ol><li>Read enough data to fill the FFT</li><li>Low-pass the data</li><li>Apply a window to the data</li><li>Transform the data using the FFT</li><li>Find the peak value in the transformed data</li><li>Compute the peak frequency from from the index of the peak value in the transformed data</li></ol><br />This is the main processing loop for the tuner, with some stuff left out:<br /><br />&nbsp; &nbsp;<span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">while( running )</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;{</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; // read some data</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; err = Pa_ReadStream( stream, data, FFT_SIZE );</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><br /></span><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; // low-pass</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; for( int j=0; j<fft_size font="" j=""></fft_size></span><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&lt;</span><span style="font-family: &quot;\22 courier new\22 &quot; , &quot;\22 courier\22 &quot; , monospace;">FFT_SIZE; ++j ) {</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;data[j] = processSecondOrderFilter( data[j], mem1, a, b );</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;data[j] = processSecondOrderFilter( data[j], mem2, a, b );</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; }</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; // window</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; applyWindow( window, data, FFT_SIZE );</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><br /></span><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; // do the fft</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; for( int j=0; j</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;datai[j] = 0;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; applyfft( fft, data, datai, false );</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><br /></span><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; //find the peak</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; float maxVal = -1;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; int maxIndex = -1;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; for( int j=0; j&lt;</span><span style="font-family: &quot;\22 courier new\22 &quot; , &quot;\22 courier\22 &quot; , monospace;">FFT_SIZE; ++j ) {</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;float v = data[j] * data[j] + datai[j] * datai[j] ;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;if( v &gt; maxVal ) {</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; maxVal = v;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; maxIndex = j;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; }</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; float freq = freqTable[maxIndex];</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; //...</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; }</span><br /><br />Let's go over each of the steps and see how they work.<br /><h2> Audio Data</h2>We always need to start with a sequence of numbers representing the amplitude of audio over time (sometimes called "Linear, PCM audio"). This is what we get from most uncompressed audio formats like AIFF and WAV. Its also what you get from audio APIs like ASIO, CoreAudio and ALSA. In this case, we are using <a href="http://portaudio.com/">PortAudio</a>, which acts like a portable wrapper around these and other APIs. If you have a compressed format such as MP3 or OGG, you will have to convert it to uncompressed audio first.<br /><br />Your data might be 16-bit integer, 8-bit integer, 32-bit floating point or any number of other formats. We'll assume you know how to get your data to floating point representation in the range from -1 to 1. PortAudio takes care of this for us when we specify these input parameters:<br /><br /><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">inputParameters.device = Pa_GetDefaultInputDevice();</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;inputParameters.channelCount = 1;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;inputParameters.sampleFormat = paFloat32;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;inputParameters.suggestedLatency = Pa_GetDeviceInfo( inputParameters.device )-&gt;defaultHighInputLatency ;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;inputParameters.hostApiSpecificStreamInfo = NULL;</span><br /><br /><br />You'll also need to know how often your audio is sampled. For a tuner, less is more, so we'll use a sample rate of 8 kHz, which is available on most hardware. This is extremely low for most audio applications (44.1 kHz is considered standard for audio and 48 kHz is standard for video), but for a tuner, 8 kHhz is plenty.<br /><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">#define SAMPLE_RATE (8000)</span><br /><br /><h2> Low-Pass Filtering</h2>There's no hard and fast rule about low-pass filtering (or simply "low-passing") your audio data. In fact, it's not even strictly necessary, but doing so can get rid of unwanted noise and the higher frequencies that sometimes masquerade as the fundamental frequency. This is important because some instruments have component frequencies called harmonics that are more powerful than the "fundamental" frequencies, and usually we are interested in the fundamental frequencies. Filtering, therefore, can improve the reliability of the rest of the pitch tracker significantly. Without filtering, some noise might appear to be the dominant pitch, or, more likely, the dominant pitch might appear to be a harmonic of the actual fundamental frequency.<br /><br />A good choice for the filter is a low-pass filter with a center frequency around or a little above the highest pitch you expect to detect. For a guitar tuner, this might be the high E string, or about 330 Hz. So that's what we'll use -- in fact, we low-pass it twice. If you are modifying the code for another purpose, you can set the center frequency to something that makes sense for your application.<br /><br />If you aren't sure or you want to go with or want something less agressive, you could try a moving average filter, which simply outputs the average of the current input and some number of previous inputs. Intuitively, we can understand that this filter reduces high frequencies because signals that change quickly get "smoothed" out.<br /><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">// Process every sample of your input with this function</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">// (this is not used in our guitar tuner)</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">function float twoPointMovingAverageFilter( float input ) {</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;static float lastInput = 0;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;float output = ( input + lastInput ) / 2 ;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;lastInput = input;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;return output;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">}</span><br /><br />The moving average filter won't make a huge difference, but if the low pass filter I used in my code doesn't suit you and you don't have the degree in electrical engineering required to design the right digital filter (or don't know what the right filter is), it might be better than nothing. I haven't tested the moving average filter myself.<br /><h2> Windowing</h2>Generally speaking, FFTs work in chunks of data, but your input is a long or even continuous stream. To fit this round peg into this square hole, you need to break off chunks of your input, and process the chunks. However, doing so without proper treatment may prove detrimental to your results. In rough terms, the problem is that the edges get lopped off very sloppily, creating artifacts at frequencies that aren't actually present in your signal. These artifacts, called "sidelobes", cause problems for many applications. I know that some tuners are designed without special treatment, so you can skip this step, but I strongly recommend you keep reading because it's easy to deal with this problem.<br /><br />To reduce the sidelobes, we premultiply each chunk of audio with another signal called a window, or window function. Two simple and popular choices for window functions are the <a href="http://en.wikipedia.org/wiki/Window_function#Hamming_window">Hamming window</a>, and the <a href="http://en.wikipedia.org/wiki/Hann_function">Hann window</a>. I put code for both in the tuner, but I used the Hann window.<br /><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">void buildHanWindow( float *window, int size )</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">{</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;for( int i = 0 to size )</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; window[i] = .5 * ( 1 - cos( 2 * M_PI * i / (size-1.0) ) );</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">}</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">void applyWindow( float *window, float *data, int size )</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">{</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;for( int i = 0 to size )</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp; &nbsp; data[i] *= window[i] ;</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">}</span><br /><br />For a tuning app, the windows may overlap, or there may be gaps in between them, depending on your needs and your available processing power. For example, by overlapping and performing more FFTs, and then averaging the results, you may get more accurate results more quickly, at the cost of more CPU time. <b>I strongly recommend doing this in real apps. I did not do this in my app to make the code easier to follow, and you'll see that the values sometimes jump around and don't respond smoothly.</b><br /><h2> FFT</h2>The FFT, or Fast Fourier Transform, is an algorithm for quickly computing the frequencies that comprise a given signal. By quickly, we mean O( N log N ). This is way faster than the O( N<sup>2</sup>) which how long the Fourier transform took before the "fast" algorithm was worked out, but still not linear, so you are going to have to be mindful of performance when you use it. Because the FFT is now the standard way to compute the Fourier transform, many people often use the terms&nbsp;interchangeably, even though this is not strictly correct.<br /><br />The FFT works on a chunk of samples at a time. You don't get more or less data out of a Fourier Transform than you put into it, you just get it in another form. That means that if you put ten audio samples in you get ten data-points out. The difference is that these ten data points now represent energy at different frequencies instead of energy at different times, and since our data uses real numbers, and not complex, the FFT will contain some redundancies -- specifically, only the first half of the spectrum contains relevant data. That means that for ten samples in, we really only get five relevant data-points out.<br /><br />Clearly, the more frequency resolution you need, the more time data you need to give it. However, at some point you will run into the problem of not being able to return results quickly enough, either because you are waiting for more input, or because it takes too long to process. Choosing the right size FFT is critical: too big and you consume lots of CPU and delay getting a response, too small and your results lack resolution.<br /><br />How do we know how big our FFT should be? You can determine the accuracy of your FFT with this simple formula:<br /><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">binSize = sampleRate/N ;</span><br /><br />For example, with a bin size of&nbsp;8192&nbsp;(most implementations of the FFT work best with powers of 2), and a sample rate of 44100, you can expect to get results that are accurate to within about 5.38 Hz. Not great for a tuner, but, hey, that's why we are sampling at 8000 Hz, which gives us an accuracy of better than 1 Hz. Still not perfect, for, say, a 5 string bass, but you can always use a a larger N if you need to. Keep in mind that getting enough samples to get that much accuracy takes longer than a second, so our display only updates about once a second. That's yet another reason you might want to overlap your windows.<br /><br />The output of the FFT is an array of N complex numbers. It is possible to use both the real and imaginary part to get very accurate frequency information, but for now we'll settle for something simpler and much easier to understand: we simply look at the magnitude. To find the magnitude of each frequency component, we use the distance formula:<br /><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">for( i in 0 to N/2 )</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">&nbsp; &nbsp;magnitude[i] = sqrt( real[i]*real[i] + cmpx[i]*cmpx[i] );</span><br /><br />Now that we know the magnitude of each FFT bin, finding the frequency is simply a matter of finding the bin with the maximum magnitude. The frequency will then be the bin number times the bin size, which we computed earlier. Note that we don't actually need to compute the square root to find the maximum magnitude, so our actual code skips that step.<br /><h2> More</h2>We do a bit more in our code, like identify the nearest semi-tone and find the difference between the that semi-tone and the identified frequency, but for stuff like that we'll leave the code to speak for itself.Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com19tag:blogger.com,1999:blog-7225698277211840079.post-62184833817313433832012-06-29T08:48:00.004-07:002012-07-22T19:24:15.754-07:00Freeverb: original public domain code by Jezar at DreampointI recently had&nbsp;occasion&nbsp;to use the original Freeverb code by Jezar at Dreampoint. There are several variations on this, including <a href="http://freeverb3.sourceforge.net/">Freeverb 3</a>, a complex GPL library, and a bunch of <a href="http://ccrma.stanford.edu/planetccrma/software/">packages from CCRMA</a>, but these are bloated things, not conducive to my needs for a variety of reasons. It took some digging to find the original, and when I did it was&nbsp;<a href="http://music.columbia.edu/pipermail/music-dsp/2001-October/045433.html">buried&nbsp;in a mailing-list archive</a> with the wrong file extension, so I thought I'd post it here to make it easier for anyone else.<br /><br /><a href="http://stuff.bjornroche.com/freeverb.zip">Original Public Domain Freeverb by Jezar</a>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0tag:blogger.com,1999:blog-7225698277211840079.post-19610795150543397952012-04-30T11:33:00.003-07:002012-05-05T09:29:04.088-07:00Audio Misconceptions around "Mastered for iTunes"Ars Technica, among others, has been <a href="http://arstechnica.com/apple/news/2012/04/does-mastered-for-itunes-matter-to-music-ars-puts-it-to-the-test.ars">talking about</a>&nbsp;Apple's new "Mastered for iTunes" product campaign. They talked to some real mastering engineers and got some real information about audio compression and how carefully tweaking the master before compression might make a difference to sound &nbsp;quality after compression.<br /><br />It's an interesting article and worth a read. Mostly, I think the conclusions are probably correct, although I think "Mastered for iTunes" fails to address the <a href="http://en.wikipedia.org/wiki/Loudness_war">real problem of poor audio quality</a> in most of the music we listen to today, which has absolutely nothing to do with the delivery format.<br /><br />Unfortunately, they also managed to let loose some audio myths. Here are some corrections:<br /><br /><b>Confusing</b><br /><blockquote class="tr_bq"><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px;">Using 16 bits for each sample allows a maximum dynamic range of 96dB. (It's even possible with modern signal processing to accurately record and playback as much as 120dB of dynamic range.) Since the most dynamic modern recording doesn't have a dynamic range beyond 60dB, 16-bit audio accurately captures the full dynamic range of nearly any audio source.</span></blockquote>This is basically correct, but it sure is confusing. If&nbsp;you want to learn more, you can read all the gory details about the process, called dithering&nbsp;<a href="http://www.digido.com/dither.html">at Bob Katz website.</a>&nbsp;(I am not sure where they got 60 dB from. That's HUGE even for orchestral music. If they are citing <a href="http://www.aes.org/e-lib/browse.cfm?elib=1209">this source</a>, they are confusing dB dynamic range with dB absolute volume. I am also not sure where the&nbsp;120dB figure comes from -- that seems like a very contrived&nbsp;laboratory&nbsp;condition.)<br /><br /><b>Reality vs Theory</b><br /><blockquote class="tr_bq"><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px;">The maximum frequency that can be captured in a digital recording is exactly one-half of the sampling rate. This fact of digital signal processing life is brought to us by the&nbsp;</span><a href="http://en.wikipedia.org/wiki/Nyquist-Shannon_sampling_theorem" style="background-color: white; color: #ffae00; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px; text-decoration: none;">Nyquist-Shannon sampling theorem</a><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px;">, and is an incontrovertible mathematical truth. Audio sampled at 44.1kHz can reproduce frequencies up to 22.05kHz. Audio sampled at 96kHz can reproduce frequencies up to 48kHz. And audio sampled at 192kHz—some studios are using equipment and software capable of such high rates—can reproduce frequencies up to 96kHz.</span></blockquote>Unfortunately, there's a big difference between "incontrovertible&nbsp;mathematical truth" and what can actually be implemented in hardware and software. In the real-world, we need to filter out all frequencies above the so-called Nyquist limit (one half the sample rate), or we get nasty artifacts called "aliasing". And, in the real-world, there is no filter that lets us keep everything below the limit and reject everything above the limit, so if we want this to work, we need a buffer between what we can hear and the Nyquist limit. That's why 44.1 kHz and not 40 kHz was chosen for CDs to reproduce up to 20 kHz audio. (<a href="http://en.wikipedia.org/wiki/Sinc_filter">Ideal filters</a> could be designed if we relaxed certain constraints, such as one known formally as "<a href="http://en.wikipedia.org/wiki/Causal_system">causality</a>", and if we had an infinite amount of data to work with.)<br /><br /><b>Typical Hearing</b><br /><blockquote class="tr_bq"><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px;">However, human ears have a typical frequency range of about 20Hz to 20kHz. This range varies from person to person—some people can hear frequencies as high as 24kHz—and the frequency response of our ears also diminishes with age. For the vast majority of listeners, a 44.1kHz sampling rate is capable of producing all frequencies that they can hear.</span></blockquote>Haha. Sure, maybe my 9-week old son can hear 24kHz, but I doubt it. The range of human hearing which is so often cited as 20Hz to 20kHz does vary from person to person (last time I checked, a few years ago, my hearing went up to about 17kHz), but the&nbsp;20Hz to 20kHz range is anything but typical. An <a href="http://www.amazon.com/Science-Sound-The-3rd-Edition/dp/0805385657">acoustics textbook</a> puts this more accurately: "a person who can hear the over the entire audible range of 20-20000 Hz is unusual." I would go further and say such a person is not living in the modern world, reading ars technica and buying pop or rock albums. Modern life and aging destroy the tiny hairs in our ears that are sensitive to those frequencies and that's all there is to it. Some people think they have better hearing because they are audiophiles. In fact, they may have superior hearing, but that has nothing to do with how well their ears work: exposure and critical listening improve our ability to hear. We&nbsp;exercise&nbsp;the appropriate parts of our brain and our hearing improves ("<a href="http://www.moultonlabs.com/full/product01">Golden Ears</a>" is an example of a product designed for just that purpose).<br /><br />Some people are reportedly sensitive to "supersonic" frequencies (it may give them headaches, for example). This is not the same as hearing.<br /><br /><b>Ultrasonics in Analog</b><br /><br /><blockquote><div style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px; margin-bottom: 1.308em;">Furthermore, attempting to force high-frequency, ultrasonic audio files through typical playback equipment actually results in&nbsp;<em>more</em>&nbsp;distortion, not less.</div><div style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px; margin-bottom: 1.308em;">"Neither audio transducers nor power amplifiers are free of distortion, and distortion tends to increase rapidly at the lowest and highest frequencies," according to Xiph Foundation founder Chris Montgomery, who created the Ogg Vorbis audio format. "If the same transducer reproduces ultrasonics along with audible content, any nonlinearity will shift some of the ultrasonic content down into the audible range as an uncontrolled spray of intermodulation distortion products covering the entire audible spectrum. Nonlinearity in a power amplifier will produce the same effect."</div></blockquote>Chris Mongomery is surely a&nbsp;genius, but I don't think he should be considered the authority on analog electronics. I think many analog engineers will tell a different story: when ultrasonics are pushed through most analog equipment it is steeply&nbsp;attenuated. It's phase might be altered, and it may produce some IM distortion, but at a very low level. For the most part, supersonics might as well not be there. On the other hand, it gives the benefit of allowing less stringent Nyquist filters, which reduces the amount of distortion in DAC. I think compelling arguments could be made either way, although I'm not a proponent of 96 kHz consumer formats. Even in the studio, well designed DSP mitigates the need for high sample-rates, though frequent ADA conversion may sound better at a high sample rate.<br /><br /><b>What Mastering is</b><br /><blockquote class="tr_bq"><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px;">When mastering engineers create a master file for CD reproduction, they downsample the 24/96 file submitted by the recording studio to 16/44.1. During this process, the mastering engineer typically adjusts levels, dynamic compression, and equalization to extract as much "good" audio from the source while eliminating as much "bad" audio, or noise, as possible.</span></blockquote><blockquote class="tr_bq">...</blockquote><blockquote class="tr_bq"><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px;">Filtering as much useful dynamic range from 24/96 studio files into 16/44.1 CD master files is, in a nutshell, the mastering process.</span></blockquote>This is a pretty poor representation of what mastering is, and it's sad that an article on mastering doesn't really bother to explain mastering. I've known top mastering engineers (even ones who have worked at masterdisk) who do all their work at 16/44.1. Many still prefer to work with analog as much as possible, where the bitdepth/samplerate doesn't mean much. All mastering engineers are all happy to deliver a wide variety of formats as the end product. Moreover, equating "bad" audio with noise, talking about level changes, dynamics and EQ as if it has something to do with "extraction" is all wrong, and none of that has anything to do with format. Fundamentally, mastering is about balancing levels, dynamics, and frequencies of a finished mix.<br /><br /><b>Huh?</b><br /><blockquote class="tr_bq"><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px;">...since iTunes Plus tracks are also 16/44.1, it seems logical to use the files created for CD mastering to make the compressed AAC files sold via iTunes.</span></blockquote>iTunes Plus tracks, if sourced from 24/96, never become 16/44.1. As you explain in the next paragraph, they go from 24/96 to float/44.1 to AAC/44.1. (They usually are played at 16/44.1, but with the volume control in between, so the effective bit depth is usually lower)<br /><br /><b>Null Test</b><br /><blockquote class="tr_bq"><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px;">Shepard performed what is known as a "null test" to prove his theory that specially mastering songs for iTunes to sound more like the CD version is "BS."</span><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 17px;">&nbsp;</span></blockquote>About the only thing a "null test" is good for is determining if two files are identical. It's sort of the audio engineer's&nbsp;equivalent&nbsp;of the "diff" command-line tool. The Ars Technica article quotes Scott Hull arguing against the null test on&nbsp;artistic and perceptual&nbsp;grounds: "...objective tests give us some guide, but they don't account for the fact that our hearing still has an emotional element. We hear emotionally, and you can't measure that." But there are also very sound technical reasons why the null test is simply inappropriate here. When comparing perceptual coding, or even basic eq or other effects, the null test becomes useless because the it is nothing more than subtracting two files sample by sample and seeing what's left. Unfortunately, one of the basic operations you can perform on audio is to shift it in time, which means that data no longer corresponds sample by sample. Minute shifts in time are the only way to achieve eq and other frequency domain changes ("Aha," you say, "but FIR filters don't shift in time," but actually they do, they just don't do so recursively). Most other effects, including most dynamics changes and perceptual coding, do drastic changes in time as well, (although it's possible to do these kinds of changes without time shifts), so anything that changes here more or less here is really apples to oranges (apples to televisions?).<br /><br /><b>More?</b><br /><br />Phew, that's enough for now. I think I got the big ones. Like I said the conclusions are mostly correct, even if the above is wrong, but the whole "Mastered for iTunes" thing does seem to miss the point. (Unless the point is marketing, in which case, cheers!)<br /><br />Updated 5/5/2012: fixed typo and included Scott Hull quote on null test along with some clarifications to that section.Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com5tag:blogger.com,1999:blog-7225698277211840079.post-60265372187802202812011-12-15T07:29:00.000-08:002011-12-15T07:29:14.353-08:00FCC calls for quieter commercials, but how?<a href="http://perezhilton.com/tag/calm_act/#.TuoQ7Zifu04">In</a> <a href="http://online.wsj.com/article/SB10001424052970203430404577092530932474076.html">the</a> <a href="http://www.cbsnews.com/8301-503544_162-57342486-503544/fcc-passes-rules-banning-extra-loud-commercials/">news</a> recently is the so-called "CALM Act" (<a href="http://en.wikipedia.org/wiki/Commercial_Advertisement_Loudness_Mitigation_Act">Commercial Advertisement Loudness Mitigation Act</a>), which will force TV and cable broadcasters (specifically, multichannel broadcasters) to make advertisements and content be the same volume.<br /><br />The problem of blaring commercials, the TV equivalent of the <a href="http://en.wikipedia.org/wiki/Loudness_wars">loudness wars</a>, have been going on for some time, but with newer technologies, including digital broadcasting, it has gotten worse. The fundamental issue is that advertisers want to be heard, so they want to be louder than their competition (the program material). However, it's not just a matter of submitting content with higher volume -- broadcasters, whether analog or digital, have limits to the maximum volume they transmit. Instead, they use recording tricks called compression and limiting to boost the average levels of their recordings while keeping the maximum just within limits. The result is a commercial that sounds louder that the program.<br /><br />While digital technology has made it possible to take this loudness to an extreme, digital distribution has also provided one part of the solution: each piece of program material can be pre-marked with loudness information using a standard called <a href="http://www.atsc.org/cms/index.php/standards/recommended-practices/185-a85-techniques-for-establishing-and-maintaining-audio-loudness-for-digital-television">A/85 rp</a> which is used by the consumer's television to determine playback volume.<br /><br />The trick is to accurately determine the loudness of the material, so that the A/85 tags can be correctly applied. As it turns out, this is no simple task. The ear is more sensitive to some frequencies than others, and you don't want to use simple averaging because then long periods of silence would allow commercials to get away with short loud segments that were disproportionately loud. To get around these issues, A/85 rp recommends the use of a well researched standard called&nbsp;ITU-R BS.1770 (which may be more familiar from the EBU metering and normalizing standard which uses it, EBU R 128). The ITU standard allows the measurement of loudness that very closely matches human perception of loudness, and offers recommendations for use in live, short and long form content.<br /><br />Will the system be gamed? Perhaps content creators will find some way to trick ITU measurement system to make their content appear less loud than it really is, but even if they do, it seems unlikely that they will be able to game the system anywhere near as well as they currently do.<br /><br />How will the FCC know if the system is working, and which broadcasters are using the system? They will rely on the public to call-in complaints. Of course, since this has, for years, been the number-one complaint they have received, I don't anticipate too much difficulty there.Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com1tag:blogger.com,1999:blog-7225698277211840079.post-10343077001884568912011-11-20T13:45:00.000-08:002011-11-20T13:45:43.790-08:00Let's Get Digital (Talk Slides)Here are slides from my recent talk at the <a href="http://www.meetup.com/composer/">NYC Composer's Meetup</a>. I started with a discussion of analog v digital processing, went on to discuss basic processing types (focusing on EQ, Reverb, and Compression) and gave some tips about usage. Finally, I explained how a simple mix might come together using these tools.<br /><br /><a href="http://stuff.bjornroche.com/lets-get-digital.pdf">Talk Slides</a><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-QGGOkBP6fKM/Tslz3qklmjI/AAAAAAAAACM/0Khr65lSwTA/s1600/let%2527s+get+digital.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="229" src="http://1.bp.blogspot.com/-QGGOkBP6fKM/Tslz3qklmjI/AAAAAAAAACM/0Khr65lSwTA/s320/let%2527s+get+digital.png" width="320" /></a></div>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0tag:blogger.com,1999:blog-7225698277211840079.post-7737980477275831622011-11-10T06:27:00.000-08:002012-07-22T19:26:13.362-07:00Slides from Fundamentals of Audio Programming<div class="separator" style="clear: both; text-align: left;">Slides from my talk on the&nbsp;<i>Fundamentals of Audio Programming</i>&nbsp;are available for&nbsp;<a href="http://stuff.bjornroche.com/fundamental-of-audio-programming-slides.pdf">download</a>. They include the full slides and, for better or worse, my notes. Enjoy!</div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-91NEIffFSQs/TrvfA30BM4I/AAAAAAAAACE/vbu_lPDjOOc/s1600/Preview+of+%25E2%2580%259Cslides%25E2%2580%259D.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="239" src="http://3.bp.blogspot.com/-91NEIffFSQs/TrvfA30BM4I/AAAAAAAAACE/vbu_lPDjOOc/s320/Preview+of+%25E2%2580%259Cslides%25E2%2580%259D.png" width="320" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: left;">Also, someone asked about file formats, I said I would get to it, and I never did. If you are that person, feel free to contact me or leave a comment here and I'll try to send you some pointers.</div>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0tag:blogger.com,1999:blog-7225698277211840079.post-74077962504459025802011-11-08T13:31:00.000-08:002011-11-08T13:35:01.601-08:00Annoying SoundsTo an audiologist or acoustician it is not news that the ear canal has a resonance in the range of human speech, resulting in extra sensitivity in that range. I even remember my acoustics teacher musing about which came first in human evolution, speech with lots of content roughly around the 1kHz range or ears with extra sensitivity in that range (presumably with the advantage that an eardrum embedded in the skull is more protected from the elements).<br /><br />To an audio engineer, the "midrange" frequencies are the aggressive frequencies. They're the ones you emphasize when your guitar or snare drum is wimpy, and the ones you take out if the mix is too harsh. Bob Katz, in his book <i>Mastering, the Art and the Science</i>, assigns the following negative subjective terms to describe excesses in this frequency range: boxy (400-900 Hz), nasal (700-1.2 kHz), harsh (2-10 kHz). <a href="http://www.amazon.com/Science-Sound-2nd-Thomas-Rossing/dp/0201157276">My 21 year old acoustics text book</a> states, "Noise with appreciable strength around 1000 to 2000 Hz is more disruptive than is low frequency noise."<br /><br />Because the ear has different sensitivities to different frequencies, different "weightings" have been developed which allow measurements of sound level which take frequency into account. Some municipalities even take these weightings into account for noise complaints. <a href="http://en.wikipedia.org/wiki/A-weighting#History_of_A-weighting">These weightings date back as far as 1936</a>.<br /><br />Recently, musicologists studied <a href="http://news.sciencemag.org/sciencenow/2011/10/cover-your-ears.html">what sounds are annoying</a>. Their findings reenforce this old tale:&nbsp;sounds in the range from 2-4 kHz can be offensive. Even quiet sounds, like the classic fingernails across the chalk-board can be explained by a dominance of sound at these frequencies. What is interesting about this research is that it shows, for the first time as far as I know, that sounds can be made less annoying simply because of context. For example, if the listener is told the source of the sound is fingernails on chalk, they are more likely to say they found it offensive, than if they are told it is a part of a composition. Of course, it's possible they are just being polite, since either way, they have the same biological reaction.Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0tag:blogger.com,1999:blog-7225698277211840079.post-38298411044125982292011-11-07T07:48:00.000-08:002011-11-07T07:48:12.929-08:00Audio Programming ClassI'll be teaching a class on the Fundamentals of Audio Programming. This is &nbsp;crash course (90 mins!) explains the basics of audio to software developers, with a focus on getting audio in and out of your computer.<br /><br />Signup and more info at&nbsp;<a href="http://audioprogramming.eventbrite.com/">eventbrite</a>.<br /><br />Excitingly, we also got picked up by <a href="http://www.sonicscoop.com/2011/11/04/event-alert-fundamentals-of-audio-programming-workshop-with-bjorn-roche-wed-119/">SonicScoop</a>!Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0tag:blogger.com,1999:blog-7225698277211840079.post-86016802933645421392010-10-26T14:02:00.000-07:002010-10-26T14:02:33.595-07:00Linear Interpolation for Audio in C, C++, Java, etc.Linear Interpolation in digital audio came up recently, so I'm posting it here. I hope it's useful for other folks.<br /><br />Technically, Linear interpolation is the act of fitting a line through existing points and computing new data from that line. This might sound complex, but it turns out to be pretty easy, and we can do it with a few lines of code.<br /><br />Visually, we can think about drawing a line between two points, and then being able to find the y value for any given x. However, I actually think it's easier to think of it non-graphically because linear interpolation is really just a kind of weighted average.<br /><br />For audio, we frequently want to use linear interpolation because it's easy to implement, computationally efficient, and it's "smooth" in some sense that I won't get into, but I will say that it generally does not create clicks and pops when you don't want it to. Linear interpolation is useful for handling fader changes and button-push "de-bouncing" and so on, and it's often great for simple cross-fades and the like.<br /><br />The formula for linear interpolation is derived from the formula for the line between two points. You can see <a href="http://en.wikipedia.org/wiki/Linear_interpolation">wikipedia</a> for the details. I am omitting it here and jumping straight&nbsp;to an example. To perform a linear interpolation of 100 samples where y[0] = 7, and y[100] = 20, our code would look something like this:<br /><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">double start = 7;</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">double end = 20;</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br /></span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">for( int i=0; i&lt;100; ++i ) {</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; double ratio = i/100.0;</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; y[i] = start * (1-ratio) + end * i;</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">}</span><br /><br />You can think of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ratio</span> as the weight given to the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">start</span> variable, and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">1-ratio</span>&nbsp;as the weight given to the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">end</span> variable. As we slide through the samples, we slowly transition from the start value to the end value.<br /><br />Notice I've been very careful to make sure <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">y[0]</span> is actually 7, and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">y[99]</span> is not quite 20, so that <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">y[100]</span> will smoothly transition to 20 as required. Off-by-one errors can screw this up and while you might not hear the difference, you want to get that right or you could end up with pops, weird overs, or other subtle problems.<br /><br />Now you might say that the above code is not very efficient. You can improve on it somewhat using the code below, but be aware that if you interpolating over a large number of samples, especially if you are using single precision floats, you might not quite end up where you expect. The performance gain for this more complex code is likely to be minimal on modern computer hardware, but may be substantial on DSP hardware, where operations like floating point adds take much less time than floating point divides. A clever compiler could theoretically make the same object code out of these two code snippets if it can determine that precision won't be an issue.<br /><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">double start = 7;</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">double end = 20;</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">int length = 100;</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br /></span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">double interval = ( end - start ) / length;</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">y[0] = start;</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br /></span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">for( int i=1; i<length; )="" ++i=""></length;></span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; y[i] = y[i-1] + interval ;</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">By the time we get to the end, &nbsp;</span><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">y[length-1]</span><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"> should be&nbsp;</span><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">( end - start ) / length * length =&nbsp;</span><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">( end - start ) </span><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">larger than the </span><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">start</span><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">, which is exactly right.</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">That's all there is to Linear Interpolation, so let's go to an audio example:&nbsp;</span><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Say we want to go from off, or mutted to on, or unmutted, without a click. Instead of 7 and 20, we'd use 0.0 for off, and 1.0 for on. Also, instead of setting the values in the array, we are going to be multiplying the values in the array, because that's how we do gain changes. Now, let's say we don't know what a good length of time is for unmutting, so lets just make that a variable. Below is a function that takes an array of mono samples, transitions them from off to on at a given time with the given interval. I haven't tested this exact code, but it should be good enough for illustrative purposes:</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">void unmute( float data[],</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int totalSamples, //how many samples in our array</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int startUnmute, &nbsp;//when do we start unmutting?</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int transitionLength ) &nbsp;//how long is our transition?</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">{</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; //basic sanity check:</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; if( startUnmute + transition &gt; totalSamples )</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; &nbsp; &nbsp;exit( -1 ); //or throw an exception if this were java</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; //process the muted samples:</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; for( int i=0; i<startunmute; )="" ++i=""></startunmute;></span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; &nbsp; &nbsp;data[i] = 0; //effectively multiplied by zero</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br /></span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; //process the transition samples.</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; // this is where the linear interpolation</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; // happens. We are interpolating between 0 and 1,</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; // and multiplying the samples by that value:</span><br /><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp;&nbsp;</span><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">for( int i=0; i<transitionlength; )="" ++i="" {=""></transitionlength;></span><br /><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; &nbsp; &nbsp;double ratio = i/(double)transitionLength;</span></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; &nbsp; &nbsp;data[i+startUnmute] *= ratio; //multiply by the ratio, which is transitioning from 0 to 1</span></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; }</span></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; // the rest of the samples don't need to be processed:</span></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">&nbsp;&nbsp; // &nbsp;they are effectively multiplied by 1 already.</span></div><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">}</span>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com2tag:blogger.com,1999:blog-7225698277211840079.post-32624456620112180712010-07-15T09:10:00.000-07:002010-07-15T09:23:22.328-07:00I signed a deal with the devil and now he wants my soul! Tales of Record Industry Woe...Yesterday I visited <a href="http://lizphair.com/">Liz Phair's web site</a> hoping to hear this new song where she raps. Instead I was subjected to 90's quality web-design and a short note about how rebellious her new songs are because they cost her her management and record label. "You weren't supposed to hear them," it read. It reminded me of one of those conspiracy theory web sites.<br /><br />Today I read <a href="http://www.techdirt.com/articles/20100712/23482610186.shtml">an article about business practices at major label</a> (Mis-titled "RIAA Accounting" as if RIAA does the accounting at major labels).<br /><br />Every few weeks or so, I hear a story about how evil record labels are for not paying musicians or not understanding them, or something along those lines. Search the internet and you will find lots of stories about how record labels do the accounting in order to avoid paying bands what they rightfully "deserve" or would have made under different conditions, like with a smaller label, or if they had done it themselves, or if the label had done the accounting "fairly" or something like that.<br /><br />Well, most of it's true, more or less. Record labels are out to make money. Plain and simple. The record business is, er, a business (newsflash!). What's most surprising to me about this is how surprising it seems to be to so many people.<br /><br /><div class="separator" style="clear: both; float: right; text-align: center; width: 250px;"><img border="0" src="http://1.bp.blogspot.com/_-pZI7Bl38cw/TD8xu_tMxFI/AAAAAAAAABs/fM-PoyLa8w0/s320/robot-devil.png" title="robot devil" /><br />Let me see if I understand correctly. You say a deal with the <em>devil</em> didn't work out the way you expected?<br /></div><br />Musicians have many goals, one of which, like the labels, may be to make money. Many of the goals of musicians are at odds with the goals of the labels. When it comes to signing the contract, though, the labels have the upper hand: musicians are <i>desperate</i> to get signed, and the labels have been doing this for years and they know how to make bands sign a contract that works better for them than for the band.<br /><br />But let's get something straight:&nbsp;Investing in bands is hard. Seriously, you may think you can pick the next hit band when you hear them at the bar, but there's a lot more to it than writing a catchy song or two. Can the band deal with management? Can they work in the studio? How do&nbsp;test audiences respond to them&nbsp;(this is more important than how real audiences respond to them)? Have they sought legal council? There's a million questions many of which have nothing to do with the quality of their music and it's still a crap-shoot.<br /><br />And yet, record labels are doing it: they are investing in bands. In order for it to work at all, some really ugly stuff needs to happen. Crappy pop bands need to get signed. Lousy deals need to be made, and, here comes the horrible truth you don't want to hear, most bands fail by <b>any</b> measure. In my opinion, major labels could probably make some more money off smaller artists if they invested more, but instead they focus on the artists they think are going to be huge, because they are big organizations structured for big payoffs. The mid-sized or potentially mid-sized artist is best served by signing to a smaller label or hiring a private publicist, and either paying out of pocket, or getting money for that in record contract. Of course, no one thinks about being a mid-sized artist when they sign a record contract.<br /><br />It's not pretty, but that's music industry sausage. But, contrary to the complaining, the label doesn't want the band it's invested in to starve, either: a savvy band with good management has plenty of opportunities besides record sales to make money, and the record label has no interest in cutting into that. Usually, these sorts of proceeds are not included in these "record labels are evil" calculations, but that's probably fair, because the point of those calculations is usually to show that labels aren't paying musicians, which, by and large, they aren't. A savvy band should realize that album sales are promotion for they other items, and the label knows that other items such as touring (which, as I understand it, is increasingly becoming part of the label income, too) are promotion for the records.<br /><br /><br />A smart band will negotiate their record contract rather than just signing it. As I mentioned earlier, it's still pretty one-side, but small things can be huge wins for the band. To use an old example I happen to remember, Primus kept the rights to their demos and made a lot of money off the sales of them when their album sold well. The fact is, contracts wouldn't suck so bad for bands if bands didn't want to be signed so badly. But bands are desperate. They want contracts, so they sign them. Bands can and sometimes do walk away from the table, but that's rare, and the record contract is the only deal on the planet where someone is going to <b>give you significant amounts of money</b> to record and play your music. Plus, it's cool to bitch about record labels being "evil" after you are signed. Like you had nothing to do with it. "I made a deal with the devil and now he wants my soul! Can you believe how evil he is?"<br /><br />Maybe that's unfair of me, but it's worth emphasizing the flip-side of the argument, when the internet is full of "record labels are evil and screw artists over" talk. I just read one article where the author compares a record deal to a loan. It's not a loan. Loans come with a promissory note, which is a promise to pay the money back. Banks only give you a loan when they think they are going to get the money back, and if they can't get you to pay, they take your collateral, which is usually something like your house. The record label knows that there's a good chance that they are not going to get the money back. Would a bank give a band a loan to make a CD? Maybe before the housing market collapse, but even then you'd need some collateral. Like the afore mentioned house. But the label isn't taking collateral. While there are many crappy things about all this, we have to appreciate that one fact. It's kinda magical.<br /><br />So if it's not a loan, what is it? It's an investment contract, like venture capital. And as with venture capital, when the original investment goes big, the venture capitalist, ie the record label, gets a big cut. They also want a fair bit of control, ownership, and so on. And yea, that sucks. Talk to anybody who's started a small company with venture capital about how their investors "don't get it". The best position to be in is to have a good contract from the get-go, and to do that you need as much negotiating power as you can get. That might mean turning down the first record contract that comes along, or spending some cash on a real shark to read and help you understand the contract, or maybe you need shop your demos to the labels who are a better match for your needs, rather than just every label in town. At the end of the day it means research, hard work and compromises.<br /><br />But if that still seems unfair -- and I'm not arguing it's totally fair -- I ask again where else are you going to get money to make music and tour with your band? The fact is that labels are not signing the contract to be nice to you because they think your music is awesome and they believe in awesome music. They are signing you because they want to make money. As much money as possible. I wonder why bands forget they are making a business deal when sign these contracts and that the other party wants something, too. It's not like it's free money: it's an investment in them as a business, and if they can't think of it that way, they shouldn't sign the contract.Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com3tag:blogger.com,1999:blog-7225698277211840079.post-34185874715274869712010-04-09T08:17:00.000-07:002010-04-09T08:17:43.688-07:00Comments on ConversionsIt didn't surprise me how impassioned a response I got to my <a href="http://blog.bjornroche.com/2009/12/linearity-and-dynamic-range-in-int.html">post about converting audio data from integer to floating point</a>. I posted a link to it on one mailing list and got some heated responses. Despite the extremely geeky nature of it, the fact is that I've seen this discussion on mailing lists before and it always seems to turn into a flame war. People put a lot of thought into implementing the simple conversion of audio from float to int and back and no matter what choice they make, they are invariably criticized for it, so it's only natural to be on the defensive.<br /><br />While I contest that my post represents more thought and analysis (and better thought and analysis) than is available anywhere else publicly (certainly than I know of), I did not intend for it to be the be-all or end-all to the discussion, even if I implied otherwise. Some of the criticisms I received bordered on the absurd (it's true that my <i>blog entry</i> is not peer reviewed), while other criticisms were face-valid, but irrelevant (whether one solution is more pleasing mathematically is irrelevant if it is going to produce worse sounding results). However, digging though the criticisms it's apparent that some things from my analysis can be improved.<br /><br />To that end, I'm going to use this entry to accumulate comments and thoughts that need to be made on the subject as they come up and/or need to be made. So this is a living blog post that will be updated and revised from time to time.<br /><br /><span class="Apple-style-span" style="font-size: x-large;">April 9th 2010</span><br /><br />- I claimed that looking at the no DSP case was a "best case" situation, and that any DSP would only make whatever distortion occurred worse. Therefore, I argued, this was the only case that needed to be considered. Not everyone agrees with this, but it's also hard to generalize DSP. I might be worth analyzing some simple DSP like volume ramping.<br /><br />- I contrasted the distortion produced by using the wrong conversion method to the distortion created by not using dither. However, error produced from truncation is most objectionable with low-level signals while only high-level signals were tested, so this is not a fair comparison.<br /><br />- It would be worthwhile to test conversions from 24-bit to 16-bit of several different audio source types to determine if the harmonic distortion of the (2^n) model is relevant in that case.Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0tag:blogger.com,1999:blog-7225698277211840079.post-80236536071639759702010-03-26T14:26:00.000-07:002010-03-26T14:26:36.293-07:00Java (No)FX - why one project dropped JavaFX for JavaThere's a lot of FUD out there. For some reason, it seems like Java and JavaFX take a hard hit. I've heard nonsense like "no one uses Java anymore," and other such stuff. Unfortunately, some of the JavaFX FUD might be true: I recently completed a project which was little more than replacing a buggy, slow JavaFX UI for a Java applet with a fast, pure Java UI of the same applet. Thank God we did, because the applet is way better now.<br /><br />The JavaFX portion was, at one point, the largest JavaFX codebase in existance, and, indeed, the first serious large JavaFX project not funded by Sun. And the company I was working with dropped JavaFX for plain old Java.<br /><br />Although I can't say what the project was, I can say that this is a project that most companies would have tried to use Flash for, but, in the end, Java provided features that competitors using Flash simply can not offer. As far as I know, this company is now the only company with these features in a browser because they went with a Java foundation.<br /><br /><br />Now keep in mind that a large part of the problem, here, is that this was really the first major JavaFX project, so it was bound to have difficulties. We had support from Sun, but I think Sun did not realize how far they had to go and how many bugs they still had in their runtime. In my opinion, nobody, neither us nor Sun, was aware of the problems of JavaFX. Despite it's version number (1.1 when we started and 1.2 by the time we abandoned), we found it to be buggy. Had the version number been 0.7, I would right now be saying JavaFX is the coolest thing ever, but the truth is that it's still a nascent technology that has yet to prove itself.<br /><br />Lack of Competent Developers<br /><br />Symptomatic of a new technology, we had a hard time finding quality developers. Sun gave us some leads but these developers were simply not up to the task of building our complex app. JavaFX is not a difficult programming language, so having someone in-house learn it would have been preferable, but at the time we did not have the resources.<br /><br />Performance<br /><br />The performance of the JavaFX portion of our app was extremely poor. The extremely performance critical portions of the app were written in Java and performed well, but almost all the graphics were written in JavaFX. Basic graphics like buttons were okay, but complex drawing was very slow. I don't know for sure if this was a result of poor design or JavaFX itself, but from the code I perused it looked like a little of both. We know JavaFX was at least partly to blame because we had performance issues even when we commented out the complex calculations.<br /><br />Unstable API<br /><br />Transitioning from JavaFX 1.1 to JavaFX 1.2 turned out to be very difficult because of backwards compatibility issues. Unfortunately, at a certain point in our project, our app was performing so badly with even rudimentary graphics tasks that Sun and our JavaFX developers insisted on upgrading.<br /><br />Lack of Quality Control<br /><br />At one point, after we released a baseline product with minimal features, Sun released a new upgrade to JavaFX. Unfortunately, this meant that JavaFX libraries would be updated on all machines that ran our app. Since there was a bug in the library, our app stopped working.<br /><br />Poor Developer Communication<br /><br />In an effort to work around the bug, we asked Sun for the developer version of the new library. Unfortunately, they were not forthcoming. At that point, the decision was made to stop development on the JavaFX solution and seek an alternative. After I convinced the team that Swing was capable of looking great, we developed a pure-Java alternative and have released the new product. The graphics performance is orders of magnitude better, and the look is similar to Flash. People who have seen it so far have gone out of their way to complement the appearance of the app. The appearance matches the design spec almost perfectly except for a few things we have not yet had a chance to attend to. (I am belaboring this point because Swing has a reputation for being ugly, simply because most of the included Look and Feels are ugly)<br /><br />Future Seems to Be Too Business Oriented<br /><br />When asking Sun about the future of JavaFX, they said they plan to offer certification levels. It's bad enough that Java skills are based on memorization rather than problem solving abilities now, but I can sort of understand that given the business orientation of Java, and&nbsp;maybe businesses look for that sort of thing.<br /><br />If hip kids don't learn Java, that's okay. There's more than enough Java programmers to keep Amazon, ebay, oracle and all those other Java giants going.&nbsp;But JavaFX is not like that. JavaFX is a creative tool. It's not a direct competitor to flash, but it's in the same vein, and those people aren't going to take a certification exam. Sun needs to think about what is going to make people think JavaFX is awesome, and certification is not it. Neither is having Gosling slinging T-Shirts at them. JavaFX really really has the potential to be awesome, and I really really want it to be, so here's what I think you need to do (Sun, I'm talking to you):<br /><br /><ul><li>Show people that you are serious about making it awesome. Hire some awesome programmers and have them blog and tweet or whatever the cool kids are doing these days about what they are doing. Really. Hire the best. Hire some young folks. Hire some experienced old guns. Mix and match. Make sure you let them be honest on those blogs.</li><li>Talk to some cool startups that do mobile stuff like venmo about what they are doing in the mobile sphere and work with them proactively.</li><li>Suck it up: work with the android people. Sure they stabbed you in the back, but you need them.</li><li>Help out Apple: really, their JVM sucks. It seems like I am filing bug reports every week. Make sure they get it right because cool geeks use Apple.</li><li>While you're at it, make sure your JavaFX staff has Macs and other cool toys. Remember, you need to treat those folks as creatives, not code monkeys.</li></ul>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com3tag:blogger.com,1999:blog-7225698277211840079.post-23536096786419310112009-12-09T10:59:00.000-08:002010-04-09T08:19:18.636-07:00Linearity and dynamic range in Int->Float->Int<i>Update: </i><a href="http://blog.bjornroche.com/2010/04/comments-on-conversions.html"><i>some comments</i></a><i>.</i><br /><br />In my <a href="http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html">last blog post</a>, I discussed converting audio from integer to floating point back to integer, mostly from a programming perspective. I showed how there are a lot of ways to do the conversion. Most audio folks would say, "huh, I thought there were only two ways to convert floating point numbers to integers." And they'd be right: with and without dither. So what's all the fuss about?<br /><br />Indeed, that's a good question. Most audio folks have this expectation:<br /><ol><li>When I have dither off and no effects (including volume, etc) I expect to be able to get out exactly what I put in.</li><li>When I have dither on, I expect it to sound good.</li></ol>Point 1 is what we referred to as bit transparency in the previous post, and we found lots of ways to do that. Point 2 is a bit more subtle. How do you make something sound good? In this case, we mean transparent, and what's especially critical is that we eliminate truncation and IM distortion which are the hallmarks of <a href="http://www.digido.com/more-bits-please.html">cold, harsh digital audio</a>.<br /><div class="" style="clear: both; float: right; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em; text-align: left; width: 320px;"><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/_-pZI7Bl38cw/Sx_qDM_VnsI/AAAAAAAAABA/ZYN_DBDChlo/s1600-h/matched-v-unmatched.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/_-pZI7Bl38cw/Sx_qDM_VnsI/AAAAAAAAABA/ZYN_DBDChlo/s320/matched-v-unmatched.png" /></a></div><small>Figure 1. Comparison of 16-bit conversion using the same scaling factor (matched) vs. different scaling factors (mismatched). Mismatched scaling factors come from Method 3 from previous post and matched are Method 2.</small></div><br />What we need when it comes to transparency and avoiding that cold harsh sound is linearity. In this regard, the methods discussed in my last post, transparent or not, don't stack up equally. You might think you could judge them by inspection, but the mathematics are a bit more complex. Let's be clear about what we need to test: what we <i>don't</i> care about is how accurately a given conversion method responds to a DC signal: we aren't measuring the temperature or the amount of fuel in a tank. Rather, when we talk about linearity in audio we are referring to the ability to accurately translate dynamic information. Think about it: when you buy an analog-to-digital converter, you aren't concerned about its ability to accurately measure a certain input voltage, are you? No, you care about it's frequency response and dynamic range. In the same way, we must ensure maximum signal-to-noise ratio and dynamic range in our conversions. It turns out not all the conversions from my last post have good dynamic performance.<br /><br /><span style="font-size: x-large;">Tests</span><br /><br />It is sometimes claimed that the percent error introduced by "mismatched" conversion (ie Method 3 from the previous post) is small, and therefore of little concern, but percent error is not what matters in a dynamic system such as audio, so we will not concern ourselves with that and investigate the dynamic performance instead. In Figure 1 we show the results of "mismatched" conversion.&nbsp;In this case we are converting from a source signal of 2 sine waves in double precision to 16-bit integer (to simulate A/D conversion), then to single-precision floating point and back to 16-integer (to simulate a standard editing workflow), and finally back to double precision (to simulate D/A conversion). This is more or less the minimum error we can expect with the mismatched method if we use audio editing software but do not use DSP, and therefore represents a best-case scenario. In the dynamic analysis, it becomes clear that using different scaling factors produces more noise whether dither is used or not. In fact, the difference made by dither is dwarfed by the difference in techniques. Just as importantly, the quality of the noise is bad: rather than shifting the noise floor up, we see spikes indicating that the noise is likely to be audible even at low levels. These results also suggests that it is important to use the same scaling factors throughout the processing chain.<br /><br /><div class="" style="clear: both; float: right; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em; text-align: left; width: 400px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/_-pZI7Bl38cw/SxgUOMM4ufI/AAAAAAAAAAw/tH7GqBH7BFw/s1600-h/figure.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/_-pZI7Bl38cw/SxgUOMM4ufI/AAAAAAAAAAw/tH7GqBH7BFw/s400/figure.png" /></a></div><small>Figure 2. Quantization and dithering from float to int and back to float is tested at 16 bits (a,b) and 24 bits (c,d) using a full-scale sine (a,c) and the sum of two sine (b,c). Notes: the sum of two sines does not clip; clipped signal and raw quantized signal are not shown in a.</small></div></div>Figure 2. shows the dynamic performance of conversion using 2^n, (2^n)-1 and "asymmetrical" conversion (ie Method 4 from my previous post). We will discuss below that "asymmetrical" is &nbsp;a misnomer. We also looked at dithered and non-dithered versions.<br /><br />Two types of tests were run: first, a full-scale sine wave was generated, converted to int, and back to float for FFT analysis. The second test was the same except that two sines, each at 1/2 full scale were summed together. Each test was run at 16 and 24 bits. Note that the full-scale sine wave cannot be accurately represented in some of these conversion methods, resulting in some clipping.<br /><br />As you can see, all dithered converters performed fine at 16-bit as long as nothing was out of scale. At 24-bit, the weakness of the (2^n)-1 converter becomes clear: it actually performs worse than rounding (ie. no dithering). Clearly (2^n)-1 is not an acceptable transformation for 24-bit integers and single precision floating point numbers. The 2^n converter performed admirably on all tests except the 16-bit full-scale test (1a). Those small spikes line up perfectly with the spikes caused by clipping as expected (results not shown) meaning that it is harmonic distortion -- not the worst thing that could happen, but, still, the asymmetric converter does outperform it in this regard.<br /><br />As mentioned, I'm calling Method 4 from my previous post the "asymmetric" method, but it is only asymmetric in the sense that you apply different math to positive and negative numbers. As these results show, it <i>is</i> linear. Moreover, it is symmetric with respect to dither amplitude, which is what ensures its linear behavior.<br /><br /><span style="font-size: x-large;">Conclusions</span><br /><br />Clearly the two winners here are the so-called asymmetric method and the (2^n) method. Both methods excel in the critical areas of bit transparency and linearity. Even their un-dithered performance is quite good, and they are obviously superior to other methods.<br /><br />The one area in which the asymmetric model outperforms the (2^n) model is in terms of clipping signals that originated from higher resolution. Even with dither, we still see incorrect behavior with the the (2^n) model because dither only finds its way to 1/2 LSB, whereas +1 clips by going 1 LSB over. The question is whether or not this matters. Indeed&nbsp;<a href="http://lists.apple.com/archives/coreaudio-api/2009/Dec/msg00046.html">there is some debate about the importance of +1</a>.&nbsp;My opinion? +1 is a value that occurs in the real world and it's not always possible for the code that's producing the +1 to know what the output resolution is going to be. For example, a VST synth plugin has no way of knowing what the output resolution is going to be, so it can't be expected to know what to scale its output to. When converting from 24 bit to 16 bit and using float as an intermediary, there is no simple way to solve this problem.<br /><br />On the other hand, non-pro A/D converters frequently clip around -.5 dBFS, which is below +1 - 1 LSB anyway. Conceivably, you could also correct for this by introducing a level shift at the output equal to 1/2 LSB, but that's equivalent to turning your converter into a (2^n)-.5 converter -- it solves one problem, but introduces another. All that said, there is no reason not to develop software, especially libraries, drivers and other software intended for use by multiple type of users including audiophiles and pro audio engineers, that is convenient to use while meeting the highest audio standards: just use the asymmetric converters.<br /><br />Given the potential hazards found in mixing and matching conversion methods, I recommend that all libraries (and drivers, if possible) offer options for various conversion settings, both to minimize bit transparency problems and unnecessary quantization noise, until all libraries and drivers can standardize on the asymmetric conversion method. This is the only way to guarantee transparency and maximize linearity. As these results show, this issue may be more important than dither.Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0tag:blogger.com,1999:blog-7225698277211840079.post-30545219449577128342009-12-02T13:51:00.000-08:002009-12-06T20:07:46.486-08:00Int->Float->Int: It's a jungle out there!It turns out that the simple operation of converting from float to integer and back is not so simple. When it comes to audio, this operation should be done with care, and most programmers do, in fact, put a lot of thought into it. The problem most programmers observe is that audio, when stored (or processed) as an integer, is usually stored in what's called "two's complement" notation, which always gives us 1 more negative number than positive. When we process or store floating point numbers, we use a nominal range of -1 to +1.<br /><br />The fact that there are more negative numbers than positive numbers has caused some confusion amongst programers, and a number of different conversion methods have been proposed.&nbsp;Here is my survey of how a number of existing software and hardware packages handle this conversion.&nbsp;In these examples, I show conversions for 16-bit integers, but they all extend in the obvious way to other bit depths. It is important to consider how these methods extend to larger integers, especially how they extend to 24-bit integers, so I've tested bit transparency for these methods up to 24-bit using single precision floating point intermediaries, correcting for the fact that IEEE allows for extended precisions to be used in computations. Endianness is irrelevant here, because everything works for big and little endian systems.<br /><br />Transparency is only required or possible when the data has not been created synthetically or altered via DSP (including such simple operations as volume changes, mixing, etc). In cases where transparency is not possible, dither must be applied when converting to integer or reducing the resolution. In many software packages it is up to the end-user to make this determination and manually switch dither on or off. In my next post I will discuss dithering and linearity.<br /><br /><table border="0" bordercolor="#000000" cellpadding="2" cellspacing="0" class="" style="width: 100%;"><tbody><tr><th style="background-color: #104386; color: white; text-align: left;"><br /></th><th style="background-color: #104386; color: white; text-align: left;" width="25%">Int to Float<br /></th><th style="background-color: #104386; color: white; text-align: left;" width="25%">Float to Int*<br /></th><th style="background-color: #104386; color: white; text-align: left;" width="25%">Transparency<br /></th><th style="background-color: #104386; color: white; text-align: left;" width="25%">Used By<br /></th></tr><tr><td>0)<br /></td><td width="25%">((integer + .5)/(0x7FFF+.5)<br /></td><td width="25%">float*(0x7FFF+.5)-.5<br /></td><td width="25%">Up to at least 24-bit<br /></td><td width="25%"><span style="font-size: small;">DC DAC Modeled</span><br /></td></tr><tr><td>1)<br /></td><td width="25%">(integer / 0x8000)<br /></td><td width="25%">float * 0x8000<br /></td><td width="25%">Up to at least 24-bit<br /></td><td width="25%"><span style="font-size: small;">Apple (Core Audio)</span><sup><span style="font-size: small;">1</span></sup><span style="font-size: small;">, ALSA</span><sup><span style="font-size: small;">2</span></sup><span style="font-size: small;">,</span><span style="font-size: small;">&nbsp;MatLa</span><span style="font-size: small;">b</span><sup><span style="font-size: small;">2</span></sup><span style="font-size: small;">, sndlib</span><sup><span style="font-size: small;">2</span></sup><br /></td></tr><tr><td>2)<br /></td><td width="25%">(integer / 0x7FFF)<br /></td><td width="25%">float * 0x7FFF<br /></td><td width="25%">Up to at least 24-bit<br /></td><td width="25%"><span style="font-size: small;">Pulse Audio</span><sup><span style="font-size: small;">2</span></sup><br /></td></tr><tr><td>3)<br /></td><td width="25%">(integer / 0x8000)<br /></td><td width="25%">float * 0x7FFF<br /></td><td width="25%">Non-transparent<br /></td><td width="25%"><span style="font-size: small;">PortAudio</span><sup><span style="font-size: small;">1,</span></sup><sup><span style="font-size: small;">2</span></sup><span style="font-size: small;">, Jack</span><sup><span style="font-size: small;">2</span></sup><span style="font-size: small;">, libsndfile</span><sup><span style="font-size: small;">1,3</span></sup><br /></td></tr><tr><td>4)<br /></td><td width="25%">(integer&gt;0?integer/0x7FFF:integer/0x8000)<br /></td><td width="25%">float&gt;0?float*0x7FFF:float*0x8000<br /></td><td width="25%">Up to at least 24-bit<br /></td><td width="25%"><span style="font-size: small;">At least one high end DSP and A/D/A manufacturer.</span><sup><span style="font-size: small;">2,4</span></sup><span style="font-size: small;"> XO Wave 1.0.3.</span><br /></td></tr><tr><td>5)<br /></td><td width="25%">Uknown<br /></td><td width="25%">float*(0x7FFF+.49999)<br /></td><td width="25%">Unknown<br /></td><td width="25%"><span style="font-size: small;">ASIO</span><sup><span style="font-size: small;">2</span></sup><br /></td></tr><tr><td colspan="5" style="background-color: #104386; color: white; text-align: left;" width="100%"><small>*obviously, rounding or dithering may be required here.</small><br /><span style="font-size: small;"><span style="font-size: 13px;">Note that in the case of IO APIs, drivers are often responsible for conversions. The conversions listed here are provided by the API.</span></span><br /></td></tr></tbody></table><br />Method 0 is one possible method for preserving the DC accuracy of a DAC, and is included here for reference.<br /><br />Edited December 6, 2009: Fixed Method 3. (0x8000 and 0x7FFF were backwards)<br /><br /><span style="font-size: x-small;">Sources:</span><br /><sup><span style="font-size: x-small;">1</span></sup><span style="font-size: x-small;"> Mailing list</span><br /><sup><span style="font-size: x-small;">2</span></sup><span style="font-size: x-small;"> Perusing the source code (this, of course, is subject to mistakes due to following old, conditional or optional code)</span><br /><sup><span style="font-size: x-small;">3</span></sup><span style="font-size: x-small;"> libsndfile FAQ goes into detail about this.</span><br /><sup><span style="font-size: x-small;">4</span></sup><span style="font-size: x-small;"> Personal communication.</span>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com4tag:blogger.com,1999:blog-7225698277211840079.post-67202494704884617452009-11-11T10:46:00.000-08:002009-11-11T18:53:38.778-08:00WAVE64 vs RF64 vs CAFRight now I am choosing new a default internal audio file format for XO Wave, and I'd like to choose a format that offers large file sizes and high resolution. I'd like to use an existing popular standard rather than inventing my own or using RAW audio.&nbsp;The pro audio industry is finally moving towards 64-bit file formats, and the three options supported by most pro software are<br /><br /><ul><li>Wave64, aka Sony Wave64, originally developed by Sonic Foundry before 2003, is an open standard and a true 64-bit format: all 32-bit fields are replaced with 64-bit fields, and all chunks are 8-byte word aligned. Instead of the dreaded FourCC it uses GUID. Other than that, it is pretty much the same as WAV, so the spec is barely 4 pages long, although in my opinion it could stand to be a bit longer, as many aspects of WAV are so poorly devised it really wouldn't hurt for someone to put it all in one place.&nbsp;<a href="http://www.hydrogenaudio.org/forums/lofiversion/index.php/t35550.html%3Cbr%20/t32422-50.html">Some people have criticized</a> the use of GUID on the grounds that there will never be that many chunks, but this misses the point: the point of using GUIDs is that anyone can define their own chunk without having to check with Sony or register a chunk ID. It's actually rather clever.</li><li>RF64 was proposed in 2005 by the EBU&nbsp;<a href="http://www.sr.se/utveckling/tu/bwf/prog/RF_64v1_4.pdf">with full knowledge of Wave64</a>. Although the proposal stated basic requirements that could have easily been met by a few minor extensions to Wave64, and they stated a desire to "join forces" with the developers of Wave64, they made no effort to do so other than to say they hoped they'd be involved. Moreover, the same document proposes RF64 as an alternative, incompatible 64-bit extension to the WAV format. Unlike Wave64, RF64 is not a true 64-bit format. All existing "chunks" remain 32-bit, so, for example, markers, regions and loops will no longer work past a certain number of samples. Even EBU's levl chunk will not work with RF64 because it uses a 32-bit address for pointing to the "peak-of-peaks" in the raw data. RF64 offers the much made-of promise of backwards compatibility via a "junk chunk", but, of course, this is possible with Wave64 as well, as pointed out in the Wave64 spec.</li><li>CAF, or Core Audio Format was Apple's entry into the ring. Apple didn't want to be left out of the 64-bit game, after all, and around the same time in 2005 they released CAF.&nbsp;Since they are Apple, they figured people would adopt it (Logic would, if no one else), even if there were competing specs.&nbsp;Their approach, however, was to start from scratch, and it's pretty refreshing. Indeed, the spec addresses practical issues to ensure that important features are implemented, and it even makes that tiny little bit of extra effort required to avoid file corruption by not requiring a header rewrite to finalize a recording of unknown length (Anyone who's ever recorded using software knows that once in a while something goes wrong and a file ends up corrupted. It's so nice that someone finally addressed this in a spec.).</li></ul>The WAVE format is problematic in many, many ways. For example, in some places it uses zero-based indexing, in others it uses one-based indexing. Sometimes it uses signed integers for raw audio data, other times unsigned. That may not seem so bad, but considering how simple the data it's trying to carry is, but when you add to that the fact that Microsoft had to use format extensions just to clear up ambiguous documentation (and they've still got an ambiguously documented "fact" chunk), it's really not good territory. It is a shame that both Sonic Foundry/Sony and the EBU chose WAVE as the format to extend. Moreover, it's annoying that EBU designed their own, incompatible 64-bit extension to WAVE when a superior one already existed.<br /><div><br /></div><div>Some people think the whole "backwards compatibility" thing is a bunch of hooey because it puts an undo burden on the people writing the libraries. Erik de Castro Lopo, author of the popular LGPL'ed libsoundfile says:<br /></div><div><br /></div><div><blockquote>Quite honestly, its stuff like this that makes me think the people who&nbsp;write these specs smoke crack!<br /></blockquote><blockquote>If I were to follow the ... insane advice [about retaining backwards compatibility], the test suite would have&nbsp;to write &gt; 4Gig files in order to write a real RF64 file instead of&nbsp;just a normal WAV file.<br /></blockquote><blockquote>In order to avoid this insanity, libsndfile, when told to write an RF64&nbsp;file does exactly as its told.<br /></blockquote><div>I would add that the backwards compatibility adds another point of failure in the recording process, in the same way that header rewrites are a point of failure in most current formats (except for CAF and "chunkless" formats like RAW and AU).<br /></div><div><br /></div><div>All that aside, RF64 is gaining some popularity and support -- probably more than Wave64. As for CAF, it's less popular, but since it's an Apple standard it's probably not going anywhere even if it's not going to be the "next big thing." It could be a fine place to work from, but just scanning the docs everything I looked at brought up a few issues that worried me. For example:<br /></div><div><br /><br /><br /><ul><li>The CAFMarker data-type has three design flaws I noticed. One is that the frame position is a floating point number. I might be missing something here, but in a format where everything else that counts frames and bytes as 64-bit integers, why are we suddenly using floats? Sure that will be integral to pretty big numbers since it's 64-bit, but it's still a float. I didn't use a format like this to get pretty accurate big numbers when I could get completely accurate big numbers! Internally, most apps are going to be converting 64-bit integers to 64-bit floats, which is insane. Another problem is mChannel, which is the channel (starting at 1) that the marker refers to or zero if the marker refers to all channels. Okay, seems reasonable, except that the spec also defined a channel mapping with a 32-bit channel layout bitmask. Why not use that? Granted you might have more than 32-channels, but that's not going to be the most common case, and you could give your users a choice. Consistency is important in APIs. Also, let's face it, the CAFMarker, if not all the basic chunks, should be versioned and extensible. Sure all that takes a few more bits (well, not the float/integer thing), but it's really nothing compared to the sea of data in most audio files.</li><li>In the SMTPE timecode types they define&nbsp;<span style="font-family: Monaco, Courier, Consolas, monospace; font-size: 11px; line-height: 14px; white-space: pre;">kCAF_SMPTE_TimeType30Drop. <span style="font-family: Times; font-size: medium; line-height: normal; white-space: normal;">Now, the fact is that there's really no such thing as 30 Drop, but I can see an argument for including it out of completeness. However, the documentation states that: "<span style="font-family: 'Lucida Grande', Geneva, Helvetica, Arial, sans-serif; font-size: 12px;">30 video frames per second, with video-frame-number counts adjusted to ensure that the timecode matches elapsed clock time.</span>" Which is wrong. If you actually had 30 Drop it would run ahead of elapsed, or "wall-clock" time. "Aha!" you say, "they really mean 29 Drop, which is often just called 30 Drop because <i>everyone</i> knows there's no such thing as 30 Drop." But, I'm afraid you are wrong, because there's another constant for that,&nbsp;<span style="font-family: Monaco, Courier, Consolas, monospace; font-size: 11px;">kCAF_SMPTE_TimeType2997Drop</span>, with pretty much the same documentation, only in this case, it's correct to say that the timecode matches elapsed time. (well, it's very close anyway)</span></span></li></ul><div>So CAF might be flawed, but probably no more so than WAVE and anything built on it. The reliability factor is sweet. Really. The fact that many people, especially in broadcast, seem to be wanting RF64 support is a detraction, though.<br /></div><div><br /></div>Of course, I might just be over-engineering it. The AU format has been around forever, is super simple and provides high resolution, uncompressed audio of ANY length (it's not even limited to 64-bit). Of course, it lacks metadata which might be useful for BWF-style info as well as region data, but hey, it's wicked simple.<br /></div><div><br /><br />An interesting side note is that by choosing an appropriately sized junk/empty chunk in the header, Wave64, RF64 and CAF can actually be converted from one to another in-place.<br /></div></div>Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com1tag:blogger.com,1999:blog-7225698277211840079.post-12017992356750137972009-07-31T23:41:00.000-07:002009-11-11T08:27:26.790-08:00Reproducing the THX audio logo in SuperColiderHere's an interesting <a href="http://www.batuhanbozkurt.com/instruction/recreating-the-thx-deep-note">article</a> on Reproducing the THX audio logo in SuperColider. While I definitely find the original a bit more pleasing, maybe it's just because I'm more familiar with it. It's amazing how much simpler it is to do this sort of thing today than back then. The author of the post even includes a 140 character version which doesn't sound half bad.Bjorn Rochehttp://www.blogger.com/profile/17072425815152893296noreply@blogger.com0