I'm pretty glad I decided to try using xmms2 to implement my player. I already have my player and library/querybuilder built but I'd rather just focus on my UI and get this thing completed. The fact that web-based frontends already exist is a big plus. As far as HQCT goes, the interface doesn't require a harvard grad. It is just a bunch of get/set functions and a couple of control functions such as tune and seek.

I just started looking into XMMS2 and am beginning to be swayed. I've already finished the "heavy lifting" of decoding an MP3 but haven't implemented outputting the PCM data to the hardware yet which is why I'm looking into options. If I decide to go with XMMS2 it will be similar to what I did with SW1 using a socket to control an external player. Otherwise I'm thinking of using a small go-between app that receives UDP packets of PCM data over a unix socket and passes the data to the user's player of choice via stdout of the go-between to stdin of the given player. I'm going to compile xmms2 and do some more reading before I go any further.

I just started looking into XMMS2 and am beginning to be swayed. I've already finished the "heavy lifting" of decoding an MP3 but haven't implemented outputting the PCM data to the hardware yet which is why I'm looking into options. If I decide to go with XMMS2 it will be similar to what I did with SW1 using a socket to control an external player. Otherwise I'm thinking of using a small go-between app that receives UDP packets of PCM data over a unix socket and passes the data to the user's player of choice via stdout of the go-between to stdin of the given player. I'm going to compile xmms2 and do some more reading before I go any further.

After reading further I'm not really gaining anything that SW2 doesn't already do natively other than support for things like OGG, etc. but I can add that stuff fairly easily later too with libogg and the like. I just got the hardware output working this evening so now it's on to coding the FFT and a few visualizations based on the data (the first of course being the venerable spectrum analyzer). After that it will be dangerously close to release.

After reading further I'm not really gaining anything that SW2 doesn't already do natively other than support for things like OGG, etc. but I can add that stuff fairly easily later too with libogg and the like. I just got the hardware output working this evening so now it's on to coding the FFT and a few visualizations based on the data (the first of course being the venerable spectrum analyzer). After that it will be dangerously close to release.

Okay the FFT and spectrum analyzer code is working tickety-boo now, thanks to libfftw. My own FFT code wasn't as fast as libfftw so I used it instead.

If you decide to use libfftw, there are a couple of things you'll need to keep in mind:

Use a reasonably large sample size, or the FFT will not be very indicative of the spectrum. A sample size of about 5k seems to work fairly well but may be slight overkill. At 44.1kHz a sample size of say 10k yields a much less snappier response and increases the CPU usage a fair bit.

The other is you'll want to use a one-dimensional real to complex DFT on the data since the originating audio data doesn't have an imaginary part (or you can use complex to complex with a null imaginary part as the input but what would be the point?). The last thing is not to scale the input data as long as you keep the input data as a signed type. This is convenient as well since you can use the same signed data you're passing to the hardware (assuming you're doing that). Since libmad uses fixed point integers, I'm using the same scaling to signed short integers for feeding the hardware as I am for the input of the FFT. The only difference is that the data sent to the hardware is interleaved and the input to the FFT is not.

Other than some simple linear scaling of the resulting spectrum, the dB conversion I'm doing on the resultant FFT (real and imaginary) is quite accurate. You can see the same result in Xine's simple spectrum analyzer visualization where they aren't doing any scaling and the data is skewed too high on the low frequencies. The linear scaling I'm doing in the visualization simpy starts fractional, reaches unity at the Niquist frequency, and becomes multiplicative at the high end.

Below are snippets directly from SW2's fft.cpp (the visualization code is in another class obviously):

Couple of points (I'm not trying to be nit-picky, just trying to be helpful):

I don't think you should be dividing the sum of the bands by the number of values that went into that band. A spectrum analyzer shows total power for a band, not average power. You want to just sum the values. (Something I am not entirely sure about is that your suming db values, whereas I sum raw values and convert the result to db. Seems like the same still applies.)

Secondly, You may want to look into using a logarithmic scale to generate bands. The higher a frequency becomes the less our ears can separate close frequencies. Part of the reason you get abnormally large values in the low end is that you are summing alot of power in the low frequency. A quick and dirty method I came up with to get start/stop pairs to group frequencies is to generate a linear array (What you are doing now essentially), then taking the log of those index values. Now you have a logarithmic scale (It will be reversed, so you need to reverse the array). Some fudging of constants will change the curve, but the end result is that less bands are summed in the low end, and more bands are summed in the high end.

Also, it looks like you traverse your entire output array for each band. I'm hoping this is proof of concept code that will get optimized at a later date :P

You were right except that the spectrum still "seems" more accurate (by seems I mean that by eyballing it, it appears to be roughly about what it should given the source material) if you divide the total full power sums by the samples per band before converting to dB.

Low frequency sounds require alot of energy to produce, vs higher frequencies wich require very little. If you are grouping a large part of the low end spectrum, (linear scale) then you will see very large values for that band in the lower end. With a logarithmic scale it should seem more correct.

--Zims

--------------------------------------------------------------------------------
Now, Where are my Pants?