This project will be to develop an easy-to-use interface for mixing audio data for playback through a single output data line using Java Sound. This is essential for systems where the Java Sound Audio Engine or alternative software mixer is not available, where the existing hardware mixers only support a single output line, and where a pure-java option is needed. It will be for 2D audio only, and will not support panning or pitch changes. It will implement a per-line gain changing capability if that doesn't introduce too much latency.

We love death. The US loves life. That is the difference between us. -Osama bin Laden, mass murderer

Pitch is another word for frequency (a base drum is low-pitch, and a bird chirping is high-pitch). Pitch changes are typically used in 3D (spatial) sound to simulate changes in velocity and Doppler Effect.

Pan refers to the volume difference between the left and right speakers. Pan changes are typically used in 3D sound to simulate a sound source coming from a particular position in 3D space.

Gain refers to the overall volume (both speakers).

This project will not have pitch or pan capability (because adding the ability to change those introduce significant latency). It will be designed for use in applications that do not require 3D audio (for folks who just want an easy-to-use, reliable interface for playing sounds, regardless of Java version or target OS).

I am continuing work on my 3D sound mixer as well (as a separate project), and will attempt to further optimize it to reduce the latency problems I'm experiencing with it. I'll post any progress related to it on my 3D Sound System thread.

We love death. The US loves life. That is the difference between us. -Osama bin Laden, mass murderer

This project will not have pitch or pan capability (because adding the ability to change those introduce significant latency).

While I understand this statement for pitch change using some FFT based algorithm, I don't see latency introduced by panning, since it is easily achieved by changing the volume of the left or right channel.

Also you can simulate pitch by resampling the source data for playback "on the fly" without needing a buffer.

Having said that, this will of course only work for uncompressed samples, so maybe you are targeting a wider variety of source material...

While I understand this statement for pitch change using some FFT based algorithm, I don't see latency introduced by panning, since it is easily achieved by changing the volume of the left or right channel.

Also you can simulate pitch by resampling the source data for playback "on the fly" without needing a buffer.

Right, but these are achieved by apply additional math to the data from each line being mixed. Each one feature by itself isn't much, but the more you do with the more lines you mix, the more resources it requires, adding up to latency (which is why I also mentioned I won't even implement per-line gain if it takes too much resources). This is meant to be quick and efficient, at the cost of features. My 3D mixer takes the other route (less efficient in favor of more features).

We love death. The US loves life. That is the difference between us. -Osama bin Laden, mass murderer

I still don't get why this adds to latency? It might use CPU cycles, but as long as you don't need buffers in your algorithms, latency will always be none.

For example, if it takes 20 ms to mix all the lines, then any line that was queued up to play 20 ms ago has experience 20 ms of latency. (and any external synchronization that required a millisecond position for playback is now off)

I agree I'm being a bit of a [German fascist party controlling Germany from 1933 to 1945] with this, but do you understand the the reasoning for wanting an efficient and precise method for mixing and playback of multiple lines? (like I said, my other mixer has all the features - that's not the goal with this project)

We love death. The US loves life. That is the difference between us. -Osama bin Laden, mass murderer

No computer operating system can do everything at once, so a multitasking operating system such as Windows or Mac OS works by running lots of separate programs or tasks in turns, each one consuming a share of the available CPU (processor) and I/O (Input/Output) cycles. To maintain a continuous audio stream, small amounts of system RAM (buffers) are used to temporarily store a chunk of audio at a time.

Some plug-ins add latency to the audio path, as revealed by this Plug-In Information window in Cubase SX. The window shows which plug-ins exhibit additional latency when used, and whether or not to automatically compensate for it.For playback, the soundcard continues accessing the data within these buffers while Windows goes off to perform its other tasks, and hopefully Windows will get back soon enough to drop the next chunk of audio data into the buffers before the existing data has been used up. Similarly, during audio recording the incoming data slowly fills up a second set of buffers, and Windows comes back every so often to grab a chunk of this and save it to your hard drive.

If the buffers are too small and the data runs out before Windows can get back to top them up (playback) or empty them (recording) you'll get a gap in the audio stream that sounds like a click or pop in the waveform and is often referred to as a 'glitch'. If the buffers are far too small, these glitches occur more often, firstly giving rise to occasional crackles and eventually to almost continuous interruptions that sound like distortion as the audio starts to break up regularly.

Making the buffers a lot bigger immediately solves the vast majority of problems with clicks and pops, but has an unfortunate side effect: any change that you make to the audio from your audio software doesn't take effect until the next buffer is accessed. This is latency, and is most obvious in two situations: when playing a soft synth or soft sampler in 'real time', or when recording a performance. In the first case you may be pressing notes on your MIDI keyboard in real time, but the generated waveforms won't be heard until the next buffer is passed to the soundcard. You may not even be aware of a slight time lag at all (see 'Acceptable Latency Values' box), but as it gets longer it will eventually become noticeable, then annoying, and finally unmanageable.

Scenareo:You want to mix multiple input lines. They each have byte data that needs to be combined into a single output stream. The mixer has an interface which lets the user attach the lines, and signal when they should start or stop. A thread running within the mixer does all the mixing of the byte data from each line.

Inter-workings:The thread which is doing the mixing can not process every line at the same time (obviously), so it must process them in a linear fashion. It starts with the first line, grabs a block of data [i.e. a buffer] to mix, applies any transformations to that data (gain changes, pan changes, sample-rate changes, etc), and moves on to the next line. Once it reaches the end, after combining the data from all the lines, it pushes the result out to the hardware to be played.

Where latency originates:The calculation for each line takes some amount of time, however minuscule that might be. Lets say for the sake of argument that to grab the data, do a gain calculation, do a pan calculation, to a sample-rate change, and move on to the next line it takes a total of 100 ns (this is a totally made up number). A summation of that unit time requirement for each line is equal to the amount of latency that will be experienced from the time the user told the mixer "hey, start playing!" and when the audio data was pushed to the speaker. It is also important to point out that the [average] unit time will increase depending on how heavily the user is loading the CPU with other processes.

How much latency is there? Well, it depends on the number of lines you are mixing and how much time you are spending per line. In the above scenario with my totally arbitrary number, the latency isn't too bad - for 250 lines would be 25 ms. For 1000 lines, it would be .1 seconds. But it is there. As I've mentioned, the point of this project is efficiency, not features. As much processing as I can remove from the equation, the better. I assume the main concern is not having a per-line gain control? That may be an important enough feature that it should be included in the mixer, rather than removed simply for the sake of efficiency. Pan and pitch, on the other hand, are not important features for most people, and therefore should be eliminated. I've written a lot of "bloatware" in my time, but that doesn't mean I can't be a [German fascist party... yeh, you get the picture] once in a while when it suits my goals

We love death. The US loves life. That is the difference between us. -Osama bin Laden, mass murderer

I totally get your points, but when I set my mind to efficiency I stick to it. All the "bloat" of this project will be in the surrounding easy-to-use interface. The core of the mixer where all the heavy lifting is being done is going to be as efficient as I can make it while still accomplishing its purpose: mixing of inputs into a single output.

We love death. The US loves life. That is the difference between us. -Osama bin Laden, mass murderer

I've got the basic project put together. Attaching lines, starting, stopping, and mixing works nicely. I've been borrowing most of the code from my other mixer project, so nothing much new as of yet. There is a little more to complete on this part, such as drain and flush, as well as gain changes.

The next big component is the format conversion code. Currently, the mixer is assuming a particular input (and output) audio format, which of course is not all that adaptable. I need to be able to let the user specify the output format, and to mix a variety of input formats. Obviously format conversion impacts performance, but this is certainly a feature that people will need. For folks who need ultra optimization, they just have to make sure all their input data is in the same audio format, and use the same for the output audio format as well (thus bypassing all the format conversion steps)

We love death. The US loves life. That is the difference between us. -Osama bin Laden, mass murderer

I'm looking forward to seeing and hearing what you come up with and applaud your efforts!

I have some apprehensions about the algorithm you describe here:

Quote

Inter-workings:The thread which is doing the mixing can not process every line at the same time (obviously), so it must process them in a linear fashion. It starts with the first line, grabs a block of data [i.e. a buffer] to mix, applies any transformations to that data (gain changes, pan changes, sample-rate changes, etc), and moves on to the next line. Once it reaches the end, after combining the data from all the lines, it pushes the result out to the hardware to be played.

It would seem to me you would want to bring in some defined amount of data into the inner loop from each line, at the "same time," then do a single operation that includes each buffer's i-th element. In other words, progress by frames, not by tracks or channels.

Maybe that is what you are doing?

Or maybe it isn't. I noticed for example that a free, tutorial OggVorbis Player iterates once for each stereo track of the innermost byte buffer in turn, which is a lot less efficient than processing both channels at the same time, especially if you are going to apply any sort of processing to the individual frames.

"We all secretly believe we are right about everything and, by extension, we are all wrong." W. Storr, The Unpersuadables

I was to suggesting the same thing, but happend to realize, that I have no idea if it really is more efficient today (it was in the past), given all those fance cache architectures. Maybe some cache prefetch today makes chunked processing more efficient than the per frame/byte one...

I am using similar algorithms to whats in that library (also working with float buffers. This is fairly straightforward as long as you are dealing only with PCM data). I just realized my previous post sounded like I was asking how to do format conversions - I actually have this part coded from my other mixer project. That being said, I'll definitely take a look through that link you posted see if there is anything useful that I could learn from it for optimizing my own code (although I personally shy away from anything GPL, even with those convoluted lawyer-speak "classpath exception" or "lesser" clauses - I don't like the controls that the designers of these licenses try to place on the developer. I much prefer to give people a "do whatever you want, just don't come crying to me if you screw up your sh**" type of license). Anyway, what I meant to say is that I need to work format conversions into this system, decide how many formats I want to support, and spend some time testing and debugging.

It would seem to me you would want to bring in some defined amount of data into the inner loop from each line, at the "same time," then do a single operation that includes each buffer's i-th element. In other words, progress by frames, not by tracks or channels.

How would that work, though, since each line is going to have its own unique set of transformations to be applied? Even if you do it at the line level to present a pre-defined amount of transformed data to the mixer thread, you are just pushing the same operation to another class - in the end you are still performing transformations to each line in a linear fashion (that's the only way a CPU can operate). So whether it is done by the mixer after it draws the data from the line, or done by the line before it is drawn by the mixer, it is the same performance-wise, right? Perhaps I'm not understanding the process you are describing (I sometimes need unfamiliar concepts to be "dumbed down" for me). Since cylab mentioned that is what he was suggesting as well, I'm probably just on the wrong wave-length so to speak

We love death. The US loves life. That is the difference between us. -Osama bin Laden, mass murderer

I guess I'm thinking construct each line using a Factory pattern that implements the appropriate Strategy pattern, such that in the innermost loop of the mixer, all you have to do is worry about adding the signals and dividing by the number of signals (or whatever that algorithm is) and applying a global mixer volume to that.

Suppose you had a wrapper for each AudioInputStream that contained its own buffer, and a collection of these wrappers. The following I am writing as I sit here, for the inner loop of the Mixer, to illustrate the idea.

Have you been looking into multicore processing? I'm trying to get a grip on it. This could affect whether there is a difference between the inner loop of the mixer progressing via frames vs tracks.

I'm sorry if I'm not doing a good job of getting my head around the issues in a way to clearly answer the points you raised.

For my JTheremin, I used the code presented here to feed output from Swing components to the audio processing loop on a per frame basis. I don't know if you saw it or not, or if the idea or code is useful to you. (Probably lots of room for improvements here, too.) This tool is geared to a per frame processing order.

Thanks, I'll take a look at the code you linked to. I like the basic infrastructure as I understand it, because whether or not it improves performance, it does clean up the code in the mixing loop.

Also, what I'm calling a "buffer" is any arbitrary size divisible by the sample size. I may very well use the sample size if a reasonable number of lines can be mixed on my crappy Netbook without having any skipping in the output. I do still need to determine the most efficient way to handle format differences between input and output that result in different sample sizes between the two, but that's just a matter of trying things out to see what works the fastest (I currently have the mixer reading until it has enough data, and telling the lines to store any "leftovers" until the next iteration).

We love death. The US loves life. That is the difference between us. -Osama bin Laden, mass murderer

Great to see this project starting - would seem to be highly needed given recent conversations!

Sometime in October I'll be releasing the rest of the audio libraries from Praxis as GPL w/CPE, which does most of this and a whole lot more. However, I also understand the need for a lightweight and more liberally licensed library. I'm happy to donate any fragments of the Praxis stuff that's useful under BSD if either it's code I've written or is 3rd-party based but license compatible.

The AudioServer code (minus the conversion code from Gervill) I've posted elsewhere might be useful (just the general concept of not blocking on the SDL write(). )

... which is why I also mentioned I won't even implement per-line gain if it takes too much resources ...

Java is not that slow...

This made me laugh - perhaps biggest understatement of the year!

As I mentioned in the other JavaSound thread, I've been programming audio stuff with pure-Java DSP for about 7 years now. All that you've mentioned (gain / pan / pitch) are easily achievable at low latency - I've done things doing far more than that! I also don't understand how you've expressed your concern on latency - each of these operations does not add latency as such - this is set in the soundcard driver. Each will add some (minimal) CPU usage, which can eventually cause audio break up if the CPU gets saturated - you can lower CPU usage by increasing latency, but this isn't the same thing as saying every operation increases latency.

If it's useful to know, I can get sub-5ms latency doing all this and more. Once your code is up I'm happy to have a look through and see if there's any optimisations I can offer.

I guess I'm thinking construct each line using a Factory pattern that implements the appropriate Strategy pattern, such that in the innermost loop of the mixer, all you have to do is worry about adding the signals and dividing by the number of signals (or whatever that algorithm is) and applying a global mixer volume to that.

You don't want to be dividing by the number of signals, or every time you add a signal all the others will get quieter. You do however need to limit the signal so it doesn't go out of range. There are various approaches to that. An auto-gain control (measuring the ongoing absolute maximum signal and dividing all values by that) is simple and possibly suitable in this case.

This probably isn't worth the effort unless you're doing a lot of DSP work. Programming multicore audio isn't easy - you need to use lock-free structures throughout (synchronized or other locks can really mess with your latency).

I've actually learned quite a bit by starting over from scratch with a new project and a new infrastructure. I'm really not experiencing the latency issues that I'm getting with my other mixer project (even with as low as 32 total lines, it is complete garbage for anything I'd use it for)

Anyway, I'm going to just scrap that project and turn this one into the base that I'll use in my 3D Sound System. I can add per-line pitch/ pan/ gain capabilities without damaging the optimized pipeline, simply by having the mixer check if those features are requested, and if not simply don't do the calculations. That way, there would be no performance hit as long as the user used the same format for the output and all inputs, and didn't change the pitch, pan, or gain. This should give the capability to mix hundreds of lines at a decent speed.

We love death. The US loves life. That is the difference between us. -Osama bin Laden, mass murderer

Anyway, I'm going to just scrap that project and turn this one into the base that I'll use in my 3D Sound System. I can add per-line pitch/ pan/ gain capabilities without damaging the optimized pipeline, simply by having the mixer check if those features are requested, and if not simply don't do the calculations. That way, there would be no performance hit as long as the user used the same format for the output and all inputs, and didn't change the pitch, pan, or gain. This should give the capability to mix hundreds of lines at a decent speed.

Does it change anything for people not using JavaSound but rather JOAL?

The mixer itself is independent of Java Sound, and could be used with JOAL. There is a class that provides the linkage with Java Sound (chooses a device, opens a line, etc), and it can be removed easily enough. As for how this mixer will fit into my SoundSystem library, it will be part of a new library plug-in (something along the lines of LibrarySimpleMixer or the like), and will link with Java Sound. The LibraryJOAL plug-in will not be affected.

We love death. The US loves life. That is the difference between us. -Osama bin Laden, mass murderer

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org