I am programming a basic MIDI sequencing tool, and am starting off with the core program and internal music representation, then moving to UI, and finally to interfacing with the audio output API (I don't even know what API I'm going to use, but something MIDI-related). I have written a bit of something for my internal representation of a musical note and of its channel. I am asking for advice on both my note and what API to use for MIDI audio development (real time efficiency must also be considered). Here is my note structure as of now:
[source lang="cpp"]struct note { char midiVal; /* MIDI value, or note value */ char timeStmp [3]; /* Time value, sometimes called timestamp, basically when the note will play on the timeline. It is organized by mm:ss:ms, or minutes, seconds, milliseconds, respectively. As you guess, there is an inherent limit of 99 minutes per song. */ char channel; /* The software channel. There are 0-255 channels. Channels organize groups of similar notes, like a "nation of notes" if you will. Channels can be manipulated as one whole unit or as individual notes. */ float physChannel; /* The actual physical channel, of no relation to the previous data. It is, of course, left and right audio streams, and each note will contain data for how much it will blend on either side of the audio stream. A positive number is right stream, negative is left. */ char duration [3]; /* The duration of the note based on the same timing conventions used earlier for timestamp data. Used to specify the duration, and yet again, the limit for a single note's duration is 99 minutes. pah. */ char instrument; /* Specifies, out of a bank of a mere possible 256, which instrument a not belongs to. */ char volume; /* Dictates what volume a note will be. based off of a scale of 0-255, zero being mute, and 255 being the mortal enemy of your grandmother. */ ULINT name; /* Not really a name, more like a serial number. Anyway, this puts a limit on possible notes as well (do you like my limits?). Now, there is a ceiling of only 4,294,967,296 notes in a piece, and that piece can be 99 minutes long. */ };[/source]
The code comments can be a bit basic at times, but they were written so that someone with no clue of what I am doing could pick up on at least a little of what I was talking about and recognize similarities with other API's. I didn't mean to offend anyone with dumb humor or elementary explanations on basic subjects.

EDIT: "ULINT" is an unsigned long integer, fyi.

Edited by MrJoshL, 14 December 2012 - 08:34 PM.

C dominates the world of linear procedural computing, which won't advance. The future lies in MASSIVE parallelism.

For starters, things will be much easier if you make your timestamps a more reasonable type. Say, a float expressing seconds from the beginning of the song, or perhaps an integer number of milliseconds. Similarly for the duration.

I don't know if there is any reason why a note should know what channel it belongs to. Presumably there will be a struct "channel" which will contain the notes in that channel, and you don't need to repeat the information of what the channel is inside each note. It is possible that you have a good reason to do that, but I don't know what it is.

`physChannel' is not very descriptive. It is known as "pan" in MIDI, so perhaps you should consider changing its name to something like that.

I would probably have used a 16-bit integer for the instrument. I know there are synthesizers with more than 256 instruments, and it's probably not worth saving the extra byte, given that people have already bumped into this limit at some point.

Similarly to the channel, I don't think I would make the identifier a part of the note. If some container of notes wants to locate them using an identifier, that's great, but I think that should be part of the container, not the contained type. (In C++ I would perhaps store the notes in an object of type std::map<unsigned, note>, where the unsigned integers are the identifiers.)

Oh, one more thing. You should reorder the elements in your structs from largest to smallest, or you may end up with a bunch of padding (unused bytes to guarantee proper alignment of some types) that might make your struct larger than it needs to be.

I have worked with MIDI a ton and long ago I made a tool that generated valid MIDI files (the tool’s goal was to algorithmically generate music—it worked but the music it generated sucked ass).

Álvaro is correct about everything but the time. The times/durations must be in a resolution no shorter than microseconds. They should be stored in ULINT.Today’s software have resolutions of up to around 960 PPQN (possibly more) and 250 BPM, giving you a resolution as low as 4.166667 microseconds between events.

The variable-length time stamps inside the MIDI files store the number of ticks between each successive event. For efficient run-time performance you should convert all of these event time stamps into literal times, which is why you need to store raw microsecond values. Internally you will still need to maintain this tick-style format so that you can add/remove events reliably and change the tempo, etc., without losing precision, but before playing a song you should make a quick prepass to convert all those ticks into absolute times.

Volume only ranges from 0 to 127, by the way. Most MIDI events do.

For anything you think will be in the range from 0-255, use an unsigned char, not a char. No reason to bite yourself in the ass with a useless sign bit, especially when shifting things.

For the instrument patch you are storing severely too little information. My Yamaha MOTIF XF8 has 1,353 voices and this is fairly common these days.You need to look into the MSB/LSB system for selecting banks and patches.

Álvaro is correct about everything but the time. The times/durations must be in a resolution no shorter than microseconds. They should be stored in ULINT.Today’s software have resolutions of up to around 960 PPQN (possibly more) and 250 BPM, giving you a resolution as low as 4.166667 microseconds between events.

That makes sense. I picked milliseconds because that's what he was using, according to his comments (although he was trying to fit the milliseconds field in a single byte...).

For anything you think will be in the range from 0-255, use an unsigned char, not a char. No reason to bite yourself in the ass with a useless sign bit, especially when shifting things.

Oh, yes. This is important, and using `char' by itself is even worse than that: Whether `char' is signed or not depends on the compiler, so you should definitely say explicitly which one you want. Sorry I missed that.

As L. Spiro says, don't store you times as floats or something like that, store than as ticks.

Sequencers commonly work on a scale of PPQN (pulses per quarter note), so if you adjust tempo (once off or gradually throughout a song) it just *works*. The PPQN values are usually things like 48,96,192 etc.

Bear in mind that if you are doing 4/4 music that's all good, but if you are using triplets, or groove, you'll want the PPQN divisible by 3, and with enough precision for your 'groove'.

You'll also probably want to store your note timings as offsets from the start of a pattern, rather than the start of the song. This way you can several instances of the same pattern at different parts in the song.

Also instead of storing things as e.g. char[3] to save space, it's probably more sensible just to make them 4 byte unsigned int / ints and keep your structures 4 byte aligned so you (or the processor) aren't faffing about for no reason. You can always compress them on import / export, if you really need to.

Another reason for PPQN is so you easily change the output sample rate (assuming you are going to do some audio instead of purely MIDI).

I've done several audio / sequencing apps and don't think I stored anything as floats. PPQN can be used to calculate the exact sample for an instrument to start / end (and you might precache this kind of info). You could possibly use something more accurate to get within sample accuracy for timing, but I've never bothered myself.

It's really worth using a plugin architecture for different components of a sequencing / audio app, I'd highly recommend it. You can make effects (reverb, delay, chorus etc) plugins, and instruments plugins. You could potentially also use VST plugins or similar if you can work out their interface (you may find some open source apps that have managed this).

I'm currently rewriting a sequencer / audio app I wrote a few years ago, and have actually moved to using plugins for things like quantization / legato / groove / argeggios. Have a think about whether you want to be able to do stuff like 'undo' quantization, keep original values, or have a modification 'stack' applied to notes.

I don't think you'll get the exact structures bang on first time, it's the kind of thing you write a first version, then realise there's a better way of doing it, redo it, etc etc. But it is fairly easy to get something usable. You may also spend as much time on user interface / editing features as the stuff 'under the hood'.

As for APIs, I have so far cheated and don't actually use MIDI input or output (although I have done that in the distant past and it wasn't that difficult I don't think). I have just been writing a MIDI file importer though refreshing my memory lol.

If you want realtime MIDI input you'll have to pay much more attention to latency and the APIs you use. I was just getting by with the old Win32 audio functions for primary / secondary buffers, but the latency is awful, so using direct sound or I think there may be a new API in windows 7 would be better. Sorry can't help yet in that as I haven't researched it myself yet.

Also I'd add, consider using direct3d or (in my case) opengl to accelerate the graphics side. This way you can easily show the position within a song without overloading the CPU and causing stalls and having your audio stutter.

Once you start doing the audio side a bit of SSE / SIMD stuff helps. And you have to think carefully about how you'll structure your tracks / sends to effects, to make it efficient but also customizable.

Note pitch: I'd stick with just a note number like MIDI for now, and the 12 note western scale. 99% of music is written like this, and handling other systems is a bit more advanced and something you can tap on later. Storing notes as float frequencies I wouldn't recommend for several reasons : accuracy (say you transpose down, then up later) .. the wavelengths don't have a linear relationship with note number. You might want to do operations based on the relative pitches of notes, or detect chords etc. All of this would be stupidly difficult just trying to store wavelength / frequencies. Besides the fact your source instruments may have different base frequencies anyway and these would need to be compensated for.

Pan: Why limit yourself to stereo pan? What about surround sound?

Channels / instrument info on a note: Would you want the note to determine this, or the track and / or pattern? Having a 'grouping' feature for notes can be useful though. Remember you are going to want to be able to do stuff like edit the instruments you are using quickly and easily, and not change this for every note.

What happens when by accident you set 2 bunches of notes to the same instrument ID (if storing on the notes?) you have then lost their 'individuality'. Better to store something else that then maps to the instrument.

Volume: This is usually key velocity rather than volume (there is midi volume as well, but you wouldn't store this per note, but as a separate event), which in midi is 0-127. There is also release velocity, which may or may not be used by the instrument.

There's also other stuff like pitch bend, aftertouch etc, which you can store as a separate event.

Note name / ID: Why try and store this on the note? If your pattern has e.g. an array or vector of 35 notes, then you know its ID as you access it.

Once you have a simple system working then it will become more obvious where to add things.

To reiterate on the notes side of things, don't worry so much about space saving, just concentrate on simplicity. Note data doesn't tend to be that large. It's more when you get to the audio side you need to pay attention to the data structures / bottlenecks.

And rather than just having a struct-like class you can use accessor functions so the actual data underneath can be anything you want.

Note pitch: I'd stick with just a note number like MIDI for now, and the 12 note western scale. 99% of music is written like this, and handling other systems is a bit more advanced and something you can tap on later. Storing notes as float frequencies I wouldn't recommend for several reasons : accuracy (say you transpose down, then up later) .. the wavelengths don't have a linear relationship with note number. You might want to do operations based on the relative pitches of notes, or detect chords etc. All of this would be stupidly difficult just trying to store wavelength / frequencies. Besides the fact your source instruments may have different base frequencies anyway and these would need to be compensated for.

I still think I would use a float for this, but instead of the frequency it would be a number of semitones (which is 12*log2(frequency)+some_constant). You don't magically lose precision if you add and subtract integer values to integer values, even if those integers are stored in floating-point variables.