I am quite confident from viewing the die shots, and it seems to be sufficiently confirmed from testing, that whenever you write a data value, BUSY gets set, and gets cleared exactly 32 cycles later. On the die shot it's just a 5-stage counter, not connected to anything else.

Of course the chip actually "consumes" the written data value at some point within the first 24 (or possibly a couple more) cycles of it being written, when its value comes around in whatever circular shift register. (They're all 24 stages or less.)

I see that what you're saying is that, as far as you can see, if you write the key on register just after op 1's key on bit came around, it waits until it comes around again so that the key on bits are always updated in the order 1, 3, 2, 4. This is possible--I haven't worked out any of the control units in detail--but this is easily testible.

Write all four operators key-on on one cycle. Exactly 24 cycles later, write all of them key-off. On the naive implementation, you will get each operator to output a nonzero value exactly one sample (possibly not all on the same sample, but one each). On your implementation, one of two things will happen: either the new key-on value will overwrite the old, and some of the operators will have never received a key-on; or the new value will be discarded, and some of the operators will remain keyed on afterwards. Both of these conditions should be observable even on one sample (if release is set to a long value, you can notice any operator which was keyed on for only one sample; and as long as decay level is not zero, you can also tell it apart if the operator was never keyed off).

The thing is that, if your implementation is correct, it would take a maximum of 18+23=41 cycles to update the keyon states of all four operators. Which means, you could write a certain keyon value, wait the 32 cycles the dumb BUSY timer (and the datasheet) told you to, and then write a different keyon value, and have the output be blatantly incorrect (compared to what the datasheet specified), i.e. have some operators still on or never turned on. I am a bit doubtful that Yamaha would have added the additional logic to make sure the operators got keyed on in the correct order, when this could lead to strictly incorrect performance, while the only penalty for not adding this logic would be modulation being wrong for a single sample.

Sauraen wrote:
The thing is that, if your implementation is correct, it would take a maximum of 18+23=41 cycles to update the keyon states of all four operators. Which means, you could write a certain keyon value, wait the 32 cycles the dumb BUSY timer (and the datasheet) told you to, and then write a different keyon value, and have the output be blatantly incorrect (compared to what the datasheet specified), i.e. have some operators still on or never turned on.

The chip could have a different logic to generate the BUSY for the keyon. In JT51 v1.0 I had a fixed 32 bit counter for busy_reg and then an independent busy_keyon signal. And the final busy was the OR of both of them. This means a bit more of logic, yes and:

Sauraen wrote:
I am a bit doubtful that Yamaha would have added the additional logic to make sure the operators got keyed on in the correct order, when this could lead to strictly incorrect performance, while the only penalty for not adding this logic would be modulation being wrong for a single sample.

So I think they added the extra logic to have a longer BUSY but I do not have a PCB setup to verify it at the moment. The alternative to update the keyon within a 32 bit count and still get the operator order right doesn't sound easy either. And I insist, I took literally hundreds of measurements and it is not possible that all of them just by mere chance got the same operator order for all channels... is it? I do not think my testbench was so deterministic as the PC was driving it and relied on an UART... so there are plenty of chances to get information delayed by several clock cycles even when running the same test bench.

You get clicks and whatnot on note ons, they didn't try to do anything about it whatsoever (easiest thing would have been allowing te select if phase generator is reset by note on or not) and having a wrong sample or two at a noteon wasn't gonna be a problem because it is gonna be masked by the click or even be source of it. I don't see any reason they tried to handle things specially here.

jotego wrote:The chip could have a different logic to generate the BUSY for the keyon

I looked again, and as far as I can tell this is not the case. The BUSY timer's only inputs are chip reset and a signal indicating a data address was written, and it has one output. This output goes to a unit which puts values on the output bus. The connections to this unit are: the two timer overflow signals, the BUSY signal, test register 21:6 (which disables this unit and enables another unit to put test data on the output bus), a signal to teh actual /IRQ pin, a Bus Write timing control signal, and bits 7, 0, and 1 of the output bus. There's no other "mystery control signal" from the keyon logic.

jotego wrote:The chip could have a different logic to generate the BUSY for the keyon

I looked again, and as far as I can tell this is not the case. The BUSY timer's only inputs are chip reset and a signal indicating a data address was written, and it has one output. This output goes to a unit which puts values on the output bus. The connections to this unit are: the two timer overflow signals, the BUSY signal, test register 21:6 (which disables this unit and enables another unit to put test data on the output bus), a signal to teh actual /IRQ pin, a Bus Write timing control signal, and bits 7, 0, and 1 of the output bus. There's no other "mystery control signal" from the keyon logic.

Very interesting, thanks a lot! So it is very unlikely that they could manage to have a specific sort order for operator keyon. I will try to verify it on the actual part, using the test output. I have a question about parallel data:

Sauraen wrote:Turn on read data mode (21:D6 = 1). Select MSB or LSB of read data (21:D7 = 0 for MSB, 1 for LSB--this may be new information) Select the signal that appears on bit 14 of the output (1:0, and it does seem to make a difference, but I still don't know what the two signals are) Select whether to read operator or channel outputs (2C:D4 = 0 for operators, 1 for channels) Set up to read data from the OPN2 (as if checking whether it's busy). Every OPN2 internal clock cycle, capture a new byte from the bus, for 24 cycles. Print the 24 bytes from the synth to a terminal.

I do not understand what the interaction with the clock is. If I set to read LSB data and then read for 24 cycles, I am getting the LSB of all the 24 operators... but I am losing the MSB! If then I set to read MSB the operators are already at a different value (different clock) so I cannot get to read both the MSB and the LSB of an operator at a specific clock cycle. Is that correct? It seems odd that MSB+LSB cannot be read.

Then about the test pin, when it is set to input, (2C:D7 = 1), I wonder if that input is actually a clock. Maybe when the PG and EG clocks are stopped using test modes (21:D5=1, stops EG clock and 21:D3, stops PG clock) the clock is actually handed to the test pin.

However, note that even stopping the PG clock is not enough to read MSB+LSB of the same operator. Because if we have to read MSB for 24 cycles and then set to read LSB for another 24 cycles, even if PG and EG are stopped, the internal modulation shift registers and signals will make operator values shift away. So there must be another bit that stops the modulation control signals (the ones you called op_algorithm_ctl, voice_fb and op_fb_enable). If the modulation, PG and EG are stopped then we can alternate between MSB and LSB read and get all the information. Then we can advance one clock cycle using the test pin or one of the other unknown test pins:

Sauraen wrote:
21:D1 Goes to unknown control unit (large block above PG).
21:D0 Goes to parallel data output unit. Function unknown—maybe selects which channel to read output of?
2C:6 - Goes to the same unit as the TEST pin's input wire, which is a small control unit at the upper right of the EG. I will try to figure out whether this unit actually has to do with the EG or not. (It's also worth mentioning that it looks like BUSY is permanently wired to the TEST pin output--you can't switch it to output some other signal on this pin.)

jotego wrote:I do not understand what the interaction with the clock is. If I set to read LSB data and then read for 24 cycles, I am getting the LSB of all the 24 operators... but I am losing the MSB! If then I set to read MSB the operators are already at a different value (different clock) so I cannot get to read both the MSB and the LSB of an operator at a specific clock cycle. Is that correct? It seems odd that MSB+LSB cannot be read.

You are correct, MSB+LSB cannot be read. The info you quoted from me about the unknown test bits is obsolete. The most up-to-date information is as follows:

Test Bit Functions
$21:0: Select which of two unknown signals is read as bit 14 of the test read output.
$21:1: Some LFO control, unknown function.
$21:2: Timers increment once every internal clock rather than once every sample. (Untested by me)
$21:3: Freezes PG. Presumably disables writebacks to the phase register.
$21:4: Ugly bit. Inverts MSB of operators.
$21:5: Freezes EG. Presumably disables writebacks to the envelope counter register. Unknown whether this affects the other EG state bits.
$21:6: Enable reading test data from OPN2 rather than status flags.
$21:7: Select LSB (1) or MSB (0) of read test data. (Yes, it's backwards.)
$2C:2 downto 0: Ignored by OPN2, confirmed by die shot.
$2C:3: Bit 0 of Channel 6 DAC value
$2C:4: Read 9-bit channel output (1) instead of 14-bit operator output (0)
$2C:5: Play DAC output over all channels (possibly except for Channel 5--in my testing the DAC is the only thing you hear and it's much louder, you do not get any output from Channel 5; but someone else supposedly found that the pan flags for Channel 5 don't affect the panning of this sound, which is only possible if it's not being output during that time slot for some reason. I don't have any other reason to believe this is true though).
$2C:6: Select function of TEST pin input--both unknown functions.
$2C:7: Set the TEST pin to be an output (1) instead of input (0).

Test Bit Functions
When the test bit is configured as an output ($2C:7 is 1), it outputs the SYNC signal (NOT BUSY as stated in old posts of mine). This signal goes high for one internal cycle every sample (24 cycles). Its falling edge occurs just after an internal clock pulse, so wait half an internal cycle before sampling the OPN2's output. Then, four cycles after that, the data on the output will be for Ch 1 Op 1. (Presumably, the SYNC signal is when it begins calculation of Ch 1 Op 1, and the output is available after four cycles.) Sample on each internal cycle to receive the data for channels 1-6 operator 1, then 1-6 op 3, then 1-6 op 2, then 1-6 op 4. I implemented this in my synth MIDIbox Quad Genesis to read back operator states (just the MSB) at about 500 Hz and create a VU meter display on the front panel; the code for the routine is in this file.

When the test bit is configured as an input, I have no idea what it does.

So to answer your question @jotego, there is no bit which stops calculation of the operators. When I transcribed the operator unit I found no such functionality. The operator pipeline does calculation as part of the pipeline, and there's no way to skip it. Unlike the PG and EG which can essentially pop out a value, modify it, and put it back in the circular shift registers, in the OP the shift registers are throughout the pipeline. There's also no "mystery" test bit (input) which is unaccounted for which could do this, and yes I'm sure I saw them all.

As I don't know what the TEST pin input does, I can't rule out that it would behave like you're saying; but there's two things which lead me to believe it's not. First, the two functions the test bit has don't seem to be linked to logic related to the PG and EG writeback disable signals, so I think the TEST pin input is active regardless of what kind of test mode the chip is in. Second, you need the TEST pin configured as an output in order to synchronize your data capture to it; and while you might be able to synchronize and then not need to look at the TEST pin anymore, to change it back to an input would require writing to $2C, which might interrupt your data collection. I guess you could keep careful timing throughout the process and still get good results--after all you have to write to $21 anyway to switch between LSB and MSB.

Very informative. I have updated the wiki on the JT12 repository with the information. I am setting up a board to take direct measurements so expect news in the coming weeks about the latest topics we have discussed.

Some more news about YM2612 DAC "ladder" effect: a guy named Nuked who analyzed YM3438 die shots recently wrote a cycle-accurate implementation in C and figured the effect was actually caused by two things:

1 ) the fact that the resistor array would not exactly reach Vcc/2 on either negative or positive side (already explained by Sauraen in a previous post) which causes a gap 2x larger than expected between 0 and -1 steps: 0 and -1 actually corresponds to same amount of resistance in DAC array i.e they are like opposed values symmetrical on each side of Vcc/2
=> a way to emulate this is to add a +1 offset on positive or zero channel output (so 0 corresponds to +1 and so on).

2 ) the fact that the sign bit also has an impact when the DAC output is silenced, which can happen on one of the stereo analog output when muted but also between channels output as seen in early Nemesis captures (channel output is active for one chip internal cycle then kept "low" for 3 internal cycles) : when "silence" is needed, the output is actually forced to either +1 or -1 (max resistance output of DAC array)
=> a way to emulate this is to add a +3/-3 offset (+4/-4 when channel is muted) on channel output depending on sign bit

Also, further discontinuities exist, especially for test bit $2C:3=0, where when DAC register bits 0 to 3 are 0 and register value is > 128. I tested multiple MD1s, all of them had this odd behaviour. I did not spot this on MD2, too bad that the seemingly better DAC performance is crippled by a poor amplifier circuit.

Regarding $2C:5: I noticed that on a MD1 I tested this resulted in a roughly 30-fold increase of DAC levels when all channels are on. This causes distortions as the amplifier does not expect to see anything louder than all 6 channels played at once at maximum levels (which is unlikely since phases would need to match as well so there is a bit of head room left for PSG). Maybe this does not only output DAC on multiple channels but also output DAC for much longer than it does normally.

My understanding with the DAC "loud" test bit is that normally, channel values are output in an "impulse-like" fashion, for a fraction of their time slot (I think it should be 1/4, since there are 24 clock cycles per complete computation cycle and there are 6 channel values to output during that time, so output for 1 clock cycle would be 1/4 of a slot). Whereas, with the test bit set, the output is just constantly the DAC value. So it would be 24x as loud.

I am quite confident from viewing the die shots, and it seems to be sufficiently confirmed from testing, that whenever you write a data value, BUSY gets set, and gets cleared exactly 32 cycles later. On the die shot it's just a 5-stage counter, not connected to anything else.

Of course the chip actually "consumes" the written data value at some point within the first 24 (or possibly a couple more) cycles of it being written, when its value comes around in whatever circular shift register. (They're all 24 stages or less.)

Kabuto's Titan 2 Overdrive demo does not work well with a 24-stage circular shift register. His experimentation tells him that 12 FM cycles are enough to get a good write. He also reported that Sik, the author of echo, found that it was possible to get good writes while waiting less than 24 FM cycles.

After thinking about it, I do not think that there is any other thing but a 24-stage circular shift register. But, maybe it was built out of the original 12-stage circular shift register from YM2203. We know, thanks to Sauraen, that the operator unit is pretty much a copy-paste exercise (with a difference in the number of stages) from that of YM2203.

So I wonder if the original YM2203 structure of an input multiplexer, fed by the external data input on one side and the output of the shift register, followed by a 12-stage shift register was actually used twice in series. Thus having a total of 24 stages, where feedback goes from stage 24 to stage 1 but there are two multiplexers, one in front of stage 1 and another one in front of stage 13. This means that you can update operators 0 and 2 using the first multiplexer and operators 1 and 3 using the second one, within a 12-FM clock cycle period.

This double-mux structure is compatible with the structural need for 24 stages and the observation from the demo scene that 12-FM clock cycles should work and makes sense historically and would have saved layout work to YM2612 designers.

I wonder if Sauraen would still care to have a look at the die shot to confirm this, please.

So I wonder if the original YM2203 structure of an input multiplexer, fed by the external data input on one side and the output of the shift register, followed by a 12-stage shift register was actually used twice in series. Thus having a total of 24 stages, where feedback goes from stage 24 to stage 1 but there are two multiplexers, one in front of stage 1 and another one in front of stage 13. This means that you can update operators 0 and 2 using the first multiplexer and operators 1 and 3 using the second one, within a 12-FM clock cycle period.

The pipeline for YM2612 is indeed 12 stages long: 6 of these are the same as YM2203, and then YM2612 has an additional 6-layer shift register stuck in the middle. Unlike PG and EG which have an important state variable to store (and so have logic to compute its new value from the old one and then enough shift registers to delay it for the remainder of 24 cycles), OP doesn't *normally* (leaving aside feedback for the moment) use the old value of an operator to compute the new one. What it does have is three additional buffers for modulation that store the last value of op 0, the last value of op 1, and the last-last value of op 0. (These are only 6 layers deep, because there are 6 voices.) At any given point in the pipeline, if you're looking at the data there, it's for (say) op 0; six cycles later it will be the same voice op 2; six cycles later the same voice op 1; and finally six cycles later the same voice op 3. In the intermediate six cycles, the other six voices are being computed.

Please take a look at my VHDL model of the operator unit. I do not claim that every single logical element in this is correct, but I am confident that the overall structure of this in terms of what registers are where and how the pipeline works is correct.https://github.com/sauraen/YM2612/blob/ ... erator.vhd

The operator parameters (for all units: EG, PG, OP) are stored in 24-deep circular shift registers whose outputs are basically just the control signals for all the units in the chip. There's no way to write to the middle of these--they can only be written to when that particular operator's data comes around. [EDIT: this is wrong] So, there is control logic (which I haven't analyzed in detail) which, based on which voice and operator you're writing to, waits to write your new data to these until the proper cycle. If you write one address and data at a given time and then 12 cycles later write another address and data, you might get lucky and both get captured, or the first one might get lost if it didn't come around in time. I can't see how it would be possible to make a sound engine that wrote new data every 12 clocks, consistently, without problems.

Last edited by Sauraen on Thu Nov 22, 2018 5:48 pm, edited 1 time in total.

Reminds me, the other day I got reports that it's possible for a write to register $28 to go missing and you need to do key-on twice as a workaround. Now I wonder if that particular register does indeed need double the delay, seeing as it messes with the ADSR mechanism (・～・)

The operator parameters (for all units: EG, PG, OP) are stored in 24-deep circular shift registers whose outputs are basically just the control signals for all the units in the chip. There's no way to write to the middle of these--they can only be written to when that particular operator's data comes around. So, there is control logic (which I haven't analyzed in detail) which, based on which voice and operator you're writing to, waits to write your new data to these until the proper cycle. If you write one address and data at a given time and then 12 cycles later write another address and data, you might get lucky and both get captured, or the first one might get lost if it didn't come around in time. I can't see how it would be possible to make a sound engine that wrote new data every 12 clocks, consistently, without problems.

Thanks for your detailed account. I totally agree with it and JT12 implementation follows those lines as you know. However, in view of this new evidence from Kabuto, I remembered having seen some actual wait times in some Yamaha document, so I dug for it. It is the application notes for YM3438, a CMOS version of YM2612 -which is NMOS logic. It is in Japanese so you may have to take my word for it. On page 10 it shows these wait times:

There is a delay through the input/output latches equivalent to 2 master clock cycles. This is fixed. It might be because these are actually flip flops clocked with the master clock or just an equivalent delay through an asynchronous latch. Once that delay is taken away we are left with very interesting numbers: 12 FM clock cycles for global (0x2? registers) and operator registers. And 6 FM clock cycles for channel-wide registers. Remember than one FM clock cycles is six master clock cycles because of the internal divider. It takes 24 FM ticks for the pipeline operators to process everything and data must be poured into the pipeline one operator at a time, as Sauren also explained.

My implementation -so far as v0.61 of JT12- was assuming that a 24 CSR (circular shift register) only had an update point, so it would take 24 FM ticks to get new data in. That also made sense in view of the length of the BUSY counter, which Sauren had reported to count for 32 FM ticks. However, the evidence from Kabuto and from YM3438 document tells us that the CSR can be updated in just 12 FM ticks. Thinking of different ways to accomplish this on silicon, I think that an update point just in the middle of it is economical and makes sense if they were reusing layout work from YM2203. As I explained before. Now, if someone looks at the die shots (YM33438 or YM2612) and checks the CSR and outputs from each flip flop just go straight to the next FF's input without any mux anywhere, then we will have to consider other implementation options.

Reminds me, the other day I got reports that it's possible for a write to register $28 to go missing and you need to do key-on twice as a workaround. Now I wonder if that particular register does indeed need double the delay, seeing as it messes with the ADSR mechanism (・～・)

That makes sense. Keyon register $28 is an operator-level register because the key-on signal has to go through the pipeline. In my implementation I use another CSR to hold the key-on values and feed them to the pipeline. Original implementation must be like this too. I think I should change this CSR to have an update point in the middle too so it can get updated in 12 FM ticks rather than 24.

More importantly, PCM data gets taken into account in the accumulator and the accumulator will exactly add each operator once every 24 FM ticks. So even if YM3438 says that 12 FM ticks is enough for $2? registers, I don't think you can get PCM data out of the chip at the suggested time of 13.8 FM ticks because the accumulator is only processing data at 24 FM ticks. You can write the data in the PCM register, and it will get in, but you will not get it out to the DAC.

In summary, in the $2? space we have some registers that take rather immediate effect -like the frequency of the LFOSC- and others that go through the pipeline and have either a 12 or 24 FM tick time frame to become effective.

PCM data straight up can't be faster than ～26KHz (i.e. half the sample rate at which FM updates), writing faster than that outright results in missed samples. It may be slower than you think.

In fact it'd be probably a good idea seeing how fast each of the $2x registers is. They by-pass the shifter (as they're global) which at first may make it look like they may work faster, but after some issues I've had some time ago and the key-on thing (not to mention the well known slowness of the PCM output) I'm starting to think they may be slower in general (probably depends on each particular register, so that's gonna need a lot more of testing).