peterxtian wrote:Footswitch: definitely not 100%. It takes a few presses for the MIDI receive light to start registering. Then it's fairly consistent, and then it will go dark for a stretch, then back again. I'd put it at 70% of presses cause it to light up.

I hope I did a good job of describing that!

Yes, you did. You're easy to help because you're (refreshingly) diligent and articulate.

Man, this is a tricky one -- elusive and subtle!! I think we're in the ballpark, but finding the exact row and seating may be a challenge (Who's got the ticket stub? Ha.) Part of the problem is that it appears to be intermittent. The other problem is that it's causing a wide array of symptoms. And one thing in particular doesn't seem to add up: MIDI out doesn't seem to work AT ALL.

Those weren't the exact results I was expecting, but the footswitch is very interesting.

Putting on my thinking cap and will edit this post when I have honed the theory more.

EDIT

I think we need to test mostly by booting without the OS disk because then we don't have to worry about the OS potentially being corrupt due to floppy loading problems.

What do we know?

Since the system seems to be generally stable, this seems to indicate the core of the S900 (CPU/RAM/DMA) is working properly -- the very fact it passes the RAM test points in this direction, as well.

The problem seems to extend to multiple points, but also limited to certain sub-systems (floppy, MIDI, serial, footswitch, etc.) The common thread for a multi-point -- yet not system critical -- failure leans toward features using data bus B.

I also believe our problem is limited to data bus B because if data bus A had corrupt bits, etc. then we should see wide-spread panic in the S900 and it should barely be functional, if at all.

Assuming the footswitch is physically reliable (works dependably with other equipment), if we're seeing flaky results --as you've observed--then that could be because something is occasionally happening to (at least) the lowest 4 bits of data bus B. This would also explain MIDI note corruption because the note to play is encoded in the lowest 6 bits. In other words, data bus B should not have corruption limited to only the highest 4 bits -- something has to be affecting bits 0~3, at minimum.

If both MIDI and footswitch data is being corrupted then the common point of failure should be the bus transceiver. However, if there is a problem with the bus transceiver then we MUST also see problems with the floppy (amongst other things), even if they are only intermittent.

Here's the key idea:

If we test all functions by booting without the floppy then we should have a stable system except any sub-system using data bus B (again: floppy, MIDI, serial, footswitch, etc.)

To test the floppy sub-system when booting without a floppy we can first sample in something clean sounding. Then we save to a floppy and reload it. By repeatedly loading and saving the same (intentionally unaltered) samples back to the same disk, we should see (or hear?) data corruption. Conduct this test maybe ten times or so?

If you have access to a desktop PC with an internal floppy drive (Pentium 4 Windows XP desktop, perhaps?) then we can run Omniflop and test this theory 100% by saving the disk once, making an image (our control disk), and then running the above test repeatedly on the same disk. Image the disk again (our experimental disk) and then compare our control to our experimental disk. If they are different then we're getting floppy corruption. We might even be able to pin it down to a particular bit.

If we test all functions after booting with an OS floppy then we could see lots more problems as the OS could be partially corrupt. All bets are off on this one. One boot could be perfect, the next could be wrong, the next one after could be wrong in a different way.

So, I tried the sample save/load test with a clean sample. I repeated it at least a dozen times (saving to disk, loading from disk) etc, and there never seemed to be any change or corruption in the sample. Sample was 1 second long. Not sure what the implication is there.

In other news, I'll be in the possession of an XP-running PC with a floppy drive shortly. That'll mean I can also test the serial more definitively.

Before we get there, I'm not totally clear on what the Omniflop test entails. What disk exactly are we burning? Is it OS4? Is the s900 capable of loading and saving an OS to disk on its own? Or is it just any disk image, like a few programs and samples?

I follow the control vs. experimental disk concept – comparing the unadulterated Omniflop version vs. the one put through the S900 wringer.

peterxtian wrote:So, I tried the sample save/load test with a clean sample. I repeated it at least a dozen times (saving to disk, loading from disk) etc, and there never seemed to be any change or corruption in the sample. Sample was 1 second long. Not sure what the implication is there.

Before we get there, I'm not totally clear on what the Omniflop test entails. What disk exactly are we burning? Is it OS4? Is the s900 capable of loading and saving an OS to disk on its own? Or is it just any disk image, like a few programs and samples?

If there's no corruption then traffic to/from the floppy drive is not a problem. If floppy drive traffic is not a problem then our main problem cannot be data bus B because the floppy drive lies on data bus B.

The floppy test would be exactly the test that you did, except that it wouldn't be strictly a listening test. We'd image the first round, keep doing the test a dozen+ times, then image the final result. If the first round and final round disk images were 1:1 (as compared by a hex editor, etc.) then we can definitely prove that no corruption is occurring, rather than just trying to listen to sample distortion. So it doesn't involve the OS, just the samples**.

It should be noted that listening for sample distortion may not be effective -- especially if the bits aren't always being lost and if they are the least significant bits. As evident by people with multiple chips of bad RAM in their S900, you can have data corruption and not necessarily hear it.

The longer the sample you can record, though, the better. We have a better chance of catching data corruption if we push around as much data as we can, as many times as we can. So I would record a big long chunk at the lowest sample rate.

**Incidentally, since we'd be booting from the built-in OS 1.2/1,2 then the samples will not be data compressed (as opposed to the option in OS4) so we could easily load our saved sample disk into a wave editor and also save it back should it be advantageous to do that experimentally -- totally pure silence, some special test wave, etc.

Hey, finally got Windows XP on an older PC up and running. (Had to convert from Ubuntu which was its own PITA). Omniflop and all the floppy business works fine.

Did the floppy test again. Recorded a sine tone, for 7 seconds long at 3000Hz (around the maximum before they start glitching)... Saved that onto Floppy Disk A and ejected. Repeated the test you described: save sample to disk, load sample to Akai, back etc etc, did this 12 times on Floppy Disk B.

Read these into Omniflop, saved them out as .AKAIs, loaded them into a hex editor, ran a binary difference aaaannd..... "Documents are identical"

So, not sure what this means for Data Bus B.

Hope that all narrows it down. Let me know if I there's any more tests I should do.

peterxtian wrote:Did the floppy test again. Recorded a sine tone, for 7 seconds long at 3000Hz (around the maximum before they start glitching)

What do you mean "before they start glitching"?

I think we have to make some definitive observations about things that NEVER fail, and things that fail eventually or always.

What I mean is--booting without the floppy--what can you do over-and-over and never have anything particular happen? The display never becomes corrupt (aside from the RAM test mode thing)? And the front panel buttons always seem to work, regardless of anything else?

If there's some type of sample failure which frequently happens when recorded past a certain length and never happens when recorded under that length then that could be highly useful information.

Try to determine the maximum length of time before sample corruption that occurs at the highest sample rate, the lowest sample rate, and then the rate halfway between. If the time before corruption is occurring at predictable places as the sample rate increases/decreases then that could tell us something is going wrong with addressing.

I can't think of how the RAM test is passing if the controller can't reliably address a certain segment of memory, etc. but could still be happening. This S900 seems to be a case where the facts aren't adding up, so something is being missed or a false assumption is being made. But where?

Yeah, the sample corruption was a later discovery. I guess I didn't make that clear- but yes it always happens when the memory is filled up to a certain degree. I can test it more definitively , but I found 7 seconds at 3000hz didn't corrupt samples, but 10 seconds at 3000hz did. This is without loading OS4.

Other stray observations: the disk functions work in OS4 when no memory is used up, but pressing disk causes it to freeze when they have. I haven't noticed any display problems in 1.2.

Due to the speed with which the memory test gives its result, I don't think the RAM test is very comprehensive. However, I don't know exactly what the RAM test actually does procedurally, so I don't know how error prone it may be. Definitely sounds like something memory related though.

My current theory among ever-changing theories is that the CPU tests RAM directly during diagnostics, but that there's something wrong with the DMA controller so that when the S900 is actually in use, there's some segment of RAM that goes missing or bad.

No idea how to test that theory at present though.

EDIT:

As a bit of Hail Mary idea... IC3, IC4, and IC5 are all NEC 71071C chips. Fortunately, they are off-the-shelf parts and not entirely impossible to find even in 2018. Also, they should be in sockets, so that means we can play the "shell game".

Using masking tape or a silver Sharpie or some other labeling idea, mark each chip with its original location; write 3 on the one in the IC3 position, or whatever makes sense to you. Now remove all three and rotate them to new positions -- 3 goes to 4, 4 goes to 5, 5 goes to 3. Test the S900 and see if it's behavior is different, such as the length of time that can be sampled before freaking out, or if OS4 behaves differently now, etc.

Repeat test with 3 in 5, 4 in 3, and 5 in 4. Test the S900 for anomalies. There are other permutations for chip position, but hopefully you get the idea and intent.

peterxtian wrote:On mine, IC3, 4 and 5 are all soldered in unfortunately :/. That's a lot of pins. If need be I can try and borrow a pro desoldering gun.

Yeah, that's rough

Unfortunately, we've reached murky waters with this, and I'm having trouble coming up other diagnostics that don't involve chip-swapping or low-level testing. Plus, I'm kind of at my limit of knowledge with the S900 theory of operation in this particular case. In other words, if it's a DMA problem then I'm not totally sure what we should expect to be seeing where and when.

On the upside... I have full schematics for every board, and also almost every IC is available without having to cannibalize another S900. So there's always hope, it just depends on how far you're willing to take it.