Change History
(52)

Can you please add the output of "listusb -v /dev/bus/usb/0/0" here. The device stalls the request to set the alternate interface, which is possibly ok if there aren't any alternates. I don't know the USB midi protocol, it might also be required to set the alternate interface to enable device functions (like in other USB protocols).

Update: I switched to the Midisport 2x2 that is normally attached to my Ubuntu machine, and find that this does not stall on the set-alt_interface call, and the '/dev/midi/usb' entry gets created.
(Both this and the Uno work equally well under Ubuntu.)

PatchBay, however, still did not see any ports when I plugged the Midisport in. Then I finally read the midi_server docs, and saw that the server only looks for /dev/midi devices at boot-up!!(!) I guess someone wasn't thinking of USB and hotplugging when that decision was made... This needs fixing at some point, otherwise users will be very confused (like me!).

When the device was connected before booting, PatchBay finally saw the ports, but unfortunately there is still no actual MIDI input available when I attach my keyboard. Looking at syslog, I see a continual loop:

I've reworked DeviceWatcher in hrev33118 to now dynamically detect device , not only at start up. Removing a device is not yet implemented, though, but I plan to do it soon.
Meanwhile, could you please check if now your USB adapter is detected:

dynamically, when connected while midi_server run

during midi_server startup (already connected at boot time, or when killing/restarting midi_server), as before.

As I don't plan to buy another USB midi adapter soon, if ever, and until I enhanced our usb_midi driver to support my Roland UM2 too, I can't test these changes with actual hardware myself. I've done my tests by faking /dev/midi/* entries, which is not a complete one.

I've reworked DeviceWatcher in hrev33118 to now dynamically detect device , not only at start up. Removing a device is not yet implemented, though, but I plan to do it soon.
Meanwhile, could you please check if now your USB adapter is detected:

dynamically, when connected while midi_server run

during midi_server startup (already connected at boot time, or when killing/restarting midi_server), as before.

Unfortunately, it doesn't seem to work. (:-()

Plugging the USB in after boot-up results in two inputs and two outputs showing up in PatchBay! Neither of them actually seem to
pass MIDI events.

If the USB is plugged in before boot, I get a lockup (in both PatchBay and MusicWeaver) when I try to start the app. (Or in the case of MW when I try to load a configuration that connects to the ports.) (By 'lockup' I mean that the program freezes -- the 3-finger remedy will kill it and Haiku continues.)

I'll try to do some debugging sessions and see if I can figure something out.

I should also note something that I observed since posting the original bug. If the USB is unplugged before a shutdown/restart, I invariably get a KDL page-fault panic from 'MidiPortProducer' (presumably because the disconnection is not being "watched" either!).

Preliminary findings:
I took DeviceWatcher.cpp, stripped all the Add/RemoveDevice stuff
out and turned it into a little standalone test app (just printing
out the message contents).

If I start the test up, and plug in the USB MIDI (for the first time) I get two reports! They're identical as far as I can see.

Ahah! I didn't look quite carefully enough at the contents of the two messages (via PrintToStream). The 'name' field of one is "0", and of the other is "usb". So obviously one (the second!) is from directory creation, the other is from the device node itself. As the directory is never deleted, one obviously only gets one message from then on. [But why does one see the directory creation when B_WATCH_FILES_ONLY is set? (:-/] Interestingly the 'path' field is "/dev/midi/usb/0" in both messages.

The temporary fix is obviously to check the name field against the last item in the path, but presumably this is a bug in PathMonitor.

In hrev33202 I fix two major DeviceWatcher.cpp issues. I add an extra check to enforce that a valid path for a device should be a file, not a directory.
I don't understand how a duplicate path could now be published twice, as I check against the known path -> device's endpoints map.

This doesn't work, because '/dev/midi/usb/0' is not recognized as a file!

Damn. I wonder why, though. Maybe a devfs entry is not a regular file, and BStatable don't report them as file. Okay, will reverse the check then, calling IsDirectory() instead.

This probably avoids the dual entry anyway
(you weren't doing the check in hrev33118 which is what I was looking at).
I haven't got to check that yet.

Oh. Hope it explains the dual entry indeed.

Once I took out the 'IsFile' check, which prevented any port being found,
_AddDevice seems to work. (The extra thread has prevented the lockups.)

Nice to hear it.

However, _RemoveDevice just hangs, because the destructor for MidiPortProducer
just hangs at the wait_for_thread().

Hum, yes, and I see why: I'm stupid. See below how much.

As a result I still get to KDL when I quit midi_server.

I now have to find out why the thread isn't quitting, unless you have an immediate solution.

MidiPortProducer reader thread is blocking on read(), and I do nothing to wake him. Before the wait_for_thread(), a send_signal(fReaderThread, SIGINT) should do it.
If not, let's try the brutal kill_thread(fReaderThread) instead, but it will leak SysEx buffer eventually.

I just checked, and indeed BEntry.IsFile() only check for regular file, so return False on char/block device entries. I dunno if it's the same under BeOS, BTW.
That's what happend when one test his code with regular files he create/remove because he can't play with actual /dev/midi/* entries :-\

Anyway, the path should not be a directory. Symlinks are not supported by devfs currently, but maybe in order to support them in the future we should also tranverse them before checking the target is not a directory.

MidiPortProducer reader thread is blocking on read(), and I do nothing to wake him. Before the wait_for_thread(), a send_signal(fReaderThread, SIGINT) should do it.
If not, let's try the brutal kill_thread(fReaderThread) instead, but it will leak SysEx buffer eventually.

Or you could close the FD that the reader thread is read()ing from. That should also wake up the thread and seems like the cleanest solution.

Or you could close the FD that the reader thread is read()ing from. That should also wake up the thread and seems like the cleanest solution.

The file descriptor is open once (twice is not available, as /dev/midi/* devices will allow only one) and shared between MidiPortProducer and MidiPortConsumer. I don't like the idea that MidiPortProducer could close it behind MidiPortConsumer's back.

Anyway, on device remove, we already close the FD in DeviceWatcher::_RemoveDevice().
Seems it's not enough, thought, to wake up MidiPortProducer reader.
Dunno why, but I'm getting old and have a flu.
Yes, that's shame excuses ;-)

Or you could close the FD that the reader thread is read()ing from. That should also wake up the thread and seems like the cleanest solution.

The file descriptor is open once (twice is not available, as /dev/midi/* devices will allow only one) and shared between MidiPortProducer and MidiPortConsumer. I don't like the idea that MidiPortProducer could close it behind MidiPortConsumer's back.

Anyway, on device remove, we already close the FD in DeviceWatcher::_RemoveDevice().
Seems it's not enough, thought, to wake up MidiPortProducer reader.
Dunno why, but I'm getting old and have a flu.
Yes, that's shame excuses ;-)

[Darn -- all these comments while I'm having a lie in... (:-/)]

I too quickly realized last night that the read was hanging, and this is even though the FD is "closed" before the destructor is called. The read actually hangs on a semaphore that's not affected by the close (all in usb_midi), so I tried putting a delete_sem in the close function. Had no effect (!) so I went to bed...

I'll get back to digging into causes in a moment.

(I don't think there's any harm in closing both the Producer and Consumer at once. Events might be lost, but they're hardly important at that point!)

I just checked, and indeed BEntry.IsFile() only check for regular file, so return False on char/block device entries. I dunno if it's the same under BeOS, BTW.

Anyway, the path should not be a directory. Symlinks are not supported by devfs currently, but maybe in order to support them in the future we should also tranverse them before checking the target is not a directory.

The problem is that this is not the problem... (:-/)
The two messages that PathMonitor [buggily, I believe] sends when the device -- and directory -- are first created both contain the same 'path' field: "/dev/midi/usb/0". So what was happening before you put in the map check was that they were both invoking _AddDevice the same way and we got two entries.

What I did was to compare the Leaf of a BPath created from the path with the 'name' field of the message, which differed ("0" and "usb" respectively) in the two messages, and only called _AddDevice if they matched. This works, but is probably redundant as your map check now prevents duplicates anyway (checked).

I too quickly realized last night that the read was hanging, and this is even though the FD is "closed" before the destructor is called. The read actually hangs on a semaphore that's not affected by the close (all in usb_midi), so I tried putting a delete_sem in the close function. Had no effect (!)...

OK, I'm foxed... The usb_midi_close function in usb_midi.c is just not getting called! I have a print showing the close is being invoked on the FD that was opened, and printouts afterward showing it returned, but a (syslog) printout at the beginning of usb_midi_close never appears! What can possibly intervene? The read is hung on the semaphore, but there's no single threading here, is there?

Guys, you definitely have a bug there. Maybe the MIDI driver is blocking on the next interrupt transfer? I have not looked at the code yet, but I assume you schedule a transfer, and in the transfer done hook, you release a semaphore. In your read hook, you probably block on that semaphore, but this should be done with a timeout. That way, you can unblock the read thread from time to time and bail out if the device has been closed meanwhile. Have a look at the USB HID driver, it should pose a very clean and good example of how to do this.

Let's not open a new ticket quite yet. I think the reason the close is not working is that the node is getting removed when the device is unplugged, before the close actually gets called. I'll see if I can sort it out later today.

Sorry, folks -- I'm going to have to give this a rest for a while. See if I can clear my head after spending most of four days on it.

Obviously using a 'close' triggered by DeviceWatcher won't work, but I'd have thought I could make the waiting read quit gracefully at least. (I've even tried putting midi_server aside, and just doing
"cat /dev/midi/usb/0 >/dev/null. Little difference.)

However everything I try just KDLs. It looks as if the page fault is occurring in devfs_read, and not at the usb_midi level at all, so I'm completely lost. It's probably happening when I end the read-wait by deleting the semaphore or timing out (I've tried both, with essentially the same outcome).

OK, I'm foxed... The usb_midi_close function in usb_midi.c is just not getting called!

Eventually had the brilliant idea (:-/) of checking the return value of the close! Of course it is an error -- "Bad File Descriptor".

Now the question is, why?

There are two obvious reasons: It is really the wrong FD index, or it has already been closed before (via close() or dup2()).

I can only wholeheartedly recommend getting familiar with the kernel debugger and kernel tracing. E.g. the "iocontext" command can tell you what file descriptors are currently open in a team. Syscall tracing (the kernel tracing option) will allow you to examine what happened to the FD (open()/close()/dup[2]() calls). There are even (hardware) breakpoints available in the kernel debugger.

Could we have an issue related to USB remove hook & node deletion while it's still opened!?

We actually have something of a logical gotcha here! We are watching for the '/dev/midi/usb/0' node to be removed, and then we go to close it! Ouch...

There's nothing wrong with first removing a file/unpublishing a device and then closing still open FDs to that file/device. File systems and drivers must handle this situation gracefully. If they don't, they are buggy.

If that cycle takes 20 minutes you either have a very slow machine or you're doing something suboptimally. Rebuilding the kernel and/or a driver should take only a minute or so (the most time being spent for the initial jam startup and header scanning).

Regarding the "waiting on a semaphore and still being interruptable by a close()" issue: If something like deleting the semaphore is not an option, the experimental wait_for_objects() API (<OS.h>) introduced in Haiku might be a good alternative to timeout polling. It allows for waiting for events on any number of FDs, semaphores, ports, or threads. I.e. one could monitor the semaphore one is interested in and a second semaphore indicating a close.

[Augghh!! I just spent half an hour composing an entry, but added a couple of attachments before I previewed it -- and lost it all!! I will try to recreate it....]

After a few more days of hell [and a pleasant interlude or two -- like a weekend of music in Golden Gate Park (:-)] I think I have the page fault problem licked.

As far as I can tell, the faults seem to have been caused by the driver and its objects going away before the system was finished accessing them. The crash was never in the actual usb_midi code at all. A fragment of serial output and stack trace is attached that shows something of what was happening. In this case it seems that legacy_drivers::UninitDevice was trying to reach now-absent data.

To prevent this, I've moved object deletion as late as possible. The device removal itself is now in usb_midi_free. Even this was apparently not enough if too many dprintf statements were also enabled, so I think there must still be a hazard somewhere. (Part of the reason I was having so much trouble tracking things down.) With printout suppressed, though, it seems to be stable.

The slightly unfortunate result (at least I think it's a result -- the crashes were preventing me seeing exactly what was happening before) is that each time one unplugs and replugs the usb cable, a new device is created in /dev/midi/usb. Originally /dev/midi/usb/0, it becomes /dev/midi/usb/1, and so on. The device nodes don't accumulate, but each is not deleted until the cable is plugged in again, and of course they become safely inactive when it's unplugged. I think this is because publish_devices is not invoked when the device is finally removed; when it is called in response to a new device things get corrected.

I think I can live with this, but the user will need to be aware of the need to adjust things if they replug the cable.

I've attached a zip of the sources (not a diff) that have changed (since hrev33202).

To prevent this, I've moved object deletion as late as possible. The device removal itself is now in usb_midi_free. Even this was apparently not enough if too many dprintf statements were also enabled, so I think there must still be a hazard somewhere. (Part of the reason I was having so much trouble tracking things down.) With printout suppressed, though, it seems to be stable.

The free hook is supposed to be called only after all open handles have been close()d and all other calls like read/write/ioctl have completed. So if there is still an open count in the free hook or access to the device after the free hook returned that'd be a bug in the devfs code for legacy drivers.

I think this is because publish_devices is not invoked when the device is finally removed; when it is called in response to a new device things get corrected.

Any add/remove of a device that matches the support descriptors of your driver will also cause a rescan of the driver to happen. It should in that case call publish_devices also after the unplug. That requires that you have the list updated before returning from the device_removed hook you provide. Again, if this doesn't work there might be a bug in the devfs code.

I think I can live with this, but the user will need to be aware of the need to adjust things if they replug the cable.

Um, they are bugs, so living with them isn't really the way to go ;-)

I've attached a zip of the sources (not a diff) that have changed (since hrev33202).

It's generally easier if you avoid attaching zips. That is because the zips can't be viewed inline in the bug tracker.

Replying to Pete:
The free hook is supposed to be called only after all open handles have been close()d and all other calls like read/write/ioctl have completed. So if there is still an open count in the free hook or access to the device after the free hook returned that'd be a bug in the devfs code for legacy drivers.

I'm afraid I got fairly confused trying to track the order of occurrences, especially as the dprintfs often seemed to cause the page faults. I'm checking the open count in the free hook (as well as checking the active flag) and with printouts enabled these seemed to be correct.

I'm not sure how to dig deeper. I tried to recompile just the kernel (from the slightly older sources I currently have) with some dprintfs in legacy_drivers.cpp, but I got a (repeating!) page fault at boot. I assume I should be able to do that if everything matches (?) so I'll try again when I get the alpha release and matching sources installed.

I think this is because publish_devices is not invoked when the device is finally removed; when it is called in response to a new device things get corrected.

Any add/remove of a device that matches the support descriptors of your driver will also cause a rescan of the driver to happen. It should in that case call publish_devices also after the unplug. That requires that you have the list updated before returning from the device_removed hook you provide. Again, if this doesn't work there might be a bug in the devfs code.

Again, I'm not sure of the exact sequence, or what actually triggers the call to publish_devices. And I'm not sure what you mean by the 'device_removed hook' -- I don't find that in the set of hooks. There's a 'close', a 'free', and an 'uninit_driver'.

'remove_device_info' (in devlist.c) gets called (now) in 'usb_midi_close', and adjusts the driver's own list of names (ready to be published), but doesn't trigger any external action. Then 'remove_device' gets called in 'usb_midi_free'. So it does rather sound to me as if 'publish_devices is not getting invoked when it should be.

I think I can live with this, but the user will need to be aware of the need to adjust things if they replug the cable.

Um, they are bugs, so living with them isn't really the way to go ;-)

Well, I suppose... (:-)) but I just checked, and Linux does pretty much the same thing -- except that the device does disappear when unplugged. Here's what I got:

The other difference is that alsa itself manages to keep the same 'client number', and also lets you address the device by name.

I've attached a zip of the sources (not a diff) that have changed (since hrev33202).

It's generally easier if you avoid attaching zips. That is because the zips can't be viewed inline in the bug tracker.

Ah. OK, sorry. (I was really intending it for Philippe's use, thinking it might be easier for him to grab and look at.) I'll attach the source of usb_midi.c, as I think that's the only one that's been affecting the crash.

I'm not sure I made it clear that, if the device is not opened by any user (e.g. midi_server), there is no problem. Plugging in the USB MIDI creates /dev/midi/usb/0, unplugging removes it again. It's only if there is a connection when the cable is unplugged that things go wrong.

Specifically, if there are no opens, the call of usb_midi_removed() (by the usb manager I guess) calls remove_device_info() and remove_device() before publish_devices() gets invoked, and all is fine.

However, if there is a connection, with a waiting read(), this sequence will cause a page fault in devfs_read() -- see attached trace. So if there are open connections both the above calls are now deferred: remove_device_info() seems to be safe in usb_midi_close(), but it seems that remove_device() can't safely be called before usb_midi_free(). In any case, remove_device_info() then happens after publish_devices(), causing the successive device stepping mentioned previously.

In addition, if debug printouts are enabled, there is invariably a page fault in legacy_drivers::UninitDevice().

I've taken the serial output with debug output enabled -- first plugging and unplugging (twice) with no open call, and then with a simple 'cat /dev/midi/usb/0 >/dev/null' -- and edited it down to just the printout, with interspersed comments in '[...]', in hopes it will show more clearly what's happening:

My suspicion is strongly that the way devfs currently handles things isn't adequate for this situation. I don't think my knowledge of that area is going to be sufficient to track things down, so I'd be really happy to cooperate with someone with more detailed expertise.

Finally...(!)
A slight reorganization of the objects used by usb_midi seems to have at last given stable handling of USB MIDI plugging and unplugging. It no longer steps the device number when replugged, either. I've attached a patch against (approximately) the alpha1 sources. [The reference sources are the berlios.de source snapshot of 2009/9/28,
but nobody has yet been able to tell me how to find the revision number of this (:-/)]

The main fix was to make the usb device info structure and the driver cookie a little more independent of each other, and to carefully disconnect them before either is deleted.

I also (eventually) realized that one thing bedevilling me was a bug in legacy_drivers that had been fixed independently (ticket #3856), so things work a bit better under alpha1!

Patch applied in hrev33782.
I did some minor changes in midi_server part: I've kept the directory/symlink checks in !_AddDevice() and to not check path/name consistency in B_ENTRY_CREATED notification, as if it's a necessary hack we must fix it in BPathMonitor, not here.

Would you mind emailing me (or something) the violations you found?
I thought I'd been careful. (:-/)

Patch applied in hrev33782.
I did some minor changes in midi_server part: I've kept the directory/symlink checks in !_AddDevice() and to not check path/name consistency in B_ENTRY_CREATED notification, as if it's a necessary hack we must fix it in BPathMonitor, not here.

Yes, I slightly stand corrected... On reflection, the check for a directory will be needed when BPathMonitor is fixed (as it should be) because it will then correctly announce the creation of the directory itself, rather than the device entry twice.

And somehow your addition of the 'ContainsKey' check escaped my notice. This of course fixes the multiple Endpoint problem without the need for my path/name protection.

So I've checked it and everything seems to work. (There now seems to be a separate problem of interference between USB devices -- plugging in my Wacom seems to kill the MIDI stream rather quickly! That's another ticket, though, when I have better data.)

The other few minor changes were coding style:

no space between cast and variable

no space between type and asterisk

no space around expression

expand if else statement to 4 lines at least.

I'm glad to note that all these -- aside from the if-else -- were in the code I inherited from you...! (:-))

Wasn't the case in my code. That's pure luck maybe, as I'm not known
for respecting well our coding style, but at least that was't there before.

Sorry to persistently disagree about this (:-/), but it was! All those sections of code were exactly what I downloaded in changeset hrev33202 (source zip form, but even looking at the diff in Trac I see one of them!). I edited nothing in that area, so however it got there, it wasn't me...!

Sorry to persistently disagree about this (:-/), but it was!
All those sections of code were exactly what I downloaded in
changeset hrev33202 (source zip form,

You're right, I just forgot your patch was made against hrev33202.
I fixed these code style issues of mine (indeed) were fixed in hrev33265,
and I though your patch was against a more recent revision.
Sorry about that.