I'm suffering from exhaustion at this point, but I'll try to respond quickly:

As to the number of threads versus cores, you could distinguish between "scheduled threads" and "active threads". To minimize thrashing, a scheduled thread should not start until there is a free core. Unless of course there is a deadline/priority scheduler with a pending deadline on a higher priority thread.

Hmm, I think you've got it figured out already.

"The thing is, not all interrupts are equal from that point of view. As an example, processing keyboard or mouse interrupts is nearly pure I/O."

I don't think the keyboard will be a problem.

"But processing a stream of multimedia data can require significantly more number crunching, to the point where the amount of I/O is negligible, especially if DMA is being used."

A surprising amount of IO can be done without much CPU effort, copying files between disks, sending files over network (with adapters supporting multi-address DMA and checksum). The CPU can get away with communicating memory addresses instead of actual bytes.

However, in cases where the driver has to do more with the data (1000s of cycles), then there is no doubt a thread is necessary to prevent latency of other pending requests.

Interestingly the x86 interrupt hardware already has a form of prioritization without threads. Low IRQs can interrupt even when higher IRQs haven't been acked yet, the inverse is not true. So even long running IRQs at low priority are theoretically ok since they'll never interfere with high priority IRQs.

...but the problem is that the driver code is no longer atomic, and therefor must be re-entrant. It becomes impossible to use a shared critical resource from two handlers. The second interrupt obviously cannot block, since doing so would block the first interrupt holding the critical resource as well. So, while this has it's place in extremely low latency applications, I doubt it's helpful for you.

"Sorry, did not understand this one. Can you please elaborate ?"

Consider an async model on a single processor. Now multiply that by X processors running X async handlers.
Now, you can map each device to raise interrupts on a specific CPU, also give the driver affinity to that CPU. Now when an interrupt for the device occurs, the driver can safely assume an async model for all internal resources without any synchronization.
Assuming the devices are well distributed (statically) between CPUs, then they'll be serviced in parallel.

The requirement to route the IRQs to a specific CPU may not be strictly necessary, but it simplifies the example and reduces latency caused by transferring the request. This could even be true for a threaded model?

"who carries the burden of creating and dispatching threads ? In one case, it's a third-party developer, in the other case it's the operating system. I'd spontaneously believe that the second solution would result in more people considering the threaded option, since it costs them less."

Well, there's no denying that last point. Unless you allow userspace threads which don't require a syscall?

"I've a bit went the other way around : threads sounded like the cleanest multicore-friendly way to implement the model of processes providing services to each other which I aim at, and then I've tried to apply this model to interrupt processing and drivers."

I think a clean & consistent design is a good reason to use light threads. However, part of the difficulty in providing an async userspace interface under linux (which is still a mess) was the fact that the kernel needed threads internally to handle blocking calls, even though there was no associated userspace threads to block. It's doable but not clean.

You hit some good points further down, but I'm so tired...I'd better not tackle them today.