Reimplement workqueue flushing using color coded works. wq has thecurrent work color which is painted on the works being issued viacwqs. Flushing a workqueue is achieved by advancing the current workcolors of cwqs and waiting for all the works which have any of theprevious colors to drain.

Currently there are 16 possible colors, one is reserved for no colorand 15 colors are useable allowing 14 concurrent flushes. When colorspace gets full, flush attempts are batched up and processed togetherwhen color frees up, so even with many concurrent flushers, the newimplementation won't build up huge queue of flushers which has to beprocessed one after another.

Only works which are queued via __queue_work() are colored. Workswhich are directly put on queue using insert_work() use NO_COLOR anddon't participate in workqueue flushing. Currently only works usedfor work-specific flush fall in this category.

This new implementation leaves only cleanup_workqueue_thread() as theuser of flush_cpu_workqueue(). Just make its users useflush_workqueue() and kthread_stop() directly and killcleanup_workqueue_thread(). As workqueue flushing doesn't use barrierrequest anymore, the comment describing the complex synchronizationaround it in cleanup_workqueue_thread() is removed together with thefunction.

This new implementation is to allow having and sharing multipleworkers per cpu.

Please note that one more bit is reserved for a future work flag bythis patch. This is to avoid shifting bits and updating commentslater.