The system is optimized for short execution time,
having short switching time between threads.
The system provides elements (e.g. semaphores, mutexes, timers, queues etc.) used for building multi-threaded firmware.

More information about the chip may be found on Internet. The research resembles
archeological digging and puting together found parts of broken ancient vase.
Nevetheless, it is an opportunity to learn more about the chip.
Without any doubt, the best site about that is a site created by programmer that uses nickname Herman H. Hermitage. He was very determined and did a lot for
providing information about the chip.
On his github site, on profile page, there is a true photo of Konrad Zuse - real person that was german inventor and scientist in computer science during World War II.
It is interesting that, the current programmer created "his" profile on linkedin as well.

This site was created from available information on the Internet.
All information and links provided here may be useful for making some custom
booting tasks (e.g. when fast actions must be executed before Linux has been started).
Another area of application may be GPGPU - application of graphic processor for accelerating calculation.
It is also possible to start another microkernel on VPU for some specific purpose (it has
happened already, VPU executes ThreadX application that controls graphic part).

Beside of the main role of this part of the chip, i.e. 3D graphics processing, it can be
used for extra parallel calculations that are independent to CPU and may support it.
More information may be found on following sites:

FFT calculation - full source code that illustrates advanced techniques of QPU programming:
how to start a QPU program from C, passing the data between CPU and QPUs,
core synchronisation, multicore program, calling subroutine, pipe influence,
macroassembler usage.
Sources are present in Raspbianie in /opt/vc/src/hello_pi/hello_fft and are a part
of firmware repository.
There are shaders (QPU programs) in binary and source form.
If someone would build them, VC4ASM should be used for this purpose. The binary code
from firmware and created by 0.2.3 macroassembler have minor differences, that do not affect execution -
registers are cleared by different opcode.

VC4ASM (by Marcel Muller) - the best macroassembler for QPU (github project). Building it under Raspbian on raspberry-pi is trivial, under windows it requires some tunning of source and configuration (it can be build under MinGW or Cygwin). Building for another Linux than Raspbian
requires proper version of C++ compiler - at least C++11.

QPULib - interesting definition of
QPU language (by analogy to CUDA) and its compiler. All is based
on new data types, C++ templates and compilation in runtime. Compilation is done by kernel execution (QPU kernel, not Linux kernel) on CPU with overloaded operators, so
Abstract Syntax Tree created by CPU compiler can be replicated into local structures and further compiled to QPU code.

Description of these periperals can be found in: BCM2835 Peripherals document. There is also information about
interrupt controller, coresponding address spaces of VPU and CPU (inluding phisical and virtual ones)
and DMA.