ECE Review

How could I improve ...?
How could I use ... differently?
What could I do instead of ... to achieve the same
goals?
How would I do things differently if I could start over?
How could I take the benefits of X and combine them with the benefits of Y?

Saturday, January 22, 2011

Quote one email thread from Intel IPP for the understanding of DPB operation/decoding delay idea in h264:

Well, I think you're incorrect on the buffer (DPB) described in the H.264 specification is merely a suggestion as such. The buffering mechanism (however it is handled) will have to adhere to the specification to claim it is a conforming decoder (see Appendix C). Note that there are two types of conformance, output timing conformance and output order conformance.

To my knowledge: The Intel implementation in the IPP samples can only deliver in the correct reordered output order, whereas some of the other codecs also allow decoding order (immediate) output ordering. When providing the reordered output, buffering in the decoder needs to take place to take care of the reordered frames. Usually, this would be B-frames, but in H.264 this can also be P-frames. Therefore, for the GOP pattern described, we can not really know - but we assume that no reordering is taking place (as it just adds to the delay). In general, the decoder can not in advance know whether or not a reordered picture may appear at some point in the stream. Therefore, it seems that Intel has chosen a "safe path" in that the decoder use the "worst" possible buffering (delaying) that would be necessary to deliver the stream in a fluent manner. Elaborating on that, if the decoder did not buffer (delay) and an out-of-order picture suddenly appears, the flow out of the decoder would contain a gap as the out-of-order picture would need to be buffered before output. In other words, the decoder will buffer to the maximum number of pictures that is allowed for a given stream (I'll come to that later) to be able to deliver the frames in a fluent (one-by-one) flow.

The maximum buffering required is determined by the 'max_dec_frame_buffering' parameter as described in the H.264 specification in Annex E. This is part of the bitstream_restrictions in the VUI parameters of the SPS. As it is optional, the parameter is to be derived from 'MaxDpbSize', which again is specified/derived from the profile and level and the coded picture resolution as defined in Annex A. Note that the 'max_dec_frame_buffering' parameter is constrained at the low end to be >= the 'num_ref_frames' parameter of the SPS. The Intel decoder uses the 'max_dec_frame_buffering' parameter to set the "worst-case" buffering, and thus you can with the right encoding parameters and with the proper addition of the VUI parameters set this as low as possible to obtain the smallest possible buffering.

The 'max_dec_frame_buffering' parameter defines the maximum for the 'num_reorder_frames' parameter, which is also given in the VUI set. This thus sets a limit to the amount of reordering that can occur in a stream. This is actually the only information a decoder can derive about reordering directly from the H.264 stream. The SPS thus does not explicitly state whether or not there will be B-pictures in a stream (and P-pictures may also be reordered), and it also does not state if they actually do appear, i.e. even if the 'num_reorder_frames' parameter is >0, the stream is not required to actually use it.

Anyway, this does not mean you can handle it otherwise; especially if you have a closed-circuit system with control over both encoder and decoder side, as it seems to be the case. In this case, it is essential to choose the right encoding parameters and provide the right information in the stream, and/or adapt the decoder to use as little buffering as possible.

Sunday, December 7, 2008

Utils for resource sync provided in Linux Kernel- Semaphore and mutexes Process could be blocked and sleep. Not used in ISR. Semaphore here is not preferred to use as event sync util. And it is ususally optimized for the "available" case. Completions could be used for event sycn.- RW semaphores best used when write access is required only rarely and writer access is held for short period of time.- Spinlocks Process could NOT be blocked. Higher performance than semaphores but has a lot of constraints: -- better used on SMP systems -- when holding a lock, the code must be atomic, non-blocking and non-sleep. so preemption would be disabled for this core when holding the lock; the interrupts are disabled either for this core. -- when holding a lock, the time should be as little as possible.- RW spinlocks

Alternatives to locking- Lock-free algorithms: circular buffer- Atomic variables- Bit operations- seqlocks: small, simple and frequently accessed and where write access is rare but must be fast. Non pointers in the protected data.- Read-Copy-Update: reads are common and writes are rare. The resources must be accessed via pointers, and all references to those resources must be held only by atomic code.

* User space and kernel space- Transfering from user space to kernel space by system calling(exception) or hardware interrupts. Not the difference between these two are the executing context. TThe system call is in the context of a process (therefore could access the data in the process's address space) while the ISR is in its own context and not related to any processes.

- Concurrency in the kernel-- multi-processes which might use the driver at the same time-- HW interrupts and softirq like timers, tasklets and workqueue, etc.-- Kernel is preemptive from 2.6All of these cause the kernel and drivers must be reentrant.

- Multithreading mapping (user space to kernel space)-- Multiple to one: all the threads are mapped to only one schedule unit. Blocked if one thread is blocked.-- One to one: each thread is mapped to one schedule unit. Two complicated in communications.-- Multiple to multiple: Light weighted process (LWP) which share the data in between and form the process group. In Linux the one2one mapping between threads and LLWP is supported.

* Major and minor numbers- The major num identifies the driver associated with the device. Usually one majoor num one driver.- The minor num identifies exactly which device is being referred to.

* Some important data structs related to the device driverMost of the fundamental driver operations involve three important kernel data structs, called file_operations, file and inode.- File operations: implementing various system calls interface- File: represents an open file- Inode: internally represent files in kernel. One file could have multiple files struct but have only one inode struct.

Sunday, June 22, 2008

How to solve the problems, priority inversion and deadlock, of resource sync?

* Critical section patternpros: avoid both problemscons: cost high

* Priority inheritancepros: avoid priority inversion and simplecons: deadline and chain blocking could happens chain blocking: J1 needs S1 and S2 but S1 is used by J2 and S2 by J3. The priority of them is P1>P2>P3. Therefore, J1 must wait for both of them finish.

* Highest locker patternOne prio ceiling is defined for each resource in system design. The basic idea is that the task owning the resource runs at the highest prio ceiling of all the resources that it currently owns, provided that it is blocking one or more higher prio tasks. In this way, chain blocking is avoided.

pros: avoid chain blockingcons: deadlock still exists

* Priority ceiling patternThe idea is to ensure that when a job J preempts the critical section of another job and executes its own critical section, the priority at which this new critical section will execute is guaranteed to be higher than the inherited priorities(so could run) AND the ceiling priorities(so could continue) of all the preempted critical sections in the system. The diff from Highest locker is in the first place the job would not be assigned to the priority ceiling of the locked resource.pros: avoid bothcons: cost high

* Pool allocation patternpros: more flexible than static allocation to satisfy dynamic requirements. sub deterministic allocation/deallcation time(possible the pool runs out) and no mem fragmentation.cons: the num of objs in the pool needs to be explored for the best performance of diff systems.

* Fixed sized buffer patternpros: no mem fragmentation since it always uses the worst case of mem requirement.cons: waste mem on average. And it could be improved by managing varied size heaps.

Wednesday, March 12, 2008

The following concepts are not equal:1. Device : physical chip2. Device Driver : codes to control the chip3. Interface : routines provided to users and hide the chip details underneath. 4. ISR : 5. kernel module : object files which could be loaded at running time and enhance the kernel functionality.