* memory allocated by kmalloc(GPF_DMA) is still cacheable buffer, which would require cache flush/invalidate. (by experience)

* It seems that cache invalidate will NOT clean the cached write buffer; if there are memory writes without cache flush, even cache invalidation is done after device DMA, the data read may still be the previous cached memory write. So ensure the cache is flushed properly before device starting DMA. (by experience, ARM11MPCore)

Buffer allocated by dma_alloc_coherent cannot be invalidated/cleaned/flushed by dma_cache_maint, or you will get an BUG()

Buffer allocated by dma_alloc_coherent will have address at 0xfXXX_XXXX

arch/arm/mm/consistent.c

/* * Make an area consistent for devices. * Note: Drivers should NOT use this function directly, as it will break * platforms with CONFIG_DMABOUNCE. * Use the driver DMA support - see dma-mapping.h (dma_sync_*) */void dma_cache_maint(const void *start, size_t size, int direction){ const void *end = start + size;

2009年6月10日 星期三

Integrated ESL flows can be extremely effective for algorithmic-based designs,

1. write untimed C code

to model the system and develop algorithms.

develop a bit-accurate C model partitioned into functions that are roughly equivalent to the major hardware blocks in the system. The conformance tests are then run on the C code to verify correctness.

2. then refined for hardware implementation using C synthesisautomatically generates the RTL implementation

optimize the blocks in the system for hardware implementation. This is based on an iterative process of running high-level synthesis and checking the performance, area and power of each block in the system. The C code is then modified as needed to improve the block

sequential equivalence checking is used to verify that the optimized C code is functionally the same as the "golden" C code

A subset of the system-level simulation is run to verify that all the blocks, including changed blocks, continue to work together as expected. This process continues until all blocks have been optimized for hardware implementation.

Formal equivalence checking is used throughout the process to keep C code refinements in sync with the original untimed model and to comprehensively verify that the RTL functionality matches the C code.

Experienced designers write the initial C code so that much of it can be reused in later refinements. This requires an understanding of how to express hardware detail in C code while maintaining fast simulations.

3. Once all blocks are optimized and the design meets the quality targets, the entire system is synthesized from C to RTL. Sequential equivalence-checking is run once again to verify that the RTL for the ASIC has the exact functionality of the "golden" C code. At this point, a final sign-off verification is run on the RTL before moving on to RTL synthesis.

Three major differences exist between this integrated ESL flow when compared to a traditional flow that uses handwritten RTL

1. by creating C code, designers can focus on the algorithm development, because changes are easier and quicker to make

2. majority of simulation is done in C code, which typically simulates about 10,000 times faster than RTL, allowing more vectors to run in less time

designers have greater opportunity to improve their algorithms and design quality

3. RTL is formally verified. Having an automated, formally verified path from system to RTL guarantees the correctness of the RTL and requires only one RTL simulation.

FPGA in ESL There are times when an FPGA prototype, instead of a virtual one, is more appropriate.1. performance of the hardware is subjective2. the number of vectors required overwhelms even C simulation

1. untimed, bit-accurate C code

tight integration of both hardware and software, which directly influences the verification effort

partitioning needs to consider hardware/software partitioning along with the major hardware blocks in the subsystem

2. The C code is modified to be synthesizable, and high-level synthesis is used to construct RTL for the prototype. This RTL is rarely simulated, because it is immediately put on an FPGA for exhaustive testing

RTL for the FPGA is now the "golden" simulation model

3. Because many of the architectural decisions needed for the FPGA may not apply to the ASIC implementation, a new version of the C code targeted for the ASIC hardware must be developed

eliminates virtually all RTL simulationC simulation is only used for initial testing and debugging, and the only RTL simulation is for final RTL signoff. By eliminating the traditional long simulation times, this ESL flow delivers massive reductions in the development time and significantly increases the quality of the product