Latest tweets

Key

IEE Proceedings - Computers and Digital Techniques

Online ISSN 1359-7027
Print ISSN 1350-2387

Published from 1994-2006, IEE Proceedings - Computers and Digital Techniques contained significant and original contributions on computers, computing and digital techniques. It contained technical papers describing research and development work in all aspects of digital system-on-chip design and the testing of electronic and embedded systems, including the development of design automation tools. It was aimed at researchers, engineers and educators in the fields of computer and digital systems design and testing.

Latest content

p. 373
–380
(8)
PC clusters have recently received considerable interest as cost-effective parallel platforms for CPU-intensive applications. A cluster of PCs generally comprises of a collection of heterogeneous process elements (PEs). To make effective use of a PC cluster, a parallel program, which is characterised by a node- and edge-weighted directed acyclic graph (DAG), can usually be decomposed into a set of precedence-constrained atomic tasks such that PEs are able to accommodate these tasks and minimise the overall program-completion time. Consequently, techniques for task matching and scheduling become extremely important for effectively harnessing the computing power of the target cluster-based system. This work presents a constructive algorithm based on ant colony optimisation (ACO). The proposed algorithm, namely ACO-TMS, adopts a new state transition rule that reduces the time required when finding the satisfactory scheduling results. The proposed algorithm also integrates a local search procedure that proposed to help improve the scheduling results. The performance of this algorithm is demonstrated by comparing it against other existing algorithms, such as the genetic-algorithm-based scheduling method and the dynamic priority scheduling (DPS) heuristic, in terms of overall schedule length of randomly generated DAGs. Experimental results indicate that the proposed algorithm outperforms the genetic algorithm and the DPS heuristic algorithm for high communication to computation and heterogeneous computing environment.

p. 381
–388
(8)
Hardware prefetching schemes which divide the misses into streams are generally preferred to other hardware based schemes. But, as they do not know when the next miss of a stream happens, they cannot prefetch a block in appropriate time. Some of them use a substantial amount of hardware storage to keep the predicted miss blocks from all streams. The other approaches follow the program flow and prefetch all target addresses including those blocks which already exist in the L1 data cache. The approach presented predicts the stream of next miss and then prefetches only the next miss address of the stream. It offers a general prefetching framework, two-phase prediction algorithm (TPP), that lets each stream have its own address predictor. Comparing the TPP algorithm with the latest variant of stream buffers and Markov predictor using SPEC CPU 2000 benchmarks shows that in average (1) the TPP approach has 18% speedup compared to 1% speedup in Markov and 0.05% in stream buffers. (2) 78% of the TPP prefetches have been useful, whereas in stream buffers and Markov, only 18% and 24% of them were useful, respectively.

p. 389
–398
(10)
The load–store queue (LSQ) of modern superscalar processors is a critical and non-scalable component responsible for keeping the order of memory operations. As new architectures become more aggressive, the number of in-flight memory instructions increases, and the LSQ must satisfy higher capacity requirements. An efficient LSQ state filtering mechanism based on Bloom filtering is proposed, which, in conjunction with a dynamic or profiling-based predictor, provides significant energy reduction (up to 55% in the LSQ and 4% in the whole processor), and only incurs a small performance loss.

p. 399
–405
(7)
A new modulo 2n−1 addition algorithm is presented, which is applicable in the residue number system. In contrast to previous work, the input carry in the first stage of the addition is set to one. The associated output carry is then used to conditionally modify the sum to produce the correct modulo 2n−1 result. Moreover, unlike recent adders in the literature, the result never exceeds the dynamic range of the modulus. Actual VLSI implementations using 130 nm standard-cell technology show that the corresponding architectures provide improved trade-offs in the power–delay–area space when compared against existing designs.