Shrinking of device dimensions has undoubtedly enabled the very large scale integration of transistors on electronic chips. However, it has also brought to surface time-zero and time-dependent variation phenomena that degrade system's performance and threaten functional operation. Hence, the need to capture and describe these mechanisms, as well as effectively model their impact is crucial. To this extent, we follow existing models and propose a complete framework that evaluates failure probability of electronic components. To assess our framework, a case-study of packet-switched Network on Chip (NoC) routers is presented, studying the failure probability of its SRAM buffers.

Approximate computing is a paradigm for high performance and low power design by compromising computational accuracy. In this paper, the structure of an approximate modified radix-4 Booth multiplier is analyzed. A probabilistic error model is proposed to facilitate the evaluation of the approximate multiplier for errors from the approximate radix-4 Booth encoding, the approximate regular partial product array, and the approximate 4--2 compressor. The normalized mean error distances (NMEDs) of 8-bit and 16-bit approximate designs are found by utilizing the proposed model. The results from the error model and the corresponding analytical framework are close to those found by simulation, thus confirming the validity of the proposed approach.

The advent of the first TiO2-based memristor in 2008 revived the scientific interest both from academia and industry for this device technology, with several emerging applications including that of logic circuits. Several memristive logic families have been proposed, each with different attributes, in the current quest for energy-efficient computing systems of the future. However, limited endurance of memristor devices and variations (both cycle-to-cycle and device-to-device) are important parameters to be considered in the evaluation of such logic families. In this work we build upon an accurate physics-based model of a bipolar metal-oxide resistive RAM device (supporting parasitics of the device structure and variability of switching voltages and resistance states) and use it to show how performance of memristor-based logic circuits can de degraded owing to both variability and state-drift impact. Based on previous work on CMOS-like memristive logic circuits, we propose a memristive ratioed logic scheme, which is crossbar-compatible, i.e. suitable for in-/near-memory computing, and tolerant to device variability, while also it does not affect the device endurance since computations do not involve switching the memristor states. As a figure of merit, we compare such new logic scheme with MAGIC, focusing on the universal NOR logic gate.

ReRAM technologies feature desired properties, e.g. fast switching and high read margin, that make them attractive candidates to be used in non-volatile flip-flops (NVFFs). However, they suffer from limited endurance. Therefore, cell degradation considerations are a necessity for practical deployment in non-volatile processors (NVPs). In this paper, we present two bipolar ReRAM-based NVFFs, Hypnos and Morpheus, with enhanced endurance and energy efficiency. Hypnos reduces the ReRAM electrical stress during set operation while keeping the imposed NVFF area overhead at a minimum. In Morpheus, a write-termination circuit is used to further enhance the ReRAM endurance and energy efficiency at the cost of an affordable area overhead. Moreover, both NVFFs feature run-time tunable resistive states to enable online adjustment of the tradeoff among endurance, retention, energy consumption, and restore success rate (in case of approximate computing). Experimental results demonstrate that Hypnos reduces the ReRAM set degradation by 91%, on average. Moreover, the write-termination mechanism in Morpheus further reduces the remaining degradation by 93%/97% in set/reset operation, on average. The results also demonstrate enhanced energy efficiency in both NVFFs.

Recent artificial neural network architectures use memristors to store synaptic weights. The crossbar structure of memristors is used because of its dense structure and extreme parallelism. Transistor aging impacts their computational accuracy. An enhancement of the memristor-based neural network architecture is introduced using built-in current-based calibration circuit. It is shown experimentally that the proposed approach alleviates the cell aging effect.

The crossbar nonidealaties may considerably degrade the accuracy of matrix multiplication operation, which is the cornerstone of hardware accelerated neural networks. In this paper, we show that the crossbar nonidealities especially the wire resistance should be taken into consideration for accurate evaluation. We also present a simple yet highly effective way to capture the wire resistance effect for the inference and training of deep neural networks without extensive SPICE simulations. Different scenarios have been studied and used to show the efficacy of our proposed method.

Data converters are ubiquitous in data-abundant systems, where they are heterogeneously distributed across the analog-digital interface. Unfortunately, conventional data converters trade off speed, power, and accuracy. Furthermore, intrinsic real-time and post-silicon variations dramatically degrade their performance. In this paper, we employ novel neuro-inspired approaches to design smart data converters that could be trained in real-time for general purpose applications, using machine learning algorithms and artificial neural network architectures. Our approach integrates emerging memristor technology with CMOS. This concept will pave the way towards adaptive interfaces with the continuous varying conditions of data driven applications.

We propose and investigate a nanoscale multi-junction network architecture that can be configured on-flight to perform Boolean logic functions at room temperature. The device exploits the electronic properties of randomly deposited molecule-interconnected metal nanoparticles, which act collectively as strongly nonlinear single-electron transistors. Disorder is being incorporated in the modeling of their electrical behavior and the collective response of interacting nano-components is being rationalized. The non-optimized energy consumption of the synaptic grid for a "then-if" logical computation is in the range of few aJ.

Graphene quantum point contacts (G-QPC) combine switching operations with quantized conductance, which can be modulated by top and back gates. Here we use the conductance quantization to design and simulate multi-valued logic (MVL) circuits and, more specifically an adder. The adder comprises two G-QPCs connected in parallel. We compute the conductance of the adder for various inputs and show that Graphene MVL circuits are feasible.

Novel computing paradigms like the fully cascadable InSb bilayer avalanche spin-diode logic (BASDL) are capable of performing complex logic operations. Although the original work provides a comprehensive explanation for the device structure, the fundamental logic set and basic combinational circuits, it lacks the inclusion of sequential circuit design. This paper addresses the void by demonstrating the structural design of SR and D-type latches with BASDL. Novel latch topologies are proposed that take full advantage of the BASDL-based logic set while maintaining conventional latch functionality. The effective operation of these latches is verified through a complete logic-level analysis and a briefinsight into their physical implementation.

With CMOS feature size heading towards atomic dimensions, unjustifiable static power, reliability, and economic implications are exacerbating, prompting for research on new materials, devices, and/or computation paradigms. Within this context, Graphene Nanoribbons (GNRs), owing to graphene's excellent electronic properties, may serve as basic blocks for carbon-based nanoelectronics. In this paper we build upon the fact that GNR behaviour can be controlled according to some desired functionality via top/back gate contacts and propose to combine GNRs with complementary functionalities to construct Boolean gates. To this end, we introduce a generic GNR-based Boolean gate structure, composed of two GNRs, i.e., a pull-up GNR performing the gate Boolean function and a pull-down GNR performing the inverted Boolean function. Subsequently, by properly adjusting GNRs' dimensions and topology, we design 2-input AND, NAND, and XOR graphene-based Boolean gates, as well as 1-input gates, i.e., inverter and buffer. Our SPICE simulations indicate that the proposed gates exhibit a smaller propagation delay, from 23% for the XOR gate to 6x for the AND gate, and 2 orders of magnitude smaller power consumption, when compared with 7nm CMOS based counterparts, while requiring a 1 to 2 orders of magnitude smaller active area footprint. These results clearly indicate that GNR-based gates have great potential as basic building blocks for future beyond CMOS energy effective nanoscale circuits.

In this paper we present Combined Cache for Endurance (CCE), a scheme to enable the use of next generation high density multilevel non volatile memories in embedded systems. These memories are attractive as they can reduce the static power consumption dramatically and a single memory can be potentially used avoiding having both flash and SRAM or DRAM in a system. However, a common drawback of the new multilevel non volatile memories is that they support a limited number of write operations and thus its endurance needs to be improved to make them a viable alternative for the main memory of embedded systems. The proposed CCE relies on the fact that most writes are concentrated on a few addresses. Therefore, a small SRAM cache can be used to store positions that are frequently written. However, this would not preserve the non volatile nature of the memory. To do so, in the proposed CCE, the cache cell has an SRAM part and a non volatile part. At power up the contents of the non volatile part are copied to the SRAM and the other way around at power down. As many embedded systems execute predictable workloads, this cache is statically set to cover the most frequently written addresses. The evaluation shows that CCE can increase the endurance of the memory by several orders of magnitude. At the same time the overheads required to implement the cache are small relative to the main memory. Therefore, CCE can be an interesting option to improve the endurance of next generation high density multilevel non volatile memories.

We propose using memristor-based TCAMs (Ternary Content Addressable Memory) to accelerate Regular Expression (RegEx) matching. RegEx matching is a key function in network security, where deep packet inspection finds and filters out malicious actors. However, RegEx matching latency and power can be incredibly high and current proposals are challenged to perform wire-speed matching for large scale rulesets. Our approach dramatically decreases RegEx matching operating power, provides high throughput, and the use of mTCAMs enables novel compression techniques to expand ruleset sizes and allows future exploitation of the multi-state (analog) capabilities of memristors. We fabricated and demonstrated nanoscale memristor TCAM cells. SPICE simulations investigate mTCAM performance at scale and a mTCAM power model at 22nm demonstrates 0.2 fJ/bit/search energy for a 36x400 mTCAM. We further propose a tiled architecture which implements a Snort ruleset and assess the application performance. Compared to a state-of-the-art FPGA approach (2 Gbps,~1W), we show x4 throughput (8 Gbps) at 60% the power (0.62W) before applying standard TCAM power-saving techniques. Our performance comparison improves further when striding (searching multiple characters) is considered, resulting in 47.2 Gbps at 1.3W for our approach compared to 3.9 Gbps at 630mW for the strided FPGA NFA, demonstrating a promising path to wire-speed RegEx matching on large scale rulesets.

In-Memory Computation (IMC), which is capable of reducing the power consumption and bandwidth requirement resulting from the data transfer between the processing and memory units, has been considered as a promising technology to break the von-Neumann bottleneck. In order to develop an effective and efficient IMC platform, the performance, such as density, operation speed and power consumption, of the memory itself is one of the most important keys. In this work, we report a cross-point magnetic random access memory (MRAM) with diode selector for IMC implementation. The memory cell consists of a magnetic tunnel junction (MTJ) device and a diode connected in series. The memory cells are arranged in a cross-point array structure, providing high storage density. The MTJ can be switched through the unipolar precessional voltage-controlled magnetic anisotropy (VCMA) effect, thus enabling high speed and low power. Further, Boolean logic functions can be realized via regular memory-like write & read operations. The feasibility and performance of the proposed IMC in the crosspoint MRAM are successfully demonstrated with hybrid VCMA-MTJ/CMOS circuit simulations under the 40 nm technology node.

In this paper, we explore the possibility of hardware acceleration implementation of sparse coding algorithm with spintronic devices by a series of design optimizations across the architecture, circuit and device. Firstly, a domain wall motion (DWM) based compound spintronic device (CSD) is engineered and modelled, which is envisioned to achieve multiple conductance states. Sequentially, a parallel architecture is presented based on a dense cross-point array of the proposed DWM based CSD, where each dictionary (D) value can be mapped into the conductance of the proposed DWM based CSD at the corresponding cross-point. Benefitting from its massively parallel read and write operation, such proposed parallel architecture can accelerate the selected sparse coding algorithm using a designed dedicated periphery read and write circuit. Experimental results show that the selected sparse coding algorithm can be accelerated by 1400x with the proposed parallel architecture in comparison with software implementation. Moreover, its energy dissipation is 8 orders of magnitude smaller than that with software implementation.

In this paper, a new approach of RAM circuits, using Quantum-dot Cellular Automata (QCA), based on programmable crossbar architecture, is presented. In addition, a methodology for 2n bits RAMs is presented. Using the aforementioned methodology any designer can design a RAM regardless of its size. The proposed designs utilize the benefits of QCA programmable crossbar architecture. Namely, the RAM circuit is characterized by regularity and the ability of customization. The features that the proposed RAM design methodology has, allow the designers to use the available circuit area efficiently.

Nano-crossbar arrays have emerged as area and power efficient structures with an aim of achieving high performance computing beyond the limits of current CMOS. Due to the stochastic nature of nano-fabrication, nano arrays show different properties both in structural and physical device levels compared to conventional technologies. Mentioned factors introduce random characteristics that need to be carefully considered by synthesis process. For instance, a competent synthesis methodology must consider basic technology preference for switching elements, defect or fault rates of the given nano switching array and the variation values as well as their effects on performance metrics including power, delay, and area. Presented synthesis methodology in this study comprehensively covers the all specified factors and provides optimization algorithms for each step of the process.

Resistive memories are promising candidates for non-volatile memories. Write disturb is one of problems that facing this kind of memories. In this paper, the write disturb problem is mathematically formulated in terms of the bias parameters and optimized analytically. A closed form solution for the optimal bias parameters is calculated. Results are compared with the 1/2 and 1/3 bias schemes showing a significant improvement.

The huge amounts of physical possibilities offered by the emerging nanotechnologies must be accompanied, beyond the uniform growing mechanisms supposed by the current serial and/or parallel extensions, by an appropriate structuring mechanism able to support efficiently the increasing functional demands. A recursive growing mechanism is proposed for the upcoming Nano-Era. The current growing mechanism involves only pure quantitative aspects. We consider as mandatory, for the very big sized systems, another mechanism which interleaves the quantitative aspects with the functional ones. Because the computational parallelism is implicit for the big sized systems, the growing mechanism must be supported also by an appropriate computational model. For the current systems we started from gates. For Nano-Era structuring mechanism we will start from cellular automata. The main difference is that for nanoarchitectures the growing mechanism and the featuring mechanism are unified in an unique recursive mechanism.

The demise of Moore's law, breakdown of Dennard Scaling, dark silicon phenomenon, process variation, leakage currents and quantum tunneling are some of the hurdles faced in the further advancement of computing systems today. As a result, there is a renewed interest in alternate computing paradigms using emerging nanoelectronic devices. This work uses free binary decision diagrams (FBDDs) for computer-aided design (CAD) of compact memristive crossbars for sneak-path based in-memory computing. The absence of a fixed variable ordering makes FBDDs more compact than their ordered counterpart called reduced ordered binary decision diagrams (ROBDDs). Our design has used the size of the circuit-representation of Boolean functions for selecting different variable orderings along different paths which results in compact FBDDs. We have demonstrated our approach by designing compact crossbars for a four-bit multiplier and other RevLib benchmarks. Our synthesis process yields a 50.1% reduction in area over the previous FBDD-based synthesis for the fourth-output-bit of the multiplier. Overall, our approach has reduced the multiplier area by 20.1%.

Truly polymorphic circuits, whose functionality/circuit behavior can be altered using a control variable, can provide tremendous benefits in multi-functional system design and resource sharing. For secure and fault tolerant hardware designs these can be crucial as well. Polymorphic circuits work in literature so far either rely on environmental parameters such as temperature, variation etc. or on special devices such as ambipolar FET, configurable magnetic devices, etc., that often result in inefficiencies in performance and/or realization. In this paper, we introduce a novel polymorphic circuit design approach where deterministic interference between nano-metal lines is leveraged for logic computing and configuration. For computing, the proposed approach relies on nano-metal lines, their interference and commonly used FETs. For polymorphism, it requires only an extra metal line that carries the control signal. In this paper, we show a wide range of crosstalk polymorphic logic gates and their evaluation results. We also show an example of a large circuit that performs both the functionalities of multiplier and sorter depending on the configuration signal. A comparison is made with respect to other existing approaches in literature, and transistor count is benchmarked. For crosstalk-polymorphic circuits, the transistor count reduction range from 25% to 83% with respect to various other approaches. For example, polymorphic AOI21-OA21 cell show 83%, 85% and 50% transistor count reduction, and Multiplier-Sorter circuit show 40%, 36% and 28% transistor count reduction with respect to CMOS, genetically evolved, and ambipolar transistor based polymorphic circuits, respectively.

Analog to Digital Converters (ADCs) is the core component of computing systems forming a link between the external stimuli and digital microprocessor operations. Current CMOS based fast ADCs are difficult to scale due to the reliance on transistor sizing and high voltage operations. They also suffer from high power consumption. In this paper, we introduce a novel ADC design which uses the deterministic signal interference between metal lines as a mechanism for signal conversion. In contrast to CMOS ADCs, our approach uses a simple crosstalk tree network of metal lines to convert sampled analog levels to digital code. Here, the sampled analog signal is passed through an input metal line which is capacitively coupled to a series of metal lines in a tree-like layout, and the coupled voltages on the edge of the tree (the leaves) determine the output. The resolution is dependent on the number of branches. We show 2-bit and 3-bit ADC implemented through this mechanism at 16n technology node. Our results indicate the possibility of huge power savings with Crosstalk ADCs in comparison to CMOS; for 2-bit and 3-bit ADCs the power consumption was found to be 43.51μW and 96.74μW respectively at 50M Hz sampling frequency.

Interconnects often constitute the major bottleneck in the design process of low power integrated circuits (IC). Although 2.5-D integration technologies support physical proximity, the dissipated power in the communication links remains high. In this work, the additional power savings for interposer-based interconnects enabled by low swing signaling is investigated. The energy consumed by a low swing scheme is, therefore, compared with a full swing solution and the critical length of the interconnect, above which the low swing solution starts to pay off, is determined for diverse interposer technologies. The energy consumption is compared for three different substrate materials, silicon, glass, and organic. Results indicate that the higher the load capacitance of the communication medium is, the greater the energy savings of the low swing circuit are. Specifically, in cases that electrostatic discharge (ESD) protection is required, the low swing circuit is always superior in terms of energy consumption due to the high capacitive load of the ESD circuit, regardless the substrate material and the link length. Without ESD protection, the highest critical length is about 380 μm for glass and organic interposers. To further explore the limits of power reduction from low swing signaling for 2.5-D ICs, the effect of typical interconnect parameters such as width and space on the energy efficiency of low swing communication is evaluated.

This paper presents a 4T-based SRAM bitcell optimized both for write and read operations at ultra-low voltage (ULV). The proposed bitcell is designed to respond to the requirements of energy constrained systems, as in the case of most IoT-oriented circuits and applications. The use of 3D CoolCubeTM technology enables the design of a stable 4T SRAM bitcell by using data-dependent back biasing. The proposed bitcell architecture provides a major reduction of the write operation energy consumption compared to a conventional 6T bitcell. A dedicated read port coupled to a virtual GND (VGND) ensures a full functionality at ULV of read operations. Simulation results show reliable operations down to 0.35 V close to six sigma (6 σ) without any assist techniques (e.g. negative bitlines), achieving in worst case corner 300 ns and 125 ns in write and read access time, respectively. A 6x energy consumption reduction compared to a ULV ultra-low-leakage (ULL) 6T bitcell is demonstrated.

This paper proposes a novel processing architecture to extract Mel-Frequency Cepstrum Coefficients (MFCC) for automatic speech recognition. Inspired by the human ear, the energy-efficient analog-domain information processing is adopted to replace the energy-intensive Fourier Transform in conventional digital-domain. Moreover, the proposed architecture extracts the acoustic features in the mixed-signal domain, which significantly reduces the cost of Analog-to-Digital Converter (ADC) and the computational complexity. We carry out the circuit-level simulation based on 180nm CMOS technology, which shows an energy consumption of 2.4 nJ/frame, and a processing speed of 45.79 μs/frame. The proposed architecture achieves 97.2% energy saving and about 6.4x speedup than state of the art. Speech recognition simulation reaches the classification accuracy of 99% using the proposed MFCC features.

Energy is the heart to drive any device, such as any machine. As researchers have been trying to perform low energy operations more and more, energy requirements are turning out to be one of the key features in measuring the performance of a device. On the other hand, as conventional silicon-based computing is approaching a barrier, needs of non-conventional computing is increasing. Though several such computing platforms have arisen to prove itself as a suitable alternative to silicon-based computing, less energy requirement is certainly one of the most sought features in the competition among the new platforms. Moreover, there are certain scenarios where performing calculations in pure bio-molecular ways are highly desired. Although DNA computing has already flagged the success of bio-molecular computing in terms of energy/power requirements, its manual nature keeps it behind from other computing techniques. Another new bio-molecular computing technique Ribosomal Computing, though still in infancy, has shown real promises due to its inherent automation. This work performs an analysis of the energy/power requirements of this computing technique. With the promising result obtained, ribosomal computing can claim itself as a promising computing technique, if combined with its inherent automation.

The failure susceptibility of the quantum hardware will force quantum computers to execute fault-tolerant quantum circuits. These circuits are based on quantum error correcting codes, and there is increasing evidence that one of the most practical choices is the surface code. Design methodologies of surface code based quantum circuits were focused on the layout of such circuits without emphasizing the reduced availability of hardware and its effect on the execution time. Circuit layout has not been investigated for practical scenarios, and the problem presented herein was neglected until now. For achieving fault-tolerance and implementing surface code based computations, a significant amount of computing resources (hardware and time) are necessary for preparing special quantum states in a procedure called distillation. This work introduces the problem of how distilleries (circuit portions responsible for state distillation) influence the layout of surface code protected quantum circuits, and analyses the tradeoffs for reducing the resources necessary for executing the circuits. A first algorithmic solution is presented, implemented and evaluated for addition quantum circuits.

Quantum-dot fabrication is a well-established nanotechnology, which have many applications in many different scientific fields. By placing four quantum-dots on the corners of a square, a cell is formed, in which the digital information can be stored. This cell serves as the structural device of Quantum-dot Cellular Automata (QCA) circuits. After QCA presentation, several digital circuits and systems have been designed and proposed in the literature. However, one of the biggest problems QCA designers have to face to pave the successful design of functional and large scale QCA circuits is signal synchronization. In this paper, a novel approach of the aforementioned problem is presented. This approach is inspired by the well known computational problem of Firing Squad Synchronization (FSS). FSS problem has many similarities with large scale QCA circuits synchronization problem. In addition, FSS problem has been studied by many researchers and many efficient solutions have been proposed in the literature.

Majority-inverter graphs (MIGs) are a logic representation with remarkable algebraic and Boolean properties that enable efficient logic optimizations beyond the capabilities of traditional logic representations. Further, since many nano-emerging technologies, such as quantum-dot cellular automata (QCA) or spin torque majority gates (STMG), are inherently majority-based, MIGs serve as a natural logic representation to map into these technologies. So far, MIG optimization methods predominantly target to reduce the depth of the logic networks, corresponding to low delay implementations in the respective technologies. In this paper, we introduce several methods to optimize the size of MIGs. They can be applied such that the depth of the logic network is preserved; therefore our methods have a direct effect on the physical area, without worsening the delay. Some methods are inspired by existing size optimization algorithms for non-majority-based logic networks, others make explicit use of the majority function and its properties. All methods are Boolean---in contrast to algebraic optimization methods---which has a positive effect on the quality but challenges their implementation. Our experiments show that using our methods the size of MIGs in the EPFL combinational benchmark suite can be reduced by up to 7.12%. When mapped to QCA and STMG technologies we reduce the average area-delay-energy product by 2.31% and 2.07%, respectively.

Development of quantum simulators is a major step towards the universal quantum computer. Quantum simulators are quantum systems that can perform specific quantum computations, or software packages that can reproduce most of the aspects of a general universal quantum computer on a general purpose classical computer. Development of quantum simulators using digital circuits, such as FPGAs is very difficult, mainly because the unit of quantum information, the qubit, has an infinite number of states, whereas the classical bit has only two. On the other hand, analog circuits comprising R, L and C elements have no internal state variables that can be used to reproduce and store qubit states. Here we take the first step towards the development of a new quantum simulator using memristors. The qubit state is mapped to a 3D space spanned by the memristances of three identical memristors. The qubit state evolution is reproduced by the input voltages applied to the memristors. We define the correspondence between the general qubit state rotation, i.e. the one-qubit quantum gates, and memristor input voltage variations and reproduce the rotations imposed by the action of quantum gates in the 3D memristance space. Our results show that, at least in principle, qubits and one-qubit quantum gates can be simulated by memristors.