Gilbert Wolrich, Framingham US

Gilbert Wolrich, Framingham, MA US

Patent application number

Description

Published

20080240421

Method and apparatus for advanced encryption standard (AES) block cipher - The speed at which encrypt and decrypt operations may be performed in a general purpose processor is increased by providing a separate encrypt data path and decrypt data path. With separate data paths, each of the data paths may be individually optimized in order to reduce delays in a critical path. In addition, delays may be hidden in a non-critical last round.

10-02-2008

20080240422

Efficient advanced encryption standard (AES) Datapath using hybrid rijndael S-Box - The speed at which an AES decrypt operation may be performed in a general purpose processor is increased by providing a separate decrypt data path. The critical path delay of the aes decrypt path is reduced by combining multiply and inverse operations in the Inverse SubBytes transformation. A further decrease in critical path delay in the aes decrypt data path is provided by merging appropriate constants of the inverse mix-column transform into a map function.

10-02-2008

20080304659

METHOD AND APPARATUS FOR EXPANSION KEY GENERATION FOR BLOCK CIPHERS - A key scheduler performs a key-expansion to generate round keys for AES encryption and decryption just-in-time for each AES round. The key scheduler pre-computes slow operations in a current clock cycle to reduce the critical delay path for computing the round key for a next AES round.

12-11-2008

20090003593

UNIFIED SYSTEM ARCHITECTURE FOR ELLIPTIC-CURVE CRYTPOGRAPHY - A system for performing public key encryption is provided. The system supports mathematical operations for a plurality of public key encryption algorithms such as Rivert, Shamir, Aldeman (RSA) and Diffie-Hellman key exchange (DH) and Elliptic Curve Cryptosystem (ECC). The system supports both prime fields and different composite binary fields.

01-01-2009

20090003594

MODULUS SCALING FOR ELLIPTIC-CURVE CRYPTOGRAPHY - Modulus scaling applied a reduction techniques decreases time to perform modular arithmetic operations by avoiding shifting and multiplication operations. Modulus scaling may be applied to both integer and binary fields and the scaling multiplier factor is chosen based on a selected reduction technique for the modular arithmetic operation.

01-01-2009

20090003595

SCALE-INVARIANT BARRETT REDUCTION FOR ELLIPTIC-CURVE CYRPTOGRAPHY - The computation time to perform scalar point multiplication in an Elliptic Curve Group is reduced by modifying the Barrett Reduction technique. Computations are performed using an N-bit scaled modulus based a modulus m having k-bits to provide a scaled result, with N being greater than k. The N-bit scaled result is reduced to a k-bit result using a pre-computed N-bit scaled reduction parameter in an optimal manner avoiding shifting/aligning operations for any arbitrary values of k, N.

POLYNOMIAL-BASIS TO NORMAL-BASIS TRANSFORMATION FOR BINARY GALOIS-FIELDS GF(2m) - Basis conversion from polynomial-basis form to normal-basis form is provided for both generic polynomials and special irreducible polynomials in the form of “all ones”, referred to as “all-ones-polynomials” (AOP). Generation and storing of large matrices is minimized by creating matrices on the fly, or by providing an alternate means of computing a result with minimal hardware extensions.

01-01-2009

20090006512

NORMAL-BASIS TO CANONICAL-BASIS TRANSFORMATION FOR BINARY GALOIS-FIELDS GF(2m) - Basis conversion from normal form to canonical form is provided for both generic polynomials and special irreducible polynomials in the form of “all ones”, referred to as “all-ones-polynomials” (AOP). Generation and storing of large matrices is minimized by creating matrices on the fly, or by providing an alternate means of computing a result with minimal hardware extensions.

MEMORY CONTROLLERS FOR PROCESSOR HAVING MULTIPLE PROGRAMMABLE UNITS - A parallel hardware-based multithreaded processor is described. The processor includes a general purpose processor that coordinates system functions and a plurality of microengines that support multiple hardware threads. The processor also includes a memory control system that has a first memory controller that sorts memory references based on whether the memory references are directed to an even bank or an odd bank of memory and a second memory controller that optimizes memory references based upon whether the memory references are read references or write references.

01-22-2009

20090119671

REGISTERS FOR DATA TRANSFERS - A system and method for employing registers for data transfer in multiple hardware contexts and programming engines to facilitate high performance data processing. The system and method includes a processor that includes programming engines with registers for transferring data from one of the registers residing in an executing programming engine to a subsequent one of the registers residing in an adjacent programming engine.

05-07-2009

20090157784

DETERMINING A MESSAGE RESIDUE - A description of techniques of determining a modular remainder with respect to a polynomial of a message comprised of a series of segments. An implementation can include repeatedly accessing a strict subset of the segments and transforming the strict subset of segments to into a smaller set of segments that are equivalent to the strict subset of the segments with respect to the modular remainder. The implementation can also include determining the modular remainder based on a set of segments output by the repeatedly accessing and transforming and storing the determined modular remainder.

06-18-2009

20090158132

Determining a message residue - In one aspect, circuitry to determine a modular remainder with respect to a polynomial of a message comprised of a series of segment. In another aspect, circuitry to access at least a portion of a first number having a first endian format, determine a second number based on a bit reflection and shift of a third number having an endian format opposite to that of the first endian format, and perform a polynomial multiplication of the first number and the at least a portion of the first number.

06-18-2009

20090164546

METHOD AND APPARATUS FOR EFFICIENT PROGRAMMABLE CYCLIC REDUNDANCY CHECK (CRC) - A method and apparatus to optimize each of the plurality of reduction stages in a Cyclic Redundancy Check (CRC) circuit to produce a residue for a block of data decreases area used to perform the reduction while maintaining the same delay through the plurality of stages of the reduction logic. A hybrid mix of Karatsuba algorithm, classical multiplications and serial division in various stages in the CRC reduction circuit results in about a twenty percent reduction in area on the average with no decrease in critical path delay.

06-25-2009

20090168999

Method and apparatus for performing cryptographic operations - In one embodiment, the present invention includes a processor having logic to perform a round of a cryptographic algorithm responsive to first and second round micro-operations to perform the round on first and second pairs of columns, where the logic includes dual datapaths that are half the width of the cryptographic algorithm width (or smaller). Additional logic may be used to combine the results of the first and second round micro-operations to obtain a round result. Other embodiments are described and claimed.

REGISTER SET USED IN MULTITHREADED PARALLEL PROCESSOR ARCHITECTURE - A parallel hardware-based multithreaded processor is described. The processor includes a general purpose processor that coordinates system functions and a plurality of microengines that support multiple hardware threads or contexts. The processor maintains execution threads. The execution threads access a register set organized into a plurality of relatively addressable windows of registers that are relatively addressable per thread.

12-10-2009

20100141488

ACCELERATED DECOMPRESSION - Techniques for decompressing a compressed input by determining, according to an ordering of allowable codewords, an offset for a variable length codeword detected in the input; accessing a record at the determined offset in a data structure having one record for each of the allowable codewords, each record including a portion for at least one of a literal value and a length value and a portion for a type value indicative of whether the record is for a literal or a length; and determining a decompressed output based at least in part on the accessed record.

06-10-2010

20100153829

RESIDUE GENERATION - In one embodiment, circuitry is provided to generate a residue based at least in part upon operations and a data stream generated based at least in part upon a packet. The operations may include at least one iteration of at least one reduction operation including (a) multiplying a first value with at least one portion of the data stream, and (b) producing a reduction by adding at least one other portion of the data stream to a result of the multiplying. The operations may include at least one other reduction operation including (c) producing another result by multiplying with a second value at least one portion of another stream based at least in part upon the reduction, (d) producing a third value by adding at least one other portion of the another stream to the another result, and (e) producing the residue by performing a Barrett reduction based at least in part upon the third value.

06-17-2010

20100205455

DIFFUSION AND CRYPTOGRAPHIC-RELATED OPERATIONS - An embodiment includes at least one processing unit to perform at least first and second sets of diffusion-related operations to produce a resulting block from a data block, and that includes at least one stage and at least one other stage. The at least one stage is to select one of first operands and second operands input to the at least one other stage. The first and second operands are respectively associated with the first and second sets of operations, respectively. The at least one other stage involves arithmetic and logical operations common to both the first and second sets of operations. At least one other processing unit is to perform at least one set of cryptographic-related operations (different, at least in part, from the first and second sets of operations) on at least one of (1) another block to produce the data block and (2) the resulting block.

QUEUE ARRAYS IN NETWORK DEVICES - A queue descriptor including a head pointer pointing to the first element in a queue and a tail pointer pointing to the last element in the queue is stored in memory. In response to a command to perform an enqueue or dequeue operation with respect to the queue, fetching from the memory to a cache only one of either the head pointer or tail pointer and returning to the memory from the cache portions of the queue descriptor modified by the operation.

05-12-2011

20130191699

INSTRUCTION-SET ARCHITECTURE FOR PROGRAMMABLE CYCLIC REDUNDANCY CHECK (CRC) COMPUTATIONS - A method and apparatus to perform Cyclic Redundancy Check (CRC) operations on a data block using a plurality of different n-bit polynomials is provided. A flexible CRC instruction performs a CRC operation using a programmable n-bit polynomial. The n-bit polynomial is provided to the CRC instruction by storing the n-bit polynomial in one of two operands.

07-25-2013

20140019694

PARALLELL PROCESSING OF A SINGLE DATA BUFFER - Technologies for executing a serial data processing algorithm on a single variable-length data buffer includes padding data segments of the buffer, streaming the data segments into a data register and executing the serial data processing algorithm on each of the segments in parallel.