(Cat? OR feline) AND NOT dog?
Cat? W/5 behavior
(Cat? OR feline) AND traits
Cat AND charact*

This guide provides a more detailed description of the syntax that is supported along with examples.

This search box also supports the look-up of an IP.com Digital Signature (also referred to as Fingerprint); enter the 72-, 48-, or 32-character code to retrieve details of the associated file or submission.

Concept Search - What can I type?

For a concept search, you can enter phrases, sentences, or full paragraphs in English. For example, copy and paste the abstract of a patent application or paragraphs from an article.

Concept search eliminates the need for complex Boolean syntax to inform retrieval. Our Semantic Gist engine uses advanced cognitive semantic analysis to extract the meaning of data. This reduces the chances of missing valuable information, that may result from traditional keyword searching.

Scalable Performance Optimized RDMA Detailed Statistics

Publishing Venue

The IP.com Prior Art Database

Abstract

Statistics acquisition in highly concurrent systems is typically addressed by a best effort software design pattern. Speed takes precedence over accuracy. Said approach suffices for a macroscopic view of the product operation, however it does not address the microscopic view necessary for SW service/maintainability. We detail a mechanism whereby both speed and accuracy are achieved in a massively parallel application using RDMA protocols.

Country

Undisclosed

Language

English (United States)

This text was extracted from a PDF file.

This is the abbreviated version, containing approximately
49% of the total text.

Page 01 of 7

Scalable Performance Optimized RDMA Detailed Statistics

For massively parallel and high performance concurrent applications / usecases where statistics accuracy is required, an alternate method beyond the invention described herein to achieve the goals cannot be found or envisioned.

Statistics acquisition in highly concurrent systems is typically addressed by a best effort software design pattern. Speed takes precedence over accuracy. Said approach suffices for a macroscopic view of the product operation, however it does not address the microscopic view necessary for SW service/maintainability. A few examples are:

 Potential lack of accuracy in micro-benchmarks.

 Complications of low level driver debug using statistics.

Describe known solutions to this problem (if any)?

Statistics acquisition models for NIC and FC adapters are typically a single statistics structure encompassing all relevant resources. Said structure at the device driver level will track both HW and SW generated statistics.

At 1Gbps and 10Gbps NIC / FC speeds, this single structure is not a contention point as the number of resources (ie. parallel queues, engines) is quite small, typically 2-4 queues (Fig. 1). As line speeds increase to 40Gbps+ and 100Gbps+ then number of resources concurrently operating on the same statistics structure becomes a contention point. This is typically >=16 queues / engines (Fig. 2).

Most importantly, for both the NIC and FC models, the statistics are persistent memory. This means the structure containing the statistics persists for the lifetime of the driver which also matches the lifetime of the resources.

If the adapter is RDMA capable, then the number of concurrent queues is typically >2000 (Fig. 3). Clearly if the previously discussed models at 16 queues cause contention, then the scale of RDMA queues requires a rethinking of the SW design patterns to achieve both speed and accuracy in statistics counters. Furthermore, RDMA resources are dynamic memory backed. The resources are volatile, meaning they are constantly being created / destroyed. Said protocol is largely analogous to sockets where each instance comes and goes independently of the associated statistics.

The drawback of the aforementioned statistics solution is that a shared statistics structure for a chatty protocol such as RDMA will cause cache bouncing across the CPU complex as the number of cores / threads scale in concurrency. In a typical RDMA workload {HPC, DB, HFT}, if threads are pseudo-concurrently accessing a statistics set within the same cache line (ie. 128B) then a dirty cache condition is exhibited. The resulting cache thrashing can result in system wide performance impacts, especially on POWER* systems due to the NUMA type architecture. A NUMA architecture has high costs for cache updates across the CPU complex. A sample cost from min->max is {Node, CEC, System}. This invention details a mechanism whereby highly concurrent IO devices such as RDMA...