NVIDIADirector, Computational Imaging Group, Tegra Product Team 2013-PresentLeading both the architecture and RTL teams for NVIDIA's computational imaging pipeline (camera), which is responsible for
the processing and enhancements of raw sensor images in the Tegra camera pipeline. Developing a hardware pipeline (Image
Signal Processor) to convert raw Bayer sensor data to high-quality images and video. Focusing on developing new innovative
imaging features and supporting SOC system architectures and designs. Working to improve the 3As (auto-focus, auto-white
balance, auto exposure), noise reduction and image quality.

Senior Research Manager, NVIDIA Research 2011-2013Responsible for a cross-functional research team designing GPU-based SOC architectures for future systems ranging from
high-performance computing to graphics to mobile devices. Focusing on a broad application space - graphics, bioinformatics,
web servers, computer vision, machine learning, databases, simulations, etc. Built a cross-functional, matrixed team to focus
on all aspects of the SOC system design - applications, architecture, micro-architecture, implementation, circuit technology,
compilers, system software, and more. Successfully designed and executed a product transition plan in which both NV
Research ideas and staff were transferred to the product groups.

Senior Architecture Manager, GPU System Architecture 2008-2011Responsible for the architectural development of a significant portion of NVIDIA's Kepler generation of GPU System-On-Chip
(SOC) system architecture focusing on the memory system. This work included the on-chip interconnect (network-on-chip),
memory controller, memory caches, virtual memory, memory access protocols, system interface, and multi-chip interconnects.
Explored CPU/GPU interactions and supporting memory structures including hardware-based coherent cache architectures.
Responsible for the architecture, micro-architecture, performance modeling and validation, functional modeling and validation
of the GPU memory system. Responsibilities included the current GPU development, next generation GPU development and
the roadmap development for future GPU memory systems. Managing multiple GPUs in development each with multiple teams,
which were spread across multiple sites. Close interactions with hardware and physical design teams.

Architecture Unit Lead, GPU Architecture 2004-2008Architecture unit lead responsible for the development of the memory system including the on-chip interconnect (NoC), memory
protocols, memory cache and virtual memory system of NVIDIA's Fermi generation of GPU SOCs. Responsibilities include the
development of the architecture, micro-architecture, performance modeling and validation, and functional modeling and
validation. Drove multiple performance analysis efforts that impacted full-chip performance. Also responsible for the unit
management - plans, schedule, staffing, assignments, etc.

Architect, GPU Architecture 2003 - 2004Responsible for the development of the virtual memory architecture for NVIDIA's GPU in support of Microsoft's Longhorn
Device Driver Model (LDDM). Contributed to many other aspects of architecture work on NVIDA GPUs. Contributed to and
later drove NVIDIA's involvement in Microsoft's Virtualized Graphics effort.

Newisys, Inc. Austin, TXChief Architect, Silicon Development 2000 - 2003Responsible for the overall architectural and micro-architectural development of Newisys' CC-NUMA cache coherence
controller and low-latency packet switch (0.13 um ASIC technology) for scalable multiprocessor systems based on AMD's
Opteron x86-64 processors (Horus). Architected the overall system design, coherence controller design and coherence
protocol. Co-developed the design's micro-architecture. Responsible for the extended HyperTransport protocol development,
which included coherence directory and remote data cache functionality, and subsequent behavioral modeling. Drove the
development of a cycle-accurate performance model and subsequent analysis. Worked closely with the BIOS and service
processor development teams.

Drove the development of an advanced prototyping environment for analysis and validation of the multiprocessor system. Built
the prototype on top of existing 2-processor Opteron systems. Effort included cross-functional technical leadership, system
design, board design, software development, logic design, logic synthesis and place & route. Utilized high-end Xilinx FPGAs to
prototype controller design.
Developed and drove the Newisys intellectual property development for the scalable multiprocessor systems. Built a strong
patent portfolio (40+ patents). Developed blocking IP strategy and reviewed strategy with two external IP law firms.
Technical Manager, Architecture Group/Chip Design Group 2000 - 2003
Built and managed two groups within Newisys: initially the chip design group and finally the architecture group. Chip design
group responsibilities included project plan development, interviewing and hiring, culture development, micro-architecture and
initial RTL development. Built an advanced architecture group once the design group was stable. Architecture group
responsibilities include performance modeling and analysis; prototype development, validation and behavior analysis; cache
coherence protocol development and analysis; and future product development.

Intel/Texas Development Center, Desktop Products Group Austin, TXArchitect/Technical Manager, CPU System Cluster 1999 - 2000Technical manager responsible for building and leading an engineering team that was responsible for the system components
of a high performance, IA-32 processor with integrated memory controller and micro-architectural support for multiple,
heterogeneous on-die cores. Provided technical leadership for the team's efforts, which included multiprocessor cache
coherence protocol architecture and development; protocol engine micro-architecture and implementation; memory controller
micro-architecture and implementation; protocol formal verification; and system level performance modeling and analysis.

IBM Research/Austin Research Laboratory Austin, TXArchitect/Research Staff Member 1996 - 1999Provided technical leadership for a small research team that successfully implemented and demonstrated IBM's first
Intel-based CC-NUMA hardware prototype. The team architected, designed and implemented the system using a combination
of off-the-shelf components and programmable logic. Functionality included a patented hardware performance monitor to
understand system performance. Worked closely with the CC-NUMA software team to understand system performance and
drive performance monitoring and enhancements into Windows NT and SCO UnixWare. See "Experience with building a
commodity Intel-based ccNUMA system" and "Windows NT in a CC-NUMA System."

Architected, designed and implemented the cache coherence mechanism for IBM's first functional PowerPC-based CC-NUMA
hardware prototype. Work included the development of the cache coherence protocol and the micro-architecture &
implementation of the coherence directory and pending request mechanism. Co-architected the overall CC-NUMA adapter.
Demonstrated hardware functionality of a three-node CC-NUMA system implemented using high-speed programmable logic
(FPGA). Developed several innovative and patented protocol features to overcome deficiencies in the PowerPC bus
architecture.

HaL Computer Systems Campbell, CAArchitect/Verification Engineer 1994 - 1995Developed a verification strategy, which was based on high-level modeling (HLM), for a CC-NUMA system. The strategy
included formal verification (FV) of the cache coherence protocol and a verification tool that was able to compare the results of
cycle and non-cycle accurate models. Implemented portions of the HLM using verilog and developed early FV models of the
protocol. Worked with Prof. Dill of Stanford to improve the FV tool and methodology - funding one graduate student.

MIT Lincoln Laboratory Boston, MAMicro-Architect & Logic Design Engineer 1988 - 1990Architected a radar adaptive nulling hardware prototype designed around a systolic array, which was constructed from an array
of custom CORDIC data processors. Effort included the design of a high-speed dual banked memory system and data path
control logic for the systolic array. Additionally, the effort included system design, board design, and discrete and
programmable logic. System included a micro-controller that required extensive programming. Significant software was also
developed on the host systems to feed data to the systolic array, analyze output data, and present results. Successfully
demonstrated the system in both an IBM PC and Sun workstation environment.

Artisoft, Inc Tucson, AZLogic Design Engineer 1985 - 1987While attending U of Arizona, worked as a part time engineer and developed several products for the IBM PC including a
hardware access control card, a laptop to desktop networking system software and portions of a local area network card and
software. Involved in all aspects of product development including product conception, logic design, implementation, board
design, debug, verification, manufacturing and marketing.

Academic Experience

University of Texas at Austin Austin, TXPh.D. Committee, EE Department 2001-2003Participating in the orals committee for a Ph.D. student in the Electrical Engineering department. Student's work is focused on
high-end processor design with an emphasis on power-wise design.
Stanford University Stanford, CA

Consulting Assistant Professor, EE Department 1995 - 1999Consulting Professor working with Professor Michael Flynn. Developed and taught a graduate level course on shared-memory
multiprocessors. Obtained an industrial grant to fund research in fault-tolerant multiprocessors. Actively participated in the
research and advised a graduate student funded by this grant. Graduated one Ph.D. student.
Stanford University Stanford, CA

Research Assistant, Ph.D. Degree Program, EE Department 1990 - 1994Designed update-based cache coherence protocols for scalable shared-memory multiprocessors. Designed protocols for both
distributed and centralized directory structures. Developed a set of architectural models for shared-memory multiprocessors
and several shared-memory applications. Analyzed the performance of the update-based protocols with respect to common
invalidate-based protocols through full system simulations. Identified protocol limitations and evaluated possible protocol
enhancements to overcome these limitations. Formally verified the update-based protocols using the Murphi modeling
checking tool from Stanford.