Jim Gray
Summary Home Page

We (his colleagues in
Microsoft Research) have heard from many of his collaborators about projects and
collaborations that he had underway with them and who are unsure how to proceed.
If you find yourself in this situation, please emailgrayproj@microsoft.comand we will follow up with you to find the best way
forward.

Jim
Gray is a researcher and manager of Microsoft Research's
eScience Group. His primary research
interests are in databases and transaction processing systems -- with
particular focus on using computers to make scientists more productive. He and
his group are working in the areas of astronomy, geography, hydrology,
oceanography, biology, and health care. He continues a long-standing interest
on building supercomputers with commodity components, thereby reducing the cost
of storage, processing, and networking by factors of 10x to 1000x over
low-volume solutions. This includes work on building fast networks, on building
huge web servers with CyberBricks, and building very inexpensive and
very high-performance storage servers.

Jim also is working with the astronomy community to build the
world-wide
telescope and has been active in building online databases like http://terraService.Net and http://skyserver.sdss.org. When the
entire world's astronomy data is on the Internet and is accessible as a single
distributed database, the Internet will be the world's best telescope. This is
part of the larger agenda of getting all information online and easily
accessible (digital libraries, digital government, online science ...). More
generally, he is working with the science community (Oceanography, Hydrology,
environmental monitoring, ..) to build the world-wide digital library that
integrates all the world's scientific literature and the data in one
easily-accessible collection. He is active in the research community, is an
ACM, NAE, NAS, and AAAS Fellow, and received the ACM Turing Award for his work
on transaction processing. He also edits of a series of books on data
management.

“SkyServer Traffic Report – The First Five Years,” is a
study of the traffic on Skyserver.sdss.org, an eScience website. Done jointly
with Vik Singh, Alex Szalay, Ani Thakar, Jordan Raddick, Bill Boroski,
Svetlana Lebedeva, and Brian Yanny it analyzes the traffic trying to see how
people and programs use the site, the data, and the batch job system.

“ Life Under Your Feet: An End-to-End Soil Ecology Sensor
Network, Database, Web Server, and Analysis Service,” with Katalin
Szlavecz, Andreas Terzis, Razvan Musaloiu-E, Joshua Cogan, Sam Small, Stuart
Ozer, Randal Burns, and Alex Szalay of JHU we built a end-to-end soil monitoring
system deployed at a Baltimore urban forest. Sensor moisture and temperature
reports are stored and calibrated in a database. The measurement database is
published through Web Services interfaces. In addition, analysis tools let
scientists analyze current and historical data and help manage the sensor
network.

“ GPUTeraSort: High Performance Graphics Coprocessor Sorting for
Large Database Management ,” with Naga K. Govindaraju, Ritesh Kumar, Dinesh
Manocha of UNC built a sorter that uses the Graphics Processing Unit (GPU) to
sort real fast. I helped with the IO and with writing this report so that I
could read it :). The GPUs have 10x the memory bandwidth and processing power
of the CPU, and the gap is widening, so we have to learn how to use them. This
is my first experience in this new world -- it’s a vector-coprocessor, it’s a
SIMD machine, its really different -- and so a lot of fun. You get to rethink
all your assumptions.

20
years ago today the Datamation article: A Measure of Transaction Processing
appeared. Charles Levine and I thought it was time to benchmark a PC -- my
2-year-old TabletPC to be exact. We did TPC-B (DebitCredit without the message
handling) and got about 8ktps (!). The 4-page report and 6-page script that
builds the database and runs the benchmark in a half hour is: Thousands of DebitCredit Transactions-Per-Second: Easy and
Inexpensive Abstract: A $2k computer can execute about 8k transactions per
second. This is 80x more than one of the largest US bank’s 1970’s traffic – it
approximates the total US 1970’s financial transaction volume. Very modest
modern computers can easily solve yesterday’s problems. A second paper with a
broader perspective appeared in “A Measure of Transaction Processing 20 Years Later,” with
a broader perspective appeared as MSR-TR-2005-57 and in June 2005 IEEE Data
Engineering Bulletin.

Storage
Architecture: Peter Kukol and
others have been working on moving bulk data: the goal is to move 1 Giga Byte
per second from CERN (Geneva Switzerland) to Pasadena California so that the
Physicists in California can see the data as it comes out of the Large Hadron
Collider (LHC) that will come online in 2008. Many other science disciplines
need this as well. This paper shows how to do local IO fast. “Sequential File Programming Patterns and Performance with
.NET,” Peter Kukol, Jim Gray (describes and measures programming patterns
for sequential file access in the .NET Framework. The default behavior provides
excellent performance on a single disk – 50 MBps both reading and writing.
Using large request sizes and file pre-allocation has quantifiable benefits.
.NET unbuffered IO delivers 800 MBps on a 16-disk array, but buffered IO
delivers about 12% of that performance. Consequently, high-performance file and
database utilities are still forced to use unbuffered IO for maximum sequential
performance. The report is accompanied by downloadable
source code that demonstrates the concepts and code that was used to obtain
these measurements With Caltech (Harvey Newman et al), CERN, AMD, Newisys, and
the Windows™ networking group we have been working to move data from CERN to
Caltech (11,000 km) at 1 GBps (one gigabyte per second.) We have not succeeded
yet. Our progress is reported at Gigabyte
Bandwidth Enables Global Co-Laboratories (4.2 MB MSword), (
pdf of slides + transcript (2.4MB)) in a presentation with Harvey Newman
and I gave at Windows Hardware
Engineering Conference, Seattle, WA, 3 May, 2004. Peter Kukol’s “
Sequential Disk IO Tests for GBps Land Speed Record,” tells how we move
data the first and last meter at about 2GBps

TerraServer:
Our investigation of CyberBricks
continues with various whitepapers about our experiences: “TerraServer Bricks – A High Availability Cluster Alternative,”
Tom Barclay; Wyman Chong; Jim Gray, describes the migration of the TerraServer
to a brick hardware design and describes our experience operating it over the last
year. It makes an interesting contrast to TerraServer Cluster and SAN Experience,” Tom Barclay, Jim
Gray, describes our experience operating the TerraServer SAN cluster as a
“classic” enterprise configuration for the three years. TerraService.NET:
An Introduction to Web Services: tells how Tom Barclay converted the
TerraServer to a web service, and how the USDA uses that web service. “A Quick
Look at SATA Disk Performance,” Tom Barclay, Wyman Chong, Jim Gray
investigates the use of Storage Bricks: low-cost, commodity components
for multi-terabyte SQL Server databases. One issue has been the shortcomings of
Parallel ATA (PATA) disks. Serial ATA (SATA) drives address many of these
problems. This article evaluates SATA drive performance and reliability. Each
disk delivers about 50 MBps sequential and about 75 read IOps and 130 write
IOps on random IO. It is the sequel to TeraScale
SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data
Exchange" that describes the storage bricks we use for data interchange,
archiving, and backup/restore. Gives price, performance, and some rationale.

Deep
Thought :) : An extended
abstract of keynote talk at ACM SIGMOD 2004, Paris, France “ The
Revolution in Database Architecture,” that enumerates the enormous changes
happening to database system architecture. “Consensus on Transaction Commit”, Jim Gray, Leslie
Lamport, MSR-TR-2003-96, 32 p. The distributed transaction commit problem
requires reaching agreement on whether a transaction is committed or aborted. The
classic Two-Phase Commit protocol blocks if the coordinator fails.
Fault-tolerant consensus algorithms also reach agreement, but do not block
whenever any majority of the processes is working. Running a Paxos consensus
algorithm on the commit/abort decision of each participant yields a transaction
commit protocol that uses 2F +1 coordinators and makes progress if at least F
+1 of them are working. In the fault-free case, this algorithm requires one
extra message delay but has the same stable-storage write delay as Two- Phase
Commit. The classic Two-Phase Commit algorithm is obtained as the special F = 0
case of the general Paxos Commit algorithm.