This episode features
Jim Gray. He is a "Technical Fellow" in the Scaleable Servers Research Group (Sky Server, Terra Server) and manager of Microsoft's Bay Area Research Center (BARC). Jim has been called a "giant" in the fields of database and transaction processing computer
systems. In 1998, Jim was awarded the ACM’s prestigious A.M. Turing Award.

Before joining Microsoft, Jim worked at Digital Equipment Corp (DEC)., Tandem Computers Inc., IBM Corp. and AT&T and he is the editor of the “Performance Handbook for Database
and Transaction Processing Systems,” and co-author of “Transaction Processing: Concepts and Techniques.” In this interview, Jim is joined by former colleague from DEC and partner on the Terra Server project, Researcher, Tom Barclay.

This episode of “Behind the Code” is hosted by Barbara Fox – former senior security architect of cryptography and digital rights management for Microsoft

That is too cool. I am working on a pipe-line server like this. It is a nice way to go. There is some potential issues however. Each "stage" has a thread or a thread pool with some max. If one of the sync threads gets blocked for extra long (i.e. network
delay, error, hack, etc), more worker thread(s) spin-up for the stage. So far so good. But on a busy server with a lot of connections, it is possible to max the workers for a stage and block the whole server. Naturally, this could happen even if the stage
was total async as well. Eventually memory/resources would run out posting callbacks. Still, I think I like this sync pipe-line better. Here are some interesting works on related design:http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdfhttp://apache.hpi.uni-potsdam.de/document/4_3Multitasking_server.html

Jim Gray is really amazing....however I did a little bit of digging on the host of "Behind the Code",
Barbara Fox.

She's pretty amazing too.

Barbara Fox
Barbara Fox is a Senior Software Architect, Cryptography and Digital Rights Management for
Microsoft Corporation. She is also currently a Senior Fellow at the
Kennedy School of Government at Harvard. She serves on the National Academies of Science Committee on "Authentication Technologies and Their Implications for Privacy," the Technical Advisory Board of "The Creative Commons,"
and the Board of Directors of the International Financial Cryptography Association. Ms. Fox joined Microsoft in 1993 as Director of Advanced Product Development and led the company's electronic commerce technology development group. She has co-authored Internet
standards in the areas of Public Key Infrastructure and XML security. Her research at Harvard focuses on digital copyright law, public policy, and privacy.

Immediately prior to Microsoft, Ms. Fox was President of SystemSoft America, a Macintosh software development company in Palo Alto, California, and in addition she was a consultant to Visa International. Between 1981 and 1984, she was Engineering Development
Manager for AppleTalk at Apple Computer.

Isn't really disappointing and ironic that you have been running Terraserver for almost 10 years without any customer product coming out of it and if it was not for Google and its maps, perhaps it might have stayed that way? Where was MSN all these years?

So ... when can we expect SkyServer to be integrated into Local Live / Virtual Earth?

BTW, I love the font (Franklin Gothic Book) used in the OP. It's looks real crisp and clean. Channel9 should take care to use these kinds of nice default fonts in the next UI refresh.

...While I'm on the subject, that'd be a nice priority for all Microsoft's online properties. It's a bit baffling how Microsoft spent so much money to create fonts that render well on screen, such as Georgia, yet most sites seem to default to Arial. Many BlogSpot
themes, in contrast, use Trebuchet MS. Innovation is great and all, but don't forget to leverage existing investments!

PerfectPhase"This is not war, this is pest control!" - Dalek to Cyberman

Isn't really disappointing and ironic that you have been running Terraserver for almost 10 years without any customer product coming out of it and if it was not for Google and its maps, perhaps it might have stayed that way? Where was MSN all these years?

These are more blue sky projects, they are meant to feed technology into the product groups rather than being a product in it's own right. If you tie something like this to a commercial offering you lose the ability to make the experimental breaking changes.

To say that nothing has come from it is wrong, a lot of the scalability improvements of products like SQL-Server have come from projects like this.

The stuff on pipelines has quite a literature: "Loading databases using dataflow parallelism" is 10 years old now:
http://research.microsoft.com/~gray/papers/Parallel_DB_Load.doc.
a more recent (6-year-old) effort is at
http://research.microsoft.com/~gray/river/. but the real action is happening now with things like Google's Map-reduce, Sawzall, and such (you can search for them on Google). Those guys are working with thousands of machines and so are "really" doing it,
rather than just talking about it. I think SQL Server 2005 Integration Services is a good way of thinking about dataflow, and of course BizTalk is a dataflow system, but they are not doing the incredible partition parallelism scaleout (yet) that we need to
deal with thousands of machines.

The Kilroy thing is subtle -- which is the point. You are right to be lost. Sorry. It's subtle. Concurrency is subtle. Avoid it if you can. OK, fair warning. If you can't resist, there is a longer writeup of it in the book co-authored with Andreas Reuter
("Transaction Processing Concepts and Techniques".) As Barbara Fox pointed out, it is massive and massively expensive (sorry -- no one is getting rich on it, its just a very small market). Anyway, you can peek at it by going to Amazon and doing a "search
inside the book" under Kilroy. Something like:
http://www.amazon.com/gp/reader/1558601902/104-0609859-0703169?v=search-inside&keywords=Kilroy

You are right, Barbara Fox is indeed AMAZING. Jennifer Sisti also deserves HUGE credit for all the research she and Barb put into this event. I had no idea when I got into it that they would make it such a production -- they did. I felt kind of embarrassed
to be so off-hand about it when they were so professional. But... they wanted spontaneity, and they go it.

As for the Terraserver, it was part of Encarta, part of Home Advisor, part of MapPoint, and also a poster-child for web services. It was also a great laboratory for us to try out our scalablity and availabilty ideas. We got a LOT of mileage out of it. But...
now it is part of local.live.com (part of MSN). Every research guy's dream, the product guys took our reseach toy away from us. Now we have to think up something new for them to "steal" in a 10 years. My one regret is that we had all the AJAX stuff to make
the maps very interactive back in 2000, but we did not deploy it because it was IE5+ only. We wanted "reach" to all platforms and so lost the high end. Now all the other browsers have caught up, and we got leapfrogged. It's a good lesson. But the Virtual
Earth (aka local.live.com) folks are working hard to leapfrog the current leaders. It is fun to watch the innovation in this space. Competition is GREAT!

-------------------------------------------
That's it for now. I will try to answer the next batch of questions in a few weeks.

Good heavens, never thought I'd be reminded by the man himself to clean the dust of that timeless piece.. let me fix my face, the jaw has gone pretty low here

Dear Dr. Gray,

Could never understand how deep all the transactional science can get (until I picked up that incredibly detailed work and lost myself pretty soon), all while it is so abstracted we never see much of it in day to day work or in different, simplified models..
which is greatness no doubt. So now given myself the task to get that data structure implemented, utilised as well as look for some good testing of few SQL server batteries..

While my lousy opinion is that AJAX is not going to fire away anywhere fast or too succesful (before it is replaced with another name and method at least;-), I believe I can see where the 'regret' hint is coming from looking at how far ahead MS was back
then (and how long it takes for things out of research to resurface in commercial world).. The issue seems that VML and RDS if that was the correct name and far more were just too much for web pages back in those days. Broadband wasn't taken up as widely,
machines were far slower and storage was still expensive.. Anyway, I just believe that HTML is still slow for highly interactive apps and rendering engines just don't seem to scale with number of visible or out-of-viewport elements.. enter Java hacks etc..
don't see devices coping with much of it either but sure things are getting better slowly.

My favourite comment on the show was on the heat problem as my own teacher always insisted it will have to be hit and pretty soon (his estimates were something like c2010 back in 1997). He always said that's exactly when the algorithm guys (and researchers
as he led that department) will finally 'take over' and see great satisfaction and demand for the work they did/do.. Just thought of mentioning it in the context of something most of us mere mortals will never experience or see

To not bore anyone any more, I guess it is a common requirement today to process huge amounts of data (I guess a need for it to be compacted too but another topic I guess). I want to stick to SQL Server (if for nothing else than for many thing said, shown
here and the awesome interview).. Sure and for performance, tools and more. And now I'm hitting SQL hard as I can, transaction logs go in 1GB increments in space of a minute and I'm expecting real-world scenario to push that far higher.. therefore anything
bound to a single machine, single point is out of question.. easy to say, hard to implement especially as low latency query is a major requirement for the project. I avoid DTC as much as I can and almost always get away (I think;), sure all specific to a problem
etc. So ok I now get it has to scale out, it better do, and it has to be distributed because of locking nature of loading large datasets efficiently..

I was always thinking replication and versioning approaches were a way forward for such scenarios.. cache approach I don't have much desire for, IMDB was dumped before , addressable space will keep growing etc.. hence looking for insight from let's face
it..

Ok, gotta do this, someone point me where in the world could you publicly ask for advice from a Turing Award legend..

I would first like to extend my gratitude to you for being such a forward thinker, you know forging ahead and being realy creative with the things that you do. I would also like to say a thanks for making the interview an experience. Even through video,
I could get the sense that you are great at what you do.

All that aside, I have a few questions:

1. How has problem solving made you a better manager of projects?

2a. Is set theory an innovation to object oriented programming?

2b. In your experience what makes the set theory so effective?

3. How can a developer become more effective or efficient?

4. What is one of the things that excite you about technology?

It would be an honor to get your feedback, I could perhaps hope to kind of implement the framework that you have created by being such an innovator.

Yes, the transactional stuff makes your head hurt, and we are still exploring that space and learning new things. Vista comes with 3 TMs (Kernel, Light-Weight, and Distributed). Making them play together and making it all transparent has been a REAL challenge.
It’s a LONG story why there are 3 but each one has a good reason for existence.

The AJAX regret is that the product guys invented it (for Outlook Web Access) and we research guys were the reactionaries. The regret is that I was retro -- shame on me. I had good excuses at the time (reach to all platforms) but I was wrong.

Yes Moore's wall (the heat barrier) is going to force us to go parallel. Frankly, we are all stumped how "normals" will program in parallel. My best hope is something like dataflow (Excel Recalc, SQL parallel Query, Google MapReduce, ... ). But,.. at the moment
they all seem like one-trick ponies. The algorithms guys have been building us libraries, but we need environments not libraries. The parallelism has to be in the outer loop, not the inner loop.

As for the Turing thing and the Legend thing, I confess great embarrassment. I know how little I know and struggle with Visual Studio and SQL and Win32 just like everyone else. As you know, programming is really humbling. I get reminded most every day how
really stupid I am. So, I am glad to chat with a fellow programmer.

1. Better manager? Let the record show that my management plan is to hire over-achievers and then ask them to produce monthly reports. My job is then to keep them from killing themselves next month. Mostly by giving them pats on the back and telling them
that they are accomplishing a LOT. That's standard Management by Objectives -- and I got it early from my first job selling encyclopedias door to door (we had quotas). But, I think it is fair to say that I am NOT a good manager -- I do not enjoy it and
I tell everyone that. But I want to work with good people and I want to walk to work most days and so I have to be the manager.

2. I am not sure I understand the question. Set Theory predates OO by about 100 years.

3. Set theory is successful because it is simple and indeed it is a way to talk about numbers (including transcendental numbers) and forms the basis for discrete and continuous math and also logic. The book Gödel-Escher-Bach makes this point quite clearly
and is a good read if you have the time.

4. How can a developer be more effective and efficient? I think the simple answer is think more write less. If you are like me, you are lazy and just sit down and write the code. I really have to force myself to think. Then the phone rings or an email
arrives or some other distraction comes up. So, thinking is both very hard and at least for me, requires some quiet time -- a scarce commodity these days.

5. What's exciting? My problem is that almost everything interests me. The challenge is focusing on a FEW things and making a contribution there. Long term, I think we are on a path to make intelligent life and extend human life indefinitely, and completely
change the human condition (Kurtzweil's "the singularity is near".) That's pretty exciting Jim

Do you think Microsoft as a company is reaching a Digital Equipment Corporation moment in its history? How would you say Microsoft now compares to DEC when you arrived there?

I note warmly in your posts that you say "competition is great." As a consumer of various software companies' products, I could not agree more. Perhaps the competition will spur Microsoft to save itself.

Finally, I cannot stop myself from asking -- though I know full well that you may not feel comfortable answering, and in that sense I am imposing on you just by voicing the question -- how do you feel about Steve Ballmer? I think he may have literally gone
over the edge and become insane. Which both saddens and scares me, but I guess "it is what it is" as they say.

MS == DEC? Microsoft is huge (65k people), and so it is different from the 10 person, 100 person, 1,000 person, and 10,000 person groups that it grew from. Organizations that large have dysfunctional parts, just like people have dysfunctional parts. It
comes with complexity. Is it like DEC (where I worked for 4 years in the early 90's) or IBM (where I worked in the 70's)? No! It is very different. Part of the difference is that it is still growing fast; that covers a multitude of sins and engenders
optimism. Part of the difference is that upper management is still in touch with the technology and the business (Ken Olson and John Akers were not). Microsoft has had many near-death experiences (OS2, NetWare, WordStar, Lotus, Mac, Netscape, AOL, Linux, Google,
...). It lives on paranoia -- most of the folks I work with know that if we do not innovate, we will not be working together in a few years. Those are big differences.

Ballmer crazy? Steve Ballmer, is he insane? I think not. First appreciate that Steve graduated from Harvard in mathematics. So his IQ is probably higher than yours or mine (I am told he played poker all through school and won.) I couldn't even get into
Harvard. I majored in math and did OK at Berkeley (but I had to study, no time for poker.) Harvard math is HARD. OK, so Steve was smart once. Next fact, Steve is very involved with his family -- his kids, his wife, and his friends. He is a billionaire
but he is very earthy and personable. This is not an act -- he genuinely cares about people. You would love to have him as a next door neighbor or as a pal. OK, so how can one possibly explain his strange behavior at Microsoft marketing events (e.g. the
jumping monkey and such)? Well, remember the part about playing poker? Steve can bluff, Steve can act, and Steve LOVES to win -- he is a competitive animal ("our fair share of the OS business is 100% and it up to our competitors to deny us our fair share
and it is up to us to build products that merit that fair share.") So, suppose you are going to a marketing event and you want to get your audience's attention -- you want to energize them. How are you going to do it? Doing the monkey dance is one way.
Have you got a better idea? On a related story, suppose you are Bill Gates and one of your senior techies comes to you with a not very well thought out idea. If that person is a master-of-the-universe arrogant testosterone Microsoft techie who is a big
wheel in his organization and takes no guff from his underlings and peers, how are you going to get his attention? Sad to say, polite comments will not penetrate -- unfortunately you have to be incredibly loud, rude, and blunt just to get the message through.
I bet you have heard such stories. But, with “normals” Bill is a real gentleman.

Will be a nonsense post from me as usual but perhaps something useful for my record (I keep this list close-by) of 'Top 10, pointless, time-wasting failures' while developing software..

It took a while, can only imagine how busy everyone over there is, but reading all the Qs and replies made my day and presents an interesting flow of reasoning perhaps. And at least I can show off now, point to this site as evidence when some 'silly' argument
develops in pointilistic-culture-friendly company all my collegues work in

Thanks very much for your time Dr. Gray and for those little hints that keep the mind hungry and, how to put it, just sweet enough to question everything, try something different all the time and hopefully invoke change and more.

On parallelism, it seems a lot of new and old MSFT VS guys seem to be occupied enough with it for us to anticipate that is where the next edge will be for quite some time now (apart from other VM work I see on this site but have no time to view videos of
or read about). Might seem a no-brainer to many (they say ignorance is a bliss, but competativness on milliseconds timescales isn't one , and to write code (or to have an intuitive-enough platform) for such environment is surely couple of orders of magnitude
harder to build than just looking to avoid deadlocks or selecting locking granularity that might be optimal for some application. It's like, err, having a terrifyingly humble computing legend around, not many people about that can be and remain that way
Thus quite likely not many people will program parallel (including myself) for a long time either.

So what's left.. personally, I don't believe much in generalisations or sticking to a single environment for that matter. Reason probably being our ideas change allthe time or that everyone is psd off with everyone else, it is only natural to be moody and
try 'your own thing' TM. What strikes me in this day and age, or wisening process I wish/hope, is to make an advance no matter what the method, utilise it and make the money before someone else does the same. This in turn ideally envokes the change for greater
good, like health, like eliminating poverty and more (enter notes on Balmer and Bill which were all enlightening and beautiful read at the very least).

We perhaps don't need an environment initially, just constructs to show the results and the rest (any generalisation if there is a need for them) should appear from it. The good old 'responsibility pushed to the user'. What I am aiming at (after some layman-type
rumblings with myself) is a thought that in order for parallelism to really work is to have benefit from it, timely and more accurate data; all the software or platforms etc in the chain, or all interaction, are ideally designed with it in mind. Thus I find
it is no surprise to see very little commercial bits available, esp where money is milliseconds, and even when some are found (like was it cambridge STL bits I believe), they are useless because there is an integration with a system unable to benefit from
it.

Enter queues. But also enter the fact that no generalisation can be found for multi-writer, multi-reader (thread is the wrong term, parallel suits better I guess) scenarios. That + logic to assemble it all in such fashion that the 'serial' workhorse (historical-term:Viper
extension) might get its logic (our apps) work done on just the dataset that is relevant, ie. the latest 'shapshot' is nowhere around.

I like to draw parallels to real-life (ie. no machines exist) all the time; information that is old is no longer useful for a good number of new ideologies or applications, about time someone deals with it yet it is almost a revelation in 2006, say Windows
Mobile 2005 devices picking up email. As an example, at this time I am watching Intel's stock break out after a terrifying blow for almost 2 years now. What happened there is anyone's guess but my snapshot is only interested in whatever the arbitration logic
decides is relevant at a point in time it chews the input, not necessarily the past (ie. almosty like queue filtering with some context data to help keep the search/sort operations from parallel input down or 'temporalised' ). Sure the workhorse (inner loop)
can utilise further 'tool or software helped' parallel processing if it is built that way.

The rest in my mind has little benefit from parallel execution we like it or not and sure, algorithms and software tools can identify such things but will it work all the time, ie are these bits inner loops, and what headache will it give to software developer?
Your comment on one-trick ponnies seems to be on similar lines (unless I am terribly confused, which isn't an issue here

Much the same the hardware guys like DEC kicked off and those bits sold to was it Compaq then Intel etc, ie. Alpha architecture or at least ideas from it. Thus, I believe all we need is some good hardware abstractions in forms of MWMR queues, extension points
for our state/temporal/filtering logic, and (hate the term btw) 'pushed' and 'high-res versioned' data to help us build those. Sounds easy in this quick nonsense write up of mind, but probably hard enough to even begin with for most. Form-type, history and
other 'audit-friendly' apps can wait, they had priority for far too long and they caused no revolution apart from WWW, blogs and Google, which is not small but no Holy Grail either

In any case, I gave up on big data storage (and compression of such) idea as reliability bits came into play (off goes the Morse&Isaac,Snodgrass etc out of my short lifespan:). Life gives such incredible reasons to not be obsessed with detail or clutter,
yet all programmers fall for and fall in love with it.

Remove this comment

Remove this thread

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation,
please create a new thread in our Forums, or
Contact Us and let us know.