This means that the callers will need to specify the message template and hotel displays will pass in the hotel-specific template whereas other segment types will pass in the "vendor" message.

The developer on the team working on this asked me why I wanted to see this change, commenting that this approach was easy and kept the calling code terse and even allowed the segment type argument to be optional. I'll be honest, I had a hard time explaining all of the reasons why I wanted to see this change made.

I had the privilege of attending Øredev again this year; it was the first week of November. At the conference, I presented two sessions that were both related to my experience this year using Node and React for building Line-of-Business applications at Concur.

Learning Node After a Career on Microsoft

In my first session, I presented my story of how I started using Node and React after previously dedicating my entire career to the Microsoft stack.

This was not a technical session, but rather one that covered the mental blockers we have in making platform changes, how I overcome those blockers and realized that building applications on the Microsoft platform for 15+ years actually trained me to make bold platform changes. I then covered the tools and processes I used to learn Node as a new platform, the mistakes I made and saw others making, and highlighted some key take-aways. Attendees of the session told me the guidance of "learn concepts, not libraries" was the most impactful lesson for them.

Session Abstract

After 15 years building web applications on the Microsoft stack, and several years working at Microsoft on ASP.NET and NuGet, I found myself using Node.js and React. Come hear how I survived the boldest change of my career and how you too can overcome the challenge of a platform change.

Building Line of Business Apps with Isomorphic React/Node

My second session was a technical one, illustrating the details about the Flux architecture pattern that works well with Node and React. But before I got to that point, I went through a historical recap of how I've seen LOB apps designed over the last 15 years, going all the way back to Classic ASP and showing the progression toward MVC and what problems arose when we started building rich applications with lots of client-side interactions.

By telling this tale of how LOB web apps have evolved, we can see that it's time for history to repeat itself and how Node/React/Flux can help us work through another evolution.

Session Abstract

Did you know that React and Node can be used to build good old-fashioned line of business applications? We'll look at how!

You see, we all grew up building web applications with server-side rendering. Then we were convinced that we should render in the browser--but that proved to be a maintenance nightmare for LOB applications. With React, Node, and Fluxible, we can build apps that initially render on the server and have the client take over from there. Best of all, we can do this by using a single programming model that you'll realize you already know.

While working through sample after sample for Node.js and React.js, I experienced a pattern that wasn’t very helpful. Instead of truly starting from scratch, the samples kept walking through step-by-step of cloning a working solution. They’d start with “Step 1: paste this fully-working code into this file” and “Step 2: paste this fully-working code into this other file.” I was having a hard time finding a breakdown of the concepts being applied.

I wanted to learn by starting truly from scratch and building the app up in logical, incremental steps. To accomplish the goal of learning this new material one concept at a time, I created a new project and then documented each new concept that was introduced in a giant README.md file. I then transformed the giant README into a 19-step tutorial web site using GitHub Pages.

If you are feeling overwhelmed trying to learn Node and React, you might benefit from this QuickReactions tutorial.

RIA

My family and I moved to Redmond almost 7 years ago so that I could join Microsoft. After 13 years in the industry, it was my dream job: Creating a UI Framework that enterprise application developers would use for their web applications. The project was Alexandria, which became WCF RIA Services, and it helped developers use Silverlight for Line of Business applications.

As I blogged about when I moved out here, I had put together a 5-year plan for how to get a job at Microsoft building UI frameworks, but I ended up getting that job within just a few months. I was thrilled to work on RIA and have the opportunity to create software that became part of the .NET Framework and shipped to my mom’s computer. It was an honor to take what I learned building user interfaces for dozens of enterprise applications and create a framework that countless developers could benefit from.

With my experience on RIA, I got a taste of delivering frameworks and tools to large developer audiences, and that became a new direction for me.

Growing Scope

While working on RIA Services, I watched NuGet come into existence and I was immediately sold. While NuGet was still quite nascent, I started pitching to the RIA team that we should abandon our MSI and instead ship RIA as a collection of NuGet packages. When I became the dev lead for the project, it was one of the first new efforts I invested time into. I even blogged about my excitement around NuGet. NuGet became what I wanted to work on.

By chance, shortly after that blog post, a re-org happened. NuGet was going to become part of my group—and it didn’t have a dev lead! I jumped at the opportunity to become the project’s dev lead; in order to pick up ownership of NuGet I also needed to take ASP.NET Web Pages and Razor as well. Sold! Suddenly, I was the dev lead for WCF RIA Services, WCF for Silverlight, NuGet, www.nuget.org, ASP.NET Web Pages, Razor, and a couple of other small projects. And there were 6 developers on my team. I immediately started working on how the team could become dedicated to NuGet.

NuGet

When I became the dev lead for NuGet, version 1.5 had just shipped. We then shipped 1.6, 1.7, 1.8, 2.0, and several more releases leading up to NuGet 2.8. For over 2 years, we averaged 11 weeks between RTM releases, with an average of 85 issues addressed in each release. At the same time, we completely redesigned the www.nuget.org gallery, re-implemented it from the ground up to run in Azure on the latest ASP.NET MVC bits, and we did the work in the open on GitHub.

NuGet grew and grew. Our usage was doubling time and time again. The project matured from being a “toy” that was used only for ASP.NET projects into something that almost every project system in Visual Studio was benefiting from. I spent a great deal of my time selling NuGet to teams and groups around the company, gaining broader and deeper adoption. It was exciting to watch the tables turn as we gained more acceptance. Over time, teams were coming to us instead of the other way around. Visual Studio started fixing bugs that made NuGet better. NuGet had arrived.

Integration

It was inevitable—its users wanted NuGet to become more natural. They wanted deep integration with the project systems—not just macros over top of VS actions. They wanted integration with the project templates. And with the build system. With every aspect of the development lifecycle, NuGet should be there and be supported. NuGet needed to become part of the platform.

This is where we are today. NuGet is no longer a toy—it’s truly become a first-class aspect of how developers work on the Microsoft platform. There is still a lot of work to get done to accomplish the goals we’ve set, but I believe the direction is right and the project is on path to get there.

Rewarding Projects

When I recognized that NuGet was on path to become part of the platform, I started thinking about what would be next. What would be the next round of goals for the project? And secondarily, what would be the next round of goals for me? Don’t get me wrong, there is still a lot of work to be done for NuGet to succeed in these goals—the team and the project have plenty of room for improvement, but I started assuming we’d succeed in execution on those items. So I sought out what passions I wanted to follow as I reached my 20th anniversary in the field.

At Øredev 2014’s speaker’s dinner at City Hall in Malmö, I was talking with someone from Jayway about passions and what makes a project rewarding. She asked me what the most rewarding project was that I’d ever worked on. My knee-jerk reaction was to name NuGet. But I held back and really thought about the question. Was NuGet really it? Was it RIA? Was it the web-based replacement for Ohio’s student information system mainframe? That project actually was more rewarding than NuGet! Was it Statsworld—the web-based fantasy football app that competed directly with CBS Sportsline? What about when I created a web-based system to run a cooking school for Proctor and Gamble? Those were great too! And then I kept going back through my career until I decided what my most rewarding project really was—and it is surprising.

Impact on Individuals

My very first professional software project was in high school. I created a DOS-based CRM system for a math teacher’s husband’s lawn care company. Imagine QuickBooks, but running in DOS. I sold it to him with a bound user manual and a custom printer driver for his dot-matrix printer—for $100. It even had mouse support using a library I created in QuickBasic. I think I made about $0.50/hour on that project and built it on my mom’s computer at her office, working nights after the office had closed.

When we first met, he asked if I could create something to print invoices so that he didn’t have to type each one by hand. I most certainly could. But I started asking him questions about what other routine tasks he had and I asserted that I could automate a great deal of his routine administrative work. When I delivered this software to him and trained him on it, his eyes lit up. A few weeks later when I was delivering a new round of floppy disks with some bug fixes, he told me I saved him about 40 hours per week.

That $100 DOS-based invoicing system for a self-employed lawn care professional is the most rewarding project of my career. That was my answer at Øredev and I knew then I needed to think more seriously about what was next for me.

Following Passions

Looking back on my career, I’ve always had passion for interviewing business owners and employees and finding ways to simplify and automate their administrative tasks. In fact, after I completed that lawn care project, I dreamed of owning my own software company—it even had a (horrible) name: HANDLinc. Computer Programming. My junior year in High School, I was telling people that when I grew up I wanted to create software to help other people run their businesses. In 2000, I co-founded WeDoWebStuff.com and did just that. But somewhere between then and now, I lost sight of those objectives and found myself working on frameworks and tools for developers.

I have decided I want to return to building software for business owners and employees. I want to concentrate on user interfaces that simplify administrative tasks that cannot (yet) be automated. I want to work with non-developers and make their lives better and less frustrating—to make computers work for them instead of the other way around.

Concur

Friday, March 20, 2015 is my last day at Microsoft and my last day working on NuGet.

I start at Concur on Monday, March 23, 2015. I will be following my passions and I am very excited!

FAQ

Are you going to stay involved in NuGet?

I don’t think so. I’m going to be focused on returning to a different kind of work—I have a lot to learn and remember.

Who is taking over NuGet?

Yishai Galatzer is the new Engineering Manager for the NuGet team at Microsoft

Most business systems include some form of backend processing. This could be report generation, data transformations, credit card processing, payment auditing, or countless other scenarios. It’s typical for these systems to pull records out of a queue, perform the necessary processing, and then move on to the next record. When possible, these systems are engineered to process more than one record at a time, reducing overhead and increasing efficiency. Each time a batch processing system is created though, we face a difficult question.

What is the best batch size?

This question is always hard to answer because we know that our development environment will differ from the production environment. To combat this problem, most developers define an environment variable or configuration setting that will control the batch size, and then hard-code a default value if the setting is not supplied. This provides a feeling of comfort that we can change the setting in production without having to update the code. But this approach falls short in many ways.

NuGet Package Statistics

NuGet.org creates records every time a package is downloaded—this happens about 750,000 times per day or 8.5 times per second. The records are stored in the production database in a denormalized table where the raw values can easily be inserted at that pace. Then twice each day, we produce updated package download reports for every package with download activity since the last time its report was generated.

To generate these package download reports, we have backend processes that aggregate total download numbers, replicate the records into a warehouse database, and then purge records that are at least 7 days old and that have already been replicated. Each of these processes works against batches of records; choosing batch sizes for each of them was difficult.

Throughput Factors

When trying to select a batch size for each of these processes, we realized that there are lots of factors that come into play. Here are the variables that we found to have significant impact on throughput:

Current load on the backend processing server (it performs lots of other backend jobs at the same time)

Index fragmentation in the production database

Index fragmentation in the warehouse database

Number of records in the queue

Network latency

Each time any of these factors changed, the previous choice we’d made for our batch sizes become stale. Every once in a while, a batch would fail, cause an error, and raise an operations alert. We would then file a bug: “Stats Replicator cannot process the current batch size without timing out.” There are two obvious fixes for the bug:

Increase the timeout

Reduce the batch size

Either of these “fixes” would get the job unstuck, but then it’s just a matter of time before the change is stale.

The Edge of Failure

Batch processing can be more efficient because it reduces overhead. There’s startup/shutdown time required for each iteration of the process. When you only pay the startup/shutdown cost once but process thousands of records, the savings can be significant. The bigger the batch, the more we save on overhead. But there’s usually a breaking point where giant batch sizes lead to failure. Finding the largest batch size that can be successfully processed often yields the best performance.

To make the backend processes for NuGet.org as efficient as possible at all times, I created an approach that discovers this breaking point and then automatically adapts batch sizes to achieve the best throughput attainable within the current environment.

Defining Batch Size Ranges

Instead of defining a single batch size setting to be used, the new approach uses a pair of parameters to specify the minimum and maximum batch sizes. These batch sizes aren’t guesses, they are objective numbers with meaning.

Minimum Batch Size

The minimum batch size is truly a minimum. If the system fails to process a batch of this size, it is considered an error and the process will crash. This will lead to an operations alert to inform the team that something is wrong.

Maximum Batch Size

The maximum batch size is the max size that we would ever want to be processed at one time. This number can be selected based on the scenario and it should take into account issues like debugging when a batch encounters a bug. But this number should be as large as you’re comfortable with—don’t worry about what the system will be “capable of” handling—because all of the factors above affect the capability. If you scale your server up significantly, a previously unfathomable batch size may become not only possible, but preferable.

Sampling and Adapting

With a batch size range provided, we can now take samples of different batch sizes. This sampling will produce two important pieces of data:

The edge of failure, where the batch succeeds but larger batch sizes fail (generally by exceeding a timeout period)

The throughput measured for each sampled batch size, in terms of records per second

To accomplish the sampling, we take the following approach:

Process the minimum batch size and record the throughput (records/second)

If a batch size times out, record its throughput as Int32.MaxValue and decrease the maximum batch size by 33%

maxBatchSize = maxBatchSize * 2 / 3;

Once we’ve finished taking our 11 samples (yes, 11, because fenceposts), we then use the sampling data to begin adapting our batch sizes. Each time we’re ready to process another batch, we calculate the next batch size to use. This calculation aims to find the best possible batch size, but we don’t simply want to choose the best batch size we’ve seen so far because there’s usually a batch size better than what we’ve already seen. Instead, we select the best 25% of our batches and then use the average batch size across them.

We will then use this size to process the next batch. We’ll record its throughput and add it into our samples. As we continue to process more batches, we’ll have a larger pool of sample values to select our 25% best batches from, and we’ll be averaging out more batch sizes. But because previous batch sizes were selected based on the averages in the first place, the result is zeroing in on the batch size that yields the best throughput.

Adapting

After taking these 11 samples, we’ve learned that we can’t seem to get past ~5000 records in a batch without timing out; the maximum successful batch was 4699 at 29 seconds (162/sec). But we also see that within the timeout period, larger batches are providing better throughput than smaller batches. The system will now automatically adapt to use this data.

The samples we've taken can be ordered like this:

4445 (165/sec)

4699 (162/sec)

3070 (162/sec)

4042 (161/sec)

4060 (156/sec)

4015 (154/sec)

2080 (149/sec)

1090 (121/sec)

100 (100/sec)

5040 (Int32.MaxValue/sec)

5356 (Int32.MaxValue/sec)

Considering the best 25% of these values (that will be the top 3), we calculate the average of the batch sizes to be 4071. That will be the next batch size. We’ll time that batch as well, and put its data into the sample set.

As more batches are executed, we’ll see performance fluctuate, batch sizes vary a bit, but ultimately narrow down to a small deviation. After around 100 iterations, the value becomes relatively static. So the next step is to guard against circumstances changing and our data becoming stale.

Periodic Resets

After around 100 iterations, we lose some of our ability to adapt. Even if the times start to get very bad for the batch size we’re zeroing in on, there’s too much data indicating that batch size should be efficient. The easiest way to combat this problem is to perform periodic resets. After 100 iterations, simply reset all sample data and start fresh—take 11 new samples and then run 89 more iterations afterward, adapting anew.

While this reset can lead to a few inefficient batches, it’s an important part of what makes the system fully reliable. If load on the production system or any of the other throughput factors changes, it won’t be long before we reset and discover that we need to change our target range.

The Code

This approach is in use within a few of our backend processes around package statistics. The most straight-forward example is the job that finds package statistics from the production database that have already been replicated over to the warehouse and can now be purged from the production database.

Interesting Methods

Benefits

The biggest benefit I've seen from this approach is that our production system stays alive and efficient all the time. We used to have to tweak the batch sizes pretty regularly. And when our statistics processing fell behind, it could take a long time to catch up because our batch sizes were conservative. Now, the batch sizes can get more aggressive automatically, while ensuring we avoid timeouts.

Overall, these processes are now much more hands-off. If we need to increase throughput, we can scale a server up and the process will automatically take advantage of the improvement and use bigger batch sizes if that yields better results. But if the system is under load, the process will automatically back off if smaller batch sizes are proving to run at a steady pace.