Where machine learning meets spring planting and big data intersects with farming big and small, two Washington University in St. Louis researchers at Olin Business School have devised a computational model so farmers and seedmakers could take the
guesswork out of which particular variety of, say, soybean to plant each year.

It's simple enough that a farmer could receive a recommendation, in order of simulated success rates, containing the five best seed types to grow given the average yields, weather conditions and soil composition of his or her region.

In other words, optimization comes to agriculture, via tools as simple as a laptop, algorithms and decision-making frameworks. The researchers created a web tool they called SimSoy.

Because of its application to business -- in this instance, the agribusiness that drives America's heartland and helps to feed the globe -- this forthcoming paper was selected as the 2018 Olin Award winner for research that transforms business.

"This could be done for any farm, and a wide variety of crops," said Durai Sundaramoorthi, senior lecturer in management at Olin Business School. "You could do it for corn. You could do it for peanuts," he said.

Sundaramoorthi is a co-author on the paper "Machine Learning Based Simulation and Optimization of Soybean Variety Selection" with Lingxiu Dong, professor of operations and manufacturing management at Olin.

"This year's winners of the Olin Award developed a novel approach to a major farming issue," said Richard J. Mahoney, former CEO of Monsanto and a distinguished executive-in-residence at Olin, where he initiated the $10,000 prize. "Today, farmers are
inundated with seed choices for annual planting -- they must choose from hundreds of types which have been optimized by the seed companies for a variety of outcomes. For example, one seed might be very good for early planting but not as high yielding if rain
delays planting. Others offer improved yield with certain insect pressures.

"The method developed by our authors allows the farmers to make tradeoffs based on best judgment of planting conditions, their risk/reward appetite and many other choices with a user-friendly model," Mahoney said. "Truly an innovation in aid to farmers, but
also having potential in retail and medical applications, allowing quantifiable tradeoffs among product offerings."

The idea for the research started as the two researchers sat in a plane on the tarmac after an annual Institute for Operations Research and the Management Sciences (INFORMS) conference in Philadelphia in 2015. Not long after those initial discussions about
how to frame a meaningful study to enter into INFORMS' Analytics Crops Challenge, they produced this study -- the winner of the INFORMS Data Mining Paper award late last fall. Sundaramoorthi is a judge in INFORMS' upcoming corn competition.

"Significant progress has been made in agricultural science in developing seed varieties with genetic traits desirable in different planting environments, and the yield performance of those seed varieties in many test fields are documented in large datasets," Dong
said. "We are intrigued by the opportunity to help farmers around the world, who often have limited access to, and the knowledge of, processing the big data. This way, we can make the best use of what agricultural science offers."

Sundaramoorthi, who in a similar vein previously optimized the assignments of nurses for a north Texas hospital, and Dong launched this study with several PhD students by leveraging big data and predictive models around 182 different varieties of soybean
seeds.

Using 2008-14 big data from Syngenta, a biotechnology company that produces seeds and agrochemicals, the researchers were able to plug in seed, weather and soil/farm data in a swath of the Midwest that stretched from western Ontario through Michigan,
Indiana, Illinois, Wisconsin and into South Dakota and Nebraska. That gave the researchers tens of thousands of data points, which, when multiplied by the 182 seed varieties and the 1,000 simulations of weather predictions at each target site, produced what
seemed to measure a metric ton of information on yield and risk of varieties.

"The machine-learning-based simulation allows us to make predictions of how each of the 182 seed varieties would perform under various weather conditions in a target farm -- the place where those seeds have not been grown before," Dong said.

Researcher Durai Sundaramoorthi explains via animation how they conducted their research analyzing 182 varieties of soybean seeds to plant.

They used descriptive analytics (such as distribution), predictive analytics (machine learning models, kernel density estimation, kernel choice) and prescriptive analytics (yield simulations and portfolio optimizations). The machine-learning models make the
predictions. The simulation uses the machine-learning predictions, and the optimization component of the framework takes risk into account based on the simulations. But that's the difficult, computational work resulting in the web application, which then makes it
an easy process from there.

In the end, the researchers were able to boil down the tool to essentially a 27-question form that includes: latitude, longitude, area, soybean varieties, irrigation, soil types and depths, acreage and yields, among other things. SimSoy then basically spits out a
five-line answer, for each variety that its analytics predict for each farmer. An example from the study:

You have simulated the yield of variety V3.

Following are the results from the simulaton:

- The average yield is 65.93 bushels per acre.

- The risk involved in growing V3 is 3.01 bushels per acre.

- We are 95 percent confident that the yield is going to be between 65.74 and 66.12 bushels per acre.

"You do a portfolio optimization just like in finance," Sundaramoorthi said. "Here, the soybean varieties are like different stocks. Financial portfolios are deeply researched so you can get more for every dollar." If the investor doesn't lessen risk and improve
yield, "that's money left on the table."

They also surveyed a group of Illinois and Iowa farmers, which showed that the farmers want to plant several varieties based on historical data, Sundaramoorthi said.

"The reason they want to plant several varieties is they want to spread the risk, so they aren't relying on just one soybean," he said. "Their concerns are very similar to what we want to solve. But the biggest difference is, they're using intuition and some simple
rules to solve their issues -- they might choose one variety that has done well for them before, and use it on 50 percent of their farm, and two other varieties on 25 percent each.

"Whereas, we are using analytics telling us what should be the right proportion of these different varieties," Sundaramoorthi said.