Sunday, February 26, 2012

Side Project: Crop Planning Software

Elsewhere, I wrote about
the beginning of
growing season and some software I've written to help us out this
year.
The software I was talking about is very spartan right now. It tries to
serve exactly our needs, with just enough user interface so that we can get at
the information we need. If you notice where exactly on Launchpad I'm
currently hosting it, you'll get some idea about how much effort I've put into
making this a real, distributable, useful-to-anyone-else project so far.

What the software does at this point is this:

Load data from a (semi-)structured file (csv, because it's easy to create
and export data in this format using Open Office). The data it can load
describes certain crops and certain varieties of those crops, including
information about start and end of season, required growing days,
anticipated yields, etc.

Plan out a seed order, based on that yield data and additional product data
(also in the input file). Doing this without wasting a ton of money ends up
being something like a solution to the covering problem, due to discounts
for buying greater quantities (sometimes unbelievable discounts, with
marginal costs for additional seed ranging as low as 5% of the base cost).
This is also a very tedious part of the program, as common suppliers offer
seed in well over a dozen different package sizes (with "packages"
with the same name containing different amounts of seed for different kinds
of vegetables, and of course different vegetables requiring different
amounts of seed to produce a particular yield).

Predict various kinds of resource usage at each point in the season.
Resources include things like bed feet (eg, we have 22 beds, each 100 feet
long, so we have 2200 bed feet; our crop plan cannot exceed this, or we'll
have plants that have nowhere to be planted), plug flag usage (where seeds
are started and grow until they're hardy enough to be transplanted outside),
and man hours (there are two of us, we don't want to plant so much that we
would need to hire help to deal with it).

Generate a schedule of when to seed each variety, when to expect to
transplant them outdoors, and when to harvest them. The schedule can be
displayed as a list or it can be generated as
an
iCalendar file and loaded into something like Google Calendar or Apple's
iCal.

These are all pretty basic pieces of information that someone growing
vegetables would want to know. On a small scale, they're the kinds of things
you can plan out in your head, or keep track of on paper. As you want to do
more, though, it can be overwhelming. For example, our schedule for this
season has 376 events on it. I wouldn't have wanted to generate that
manually.

There is also some rudamentary graphing functionality. This is for
visualizing some of the pieces of information I mentioned above (eg plug flat
usage). So far this part has been mostly for fun, as it's hard to make any
additional specific decisions based on the graphs, as opposed to the textual,
numerical output also generated. One thing it has been useful for, though, is
sanity checking the output. It's easier to see a crazy spike or a mysterious
plateau on a graph than in numerical data.

As far as the implementation goes, there's nothing really fancy going on here.
I've added a lot of features that I hadn't originally planned on (or realized
would be useful). As I mentioned, this is a new domain for me to be working
in. There is some unit test coverage now, but I didn't start out doing
test-driven development. This has bitten me a few times already, as some of
the scheduling logic is subtle enough that I can't change it without
introducing bugs. Fortunately that part of the code is somewhat well tested
now. Well, not completely untested, at least. Development has been
test-driven for a month or two now, so I expect things to get easier going
forward.

Everything is written in Python, of course. I used
vobject to generate the
iCalendar output, with pytz to help with the timezone math (oh, timezones, how
I loathe you). A pleasantly small amount of code suffices for that.

I used matplotlib and
dateutil to generate the
graphs. I have a tolerate/hate relationship with matplotlib. It clear
does a lot of stuff, and I've seen people use it to good effect. Most of its
functionality escapes me, though, and I can hardly learn about a new API
without observing that it is completely terrible. Still, I used it because it
can do the job, and better than the other options, in my experience.

For
the
highly tedious structure definition, I used a class from
Epsilon.
epsilon.structlike.record is a lot like the Python standard
library collections.namedtuple. Any time I used the latter,
though, I remember how it is implemented and I feel bad. So I stick to the
former.

I also used Twisted and html5lib to write
a simple web scraper to turn variety names into Johnny's product
identifiers. Even if ordering seeds this way ends up being a one-off task,
writing the scraper to get this information was definitely easier than chasing
down product identifiers in a Johnny's catalog or from the Johnny's website,
which each have their own... unique approach to organization. I
asked Johnny's if they could make this information available in any sort of
structured format and they told me they couldn't. Maybe I should sell it back
to them?

Many features are still missing from the planning software. Some of them are
simple, like reporting how many flats to seed in the iCalendar event it
generates, instead of just reporting how many bed feet will be used after the
seeds germinate and are transplanted out into the field. Others are a bit
bigger, like having a more coherent model for the underlying data. I might
want to put this off until the end of the season, when I might have a better
idea if I've fully understood the underlying data myself.

I don't expect this to be useful to a lot of people. In case this sort of
tool does appeal to you, though, I'd love feedback (particularly from people
more experienced with planning and executing these kinds of agricultural
tasks) - but no feature requests, please :)

Weird! I'm using Python 2.7, but I don't think that makes anydifference in this case. This looks like a bug in dateutil. I'musing dateutil 1.4.1 (the version packaged for Ubuntu 11.10). Itlooks like you might have dateutil 2.0, which only works withPython 3.x (not Python 2.6 or 2.7). If that's the case, can youtry going back to dateutil 1.4 or 1.5?

About Me

I'm a software professional with over 15 years industry experience ranging from startups (with as few as four people) to multinational banks. I've built network software, database software, user-facing software, backends, distributed systems, games, business engines, application servers, and more. I've lead teams and followed leaders (great and otherwise).

I'm also deeply interested and involved in environmental protection, clean food, and how agricultural systems impact human health. I live and work on a small farm with my family building first-hand experience with as many of the related systems as I can. You can read more about that side of my life on my other blog.

Supporting Open Source

I'm a prolific contributor to free and open source software projects, both on a volunteer and paid basis. I greatly appreciate donations to support the volunteer efforts. Feel free to let me know which software you're interested: it's always great to hear from users and a downside of a lot of this work is not getting to hear from the people who use the result very much.