A Critical Systems Bloghttps://leepike.wordpress.com
Thoughts on safe and secure embedded computer systemsMon, 16 Mar 2015 00:14:04 +0000enhourly1http://wordpress.com/https://secure.gravatar.com/blavatar/c2ede898327d02503441840e70158104?s=96&d=https%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.pngA Critical Systems Bloghttps://leepike.wordpress.com
15 Minuteshttps://leepike.wordpress.com/2015/02/09/15-minutes/
https://leepike.wordpress.com/2015/02/09/15-minutes/#commentsMon, 09 Feb 2015 16:31:19 +0000http://leepike.wordpress.com/?p=729]]>Some of the work I lead at Galois was highlighted in the initial story on 60 Minutes last night, a spot interviewing Dan Kaufman at DARPA. I’m Galois’ principal investigator for the HACMS program, focused on building more reliable software for automobiles and aircraft and other embedded systems. The piece provides a nice overview for the general public on why software security matters and what DARPA is doing about it; HACMS is one piece of that story.

I was busy getting married when filming was scheduled, but two of my colleagues (Dylan McNamee and Pat Hickey) appear in brief cameos in the segment (don’t blink!). Good work, folks! I’m proud of my team and the work we’ve accomplished so far.

You can see more details about how we have been building better programming languages for embedded systems and using them to build unpiloted air vehicle software here.

On social media, this turn of phrase is almost a meme. The sentiment is reasonable at some level—computers are complicated, buggy, and insecure. But the general public is largely indifferent, or least, they accept the situation for what it is: Computer freezes? turn it on then off again. Smart phone insecure? Well, the convenience of digital wallets outweighs the risk. My parents, who are not computer scientists, never say that all computers are terrible.

Rather, this is a complaint from within, by computer scientists themselves. What drives this thinking? (While not specifically addressing the cause of the sentiment, this blog post provides a nice counterpoint.) No immediately obvious hypotheses seem particularly explanatory:

Ludditism: perhaps despite (or because of) our work in technology, computer scientists abhor it. But I find this explanation wanting, particularly because the focus of contempt is specifically about computers, and not about technological advancement generally.

Pessimism: perhaps there is a correlation between computer scientists and pessimists. Faced with the challenge of building good systems, they focus on the flaws rather than the progress. But it does not adequately explain the focus of the pessimism on computers rather than life in general—I don’t see posts stating that “all relationships are terrible”.

Realism: perhaps computers are terrible. Computer science history is measured in decades, and the approach to building systems is immature compared to other engineering disciplines. I tore down my old garage this summer. The city required a blueprint, erosion plan, a drainage plan, and a few hundred dollars for me to do so. They inspected the work afterwards. Contrast this with software: how much oversight does writing security-critical code get? Just about anyone can throw code up that purports to solve a problem. Still, given that programs are among the most complex artifacts built by humans, we’re doing pretty well.

Narcissism: finally, perhaps a tinge of narcissism drives these comments, something along the lines of, “where others have failed, I will succeed.” It is given as a tongue-in-cheek put-down regarding the work of fellow professionals. The motivation is there: if we thought that the tools of computer science were “good enough”, then the world wouldn’t need new programming languages, tools, or libraries. On the other hand, these kind of complaints are not just about directly competing work, and there will always be new inventions to make.

I tend not to hear the same kind of complaints from members of other professions. Yes, we all complain about our jobs, but not necessarily about our tools. Carpenters don’t say, “All hammers are terrible,” even if they complain about their company, their coworkers, the architect, the inspectors. Doctors don’t complain about X-ray machines, MRIs, or scalpels (but they might complain about their patients, insurance, and administration). Farmers don’t complain about the design of combines and tractors. Now, they might complain about a specific instance—“My tractor is terrible; it breaks down all the time.” But not about tractors, in general.

Other professions largely take their tools as a given. Indeed, most practitioners know so little about how the tools they use are built, it is difficult to critique them. A carpenter does not know how a skill saw is built and a doctor does not know how an X-rays are taken. Just whether they work or not and if they are useful or not.

Computer science is unique insofar as the tool makers are so closely connected with the tool users. There’s another glib quip in computer science: “patches welcome” meaning that if you (the user) thinks a tool should be improved, then it is your responsibility to make those improvements yourself. In other fields, the saying does not exist; it would be laughable! A taxi driver can’t be expected to invent a more fuel-efficient automobile; an architect can’t be expected to build better modeling software.

Computer science is unique in this respect, and it has philosophical, psychological, and sociological implications. Imagine, if you will, you are a carpenter, and you have some understanding of how the tools that you have are designed and built. Your skill saw is right-handed, but you are left-handed; with a few modifications, you realize its design can be abstracted, to accommodate either handedness. When your drill dies, instead of simply replacing it, you open it up, find the faulty circuit, and realize there’s a weakness in the wiring that allows it to overheat. When each of your tools fails you in some way, you uncover some flaw and realize they could be improved. Finally, you come to expect failure from every tool you might ever use: “All hammers are terrible.”

We all know that technology—whether we understand how it is built or not—has flaws. We know tradeoffs are made between cost, generality, performance, and reliability. There are more and less reliable, performant, and costly cars, saws, X-ray machines.

The general public certainly has opinions and a sense of the tradeoffs between the tools they use, e.g., Windows vs. Mac OS, Android vs. iOS. But the sentiment is that these are tools, with flaws and lifespans, just like a cars, appliances, and power tools.

I don’t have a strong opinion about whether computer scientists should complain or not about their tools, except, obviously, that general complaints don’t effect much change. My point is more philosophical: we find ourselves in an interesting profession, where we can open our very tools up, peer inside, and change them.

It’s a fascinating prospect.

]]>https://leepike.wordpress.com/2014/12/29/all-hammers-are-terrible/feed/1leepikeUsing GHC HEADhttps://leepike.wordpress.com/2014/10/10/using-ghc-head/
https://leepike.wordpress.com/2014/10/10/using-ghc-head/#commentsSat, 11 Oct 2014 04:23:44 +0000http://leepike.wordpress.com/?p=717]]>I wanted to try out a recent feature in GHC this week, so I was using HEAD. When I say using, I mean it: I wasn’t developing with it, but using it to build Ivory, our safe C EDSL, as well as some libraries written in Ivory. I want to point out a few dragons that lay ahead for others using HEAD (7.9.xxx) today.

The Functor, Applicative, Monad hierarchy (as well as the Alternative and MonadPlus hierarchy) is no longer a warning but an error. You’ll be adding a lot of instances to library code.

You’ll need to update Cabal, which comes as a submodule with the GHC sources. Unfortunately, building Cabal was a pain because of the bootstrapping problem. The bootstrap.sh script in cabal-install is broken (e.g., outdated dependency versions). Getting it to work involved using runhaskell directly, modifying a bunch of Cabal’s dependencies, and getting a little help from Iavor Diatchki.

Some of the datatypes in Template Haskell have changed; notably, the datatypes for creating class constraints have been folded into the datatype for constructing types (constraint kinds!). Many libraries that depend on Template Haskell breaks as a result.

I won’t mention the issues with package dependency upper bounds. Oops, I just did.

Fortunately, once Cabal is updated, Cabal sandboxes work just fine. I wrote a simple shell script to swap out my sandboxes to switch between GHC versions. (It would be a nice feature if Cabal recognized you are using a different version of GHC than the one the cabal sandbox was originally made and created a new sandbox automatically.)

Because I’m on OSX and use Homebrew, I tried using it to manage my GHC versions, including those installed with Brew and those built and installed manually. It works great. When building GHC, configure it to install into your Cellar, or wherever you have Brew install packages. So I ran

> ./configure --prefix=/usr/local/Cellar/ghc/7.9

which makes Brew aware of the new version of GHC, despite being manually installed. After it’s installed,

> brew switch ghc 7.9

takes care of updating all your symbolic links. No more making “my-ghc-7.9″ symbolic links or writing custom shell scripts.

SmartCheck focuses on efficiently shrinking and generalizing large counterexamples. I thought it’d be fun to try some of Matt’s examples with SmartCheck.

The kinds of properties Matt Checked really aren’t in the sweet spot of SmartCheck, since the counterexamples are so small (Matt didn’t even have to define instances for shrink!). SmartCheck focuses on shrinking and generalizing large counterexamples.

SmartCheck is only interesting for failed properties, so let’s look at an early example in Matt’s blog post where something goes wrong. A lot of the blog post focuses on generating sufficiently constrained arbitrary red-black trees. In the section entitled, “A property for balanced black depth”, a property is given to check that the path from the root of a tree to every leaf passes through the same number of black nodes. An early generator for trees fails to satisfy the property.

To get the code to work with SmartCheck, we derive Typeable and Generic instances for the datatypes, and use GHC Generics to automatically derive instances for SmartCheck’s typeclass. The only other main issue is that SmartCheck doesn’t support a `forall` function like in QuickCheck. So instead of a call to QuickCheck such as

> quickCheck (forAll nrrTree prop_BlackBalanced)

We change the arbitrary instance to be the nrrTree generator.

Because it is so easy to find a small counterexample, SmartCheck’s reduction algorithm does a little bit of automatic shrinking, but not too much. For example, a typical minimal counterexample returned by SmartCheck looks like

T R E 2 (T B E 5 E)

which is about as small as possible. Now onto generalization!

There are three generalization phases in SmartCheck, but we’ll look at just one, in which a formula is returned that is universally quantified if every test case fails. For the test case above, SmartCheck returns the following formula:

forall values x0 x1:
T R E 2 (T B x1 5 x0)

Intuitively, this means that for any well-typed trees chosen that could replace the variables x0 and x1, the resulting formula is still a counterexample.

The benefit to developers is seeing instantly that those subterms in the counterexample probably don’t matter. The real issue is that E on the left is unbalanced with (T B E 5 E) on the right.

One of the early design decisions in SmartCheck was focus on structurally shrinking data types and essentially ignore “base types” like Int, char, etc. The motivation was to improve efficiency on shrinking large counterexamples.

But for a case like this, generalizing base types would be interesting. We’d hypothetically get something like

further generalizing the counterexample. It may be worth adding this behavior to SmartCheck.

SmartCheck’s generalization begins to bridge the gap from specific counterexamples to formulas characterizing counterexamples. The idea is related to QuickSpec, another cool tool developed by Claessen and Hughes (and SmallBone). Moreover, it’s a bridge between testing and verification, or as Matt puts it, from the 80% to the 20%.

]]>https://leepike.wordpress.com/2014/08/20/smartchecking-matt-mights-red-black-trees/feed/1leepikeSmartCheck: Reduxhttps://leepike.wordpress.com/2014/06/16/smartcheck-redux/
https://leepike.wordpress.com/2014/06/16/smartcheck-redux/#commentsTue, 17 Jun 2014 05:11:36 +0000http://leepike.wordpress.com/?p=690]]>A few years ago, I started playing with the idea of making counterexamples from failed tests easier to understand. (If you’re like me, you spend a lot of time debugging.) Specifically, I started from QuickCheck, a test framework for Haskell, which has since been ported to many other languages. From this, a tool called SmartCheck was born.

In this post, I’m not going to describe the technical details of SmartCheck. There’s a paper I wrote for that, and it was just accepted for publication at the 2014 Haskell Symposium (warning: the paper linked to will change slightly as I prepare the final version).

Rather, I want to give a bit of perspective on the project.

As one reviewer for the paper noted, the paper is not fundamentally about functional programming but about testing. This is exactly right. From the perspective of software testing in general, I believe there are two novel contributions (correct me ASAP if I’m wrong!):

It combines counterexample generation with counterexample reduction, as opposed to delta-debugging-based approaches, which performs deterministic reduction, given a specific counterexample. One possible benefit is SmartCheck can help avoid stopping at local minima during reduction, since while shrinking, new random values are generated. Update: as John Regehr points out in the comments below, his group has already done this in the domain of C programs. See the paper.

Perhaps the coolest contribution is generalizing sets of counterexamples into formula that characterize them.

I’d be interested to see how the work would be received in the software testing world, but I suppose first it’d need to be ported to a more mainstream language, like C/C++/Java.

Compared to QuickCheck specifically, QuickCheck didn’t used to have generics for implementing shrinking functions; recent versions include it, and it’s quite good. In many cases, SmartCheck outperforms QuickCheck, generating smaller counterexamples faster. Features like counterexample generalization are unique to SmartCheck, but being able to test functions is unique to QuickCheck. Moreover, I should say that SmartCheck uses QuickCheck in the implementation to do some of the work of finding a counterexample, so thanks, QuickCheck developers!

When you have a tool that purports to improve on QuickCheck in some respects, it’s natural to look for programs/libraries that use QuickCheck to test it. I found that surprisingly few Hackage packages have QuickCheck test suites, particularly given how well known QuickCheck is. The one quintessential program that does contain QuickCheck tests is Xmonad, but I challenge you to think of a few more off the top of your head! This really is a shame.

The lack of (public) real-world examples is a challenge when developing new testing tools, especially when you want to compare your tool against previous approaches. More generally, in testing, it seems there is a lack of standardized benchmarks. What we really need is analogue of the SPEC CPU2000 performance benchmarks for property-based testing, in Haskell or any other language for that matter. I think this would be a great contribution to the community.

In 1980, Boyer and Moore developed a linear-time majority vote algorithm and verified an implementation of it. It took until 1991 to publish it after many failed tries. Indeed, in the decade between invention and publication, others had generalized their work, and it being superseded was one of the reasons reviewers gave for rejecting it! (The full story can be found here.)

To a small extent, I can empathize. I submitted a (slightly rushed) version of the paper a couple years ago to ICFP in what was a year of record submissions. One reviewer was positive, one luke-warm, and one negative. I didn’t think about the paper much over the following year or two, but I got a couple of requests to put the project on Hackage, a couple of reports on usage, and a couple of inquiries about how to cite the draft paper. So after making a few improvements to the paper and implementation, I decided to try again to publish it, and it finally will be.

As I noted above, this is not particularly a Haskell paper. However, an exciting aspect of the Haskell community, and more generally, the functional programming community, is that it is often exploring the fringes of what is considered to be core programming language research at the time. I’m happy to be part of the fringe.

]]>https://leepike.wordpress.com/2014/06/16/smartcheck-redux/feed/7leepikeEmbedded Security in the Mainstreamhttps://leepike.wordpress.com/2014/04/01/embedded-security-in-the-mainstream/
https://leepike.wordpress.com/2014/04/01/embedded-security-in-the-mainstream/#commentsWed, 02 Apr 2014 04:21:04 +0000http://leepike.wordpress.com/?p=681]]>Science Friday had an interesting podcast with Bruce Schneier worth a listen. It discussed security in embedded systems (or as is hip to say now, the Internet of Things). The idea of security in embedded systems seemed pretty obscure just a few years ago, so it’s nice to see it being featured on a general-audience radio show.

So who’s going to listen to it from their phone, connected to their car’s audio over Bluetooth, on the way to the store to buy a Nest smoke alarm?

]]>https://leepike.wordpress.com/2014/04/01/embedded-security-in-the-mainstream/feed/0leepikeStop Regression Testinghttps://leepike.wordpress.com/2013/12/15/stop-regression-testing/
https://leepike.wordpress.com/2013/12/15/stop-regression-testing/#commentsMon, 16 Dec 2013 03:19:19 +0000http://leepike.wordpress.com/?p=624]]>Some of my colleagues and I have been building a large software application, the open-source SMACCMPilot autopilot system, using Haskell-based embedded domain-specific languages (EDSLs) to generate embedded C code. Our thesis is that EDSLs both increase our productivity and increase the assurance of our application; so far, we believe our thesis is holding up: we’ve created a complex 50k lines-of-code application in under two engineer years that is (mostly) free of undefined and memory-unsafe behavior.

Of course, there are always bugs. Even with a fully formally-verified system, there are likely bugs in the specification/requirements, bugs in the hardware, or bugs in the users. But our hope is that most of our bugs now are logical bugs rather than, say, arithmetic bugs (e.g., underflow/overflow, division by zero) or resource bugs (e.g., null-pointer dereference or buffer overflows).

However, since the EDSL we are using, Ivory, is relatively young—we’ve been writing SMACCMPilot in the language while developing the language—there have been cases where we have not checked for unsafe programs as stringently as possible. For example, while we insert arithmetic underflow/overflow checks in general, we (ahem, I) neglected to do so for initializers for local variables, so an initializer might overflow with an unexpected result.

If we were writing our application in a typical compiled language, even a high-level one, and found such a bug in the compiler, we would perhaps file a bug report with the developers. If it were an in-house language, things would be simpler, but modifying a large stand-alone compiler can be difficult (and introduce new bugs). Most likely, the compiler would not change, and we’d either make some ad-hoc work-around or introduce regression tests to make sure that the specific bug found isn’t hit again.

But with an EDSL, the situation is different. With a small AST, it’s easy to write new passes or inspect passes for errors. Rebuilding the compiler takes seconds. Fixing the bug with initializers took me a few minutes.

So introducing regression tests about the behavior of initializers isn’t necessary; we’ve changed the compiler to eliminate that class of bugs in the future.

More generally, we have a different mindset programming in an EDSL: if a kind of bug occurs a few times, we change the language/compiler to eliminate it (or at least to automatically insert verification conditions to check for it). Instead of a growing test-suite, we have a growing set of checks in the compiler, to help eliminate both our bugs and the bugs in all future Ivory programs.

Edit (16 Dec. 2013): a few folks have noted that the title is a bit strong: it absolutely is, that’s probably why you read it. :) Yes, you should (regression) test your applications. You might consider testing your compiler, too. Indeed, the point is that modifying the compiler can sometimes eliminate testing for a class of bugs and sometimes simply supports testing by inserting checks automatically into your code.

]]>https://leepike.wordpress.com/2013/12/15/stop-regression-testing/feed/5leepikePrintf No Morehttps://leepike.wordpress.com/2013/10/23/printf-no-more/
https://leepike.wordpress.com/2013/10/23/printf-no-more/#commentsThu, 24 Oct 2013 04:35:04 +0000http://leepike.wordpress.com/?p=615]]>I wondered to a colleague if there is a way to get GDB to print the value of a variable when it changes or when a certain program location is reached without breaking; in effect, placing printfs in my program automatically.

His Google-foo was better than mine he and found a stackoverflow solution. For example, something like

set pagination off
break foo.c:42
commands
silent
print x
cont
end

prints the value of x whenever line 42 in file foo.c is reached, but just keeps running the program. You can stick a little script like this in your .gdbinit file, which you can reload with

source .gdbinit

This is particularly useful if you’re debugging a remote target that doesn’t support print statements. It’s a cool trick for embedded systems debugging.

Over on the Galois blog is a post about my current project, building a secure high-assurance autopilot called SMACCMPilot. SMACCMPilot is open-source; http://smaccmpilot.org/ is the project’s landing page that describes the technologies going into the project with links to the software repositories. Check it out!

Galois’ main approach to building SMACCMPilot is to use embedded domain-specific languages (EDSLs), embedded in Haskell that compile down to restricted versions of C. (If you’re not familiar with the “EDSL approach”, and particularly how it might improve the assurance of your systems, check out this paper we wrote about our experiences on a separate NASA-sponsored project.) The project is quickly approaching one of the larger EDSL projects I know about, and it’s still relatively early in its development. The EDSLs we’re developing for SMACCMPilot, Ivory (for embedded C code generation) and Tower (for OS task setup and communication), are suitable for other embedded systems projects as well.

Like many young and active open-source projects, the code is hot off the presses and under flux, and the documentation always lags the implementation, so let me know if you try using any of the artifacts and have problems. We’re happy to have new users and contributors!

]]>https://leepike.wordpress.com/2013/10/07/smaccmpilot/feed/0leepikepx4ioar_400Book Review: Automate Thishttps://leepike.wordpress.com/2013/05/27/book-review-automate-this/
https://leepike.wordpress.com/2013/05/27/book-review-automate-this/#commentsTue, 28 May 2013 00:53:14 +0000http://leepike.wordpress.com/?p=556]]>A family member gave me a copy of Automate This by Christopher Steiner as a gift a few months ago. The subtitle of the book is “How algorithms came to rule our world.” The book is a non-technical, fast, and easy read. I did read a few interesting stories, such as NASA’s use of personality categorization algorithms in the 70s to predict which potential astronauts would work well together, or a math professor’s algorithmic dissection of Beatles songs. The book particularly emphasizes algorithmic trading on Wall Street. The somewhat non sequitur conclusion is that we need more science, engineering, and math graduates.

The main point of the book is that algorithms are pervasive and will become more so. Toward this point, I think the author could have cast an even wider net, mentioning that algorithms are implemented in everything from elevator controllers to autopilots. There is a cursory pre-computing history of algorithms at the beginning of the book that is (tenuously) tied to a Wall Street trading.

Rather than focus on algorithms broadly, the focus is patchy, hitting on machine learning, high-speed trading, and game theory. Some mention of algorithms as might be taught in a “Analysis of Algorithms” course, covering basic topics like time-space complexity and decidability, would help prevent the general public from having a narrow-minded interpretation of algorithms. Computer scientists invent, analyze, and implement algorithms every day, but much of that work is perhaps not sexy enough for a popular science book.

Incidentally, for the topics Steiner does cover, an excellent companion treating similar topics, but with a different focus, is Nate Silver’s (of 538 fame) book, The Signal and the Noise.

]]>https://leepike.wordpress.com/2013/05/27/book-review-automate-this/feed/1leepikeA Dane Goes to NASAhttps://leepike.wordpress.com/2012/11/03/a-dane-goes-to-nasa/
https://leepike.wordpress.com/2012/11/03/a-dane-goes-to-nasa/#commentsSat, 03 Nov 2012 19:27:37 +0000http://leepike.wordpress.com/2012/11/03/a-dane-goes-to-nasa/]]>Last year, Nis Wegmann came from the University of Copenhagen to work with me on a NASA-sponsored project. He was recently interviewed about his experiences.

Nis worked at the National Institute of Aerospace in Virginia, just outside the NASA Langley gates. Since Galois is on the other side of the U.S., we mostly collaborated remotely. (For a taste of what its like to intern at Galois itself, check out Ed Yang’s post.) Nis is an outstanding programmer and great writer, so it made working together effortless, despite the distance.

Good luck in your future endeavors, Nis!

]]>https://leepike.wordpress.com/2012/11/03/a-dane-goes-to-nasa/feed/0leepikeThe Worst Offendershttps://leepike.wordpress.com/2012/10/13/the-worst-offenders/
https://leepike.wordpress.com/2012/10/13/the-worst-offenders/#commentsSat, 13 Oct 2012 18:21:54 +0000http://leepike.wordpress.com/?p=539]]>I listened to a Freakonomics podcast the other day in which a story about hand-washing at Cedars-Sinai Medical Center was told. It turns out that of all the staff, the doctors had the lowest rate of hand-washing (around 65%, if I recall correctly) in the hospital. This was surprising, since the doctors supposedly have the best education and so should know the most about the dangers of bacterial infection.

The chief of the staff tried different means to increase hygiene. Incentives like gift cards were tried. Then he tried another approach: shame. Doctors had their hands cultured so that they could see the bacteria on their hands. Images of the cultures were even used as a hospital-wide screensaver on the computers.

I listened to this wondering if there are any parallels to formal verification engineers and researchers. Ostensibly, we are the experts on the perils of software bugs. Academic papers often begin with some story about bugs in the wild, motivating the new formal verification technique described therein.

But to what extent do we practice what we preach? How many of us might write a Python script with no unit tests? C/C++ programs without running Lint/CppCheck? Compile without fixing all the -Wall warnings the compiler emits?

These kinds of activities represent the lowest rung of software assurance; they’re the “hand washing of software assurance”, if you will. I’m certainly guilty myself of not always practicing good “software hygiene”. The justifications you might give for failing to do so is that you’re just writing a program for personal use, or you’re writing prototype software, “just for research”. I could imagine doctors telling themselves something similar: “I always scrub thoroughly before a major surgery, but it’s not such a big deal for a simple office visit.” But this justification can become a slippery slope.

It could be an interesting empirical study to analyze the software quality of open-source formal verification tools to see just how buggy they are (spoiler alert: they are buggy). What I’m interested in are not deep flaws that might be caught via theorem-proving, but simple ones that could be caught with a little better test coverage or lightweight static analysis.

]]>https://leepike.wordpress.com/2012/10/13/the-worst-offenders/feed/2leepikeLowering the Barhttps://leepike.wordpress.com/2012/10/02/lowering-the-bar/
https://leepike.wordpress.com/2012/10/02/lowering-the-bar/#commentsWed, 03 Oct 2012 05:47:48 +0000http://leepike.wordpress.com/?p=542]]>I gave a talk (video, slides, and paper) at ICFP last month, arguing that it can be easy to build a high-assurance compiler. I gave a similar talk as a keynote a couple weeks later at the very enjoyable Midwest Verification Day, hosted by Kansas University this year (thanks Andy Gill and Perry Alexander for inviting me!). This paper wraps up the Copilot project. I had a great time (I mean, how often do formal methods engineers get to be around NASA subscale jet aircraft?!?).

]]>https://leepike.wordpress.com/2012/10/02/lowering-the-bar/feed/0leepikeFlight teamSmartCheckhttps://leepike.wordpress.com/2012/07/26/smartcheck/
https://leepike.wordpress.com/2012/07/26/smartcheck/#commentsThu, 26 Jul 2012 18:54:10 +0000http://leepike.wordpress.com/?p=525]]>I’ve been working on a Haskell library for testing Haskell programs I call SmartCheck. SmartCheck is focused on testing algebraic data and generalizing counterexamples found. Below is the README for SmartCheck, which I have located on GitHub (I haven’t put it on Hackage yet). The following is a high-level explanation that doesn’t go into details about the algorithms or implementation (that’s another post!).

I’d be interested in feedback on

Real-world examples to try SmartCheck on.

Whether there are other interesting ways to generalize counterexamples.

If there’s similar work out there I should know about (in addition to QuickCheck and SmallCheck.

Your experiences, if you try the library out.

Thanks!

Synopsis

SmartCheck is a smarter QuickCheck, a powerful testing library for Haskell. The purpose of SmartCheck is to help you more quickly get to the heart of a bug and to quickly discover each possible way that a property may fail.

SmartCheck is useful for debugging programs operating on algebraic datatypes. When a property is true, SmartCheck is just like QuickCheck (SmartCheck uses QuickCheck as a backend). When a property fails, SmartCheck kicks into gear. First, it attempts to find a minimal counterexample to the property is a robust, systematic way. (You do not need to define any custom shrink instances, like with QuickCheck, but if you do, those are used. SmartCheck usually can do much better than even custom shrink instances.) Second, once a minimal counterexample is found, SmartCheck then attempts to generalize the failed value d by replacing d‘s substructures with new values to make d', and QuickChecking each new d'. If for each new d' generated, the property also fails, we claim the property fails for any substructure replaced here (of course, this is true modulo the coverage of the tests).

SmartCheck executes in a real-eval-print loop. In each iteration, all values that have the “same shape” as the generalized value are removed from possible created tests. The loop can be iterated until a fixed-point is reached, and SmartCheck is not able to create any new values that fail the property.

A typical example

In the package there is an examples directory containing a number of examples. Let’s look at the simplest, Div0.hs.

Because SmartCheck performs data-generic operations using GHC.Generics we have to derive Typeable and Generic. To use GHC.Generics, you also need the following pragmas: and the single automatically-derived instance:

In this example, we won’t redefine any of QuickCheck’s standard arguments, but it’s certainly possible. the treeShow field tells SmartCheck whether you want generalized counterexamples shown in a tree format or printed as a long string (the default is the tree format).

Oh, that’s confusing, and for such a simple property and small datatype! SmartCheck takes the output from QuickCheck and tries systematic shrinking for the one failed test-case, kind of like SmallCheck might. We get the following reduced counterexample:

Ok, that’s some progress! Now SmartCheck attempts to generalize this (local) minimal counterexample. SmartCheck has two generalization steps that we’ll explain separately although SmartCheck combines their results in practice (you can turn off each kind of generalization in the flags). First, SmartCheck tries to generalize values in the shrunk counterexample. SmartCheck returns

Ahah! We see that for any possible subvalues x0, the above value fails. Our precondition divSubTerms did not account for the possibility of a non-terminal divisor evaluating to 0; we only pattern-matched on constants.

In addition, SmartCheck tries to do something I call constructor generalization. For a datatype with a finite number of constructors, the idea is to see if for each subvalue in the counterexample, there is are subvalues that also fail the property, using every possible constructor in the datatype. So for example, for our counterexample above

We find another counterexample; this time, the main constructor is addition.

We might ask SmartCheck to find another counterexample:

...
*** Extrapolating ...
*** Could not extrapolate a new value; done.

At this point, SmartCheck can’t find a newly-shaped counterexample. (This doesn’t mean there aren’t more—you can try to increase the standard arguments to QuickCheck to allow more failed test before giving up (maxDiscard) or increasing the size of tests (maxSize). Or you could simply just keep running the real-eval-print loop.)

]]>https://leepike.wordpress.com/2012/07/26/smartcheck/feed/11leepikeA Fistful of Dollarshttps://leepike.wordpress.com/2012/03/07/a-fistful-of-dollars/
https://leepike.wordpress.com/2012/03/07/a-fistful-of-dollars/#commentsThu, 08 Mar 2012 05:13:26 +0000http://leepike.wordpress.com/?p=522]]>A while back I attended an excellent DARPA Proposers’ Day. It got me thinking, “If I had a fistful of (a few million) dollars to provide researchers, what sort of program might I start?” Two ideas struck me:

Getting software right, fast

Software is usually developed in a world with incomplete requirements and with users (and attackers) that try to break your assumptions. Furthermore, more often than not, developers themselves (perhaps with a QA team, if they’re lucky) largely decide when they’re “done”—when a version is bug-free enough, secure enough, and meets the specifications to release.

A major portion of the challenge would be to fund experiments that are large enough to draw statistical conclusions about different approaches. Teams would get points for finishing faster, but lose points for bugs, security vulnerabilities, unimplemented features, etc. uncovered by a separate “read team”. To make the experience more realistic, I’d have career programmers (not students in an undergrad course) to participate in the experiments. A major portion of the program would be just designing the experiments to ensure statistical conclusions could be drawn from them, but I imagine the results might be enlightening.

A Believe Search Engine

Imagine a “search” engine in which you could search not for facts, but beliefs. For example, I might query the likelihood that someone believes that the Earth revolves around the Sun in Russia in 1600. Or I might ask what the likelihood of someone believing in global warming in Japan in 2012. Or whatever else. The engine would return some probability, perhaps with a confidence interval. I’d imagine being able to query some believe based on geography, time period, religion, nationality, sex, and so forth. Of course this is a hard, hard problem, and would likely involve search as well technology akin to IBM’s Watson. I’d be curious to know if it were feasible at all.

]]>https://leepike.wordpress.com/2012/03/07/a-fistful-of-dollars/feed/0leepikeWhen Formal Systems Killhttps://leepike.wordpress.com/2012/02/18/when-formal-systems-kill/
https://leepike.wordpress.com/2012/02/18/when-formal-systems-kill/#commentsSun, 19 Feb 2012 00:54:17 +0000http://leepike.wordpress.com/?p=518]]>What happens when a philosophy professor (with a M.S. in Computer Science) and a formal methods engineer (like me) collaborate? The outcome in my case is a paper entitled, “When Formal Systems Kill: Computer Ethics and Formal Methods” (pdf). In the paper, we give an ethical treatment of the use of formal methods. Our goal in the paper is not so much to make definitive claims on the subject but to raise awareness of the issue for further philosophical treatment. The subject has been neglected by ethics philosophers.

Darren (my coauthor) and I had a hard time deciding where to try to publish this. We ended up submitting it to the American Philosophical Association’s (APA) newsletter on Philosophy and Computers, where it was accepted for the Fall 2011 edition. (The APA is the “ACM of Philosophy” within the United States.)

The paper was a number of years in the making, being a side-project for me. I can’t say I agree with everything we wrote, but I hope it serves as a useful starting point for the dialogue between ethics and formal methods.

]]>https://leepike.wordpress.com/2012/02/18/when-formal-systems-kill/feed/3leepikeWho’s Afraid of Software?https://leepike.wordpress.com/2012/01/19/whos-afraid-of-software/
https://leepike.wordpress.com/2012/01/19/whos-afraid-of-software/#commentsFri, 20 Jan 2012 06:46:03 +0000http://leepike.wordpress.com/?p=513]]>Who’s afraid of software? I mean viscerally, stomach-knotting afraid. Afraid like you might be when you come across a snake or a bear, or when you are mugged. Do you obsess about a phishing attack each time you open your email? Do you worry there’s an eavesdropper when you join the open wifi access point in a coffee shop? Do you worry your software will fail in your modern automobile or aircraft?

I listened to a Freakonomics podcast about risk, uncertainty, and beliefs. One point made during the show was that our fears are shaped by evolution—to our ancestors, it made sense to be afraid of threatening animals. In modern life, however, our fears don’t match risks—we’d be much better off being afraid of cheeseburgers, as pointed out in the show. Some people are afraid of modern risks. I know people afraid of cancer, for example.

That got me thinking about fearing software. Software is certainly among the most complex artifacts created by humans. Modern cars contain 100+ million lines of code. Nearly every day there is a story about a large corporation being hacked and of cyber-warfare between nations.

My question is serious—I really do wonder if people are genuinely afraid of software. I work in the area of software assurance, and while I take precautions against viruses, phishing attacks, etc., I don’t particularly worry about software failures, even when my life might depend on it. This is despite issues just last year like this and this in automotive software. I get to see a somewhat how the sausage is made, and in general, we only exercise a small fraction of the state-space of deployed software in validation and in actual usage. There are legitimate risks, but there seems to be very little fear.

Perhaps like a medical doctors stereotypically neglecting their own health, I don’t worry day-to-day about software assurance despite working in the field. But it seems nobody else really fears software, either.

In the podcast, the topic of polarizing claims, like global warming, is discussed. Outside of academic circles, one’s view on the risks of software are not so polarizing—your views on the topic won’t cause your friends or colleagues to disparage you (indeed, if anything, the main risk is likely boring others in discussing the topic!). I wonder just what the “global warming” of software might be in the future.

]]>https://leepike.wordpress.com/2012/01/19/whos-afraid-of-software/feed/0leepikeCopilot featured in Aerospace Americahttps://leepike.wordpress.com/2011/12/10/copilot-featured-in-aerospace-america/
https://leepike.wordpress.com/2011/12/10/copilot-featured-in-aerospace-america/#commentsSat, 10 Dec 2011 19:03:36 +0000http://leepike.wordpress.com/?p=506]]>My Copilot project, sponsored by NASA, was featured in the Aerospace America magazine “Year in Review” edition (December 2011). We recently finished up the project, which focused on solving the problem of failure in complex embedded systems (like flight control systems in commercial aircraft). Our approach was a language called Copilot for runtime monitoring of embedded systems corroborated with experiments on airspeed sensor systems and an autopilot communication link.

Here’s a link to the page from the magazine mentioning us from the author’s page. (The relevant paragraph is in the middle of the right column).

Thanks to Galois and the National Institute of Aerospace where the work was done, the NASA folks we collaborated with, and especially Alwyn Goodloe, Nis Wegmann, Sebastian Niller, and Robin Morisset who worked on the project with me.

]]>https://leepike.wordpress.com/2011/12/10/copilot-featured-in-aerospace-america/feed/3leepikeStable Names in Haskellhttps://leepike.wordpress.com/2011/11/26/stable-names-in-haskell/
https://leepike.wordpress.com/2011/11/26/stable-names-in-haskell/#commentsSat, 26 Nov 2011 23:37:39 +0000http://leepike.wordpress.com/?p=499]]>Stable names in GHC “are a way of performing fast (O(1)), not-quite-exact comparison between objects.” Andy Gill showed how to use them to extract the explicit graph from writing recursive functions in his Data.Reify package (and associated paper). It’s a great idea and very practical for embedded domain-specific languages—we’ve used the idea in Copilot to recover sharing.

For a function with type constraints, stable names fails to “realize” that we are pointing to the same object. As a couple of my colleagues pointed out, the cause is the dictionary being passed around causing new closures to be created. Simon Marlow noted that if you compile with -O, the explicit dictionaries get optimized away.

Here are the solutions I have to “fixing” the problem, in the context of a DSL:

Tell your users that recursive expressions must be monomorphic—only “pure functions” over the expressions of your DSL can be polymorphic.

Implement a check in your reifier to see how many stable names have been created—if some upper-bound is violated, then the user has created an infinite expression, the expression is extremely large (in which case the user should try to use some sharing mechanism, such as let-expressions inside the language), or we’ve hit a stable-names issue.

Ensure your DSL programs are always compiled.

Of course, you can always take another approach, like Template Haskell or not using recursion at the Haskell level; also check out Andy Gill’s paper for other solutions to the observable sharing problem.

I don’t see how to use (deep)seq to fix the problem, at least as it’s presented in the example above, but I’d be keen to know if there are other solutions.

]]>https://leepike.wordpress.com/2011/11/26/stable-names-in-haskell/feed/4leepikeFM@Galois Talkshttps://leepike.wordpress.com/2011/11/10/fmgalois-talks/
https://leepike.wordpress.com/2011/11/10/fmgalois-talks/#commentsFri, 11 Nov 2011 06:33:35 +0000http://leepike.wordpress.com/?p=495]]>As I posted over on the Galois blog, I recently gave a talk a couple of times on formal methods at Galois over the years (pdf). It’s been fun putting theory into practice!]]>https://leepike.wordpress.com/2011/11/10/fmgalois-talks/feed/0leepike