Sunday, July 1, 2007

Breaking down the thought process of computer science

Last month, Language Log had a post on how people learn to think like their professions. The article was written by a linguist, reporting on recent books by a doctor and a lawyer. According to the article, doctors and lawyers learn to think like their professions by comparing information obtained from patients/clients against the treatments and case law that they studied in school (I haven't read either book yet). From a computer science perspective, the description resembles searching a mental database for facts that match the situation at hand. The significant challenge here is presumably in figuring out which queries to pose against that mental database, as the queries reflect which components of a situation are likely to be relevant, which need to be explored in tandem, which need to be generalized, etc. Having a good mental database is obviously also important, but the query construction process seems more reflective of how professionals think.

How does this compare to thinking like a computer scientist? As with most disciplines, we draw on experience and compare situations to figure out how to solve problems. Query formation remains a significant component of how we think as professionals: a computer scientist needs to know what questions to ask about performance, security, reliability, usability, and a host of other system-related -ilities. But our mental database construction problem seems more substantial as well because of the volatile, unregulated, and still mysterious nature of computational systems.

Both law and medicine build heavily on precedence and legal bounds on practice; this shapes the space in which they search for problem solutions. Computing lacks the legal regulation of medicine and law (recall Parnas' oft-cited call to replace disclaimers with warranties in software). Many doctors and lawyers deal mostly with cases that fit existing precedents (the challenge becomes which ones to apply, but the diseases or situations themselves don't change as fast as computing technologies). Law seems to deal with fewer interacting agents than medical or computing problems; medicine seems to have a richer set of diagnostics for exploring how treatments behaves than we often have for computing systems. Living organisms also seem more fault-tolerant, on the whole, than computing systems, which are still very brittle. On the flip side, computing systems lack the complexity of the human brain or body, but I suspect more average computing professionals have to confront our limits of complexity on a daily basis than do the average doctors (who can refer patients to specialists for complex cases).

When we train students to be computer scientists, we really need to train them in the science of how discrete (as opposed to continuous) systems break. They need to think about how someone might attack the system, circumvent it, or use it for harm. They need to think about how to keep the system maintainable in light of new features and new technologies. Our mental databases really need as many facts about which decisions lead to what problems as well as which lead to what solutions. This is somewhat true of medicine, but I again suspect that average programmers deal with this more often than average doctors (beyond drug interactions, which are fairly well documented).

Not many computing programs really take the study of breakage seriously. We spend a lot of time focusing on systems that do work, that are correct, that perform well, etc. These are all necessary and valuable. But when your goal is to make something break, you ask yourself different questions than when your goal is to make it work: there's a continuum from "working" to "broken" and the missing questions lie in the middle. How many students really learn to stress-test their code, to inject faults and study code behavior, to put their work in the hands of novice users, to attack others' systems so they can think about how someone might attack their own? We have the luxury of working in a science of the artificial in which we can try to break something without compromising an organism's health. How could we best exploit that opportunity within the time constraints of a university education?

2 comments:

"How many students really learn to stress-test their code, to inject faults and study code behavior, to put their work in the hands of novice users, to attack others' systems so they can think about how someone might attack their own?"

From an academic perspective, I do think this is a weak point. In most CS classes at WPI, students are told that their code has to compile and pass a handful of test cases (which are often known at the start of the assignment). Essentially, it just has to be "good enough" and that's the lesson students take away from their education.

For me, the revelation that I probably should do more to test my code came when my MS thesis adviser made the off-hand remark that unchecked errors and edge-cases are how you can tell student code from professional code. At first I was a little hurt, but now, having worked with professional code and with a thorough QA process, I see that he was right.

So, how to fit the study of code quality and stability into a 4-year curriculum? The best thing to do is focus on it throughout all classes. Make it necessary in order to receive an A, especially in the higher-level classes. Students should be taught that "good enough" really isn't.