Molevolve: Library for modeling peptide structures using GAs

Molevolve is a Java library for running a Genetic Algorithm (GA) to model the 3-dimensional structures of peptide chains from amino-acid sequences. Client code can specify its own peptide chain model, fitness functions and GA operations.

Molevolve is released under the open-source GNU Lesser General Public License (LGPL). This means you can download and run Molevolve for free for personal or commercial use. Furthermore, you can incorporate or bundle Molevolve into your own products as long as you abide by the conditions of the LGPL.

I know some of our bloggers work in this area, any thoughts or experiences?

Oei thesis available again

Chris Oei's fine MS thesis (C. K. Oei (1992). Walsh function analysis of genetic algorithms of non-binary strings. Master's thesis, University of Illinois at Urbana-Champaign, Department of Computer Science, Urbana, IL) is available again here. Although it is 13 years old, it still deserves a close reading, if you haven't done so already.

Matlab toolbox list available

GAs used in sequence analysis

A paper published by a team at the University of Southampton reports the use of genetic algorithms to create biologically interpretable blocks within a hidden Markov model for genetic sequence analysis:

The Hidden Markov Models (HMMs) are widely used for biological sequence analysis because of their ability to incorporate biological information in their structure. An automatic means of optimising the structure of HMMs would be highly desirable. To maintain biologically interpretable blocks inside the HMM, we used a Genetic Algorithm (GA) that has HMM blocks in its coding representation. We developed special genetics operations that maintain the useful HMM blocks. To prevent over-fitting a separate data set is used for comparing the performance of the HMMs to that used for the Baum-Welch training. The performance of this algorithm is applied to finding HMM structures for the promoter and coding region of C. jejuni. The GA-HMM was capable of finding a superior HMM to a hand-coded HMM designed for the same task which has been published in the literature.

Company uses GAs in marketing applications

In an article on utility computing (time-sharing on steroids?) I noticed a mention of a company called Genalytics that uses genetic algorithms in a variety of marketing applications. A list of typical services is given here, and the various product offerings are given here. The chief architect behind the evolutionary computating system in Genalytics packages is Whit Stockwell. The company web site is spotty on GA details, but EvoWeb has a little more talk of the GA guts here.

Wednesday, March 30, 2005

Graph my scenario

Rob Smith blogged about IBM tools for document evolution visualization here. Visualization is taking a more prominent place on the path to document and communication analysis. DISCUS, as mentioned in previous blogs here and here, visualization of communications plays a central role on the creativity and innovation support.

One tool that we have been extensively using for KeyGraph visualization is JUNG--an open source project..

JUNG--the Java Universal Network/Graph Framework--is a software library that provides a common and extendible language for the modeling, analysis, and visualization of data that can be represented as a graph or network. It is written in Java, which allows JUNG-based applications to make use of the extensive built-in capabilities of the Java API, as well as those of other existing third-party Java libraries.

Tuesday, March 29, 2005

Learning Classifier Systems and Model Building

I'd like to point out two recent IlliGAL reports written by Martin Butz et al., namely the IlliGAL Reports No. 2005010 and 2005011, which discuss the use of Bayesian networks of BOA and marginal product models of ECGA in the well-known XCS learning classifier system of Stewart W. Wilson. This work shows that just as machine learning helped genetic algorithms to automatically identify and process important building blocks, it can help learning classifier systems to automatically identify and process more complex features (making them more broadly applicable as a result). These are just first steps, but I think that they are important ones. I hope to see more papers related to this topic in future.

This topic also reminds me of one analogy I thought about when writing my dissertation few years back. I think that feature extraction in machine learning and effective processing of building blocks (especially the mixing) are very closely related. While in machine learning we want to identify features that allow us to learn good models of the data, in optimization we want to identify features that make good solutions good. That's one of the reasons why incorporating machine learning techniques in genetic algorithms lead to so many powerful methods (such as COMIT, ECGA, FDA, BOA, hBOA, IDEAs, etc.)

The two technical reports are based on Martin Butz's thesis, which talks about these topics in greater detail. Of course, as I am one of the coauthors of the two papers, my views may be slightly biased :-)

Take an EC survey

Greg Hornby and Tina Yu of the Evolutionary Computation in Practice track at GECCO-2005 are taking a survey of evolutionary computation educational background and employment here. Why not take a few minutes and fill it out? Also check out the ECP track here. The ECP track is devoted to practicing computational evolutionaries and their work, featuring real-world applications and techniques as opposed to advances in theory or technique.

GAs: The real gay science?

Nosophorus reports at the Genetic Argonaut that his prof in a performance analysis course said that soft computing approaches are for homosexuals (Brazilian political correctness is somewhat different than the North American kind, I guess). I love the smell of a reasoned argument in the morning (see related post on toaster convergence).

Monday, March 28, 2005

They're echoing an item from Hackaday on Mindstorm robots that are designed meet up and exchange code in a sex-like fashion:

" when i released my robots for the first time, the software that i initialized them with was not ideally suited for the environment i placed them in. occasionally they got hung up in corners, or moved around in continuous circles that would never intersect and allow them to see each other. with a small starter community, you could just turn them loose in their environment and let them run for a while. they would genetically optimize their routine to fit their environment. those that couldn’t find each other would essentially become extinct. the rest would adapt to maneuvering in their environment while maintaining communication."

Slashdot is blogging on a document evolution visualization tool, and it's use to view the evolution of Wikis. Given recent IlliGAL blog conversations, this seems like something people might like to check out. The Slashdot original article contains links to downloads and screenshots, so clickthrough.

Sunday, March 27, 2005

Educating a penguin: Paying more for less

I'm back from the almost all-Ivy tour of colleges in the Northeast with my son Max. We started in Boston and ended up in Philadelphia, visiting BU, MIT, Harvard, Yale, NYU, Columbia, Princeton, and Penn along the way.

It was interesting being on the consumer end--as opposed to the professor end--of the university business, but it wasn't a pretty picture. All of the tours and admission sessions were impersonal, with crowds of 100 or more people in each session. Only Penn provided an adequate number of tour guides for the size of the group, marching out an impressive army of almost 20 tour guides. MIT tried to cover a group of a 100 or so people with a single student. Most of the tours went to 4 or 5 sites and filled up the time with material that repeated the repetitious info sessions (Penn was a notable exception with a thoughtful tour that covered the length and breadth of campus). The information sessions were uniformly tedious, providing essentially the same information as all the other info sessions (need blind admissions, need-based financial aid, faculty have office hours, most classes taught by faculty, etc.).

One interesting thing I learned was that the Ivies use a system of credit "units" or "half courses" in place of semester or quarter hours, and this system appears to mask a significant amount of degree duration deflation compared to degree duration at most public universities. For example, Harvard's system requires 32 half-courses for graduation. Math half courses meet three times a week for an hour over the course of a 13 week semester. English and history courses meet between two and three times a week for an hour. Doing the arithmetic, a Harvard undergrad in the liberal arts takes somewhere between 832 and 1248 contact hours of instruction over the course of his or her four years. Yale boasts that its students take 36 units to graduate, but Yale's academic calendar is somewhat shorter; moreover, a quick sampling reveals that many Yale course meet only two hours a week.

Comparing these figures to a public university such as the University of Illinois is instructive. A liberal arts major at the UIUC is required to take 120 credit hours, which roughly correspond to 120 contact hours, over the course of a 14.5-week semester. This totals 1740 contact hours. In other words, the Harvard grad, if my calculation is correct, is supposed to be in class only 48-72% of the total of the UIUC grad. Perhaps the Harvard student is so much better that he or she can learn 40 to 100 percent faster than the UIUC student to make up for the difference.

Harvard charges tuition and student fees totaling roughly $30k (the other Ivies are comparable), and lets assume that the average student has 1000 contact hours of coursework. The average Harvard student is paying $120 per contact hour of instruction. At that rate, a Harvard student could hire a personal tutor at an equivalent annual salary of nearly $250k/year.

Nonetheless, these elite institutions become more and more popular and selective each year, although if the duration deflation continues, it would be more appropriate to call them labeling clubs than schools. And which schools top Max's current list? Yale and Penn.

Friday, March 25, 2005

Google Scholar

Recently Shigeyoshi Tsutsui told about a beta version of Google Scholar, which seems to be a great engine for finding research papers as well as the papers that cite them. For a long time I used CiteSeer Research Index, but Google Scholar seems to be much more complete. Check it out.

History of portable computing

This is via /. and not related to GA, but its interesting nevertheless :). Mobile PC has a nice article on the history of portable computing. Checkout the 1975 IBM's mobile machine weighing a whopping 50lbs and the 2004 OQO Model 01 with the first fully functional Windows XP machine that weighed less than a pound!

Shape-shifting robots

In its newest issue, Discover has an article about shape-shifting robots, called M-TRANs, that rely on gentic algorithms to adapt to the environment. The robot is composed of about a dozen small modules each composed of 2 rotating blocks linked together. Modules can connect to other modules via switching magnets. The M-TRAN design mixes lattice-based designs (good for self-reconfiguration) and chain system designs (for mobility).

The designers (Japanese Distributed Systems Design Research Group) foresee such robots being used for ocean exploration, space station assembly, and rescue operations, where the changing environment makes otherwise special-purpose robots inferior in handling unpredicted damage or obstacles.

Prince Charles may use GAs for table planning at his wedding

Apparently, according to a recent survey by YouGov, the second most difficult part of planning a wedding is doing the table plan. There are about 3.5 million ways to seat just 10 guests in 10 seats. Add family politics and dynamics to it and you get a duanting and time consuming task. According to eMediaWire, now there is a software that automates wedding table planning:

Andrew Brice of Oryx Digital—who struggled to produce the table plan for his own wedding last year—has developed PerfectTablePlan, a Windows software package that aims to make table planning using cut-up bits of paper a thing of the past. The user decides which guests should sit near (or not near) to each other and the shapes and sizes of tables. Guests are then automatically assigned to seats with a single mouse click, saving hours of effort. Elegant table plans can be printed and last minute changes can be accommodated with minimal stress. The automatic seat assignment is achieved using a "genetic algorithm". It is believed to be the first time a genetic algorithm has been used to produce table plans.

Oryx Digital has sent a complimentary copy of PerfectTablePlan to Prince Charles. "Given the endless press reports about tensions in the Windsor family, we thought Prince Charles’ staff could use some help to create a harmonious table plan for his wedding." said Andrew Brice.

Thursday, March 24, 2005

GA-Walk

I happened upon this site the other day. Not new but it's pretty fun to play with. It uses a GA to evolve walking styles for some simple artificial skeleton models (a normal joint constrained human skelaton, a catepillar with no legs, and a drunk skelaton). The Java applet has a fairly nice interface to it - letting you control the population size, mutation rate, and animation and fitness-plotting settings. I wasn't too impressed with the walking coordination but it was nonetheless quite amusing. (It could be that I wasn't patient enough to let it evolve long enough or find the optimal settings) The project was developed by Andrew Dick while at Middlesex University UK.

It makes me wonder if gentic algorithms have been applied to physical therapy patients trying to regain muscle strength and control. MIT has developed a programmable robot to assist in this goal, and is especially suited for stroke victims. The robot speeds the recovery by applying greater resistance as a patient's muscles develop.

Competitions at GECCO

Developmental systems: The challenge is to provide a novel demonstration of an evolved developmental system. The demonstration should consist of a 2 minute video or software run showing your system grow an interesting artefact.

Physical travelling salesman problem (PTSP): This is a new optimization problem that offers a novel twist on the classic TSP: your sales agent now has mass, and obeys simple physical laws: the challenge is to find the sequence of force vectors that propels the agent to visit all cities in the minumum time.

The closing date for both competitions is June 18, 2005 and the winner(s) will be presented a certificate and a small cash prize.

JGAP 2.2 released

JGAP is a genetic algorithms package written in Java. It is designed to require minimum effort to use "out of the box", but is also designed to be highly modular to allow for custom components to be easily plugged in by the more adventurous. JGAP version 2.2 represents a new release bringing in many more test cases, bug fixes, and a new complex example ("function finder").

Monday, March 21, 2005

On the road with a penguin

It's spring break at the University of Illinois and I'm on the road with my 16-year old son, Max (aka, the conservativepenguin). Max is a junior in high school, and we're looking at prospective colleges (along with have other parent in the US with a junior in H.S.). I only have spotty internet access, so I'm depending on the other bloggers at IB to keep posting and keep you posted on the latest in GAs. I'll be back blogging regularly starting next Monday.

Sunday, March 20, 2005

Lucille and the joy of text

When I started working on the DISCUS project, I was not fully conscious of how prominent text mining would become. Two weeks ago we blogged about one of the first big marketing test based on DISCUS. Most of our analysis done on the communications of the focus groups relies on chance discovery tools. However , the more I work with it, the more I run in to common problems share with the text mining field.

Lucille Redmond in the Joy of text presents a nice and comprehensible look to the text mining challenges. Go and take a look at it, you may start realizing common needs share among text mining and the analysis of communications using chance discovery tools.

Saturday, March 19, 2005

British police use GAs to "evolve" suspects' faces

British police are expected to begin trials of a new computerised system, called EvoFIT, developed by Drs. Charlie Frowd and Peter Hancock at University of Stirling. It uses an evolutionary approach, known as a genetic algorithm, to "evolve" faces rather than piece them together.

In one of its early versions, EvoFIT was used by police in Northamptonshire who were trying to catch a violent attacker. The attacker was never caught, but the senior investigating officer, Superintendent Paul Spick, says the witness involved found the new software much quicker and easier to use than traditional E-Fit systems. It was also more accurate: the final image caused the witness visible distress when displayed.

The report then goes on to say that the researchers have since made a number of improvements to their system which has yielded a 6-fold increase in the performance.

It should be pointed out that one of the first GA-based "photofit" tool is the Faceprints system of Caldwell and Johnston's, which dates back almost 15 years.

Many regard the prospecting function of the market campaign development cycle to be more of an art than an exact science.

Officials at Newburyport, Massachusetts-based Genalytics, which specializes in risk assessment software for credit card issuers, insurers and mortgage underwriters, claim the Prospect modeling system will take much to the guesswork out of the equation.

Given data of the Gold-Rush era, I wonder if the prospecting tool would suggest: "Go West, young man, Go West" :)

Probabilistic model-building algorithms (PMBGAs) replace traditional crossover and mutation operators by building and sampling a probabilistic model of promising solutions and thus enables the use of machine learning techniques for automatic discovery of problem regularities and exploitation of these regularities for effective exploration of the search space.

The tutorial Probabilistic Model-Building GAs will provide a gentle introduction to PMBGAs with an overview of major research directions in this area. Strengths and weaknesses of different PMBGAs will be discussed and suggestions will be provided to help practitioners to choose the best PMBGA for their problem.

Martin's tutorial was one of the most popular tutorials last GECCO and promises to be equally successful this time around as well. I would highly recommend IlliGAL Blogging readers to attend the tutorial and get hands on information from one of the leading experts in EDAs.

Friday, March 18, 2005

An economy of Powerpoint?

I was reading the blogs on blogs on blogs regarding Atkinson and Mayer's work on reducing ppt overload (see here). My first reaction was that complaining about PowerPoint is like the complaining about lecturing that goes on by adherents of "active learning" methods, whereby the old-fashioned, outmoded, boring, tedious lecture must go the way of the Dodo bird or Western civilization will come to an end.

I usually find such arguments tedious. Lecturing goes on (and on and on) because it is an economical way for a single person to deliver a lot of stuff to a bunch of people. Active learning is only done in small measures in performance critical roles, because it is expensive (see this argument). In other words, active learning techniques are not used more often, because the marginal benefit of the activity (over lecturing ) does not usually exceed its marginal cost.

With this as background, I thought the Atkinson and Mayer's work was going to call for some complete overhaul of ppt and presentations. After all, we use Powerpoint because it is convenient and conveys a bunch of stuff fairly quickly; however, a closer read of the article gives a number of sensible suggestions for

Clear headlines

Digestible bites

Offslide elaboration

Visuals & words

Removal of unneeded elements

In many cases, these steps can be taken without burdening the preparation or presentation process a bit. Elaborate visuals (number 4) can add marginally to the preparation time, but a well chosen schematic, block diagram, or picture can often benefit a presentation enough to make the added prep time worthwhile.

So I retract my first impression, but it is important to understand that Powerpoint is used as often as it is, because it is an economic alternative to other modes of presentation. Improving presentations is a laudable goal, but efficiency dictates that we balance the economy of Powerpoint in so doing.

Wednesday, March 16, 2005

Genetic algorithm utility library (GAUL)

Though maybe you've known this, for many of you who are looking for a customizable GA library, here's another choice. GAUL is an open source programming library, released under the GNU general public license (GPL). It comes with a bunch of different search algorithms, including hill climbing, tabu search, simulated annealing, simplex search, steepest ascents. So you can use it to compare the performance between different algorithms or try some hybridization.

Tuesday, March 15, 2005

Cool Hunting is reporting that Will Wright, creator of The Sims is planning a new PC game called Spore:

"You start off as this insignificant bit of bacteria and you grow and evolve through advantageous mutation that the user determines through an engine that Wright has designed that has different parts (depending on stage of development) that you can add on and manipulate, like size for example. As the slide states, you go from being bacteria to a galactic god."

GAs part of BCS Grand Challenge

An article in Computer Weekly reports the results of the British Computer Society Grand Challenge conference, and bio-inspired computing, including genetic algorithms, figures prominently under the category Non-Classical Computation.

But quantum computing is not the only area of interest to non-classical computing researchers. Biological systems will provide it with much inspiration, says the BCS report, because living organisms have much to teach us about non-sequential, autonomous processing (consider, for example, how individual cells know what they should be doing without any central control).

Genetic algorithms and neurology will be an important part of this challenge, as will artificial immune systems. The Royal Mail has already trialed the latter as a means of automatically detecting fraud at its branches.

These systems require a new type of training and worker:

We are looking at new degrees where it is not just computer science-inspired biology or biology-inspired computing, it is a new type of person who understands how to build complex systems. To do that, they have to be both a computer scientist and a biologist

The grand challenge in this area is called in vivo-in silico (iViS), the creation of life on a computer.

Heuristics vs. algorithms: A harmful distinction

On page one of many computer science theory texts, the author makes the distinction between algorithms--procedures with formal proofs of convergence--and heuristics--those without. A discussion elsewhere picks up the theme in an otherwise interesting post:

It's the human input that seems to contain all the "knowledge" an AI system has - something the designers and programmers already knew, and are trying to use the computer's speed and memory to use the concepts well. We're just not there yet with "self-learning" systems. Of course, there are randomness-based techniques to "learn" things (like genetic algorithms and genetic programming) that seem to fly in the face of all of this, but they're really just certain types of heuristics.

The author seeks a more human-like AI and suggests that GA and GP are qualitatively close in some sense, but then toward the end of the quotation we see how he is brought up short by the algorithm-heuristic categorization of his CS theory prof. Saying that GAs and GP are" really just certain types of heuristics" suggests that they are inferior to full fledged procedures accompanied by proof.

Elsewhere I have blogged on this topic. My main point was and is that in the realm of material machines (airplanes, toasters, automobiles), no such distinction is made, because proofs of convergence do not exist for the mass of things we use in our day to day lives. That is not to say that we don't understand the principles of operation of toasters, airplanes, automobiles and the like. We do, and we study the physics of different facets of their operation more closely as we need to improve their function (see here and here).

Let's bury the heuristic-algorithm distinction, or at the very least, let's acknowledge that the heuristic-algorithm axis is a continuum of mathematical understanding. Heuristics of differing stripes can function quite well, thank you very much, and many of them are backed with a good deal of mathematical understanding if not mathematical proof. Continuing to preface the term "heuristic" with the terms "merely," "just a," "only a," and the like is harmful, especially when it prevents us from grabbing the procedure we need to get the job done.

In-Kernel genetic library version 0.2 released

There were numerous changes from the first release, but the major change in this version is the introduction of phenotypes. A phenotype is a set of genes the affect an observable property. In genetic-library terms, it is a set of genes that will affect a particular fitness measurement. Each phenotype will have a set of children that contain genes that affect a fitness measure.

Now multiple fitness routines can be ran for each genetic library user. Then depending on the results of a particular fitness measure, the specific genes that directly affect that fitness measure can be modified. This introduces a finer granularity that was missing in the first release of the genetic-library.

More details regarding the project and the downloads are available here.

Wednesday, March 09, 2005

DISCUS & the 4-quad chart

The DISCUS (distributed innovation and scalable collaboration in uncertain settings) project has been the subject of experimentation and a series of posts over the last few weeks, but IlliGAL Blogging has been somewhat remiss by not discussing the theoretical and computational underpinnings of that project more fully. The project dates to some work with Alex Kosorukoff (see Alex's Free Knowledge Exchange work that goes back to 1997) published at the 2002 GECCO conference (see paper here).

Who creates and who chooses. A 4-quad chart clarifies different types of GA.

The universe of concern is reasonably fairly but simple captured in the diagram above. The y-axis considers whether a human or computer is the innovating (recombinative in the diagram) agent, and the x-axis considers whether a human or computer is the choosing (selection) agent

This decomposition of the problem yields 4 quadrants worthy of our attention. For example, C-C (computer as innovator and chooser) is a regular run-of-the-mill GA with computational fitness function and genetic operators. The H-C quadrant (human chooser, computerized innovator) is an interactive GA. DISCUS is a hybrid of all four quads arranged systematically to organize the thoughts of a core innovation team (of humans), computational agents, and human stakeholders affected by and affecting a design effort. More DISCUS publications and presentations are available here.

GECCO snags Endy as keynoter

MIT's Drew Endy will be giving one of GECCO's keynote talks on Engineering Biological Systems. Here's the abstract.

Biology is going through a fundamental transition – from preexisting, natural, and evolving systems, to synthetic, engineered, and disposable systems. Here, I will discuss (i) our ‘refactoring’ of a natural biological system, bacteriophage T7, (ii) the adaptation and application of three past lessons – standardization, abstraction, and decoupling – that seem relevant to the engineering of biology today, (iii) how solving the problems of error detection and correction in reproducing machines might lead to interesting compromises in system architecture, and (iv) some of the social, political, and risk opportunities and pitfalls worth considering as we begin to systematically engineer the living world.

Why do academics blog?

Here's a 16 March 2004 post from Crooked Timber, a multi-person academic blog, on Academics and Blogging. By analogy to Jaume Barcadit's post on Is Blogging Journalism? the Crooked Timber post might be viewed as asking Is blogging scholarship? In actuality, the post specifically asked for and received comments on the following questions:

If you’re an academic who blogs, what prompted you to start blogging?

And what keeps you going? What do you try to do in your blog?

Does your blog have any relationship to your scholarship?

If you’re an academic who just reads blogs, do you intend to start your own blog sometime?

If yes, what are the reasons that you haven’t done so at this point in time?

If no, why not? Either way, what do you get from reading blogs?

The responses are diverse, interesting, and well worth reading. One comment said that academic blogging could be viewed as "pre-scholarship," that blogging can be a research organizer and filing system as well as inspiration for future talks and writing. Another comment resonated strongly with my recent experiences with blogging, discourse, and social networking:

Academic blogs make me feel like I’m not alone in this enterprise.

Being an academic can be isolating. Working in a room, on some obscure topic, with mainly graduate students to talk to, blogging offers a way to have an interchange of ideas with a self-selecting group of people who find your writing interesting, challenging, or simply maddening. With its emergent topology of interconnection, serendipitous collection of correspondents, and adaptive fitness function of attention, blogging helps return academic process to its discursive ideal.

None of this addresses our earlier attention to the dearth of bloggers among academic leaders, but it is clear that, at the very least, a good and growing group of academics in the trenches are making peace with blogging, even finding solace in it.

Darwinia: GA as cultural artifact

Genetic algorithms have been around long enough and have been influential enough that they are weaving their way into our culture. One example is discuseed in a post at Self Deprecating Journalism reviews a game called Darwinia that does not contain genetic algorithms. Rather the story line of the game is partially inspired by the existence of genetic algorithms:

Darwinia opens with an amazing sequence that immersed me right into the feel of the game. Dr Sepulveda welcomes you to Darwinia and explains that a nasty virus has infected this virtual world and is killing his life's work, the Darwinians. The supposed result of genetic algorithms.

It is your job to create units, and retake Darwinia one area at a time (tis a ye-olde-strategy game). The graphics are nice, the sounds are original (inspired by 80's gaming soundtracks) and the interface is workable. I say workable because to create units you must use a simple mouse gesture system.

Many, if not most, of the Japanese card games/cartoons that are now popular in boydom around the planet, from Pikachu to Yu-Gi-Oh, draw inspiration from artificial evolution, and this is not surprising given that genetic algorithms were a fairly popular subject on TV and in the newspapers Japan during the 1990s. Michael Crichton's bestseller Prey is another example of GA-as-cultural-artifact.

What other example can readers of this blog come up with? Why not take a moment and post a comment on your favorite GA cultural artifact?

I'll return to corporate vs. academic blogging later, but I hope my notion of a model is big enough to capture what the Slurrier has in mind. Let me draw a picture:

The modeling spectrum goes from unarticulated knowledge to equations of motion.

On the right we have the usual sort of post-Baconian model, an equation of motion such as Newton's second law or a Markov chain, something with some red mathematical meat on it. What we 21st century modeling brats take for granted as a "model." On the left, we have a thought, a feeling, an intuition, a knowing that we have some trouble talking about. Polanyi called this sort of thing tacit knowledge, and Sowell called it unarticulated knowledge, and the error-cost plane applies to the whole lot.

I'm particularly interested in the modeling middle, especially the transition point when we move from articulated qualitative models (verbal or graphical representations) and we cross over to those models that are barely quantitative (simple measurements and dimensional reasoning), and I'll talk more about those in other posts, but I wanted to clarify that the project of The Design of Innovation is deceptively larger than might be inferred from the term "model."

Evolver used in portfolio management

E&R Blog has a short post on the comparative value of Evolver as a financial portfolio management tool. Evolver is a genetic algorithm written as an Excel add-in that allows the user to construct and optimize a fitness function within a spreadsheet.

Monday, March 07, 2005

The academy vs. blogging?

Reader Eric Anderson has an interesting take on the dearth of academic bloggers (see earlier post and his comment here):

Mr Tozier links to a Left2Right post describing emergent intelligence in the human academic ecosystem. (See also Susanne Lohmann's forthcoming book on the American university, How Universities Think, ) Really, blogs only stand to make that ecosystem more efficient with wider, faster distribution, wherein readers more easily find better niche content.

Oooh, that's an interesting thought, and Anderson continues by suggesting that academics not only will not embrace the blogosphere, they will fight it, because it challenges their presumed intellectual authority. Put another way, the suggestion is that academics might fight the blogosphere in the same way and for some of the same reasons the mainstream media has. Silly academic ideas and silly research can not stand the scrutiny of open source fact checking and smell testing any more than Dan Rather or Trent Lott could.

This is an interesting line of reasoning, and I agree that some academics will eschew pajamahadeen (pajamahadean, perhaps?) status for fear of unwanted attention; however, just as some MSM journalists started blogs and otherwise paid attention to the blogosphere, some academicians will embrace the blogosphere and face the relentless scrutiny of an open source world. They, their research, and and the state of knowledge will probably be better off for it.

GAs, baseball & the Hall of Fame

AndrewKoch.com has a terrific post about a paper by one David Cohen (I have a cousin named David Cohen, but he doesn't do GAs) that uses genetic algorithms to induce a rule that helps determine whether or not a player is elected into the Baseball Hall of Fame.

Apparently the rule is as follows:

ifGames >2794 and AB >3967 and Runs >1145 and Hits >1297 and TotalBases >4432 and Doubles >252 and Triples >14 and HR >355 and RBI >325 and StolenBases >341 and BB >658 and HBP >9 and Strikeouts <2087>9 and Psbb >1 and Psstrikeouts <42>0.269 and PSOBP >0.010thenits Cooperstown Time

Read the whole paper here. Now, if only someone would use a GA to determine what it takes to get tenure at a major research university.

Models live in the error-cost plane

In The Design of Innovation, I spend a fair amount of time talking about the economy of modeling, arguing that models live in an error-cost plane:

When scientists or mathematicians build models, they are largely interested in developing more accurate models of the world whatever the costs of so doing. When engineers or inventors use models, they are generally interested in designing better gizmos. As a result, an implicit cost-benefit calculation comes into play.

Specifically engineers tend to weigh the marginal costs of modeling against the the marginal benefits of the modeling to the technology being developed. As a result, engineers generally build and use models that are less sophisticated than the most sophisticated models available, but that choice is guided, not by simple-mindedness, but by a desire to be efficient in the race to build better technology.

Economical thinking or modeling in this wayhas been extraordinarily helpful in moving quickly to build little models that maximally advance the state of genetic algorithm art. The approach may not be as elegant as some, but it does give a terrific amount of insight for the amount of effort expended (see my 1 Feb post here on a similar subject).

Inheritance pattern of death

While eating my frosted mini wheats this morning, I browsed through a 1982 volume of the JIR and happened upon an amusing article describing a forgotten genetic breakthrough of the 1980s. J. Eastern, C. Drucker, and JE Wolf from the Department of Unclear Medicine and Biosciences (DUMB) at Baylor College of Medicine stumbled upon a classic Mendelian autosomal recessive pattern for the phenotype of death. The researchers admit to no immediate solution in their demonstration of such a genetically-controlled disease, but do offer two practical suggestions:

- screening families and ancestry of a potential spouse for any incidence of death - ensuring that the spouse has not already expressed the gene

As fitting to the field of genetics, their discovery came about as a cross product of their original goals with which they were occupied at the time, and in a similar fashion to how Mendel is speculated to have conceived his first experiments (out of insufferable boredom). David Goldberg expounds upon the cross-fertilization of innovation in DOI. Although, perhaps the idea of an inheritance pattern of death should have never left the chalkboard!

Sunday, March 06, 2005

DISCUS series make blog & focus-group history

Over the last few weeks Xavier Llora has been posting on an ongoing experiment in consumer marketing using the DISCUS (Distributed Innovation and Scalable Collaboration in Uncertain Settings) system (see posts and pictures here, here, here, here, here, here, here, here and here). The DISCUS project is a collaborative system that uses genetic algorithms and chance discovery (see my 8 Feb post here on chance discovery) to bring together individuals in disparate locations with differing abilities to articulate a problem or problem solution.

The series is noteworthy from a blogging perspective, because it represents the first time IlliGAL Blogging has covered ongoing lab research in real time.

The experiments were noteworthy, because they helped demonstrate the effectiveness of DISCUS technology in augmenting marketing focus groups. As we analyze the data, tech reports and papers will be written and published, but we already know that the results were important enough to keep working toward testing in the marketplace.

King comments on academic blogging

I think the main reason we aren't seeing more academic blogs is cultural. In academic settings people are rewarded for deep and thorough analysis, and academic writings tend to be scrutinized by their peers. Because of this, academics tend to be very careful about what they "publish".

This has the ring of truth. Many academics are extraordinarily conservative, and working on an edgy topic such as genetic algorithms for the better part of 25 years has been liberating, and a load of fun. But occasionally I forget the rules of the academic game. Thanks, Steve, for reminding me what planet I'm on, and for the sake of my reputation, I should probably shut IlliGAL Blogging down. Naaaaaahhhhh!! Keep on bloggin' momma, bloggin' your blues away.

Link like crazy. One thing that distinguishes blog posts from dead-tree journalism is that bloggers link prodigiously.Link to any other blog or Web site you mention. Link to articles, books, products, bios, explanatory materials on other sites and anything you mention in your blog. Always link to information that clarifies or gives background on information and opinions in your post.

Write less. Omit all unnecessary words. The best advice I ever got about writing was from my first boss, the late "press agent" Leo Miller, who taught me a game to play with sentences. He'd keep taking out words until removing one more word destroyed the meaning of the sentence. For example: He'd take out words until removing another destroyed the sentence meaning. Aim at keeping your posts at about 250 words.

Write good headlines. Most people use a news feeders like News Gator to scan blog headlines. They decide after seeing the headline to click into the post. Tell as much of the story as you can in the headline.

Additionally, the article has useful hints about layout, editing, and writing blog comments. Go read the whole thing. It's well worth the short time it takes to register.

Corporate vs. academic blogging

Steve King's post at IFTF Future Now on IBM's internal use of blogs reminds me how fascinated I've become with organizational blogging, and the contrast between the rapid rise in corporate blogging and the relative dearth of academic blogging. When even fairly stodgy corporations like GM are blogging, its pretty clear that corporations are way ahead of their academic counterparts in the blogging business.

MANE IT Network has a post on academic blogging that links to a nice article (registration required) in the Guardian that lays out the landscape of academic blogging. At this point, it appears largely to be a grassroots affair by faculty and students. The Guardian article cites a number of uses of academic blog, none of them administrative:

Blogging to organize research

Blogging as knowledge management

Blogging as social networking

Blogging as dissemination tool

Blogging as teaching tool

Blogging as device to increase research accountability

Why aren't more academic leaders blogging their brains out? Blogs are bully pulpits for expressing a point of view, for highlighting your work or that of your organization, If I were the head of an academic unit (a dean, a department head, a president) I'd be blogging til the cows (or donors) come home. Maybe this says less about blogging as an idea and more about the vision and creativity of those who lead our academic institutions.

Many of the emerging rules and guidelines established for corporate bloggers, are appropriate for blogging within and about academic institutions. An April 2004 MarketingProfs article lists and elaborates on 10 Rules for Corporate Blogs and Wikishere (signup required):

Be authentic

Be an unmatched resource

Once you start, don’t stop

Keep it relevant

Measure your effectiveness

Monitor other blogs

Trust your employees

Use blogs for knowledge management

Use wikis for employee and customer collaboration

Develop an organizational content strategy now

Sounds good to me. Now let's see if all you profs, deans, department heads, and university presidents can keep up with Bob Lutz.

A number of the tutorials are being given by IlliGAL Blogging bloggers (Butz, Rothlauf, Sastry, and Takagi). Many conferences charge hundreds of dollars more for tutorial registration, but GECCO tutorials (and workshops) are included in the price of admission. If you're new to the genetic algorithms game or if you're an old hand wanting to brush up on the latest tricks, techniques, and applications, GECCO has tutorials to get you up to speed. More information is available on the tutorial page here.

Friday, March 04, 2005

Clean the table my friend!

It is over. We made it. This morning we conducted the last DISCUS session. When we were walking for lunch after it I was talking to Yuichi. The whole week, the Chance Discovery Consortium researchers have been extremely excited about using DISCUS in real-world focus groups. Talking to him while walking down Green Street toward the Illini Union Ballroom I suddenly realized why.

The six DISCUS sessions covered a total of eight different focus groups in three days. In a conventional setting without DISCUS assistance, it takes---quoting Yuichi’s words---three days to complete one focus group. Stepping in the Union the reasons of their excitement became crystal clear to me :)

We wrapped an exciting week sitting at the Expresso Royale Café of Urbana. Sitting around coffees and chais we reviewed the main results of the experiment. But looking into the past was not all we did. We started already exploring the new exiting possibilities of improvement and innovation for DISCUS. Today is not an end. Today is the beginning of a whole new reborn DISCUS. A DISCUS that has celebrated its second anniversary proving that the vision could become a reality, opening the door to new and even more exciting things to come.

However, there was only one thing left to do today. Get all together and celebrate that we made it! What could be better than to have a nice Thai dinner and to relax playing pool. So, let’s clean the table my friend, let’s clean the table all together.

The milestone of the day: We wrapped up. Our colleagues have already in their hands all the data and paper work done. We wish them a pleasant flight back. I guess I have already said this several times these days, but this was no one-man job. Chen-Ju, Abhimanyu, Mohit, and Davina you did a great job. I am proud of you and your work. Nothing would have been the same without you unconditional effort to get this strait. By the way, Yukio, Davina, we missed you tonight.

Thursday, March 03, 2005

Last lap, tomorrow

It is official now. We have completed five out of the six DISCUS sessions planned. Today we had again an interesting volume of participants coming in. The morning was quite relaxed, only four participants. However, in the afternoon we have ten, all joining a common discussion. In less than 30 minutes, the afternoon participants generate more than 120 communications that needed to be analyzed by Chance Discovery Consortium researchers. The photo below shows how DISCUS assisted them in the process of new scenario creation.

Now that we have only one more session to go tomorrow morning, we need to admit that we are happy with the new input and possibilities unveiling in front of us. This experiment is becoming a main milestone for DISCUS. Besides the usefulness of the discussions of the participants for our colleagues in marketing research, we are also collecting valuable feedback about the usability, interpretability, and potential new ways to support innovation and creativity.

The milestone of the day: We tested a new approach to the creation of focus groups. DISCUS was able to support large heterogeneous groups with no special arrangements.

GAs, DNA & bioinformatics

GANN is a machine learning method designed with the complexities of transcriptional regulation in mind. The key principle is that regulatory regions are composed of features such as consensus strings, characterized binding sites, and DNA structural properties. GANN identifies these features in a set of sequences, and then identifies combinations of features that can differentiate between the positive set (sequences with known or putative regulatory function) and the negative set (sequences with no regulatory function). Once these features have been identified, they can be used to classify new sequences of unknown function.

Artificial Neural Networks are used for pattern detection, because they can model complex interactions between input variables (i.e., the features). This can be potentially very important if the positive set contains different types of regulatory regions that must all be classified.

The number of sequence encodings that can be generated is practically infinite, and even a reasonable number (a few hundred) are too much to present to the neural network at once. The Outer Genetic Algorithm (OGA) was designed to test different subsets from the pool of available representations, and generate new subsets using evolutionary operations.

Biological applications of genetic algorithms are deliciously circular (procedures inspired by nature used to understand natural procedures), and the BioGEC workshop at GECCO is a good place to meet key players and learn about recent work.

Wednesday, March 02, 2005

Solid rocket booster

Since DISCUS inception we envisioned how, with the proper innovation infrastructure, on-line communication could be an excellent medium for creativity- and innovation-based activities. Providing real-time feedback to the discussion participants about the current topics and possible chances is revealing as key element to increase the productivity of creative sessions. Moreover, providing the bare bones building blocks of an scenario in a communication and the relation among them is helping participants to rapidly cross-fertilize scenarios and generate new innovative ones.

Today we had a first hand validating result using common users (mostly UIUC students) in a real-world scenario (marketing scenarios for cell phones). The previous photos show one of the biggest experiments conducted this week using DISCUS. Fourteen students used only DISCUS to communicate among them in the focus groups for cell phone usage focus groups. In less than 30 minutes, more than 100 messages were posted and analyze. Right now, we are collecting a large volume of information about how the users interacted among them using DISCUS. We are really thrilled about the possibilities that DISCUS and this week of experiment are opening.

The milestone of the day: Chance Discovery Consortium and DISCUS researchers analyzing and creating new cell phone scenarios in the largest session held till now using DISCUS. Yes, the picture below does not lie; Nao is back in town for a few days :)

GE alum wins Oscar

The Illinois Genetic Algorithms Laboratory (IlliGAL) is located in the Department of General Engineering (GE) of the University of Illinois at Urbana-Champaign (UIUC). 1985 GE alum Chris Landreth received an Oscar on Sunday for his digitally animated short Ryan. Landreth's earlier film The Endreceived an Oscar nomination in 1995.

GAs in search personalization?

A vague post at Search Engine News reports that genetic algorithms are now being used in search personalization:

Future? "personalization. It’s misunderstood, personalization. It’s not giving you a search just for you. Its about returning results for your peer group. They can start to tailor the search specifically to you. There is data now using genetic algorithms and others set that are using these to create search engines. Mike concludes the more information we give the search engines, the better our experience will be."

Tuesday, March 01, 2005

Rock & Roll

Yes! We survived to the first day of focus groups using DISCUS. The summary: everything went smooth. No software problems. No logistics problems with the students. The day started getting together again with our Chance Discovery Consortium colleagues. After an intense final discussion we came out with the final step by step procedure. Here it is a penny summary in a step by step list: (1) welcome, objectives, and instructions, (2) computer-mediated discussion using DISCUS, (3) break, (4) computer-mediated discussion using DISCUS, (5) break, (6) evaluation of feasible scenarios, and (7) fill a feedback questionnaire.

The photo shown above was taken while participants were instructed on how to proceed during the experiment. Researchers and participants gathered together to minimize uncontrolled elements during the experiment. For instance, participants were told not to communicate to each other using any other channel than the computer-mediated DISCUS. Tomorrow we are going to have two rounds of experiments involving 20 participants in total (instead of the 5 involved today).

The milestone of the day: Abhimanyu Gupta and Mohit Jolly---below---polishing DISCUS to get it bright and shiny, and most important, purring like a kitty.

Understanding the evolution of biological sequences is one of thefundamental problems in biology. The goal is to construct anevolutionary tree or phylogeny whose leaves are the given sequences andwhose internal nodes represent ancestral species. A variety ofmathematical models have been proposed for this purpose. We consider asimple but widely-used Markov model where sequences evolve throughrandom mutation. The objective is to find the evolutionary tree thatoffers the most probable explanation for the input data. A relatedproblem is ancestral reconstruction, where the tree is given and themost likely ancestors must be inferred.

Implicit in Markov models is the notion of evolutionary distance:Intuitively, the larger the distance between two sequences, the largerthe mutation probabilities. These parameters, which are non-linearfunctions of distance, have a strong influence on the solution tophylogeny construction problems. We present an algorithm that computesthe optimum solution to the ancestral reconstruction problem for allpossible values of the evolutionary distance. The algorithm incurs onlya slight overhead relative to the effort needed to compute the answerfor any fixed value. Sensitivity analysis of this sort allows theexploration of a range of evolutionary hypotheses; enabling one toidentify possibilities that are overlooked when only fixed distances areconsidered. The procedure also helps in assessing the robustness of theinferences made from the model. Our algorithm relies on the geometricproperties of the parameter space decomposition induced by a linearversion of the problem and uses divide-and-conquer. We show thatsimilar techniques yield a fast procedure for a pair-wise sequencecomparison problem where the goal is to identify contiguous regions ofhigh similarity.

This is joint work with Balaji Venkatachalam and was partially supportedby National Science Foundation grants CCR-9988348 and EF-0334832.

Bio:David Fernández-Baca is professor of Computer Science at Iowa StateUniversity. His research interests lie in sensitivity analysis incombinatorial optimization, the construction of evolutionary trees, andthe problems arising at the boundaries of these two areas. He is partof the Phylota project (www.phylota.org), whose aim is to develop newmethods and software tools to help construct the genealogical "tree oflife" of all biological species. Phylota is funded by the NationalScience Foundation’s ATOL (Assembling the Tree of Life) Program.

Dr. Fernández-Baca received a Bachelor's degree in Computer Engineeringfrom the National University of Mexico in 1980, and a Master's degree inComputer Engineering and a PhD in Computer Science in 1983 and 1986,respectively.

Seeing chemical Daylight and GAs

Daylight Chemical Information Systems of Mission Viejo, CA has a sophisticated toolkit for simulating and storing chemicals. According to the company web page

The Daylight Toolkit enables companies to build applications to add a broad range of cheminformatics capabilities to their environment. These tools empower our customers to easily assemble customized systems which give them total control over corporate chemistry.

Although the basic Toolkit does not apparently contain genetic algorithms, a search of the web site reveals a number of user group papers using GAs (type "genetic algorithms" into the search box). An article on www.nature.com (subscription required) discusses use of Daylight and GAs to infer molecule structure from empirical data.