This copy is for your personal, non-commercial use only. To order presentation-ready copies for distribution to your colleagues, clients or customers visit http://www.djreprints.com.

https://www.wsj.com/articles/SB1005962163718064080

Plugged In

Can Computerized Language Analysis Predict the
Market?

By

Bill Alpert

Updated Nov. 19, 2001 12:01 a.m. ET

Order Reprints

Print Article

Text size

You've heard the rumor that warnings of the Sept. 11 terror attack sat unexamined in a government inbox. True or not, such risks have kept Victor Lavrenko working for six years to vanquish information overload. In a computer lab at the University of Massachusetts at Amherst, the 24-year-old is trying to build software that could listen to the news. Lavrenko's systems would then signal the human analysts who work at the government agencies that underwrite his lab's research. "The government has the ability to receive every radio and television channel in the world," says the computer-science grad student. "But they don't have enough manpower to listen to all those channels."

Techniques for automated topic detection -- Lavrenko's specialty -- could find use in many domains. One system built by the UMass lab already helps the government track outbreaks of infectious disease around the world, by sifting through Internet messages. For a class project, Lavrenko decided to try his methods on the stock market.

With several classmates, Lavrenko built a system called AEnalyst. They fed the software several months' worth of news stories on 127 stocks, and had the computer find word patterns that preceded sharp moves in a stock price. Lavrenko tested the resulting computer models on a fresh batch of news stories. At a programming conference last November, Lavrenko reported on AEnalyst's trading success. The software eked out a slight, but consistent, profit -- on paper, of course.

Academics and traders have already sicced computers on stock-price histories and financial statements, hoping to drag out signals of future stock moves. Lavrenko can find no one who's used state-of-the-art language analysis in the hunt for investment signals. "This experiment was just a lower bound on what's possible," says Lavrenko. "It was not much better than random trading, but it was consistently better. You could combine it with other kinds of indicators that people use."

Before cooking up AEnalyst, the Russian-born Lavrenko's investment experience was limited to the trading of some shares of
Oracle
and
Sybase.
He recalls looking at a few news stories to learn about product releases or pending lawsuits. His computer studies apparently left him little time to notice the Chicago economists who said there's no advantage to picking stocks -- or to studying financial news.

When his need for part-time work landed him at the UMass Center for Intelligent Information Retrieval, Lavrenko developed expertise in modeling the word patterns of texts that discuss a topic -- such as terrorist attacks. Once trained on a topic, the computer models did a fairly good job of flagging new texts that concerned the same topic. Seeing these techniques applied to news reports on political topics, Lavrenko wondered if similar techniques might work with financial reports. He got a chance to satisfy his curiosity when a seminar required he come up with a programming project. He convinced several classmates to join in.

Lacking money to purchase data sets, the UMass team spent a couple of weeks building software robots that could crawl the Internet and gather up stock prices and news stories on 127 stocks. There's an amazing amount of free data on the Web, notes Lavrenko, but his teammates had to spend lots of time cleaning it up into a form their computers could use.

One of the team members then took the tick-by-tick stock prices and ran them through software that can identify meaningful trends in a data series. In a three-month span of trading in
Yahoo,
for instance, the analysis found 11 price moves steep enough to classify as surges or plunges.

Lavrenko then took a database of contemporaneous news stories on those 127 stocks and had his text-modeling software find word patterns that typically appeared before a price surge or plunge.

After establishing some sort of correlation between the news stories and the price trends, Lavrenko's team then tested AEnalyst on a fresh batch of stock prices and news stories. The system recommended buys and sells based on its reading of the news. Positions were closed out after a 1% gain, or an hour's time, whichever came first. On paper, the system averaged a gain of 0.23% per trade. By comparison, a course of random trades with the same holding rules produced a loss of about 0.1%.

Lavrenko and his classmates got an "A" on the project. After presenting the paper at a conference, Lavrenko got peppered with questions by programmers who worked at a couple of financial startups. While AEnalyst's signal detection performance was significant, Lavrenko points out that the trading rules were ridiculously dumb. More sophisticated trading -- and the combination of his news signals with financial signals -- might produce bigger gains.

"It might look like a complete experiment," says Lavrenko. "But it really posed more questions than it answered." The Web bots built for the project have continued to collect stock prices. But AEnalyst lies dormant. Lavrenko's got to help the government manage its information glut. He's also got to write a dissertation. (Meanwhile, you can check out AEnalyst at http://ciir.cs.umass.edu/~lavrenko/aenalyst.)

Marc Andreessen won't go away.

Most New Economy celebrities retired to the ski slopes when their stock-market bubbles burst. But the 30-year-old Andreessen stubbornly labors on at
LoudCloud,
a Website services firm he founded with other veterans of Net-scape.

As chairman of LoudCloud -- and as the guy who helped popularize Web browsers -- Andreessen can open a door or two for his Sunnyvale, California-based company. Who can refuse hearing Andreessen's appealingly rueful account of his ride on the dot.com roller coaster? The sometime centimillionaire grimaces with a sweet sourness when he admits: "None of us is either as smart or dumb as we think we are."

LoudCloud operates Websites for a collection of customers that include a dwindling number of dot.coms, and a growing number of more sturdy clients like
Blockbuster
and
Nike.
Luckily for Andreessen, LoudCloud squeaked out a $6-a-share initial public offering this past March, just as the new-issue window was slammng shut. The firm had a negative gross profit as large as its revenues in the July quarter. Its shares trade below 3.

When LoudCloud reports October quarter results on Monday, investors will look for a lower cash burn rate than the $33 million of the July quarter. With about $170 million cash on LoudCloud's July balance sheet, analysts like Morgan Stanley's Jeffrey Camp think Andreessen's firm will survive long enough to see the New Economy recover in 2003.

While LoudCloud tries to see its own way to break even, it has a clear view of the technologies customers choose for new Internet applications. For large companies, says Andreessen, today's safe choice for computers remains the UNIX-platforms of
Sun Microsystems.
On top of those Sun systems, customers are running software from
BEA Systems
and Oracle.

But every day,
Microsoft's
Windows software becomes a more viable alternative, says Andreessen. Microsoft offers its own alternatives to BEA and Oracle. And some folks are seriously experimenting with the
Linux
operating system that
IBM
has pushed as an alternative to both Sun and Microsoft.

With few customers starting new software projects in the current economy, Microsoft and IBM have time to make their offerings more competitive in anticipation of a recovery. "If the economic recovery were to start tomorrow," says Andreessen, "then Sun, BEA and Oracle would remain the default platforms. But if the recovery doesn't happen for three or four years, then all these new technologies can incubate and become stronger alternatives."

This copy is for your personal, non-commercial use only. Distribution and use of this material are governed by our Subscriber Agreement and by copyright law. For non-personal use or to order multiple copies, please contact Dow Jones Reprints at 1-800-843-0008 or visit www.djreprints.com.