Since graduating from Princeton in 2004, Josh Tauberer has led a double
life. By day, he's a mild-mannered graduate student in linguistics at
the University of Pennsylvania. By night, he commands a legion of computer
programs, trolling the Internet for data about congressional bills and
republishing the information on GovTrack.us, a popular Web site for bloggers,
policy wonks, and concerned voters.

Some 10,000 visitors view GovTrack each day — more when a hot
bill is up for debate — and its freely available databases feed
a handful of government watchdog sites, including OpenCongress.org, a
portal of congressional news; and MAPLight.org, which tracks the votes
of members of Congress in parallel with the contributions they receive
from special-interest groups. At the center of this web of information
is Tauberer, GovTrack's sole employee, who works from a slightly cluttered
desktop in his Philadelphia apartment.

He's just one citizen, doing his part for democracy.

"You could put it that way," Tauberer says, stifling a laugh,
"but ... I happen to enjoy it. It's not like I get up in the morning
and [say], 'Oh, I've got to save the world by making this site.'"

Indeed, when Tauberer began organizing his site as an undergraduate,
few thought that there was any need for it. The Library of Congress had
been publishing congressional bills on its THOMAS.loc.gov site since 1995.
But Tauberer found THOMAS difficult to navigate and filled with cumbersome
quirks. So, with hopes of building a better source for legislative data,
Tauberer, a largely self-taught computer programmer, began creating "screen-scraping"
programs that look for specific patterns on Web pages, copy the information
they find, and store it in a database. Technically, screen-scraping is
not very difficult, he says, but it can be a hassle to decipher page formats
and sort through data that may be incomplete, inconsistent, or unreliable.
And when a source Web site is redesigned, the screen-scrapers need to
be retooled as well. ("Fortunately, the government doesn't change
anything — ever," Tauberer jokes.)

Perseverance paid off for Tauberer when he launched GovTrack in September
2004, more than three years after he first envisioned the site. Users
began to take notice later that year after Tauberer was awarded the top
prize in a Web development contest run by Technorati.com — the citation
called GovTrack "School House Rock on steroids" — and
a January 2005 New York Times story about the site provided an
additional boost. Today, when Web searchers type a congressional bill
number into Google, more often than not the top result is a URL that begins
with "www.govtrack.us." Other GovTrack-supported sites are close
behind.

"GovTrack is really the central hub in federal legislative information,"
says John Wonderlich, director of the Sunlight Foundation's Open House
Project, which lobbies for better Web access to legislative data. "It's
the clearinghouse for data coming from the Library of Congress, and that's
kind of amazing that [Tauberer] has managed to do that on his own."

While Tauberer's hope was to improve government accountability by making
it easier to access and digest the details of legislation, he is the first
to admit that "information only gets you so far." Footnotes,
references, and amendments to amendments to amendments can make bills
nearly indecipherable, even to well-informed readers. So, in addition
to publishing the full text, status, and Library of Congress summary for
each bill introduced on Capitol Hill, GovTrack provides other useful tools:
e-mail alerts linked to specific bills, members of Congress, committees,
or topics of interest; detailed maps of congressional districts, created
by Tauberer using census data and Google maps; graphs that illustrate
votes on a particular bill; and a blog of legislative analysis, written
mainly by unpaid contributors.

Each senator and representative also has a GovTrack page that includes
the member's voting history, links to bills he or she has sponsored, and
a graphic that shows the member's standing on GovTrack's "Ideometer,"
an ideological spectrum that Tauberer created using a statistical analysis
of bill sponsorship patterns. John McCain, for example, pushes the Ideometer's
needle to the right, about a third of the way toward the Republican end
of the spectrum, while Barack Obama is positioned to the left, about two-thirds
of the way toward the Democratic end. Both are labeled "rank-and-file,"
which means they fall within the middle 50 percent of their respective
parties. Sen. Barbara Boxer (D-Calif.) occupies the far left pole, and
Sen. Jim DeMint (R-S.C.) stands on the far right.

Tauberer would like to add more analytical features like the Ideometer,
but he concedes there are limitations to his skills. While he's a whiz
with databases, he lacks the design expertise needed to generate the slick
infographics that newspapers and magazines create. And then there's the
simple arithmetic of time. After four years of graduate school, Tauberer
is drafting a proposal for his dissertation in linguistics, which he hopes
to complete in the coming year.

Working on a Web site that promotes government transparency has been
a significant departure from Tauberer's academic work, which deals with
phonetics and how children acquire language skills. He majored in psychology
at Princeton while pursuing a certificate in computer science, and it
was his interest in technology — not politics — that drove
the development of GovTrack.

In the spring of 2001, Tauberer was a student in computer science professor
Andrew Appel '81's freshman seminar, "Speech Is a Machine,"
which addressed tech-related topics like copyright in the Internet age
and whether computer programs qualify as "speech." Tauberer
first encountered THOMAS while studying a recently passed bill that restricted
fair-use rights. He began thinking of ways to make the site's vast wells
of information more accessible. One year later, he devised a rough system
for GovTrack, and in his senior year, when most of his classmates were
immersed in thesis research, Tauberer laid the framework for his site,
which he would finish in the summer after graduation.

Classmate David Robinson '04, now the associate director of Princeton's
Center for Information Technology Policy, says that Tauberer showed the
same sort of dedication and ingenuity as an editor for The Daily Princetonian.
In 2003, Tauberer designed a survey methodology for conducting student-opinion
polls, using a random list of student phone numbers and a customized,
secure Web site in which pollsters could enter the data they collected.

Robinson says Tauberer was "sublimely confident" that GovTrack
would find an audience. (Tauberer calls it "naïveté.")
And as one who urges the government to share its data in more user-friendly
formats, Tauberer, who hopes to work as an advocate for better access
to government data after completing his Ph.D., has stayed true to those
principles, providing free access to his own databases. Advertising on
GovTrack pays for operating expenses like server space and provides a
modest profit.

The openness that Tauberer sought to expand with his own site is becoming
a major focus among scholars at Princeton and other institutions who are
studying technology, says Robinson. "In general, we're looking at
all the ways that digital technology and public life interact with one
another, and it's becoming clear that transparency is one of the main
ways that digital technology and public policy interact," he notes.
Robinson, Professor Ed Felten, and graduate students Harlan Yu and William
Zeller recently wrote a paper, "Government Data and the Invisible
Hand," outlining a novel strategy for more transparency: Reduce the
federal role in presenting data on the government's own Web sites, such
as THOMAS, but step up government provision of reliable, raw data that
nonprofit and commercial groups can use on their sites.

Wonderlich, of the Sunlight Foundation, says that most issues in government
openness still are unresolved. Some are technical, such as standardizing
the format in which data are released. Other issues involve making information
easier to obtain. In pre-Internet days, items of public record were acceptably
relegated to file drawers and dusty bookshelves. Today, Web users have
grown to expect instant access. "People see that the Internet is
making it easier to shop and do a lot of other things," Wonderlich
says. "It intuitively makes sense to people that Congress should
operate in the same way."

On the campaign trail, both Barack Obama and Hillary Clinton spoke about
technology as a means for openness. Clinton, in a January Meet the
Press interview, called for more transparency and Web access to government
information, and Obama's technology plan, outlined on his campaign Web
site, vows to "mak[e] government data available online in universally
accessible formats to allow citizens to make use of that data to comment,
derive value, and take action in their own communities."

Last year, Tauberer dipped a toe into the political waters, contributing
to the Sunlight Foundation's Open House Project report, which drew endorsements
from a handful of representatives on both sides of the aisle, and writing
an op-ed piece for The Hill, a daily newspaper that covers Congress,
on improving government databases. But Tauberer's main interest is in
civics, not politics, he says, and GovTrack is nonpartisan. The site's
only official position is that the government should publish more data.

"Definitely, the information has to be out there and usable,"
Tauberer says. "I can only hope that it makes some sort of real difference."