2. Aha! That's it! That's my thing! Research as blogging

I've been struggling to figure out exactly how graduate school and my
courses fit into The Great Scheme of Things, and I think I've just
figured out exactly how to motivate myself. =)

I'm going to look at my reading course as a series of blogging
assignments. Because I'm in grad school, I have the time and resources
to dig through academic papers and books that most people won't even
hear about. Whee! I found my value-added niche!

SO. If I break my big deliverable down into lots and lots and lots of
small deliverables--a mini-paper each week, with plenty of references
and fun stuff--then it'll be much more fun _and_ much more useful.

I can't wait to wake up tomorrow and start writing! This Friday:
Blogging as personal knowledge management...

3. Google: Organizing the World's Information... and Loving It!

The Google recruitment talk was given by John Abd-El-Malek
([email protected], abdelMAHLik). Other engineers were also around for
the question and answer: Amit Agarwal, Tim James, Jon <nop>McAlister, Peter
Szulczewski, Joel Zacaharias. There were two women from HR whose names
I didn't catch.

Google's mission statement is to organize the world's information and
make it universally accessible and useful. The presentation covered
the following points:

<b>Large data set, simple structure</b>. Key insight: Google works
with large data sets with simple structure. For example, web page
repositories, query logs, status records from thousands of machines,
source code control and software build records, etc. These aren't
stored in SQL databases because they're too large for DBMSes
(terabytes of data!) and they don't need the full complexity of a
DBMS.
<b>Simple statistical analysis.</b> Often, analyses of data tend to be
simple. General statistical analysis often only requires computing
small number of statistics, then performing more complex operations
using only these statistics. For example, if we're trying to find the
most popular query, we don't need to check all the queries.
<b>Data as a sequence of records.</b> For commutative operations,
record order is irrelevant (example: addition). For associative
operations, aggregation order is irrelevant (example: finding the
maximum). This allows you to write parallel programs to take advantage
of Google's distributed computing power. For example, consider a week
of code submissions. This short program calculates the minute for one
entry and emits an instruction to add one to the record for that
minute. The emit statements are delivered to an aggregator, which then
combines the results into a graph. (As you can see, we do have weekends.)

(Demo followed by a totally awesome video of query traffic represented
as points of light on a map of the Earth.)

Harnessing the power of data

The conventional wisdom is that given an order of magnitude increase
in computational power, you can solve previously impractical problems.

Google's insight: <b>Given an order of magnitude increase in data, you
can solve previously unsolvable problems!</b>

It's not just about getting a more robust solution. Some methods that
appear to fail with limited data works with much larger data sets.

Consider spelling correction. The old way was to use a
lexicon/dictionary - 100k words. This allows you to suggest correction
words that have a short edit distance from unrecognized words. What's
the challenge? Proper names, which are rarely in lexicons. Example:
Kofi Annan.

The set of terms on web is much larger than standard lexicons and
changes regularly. People misspell queries, even popular ones such as
"britney spears". Dictionary-based spelling correction has problems
with context.

Solution? Use the web as a contextual lexicon. Find misspellings based
on contextual usage on web. Build a probabilistic model of term
spellings. Context is key.

You can also find interesting patterns in data. For example, here are
the most popular queries from the past few Januarys. (Points out
Superbowl, points out one year when Janet Jackson and "superbowl
halftime" topped the Google queries.)

Innovations

- Plenty of crazy hacks to make it work across browsers
- Mozilla/Safari/Opera don't support vector markup. Draw driving directions on server in a PNG image and overlay it
- IE does not support alpha transparency in PNGs. Use a little known ActiveX control that's enabled by default
- Safari and Opera don't suppot parsing XML strings, so we wrote an XML parser in <nop>JavaScript (no joke)

The benefit of DHTML: Simple API

- Putting map on page requires only two lines of <nop>JavaScript:
- Initially designed to integrate
- Developers figured this out before we published API

Goal: Provide automatic high-uality translations of text between
different languages Enables all text data on the web to be accessible
in any language no matter what the language of original text Approach:
statistical machine translation. Build a statistical model of
translation. Use decision theory to make optimal decisions.
Sentence-by-sentence level.

Pre-translated pairs of text to learn parameters of log-linear model.

Throw statistics at the problem!
BLEU% score: how closely machine translation similar to human translation
Outperformed Chinese-English translation and Arabic-English translation. Why Chinese and Arabic? They're very different from English. If we can do these languages, then it would be easier to do Spanish and French.

Questions and answers

- How does Google make money off Orkut? We never worry about profit

for product. We make it first, and then we see if we can make money
off it.
- Is there an reality in a Google online office? Can't comment on any rumors.
- How many people are you looking to hire? No specific number in mind. As many great, talented people are out there.
- Server count? Can't answer that.
- Majority of Google revenue come from licensing technologies? Revenue statements are largely open now that Google is a public company. Most of it comes from Adsense. Some revenue from Google Earth and Google search appliance.
- Only some publications from Google Labs. Is that something encouraged within Google, or is it just happens? Very fine line betwe... we want competitive advantage also. We have opened up software. Historically we haven't been a huge research company.
- Where do you stand on privacy? "Don't be evil." You need to get special permission to go through query logs, for example.
- What about Linux and Mac versions of things like Google Desktop? We want to focus on what will give us the most impact. Cross-platform thing is 20%-time stuff. Most Googlers use Linux, so it's frustrating having to borrow someone else's computer to try things out.
- What about linkspam? 50-100 people working on linkspam. Matt Cutts is one of the Googlers working on this.
- What about corporate structure? I've heard Google's supposed to be very democratic. — Teams themselves figure out what features should be added. We just meet and figure out what to do. Engineers have a lot of power. More motivation to work on things.
- How many engineers do you have? 3000+ engineers.
- Why do you help out Firefox? What do you have planned? Sometimes Google just does things to help make the Web a better place. Part of philosophy of not being evil.
- What about UI design? UI designers really help us a lot. For example, sidebar. UI designers helped us do that.
- Software engineering? We have design documents and we review them. Testing. 20% projects are an exception; rules are looser. For most projects, there are design documents, all the code is reviewed before it's submitted, unit tests are encouraged...
- What are you looking for? Well-rounded bright individuals. We want to be able to learn something from you. We want to make sure you're a solid recruit for Google. We want to make sure we keep learning something. Something that wows us. "Wow, this guy is sharp."

4. Google recruitment talk: Impressions

Google is, of course, t3h k3wl. In fact, working at Google is probably
cooler than studying at MIT, in terms of geek status. ;) This
recruitment talk wasn't about convincing U of T students how cool
Google is. That would've been preaching to the choir. Rather, the talk
was about some of the interesting challenges people might get to work
on at Google. This should help students think about their projects and
their resumes...

I was a bit disappointed that there weren't any female engineers. The
two women there were both from HR. They wore Google shirts with the
second "o" replaced by the sign for woman, and that's something I want
to think about further. I talked to one of the women after the
presentation. She said that there was supposed to be one, but she got
pulled into a project at the last minute. They do try to pay attention
to these things, though, and occasionally have all-female events.

I confess. I loiter near the front during post-talk mingling not
because I have burning questions to ask, but because I like
eavesdropping on other people's questions. I learn a lot from other
people's concerns. For example, like students around the world, U of T
students are worried about their GPA and whether their grades will
affect their admissions. They want to know what companies are looking
for. They want to know about where the company's going. The usual HR
stuff. I like watching out for the unusual questions, like the way
someone asked "So, important question: vi or emacs?" (Wish I knew who
asked that one!) And the person who asked about Python. Interesting.

this stuff, it's a good time to learn AJAX and figure out how to use
the Google APIs. Google Desktop looks _really_ interesting and it's
right up my personal info/knowledge management alley, but it's
Microsoft Windows-based. (That's another option, though; get
something running on Linux...)

So if I want to boost my chances for next year's job application
cycle, I should work on a project. Come to think of it, anyone can do
that from anywhere in the world--so don't lose hope, people back home!
=)

Next question. Do I want to work at Google?

I didn't need to see this presentation to know that Google is totally
cool. It's every geek's dream company. Imagine hanging out with
incredibly brilliant geeks, working on great projects, eating nice
(and free!) food, and enjoying all the computing power you can throw
at a problem.

Does it fit what I want to do?

Well, if I get in, it will certainly push me in terms of technical
skills. I'll learn a _lot._ But I don't just want to work on my
technical skills... I don't think I know enough about Google yet to
like them immensely.

It's nice that Google matches employee donations, and it's great that
they've got a motto of "Don't be evil." I need to learn more about
them and how they might fit into my personal mission statement,
though... I think I need a lot more user contact, a lot more
involvement in people's lives.

And hah! yes, ego comes into it too. I want people to know me. Not
just the systems I build, but to know _me_, and I want to know them
not just as statistics but as people too. As much as I'm glad that
those Googlers can keep Google running and can develop all sorts of
cool new systems, they're still anonymous to me and to the millions of
people who use Google without thinking.

There you go. I've confessed it. I'm egotistic. I want people to know
me and I want to know them. I want to be within talking distance of
users.

Is that something Google can let me do? I don't know. We'll see.

Ack! I can't believe I feel uncertainty about _the_ geek company of
our time!

5. Geek girl T-shirts

The two women from HR wore Google Women's Tees.
From the website: "We originally designed this shirt for our efforts in recruiting women engineers."
Seeing the shirt on them made me think about my geekwear, and why I
found the Google Women's Tee a bit strange.

I like wearing tech shirts. They're a great way to identify myself to
other people. They makes it easier for geeks to talk to me. They
provide instant conversation starters for people in the know.

I'm still not used to the Venus symbol, though, and that's probably
because I think of the symbol in different contexts. It feels too
serious for me. I guess I'm also more used to the "girl" aspect of my
identity than I am to the "woman" aspect. That's why I self-identify
as "geek girl".

Maybe it's a socialization thing. I'm more used to subtle gender
signs, like the "geekette" in my signature. I like wearing baby tees
with the same logos as the regular shirts. The logo connects me to
other geeks, but the slightly more flattering cut makes a small
difference.

Ah. That's probably it. I want my geekwear to connect me with other
geeks, which is why I'd go for something generic like "emacs" over
something like "geek. girl. goddess." I'd wear "emacs girl" if I want
to point out that yes, I can _too_ be a girl _and_ be into Emacs, but
I prefer focusing on what I have in common with other geeks.

It's pretty much a moot point, anyway, as they only had white
long-sleeved men's style shirts earlier, and they ran out before I
could get one. The swag would've been nice, but it wasn't essential. I
learned enough from the conversations and the talk itself to make the
time worthwhile. &lt;laugh&gt; I can understand why they probably
wouldn't bring women's tees to a mixed talk. Still, I'm endlessly
appreciative of conferences and tech sessions that actually have baby
tees, like the totally cool open source conference I spoke at in Cebu
and the blogging summit I attended in Manila right before I left. I
left the blogging shirt at home, but I love my open source baby tee to
pieces.

Ah, the trouble with being a geek girl in a guy's world... Swag rarely
fits.