Gen Y Entrepreneur

August 30, 2009

Wait, what?Every so often, I like to code up something random. I think of it as a productive way to expand my technical abilities, and although I don't have any concrete evidence, I figure it probably gives me a broader base against which to think about problems in other areas. So, as a weekend project, I thought it might be challenging to teach myself something about image processing. The field of "computer vision" is evolving rapidly, and I didn't (still don't) know a thing about it. I initially thought it would be cool to build something that could identify features of a face... geometry detection. But after reading a bit about it, I found out that almost every modern method of doing this is based on computing geometry from detected edges in the image. And, so clicking a few Wikipedia links, I realized I needed first to learn about edge detection.

Edge DetectionAfter some brief research, I found the Canny Edge Detection Algorithm. Although it was developed over 20 years ago (in 1986), it looks like almost all modern methods are based on, or at least acknowledge, Canny's ideas, and it seems to remain one of the best methods for finding edges in an image.

MethodAs interesting as the topic is, the description of the edge method would be really boring to read as text. Not that boring is a problem, since the focus of the blog isn't to be interesting as much as it is to get things down on paper. But since I think I can make it more interesting, I will try to do so.

And (looking ahead to the solution), here's the edge detected version. In the rest of this post, I'll explain how I got there.

Step 1: Noise Reduction

Now, the first process discussed as part of the Canny detector is a noise-reduction stage. To do this, I simply reduced the color space to greyscale. Since I'm not an expert in the field, I just picked a random formula from the internet that I thought looked good. It is:

my $grey = byte(77/256 * $red + 150/256 * $green + 29/256 * $blue)

Having done this, I then sought to blur the image. Thankfully, wikipedia provided a suggested blur filter, which would be to use a convolution with the image (A) and the square matrix below. Since the PDL (Perl Data Language) library provides a function for a 2d convolution, this was actually remarkably easy.

As a side note, I did find a few papers suggesting that using a different color space will make it way easier to detect skin, but I thought it might be better to build a generic detector first, then specialize the program later if I wanted to work on face geometry.

This yielded a greyscale, slightly blurry version of the original picture, seen here:

Step 2: Gradient Detection

Next, I attempt to find the actual edges by finding the intensity gradient of the image. This is very easy,, I just find the delta-intensity between each pixel in the North-South and East-West directions. In the images below, areas of high gradient (high first derivative) have higher intensity values (they are whiter). As you can see, the NS edges are primarily horizontal and the EW edges are primarily vertical. Here's the gradients in the North-South (NS) direction.

And here's the gradients in the East-West (EW) direction.

There's also some work out there that suggests using the second derivative instead of a simple gradient to find the edges (zeros of the second derivative correspond to maximums in the gradient. This would natually obtain higher accuracy, thin down the lines, and eliminate the need for the non-maximum suppression (step 5) later on.

The Canny algorithm also calls for gradients to be calculated in the NW/SE and NE/SW directions (4 gradients instead of 2). For simplicity sake, I only implemented the NS and EW gradients here.

Side Note:

From an implementation standpoint, its clearly redundant to write separate functions for the NS and EW directions. Since the image manipulation functions that I wrote below all operate in-place, I can transpose the matrix to operate on it "sideways" than change it back.

So, more specifically, to get the EW implementation, I take the original input, transpose it, and pass it into the NS function. Then, I take the resulting image, and transpose it again to get the EW result out.

Step 3: Thresholding

The third step is to identify which edges are important and which are not. The algorithm assumes that areas of high difference (gradient is large) are more important edges than those that are smaller. Thus, a threshold can be used to find the important lines.

The level of noise versus accurately detecting all the important edges really depends on the threshold chosen. Below, I'll pick two thresholds, one at 20% of the maximum gradient and one at 10% of the maximum gradient. (That is, if the greatest pixel-to-pixel difference were 100, I'd only show lines that were over 20 and over 10).

The high threshold (20%):

And the low threshold:

And of course, these are only in the NS direction. We'll also want to do this in the EW direction. As seen here (high):

And with the EW, low (10%) threshold:

As you can see, the higher threshold has less noise, but in both the NS and EW direction, fails to pick up the full line that you can see in the lower threshold.

Step 4: Edge TracingThe algorithmic solution to the problem above is to use the high threshold as a baseline to figure out what lines are the important lines, and use the lower threshold to follow those lines. I'll quote Wikipedia's explanation as being a relatively clear explanation:

Making the assumption that important edges should be along continuous
curves in the image allows us to follow a faint section of a given line
and to discard a few noisy pixels that do not constitute a line but
have produced large gradients. Therefore we begin by applying a high
threshold. This marks out the edges we can be fairly sure are genuine.
Starting from these, using the directional information derived earlier,
edges can be traced through the image. While tracing an edge, we apply
the lower threshold, allowing us to trace faint sections of edges as
long as we find a starting point.

So, starting with the pixels in the high threshold, I then use any pixels in the low-threshold image that are adjacent. My implementation of this was akin to the Paint Bucket tool in an image editing program, spreading (recursively) until all the adjacent pixels in the region were detected. Again, I ignored the NE/SW type directions for simplicity.

The edge-traced images are shown below, first in the NS direction:

And also in the EW direction:

As you can see, the major lines are fully traced (better than in the 20% threshold), while eliminating most the noise (much better than in the 10% threshold).

Step 5: Non-Maximum Supression

The lines in the above images are all really thick, and all the extra pixels would make the detected edges hard to use in practice, for example, in a geometry detection algorithm. What we'd like to do is to thin down the lines so that only the highest delta in a region showed up. To do this, we refer to the original (post step-1) greyscale image, and find the local maximas of the gradient function. My implementation (for the NS direction) looks at each "column" of pixels independently, and finds the local maximas of each "cluster" of white. The resulting edges are shown below:

North-South:

And EW:

Step 6: Composite Images

Finally, we need to join the NS and EW edges into a single image. The implementation is easy, just binary OR each pixel in the image. Hooray!

As you can see below, we finally have a clean edge-detection image to use.

August 24, 2009

So this post is a brain dump of something that's been tickling my
brain for a while now. It's mostly a collection of poorly cohered
thoughts, but they're rattling around in my head at night and
preventing me from sleeping well, so I need to get them out of my brain
and into a semi-permanent medium. I'd also like to get feedback on the
concept... I think it could be interesting.

--

A/B TestingThe
starting idea is the concept of A/B testing webpages. That is, testing
variations of a webpage, randomized to eliminate interference by
dependent variables, to see which web page is more effective at getting
a user to "convert" (i.e., do what we want.)

Prediction AlgorithmsThe
next concept is that of the prediction algorithm -- using (usually
past) data to predict what a user is going to do. Or in this case,
which page (A or B) will drive the user to a conversion.

The Netflix PrizeNow,
the best prediction algorithms in the world right now are probably
those designed to win the Netflix prize, so I'll discuss those next.
The Bellkor prize solution
isn't quite as broadly applicable to other problems (mostly describing
the weights of the winning algorithm), so its slightly less
interesting, but it does convey one tidbit we should keep in mind.
This is that taking a wide selection of recommendation algorithms (in
the Bellkor paper they use 107 elements) and taking their results in
some weighted fashion is probably better than going "deep" into one
algorithm.

ParallelsSo,
to make sure we're clear, if you were to compare A/B testing to the
netflix recommender, movies would be the equivalent of page variants.
So we'd want to predict which page variant a user would "prefer", based
on some characteristic of the user and the pages. Got it? Good. Now
it gets more interesting.

Neighborhood-based Collaborative FilteringThe
Algorithms Analyzed explanation reduced the prediction problem down to
essentially two formulas. In these formulas, the variables u,v
represent two users and the variables i,j,k represent the movies.

or

So,
in the first case, you'd be considering the similarity between two
movies and combining that with the user's preference for one of the
movies. In the second case, you'd be comparing between two similar
users and seeking how the other person rated that movie. To consider
the parallel case of A/B testing pages, you'd only use the second
formula (which page similar users prefer), rather than trying to
compare different pages in the A/B test (which would be impossible, if
the user's never been to your site before.)

Similarity FunctionsSo,
assuming you could actually pull off the above and implement the
algorithm for an A/B test, you still have to come up with some way to
figure out the similar function between two users. How could you do
that? A few readily come to mind. IP geolocation gives you pretty
close data for location. If you could narrow it to a set of zipcodes,
you could use demographic information such as household income, age
distribution, race distribution. You have data regarding the user's
computer from the browser's user agent (browser, operating system, .NET
installation), as well as via javascript (screen resolution, color
depth). You can guess their ISP. You could even portscan them if you
really had to.

Processing PowerAnd one of the best
things about this concept is that the hard work isn't done on the fly.
The most difficult part of the process is deriving all the weights for
the equation, and that can be done from log data. How to accomplish
this is left as an exercise to the reader; you could use Singular Value
Decomposition if you want
high precision, or just use a neural network or genetic algorithm for a
good approximation that is simpler than doing the SVD. Regardless,
once you figure out the weights, the only thing that needs to be done
in real-time is *applying* the formula, which is just a few lines of
code.

Linearity and Variable DependenceOne possible
problem with the nearest neighbor algorithm as defined above is that
for simplicity sake, all variables are assumed to be linear functions,
neither of which may hold -- certainly not linearity, and most likely
not even the assumption they're functions (i.e., do they pass the
vertical line test?). The other is the idea of variable dependence,
which I will illustrate anecdotally from our A/B testing example -- if
all variables are independent, then it assumes all rich people will
respond a certain way, regardless of if they're in the east, west, or
south. So it precludes even the possibility that there's a function
where a poor new englander will respond more like a rich westerner, or
anything like that. This model is clearly very simplified.

FeedbackPlease leave any thoughts or feedback in the comments, or e-mail me at blog@ryankosai.com

October 08, 2008

Design is hard enough without trying to reproduce a commonly used visual element. Here are some stencils that cover the basics.

Yahoo's GUI Stencils:This is a fairly complete package of common web elements as well as basic viewports for handheld devices. It also has some nice "Windows" elements and the sizes of banner ads. http://developer.yahoo.com/ypatterns/wireframes/

iPhone GUI Development:Yahoo's stencil set has some elements for the iPhone, but this package gives you some additional iPhone options that aren't in the Yahoo package.http://www.teehanlax.com/blog/?p=447

Web Browser Elements:This package contains cursors and common web elements for a variety of browsers on different operating systems.http://radassembly.com/blog/?p=23

August 01, 2008

I think that great startups make you go, "wow. that's much better than the old way." Having spent a decent chunk of my undergraduate education (electrical engineering at the UW) looking up datasheets for electronics components, I have to say that this is 100 times better than Googling.

http://octopart.com

I don't think most people will get it. After all you need to know you're looking for an LM348N chip. But you're immersed in that world, then its a perfect startup. They should be wildly successful if they can figure out how to market their product.

July 26, 2008

S3 is GreatAmazon S3 is a pretty cool way to store data online, for just 15 cents a GB/month. The bandwidth's pretty expensive at 10c per GB up, and 17c per GB down but still viable depending on what you're planning on doing with it.

They also have a really slick API to get files to and from the server. One in particular allows clients to directly upload files to your storage space from their machines via HTTP post (so via the browser, Flash, etc.) This is great since your server doesn't have to proxy them to S3. But wait! If you have to authenticate into the service with a secret key on the client side, then you're basically giving away your key. No good.

But this is not the case, and their solution is really well implemented. To do this, you essentially sign an policy that specifies what can be uploaded -- maximum file size, key name, etc. The policy is in a standard format, you base64 encode the policy and send it with a HMAC-SHA-1 hash encrypted with your secret key and encoded using base64.

Perl is GreatAthleon's written in Perl, which I've been extremely happy with on this project. The code below is to encode a policy, generate the SHA-1 hash, and base64 encode it. It uses the MIME::Base64 and Digest::HMAC_SHA1 modules, and looks like this.

S3 + Perl is Almost GreatQuick and easy, right? Alas, though. This code doesn't work. What gives? I didn't see any documentation anywhere that says this, but looking inside some modules on CPAN, everyone's adding a '=' to the end of the signature.

So I . '=' on the end of the signature, and everything works great. Maybe it has something to do with a spec I don't know about. Or maybe its something obvious that I missed. If anyone knows why, I'd appreciate a comment.

Still though, Amazon did a good job with AWS, and its been an absolute pleasure working with the S3 API so far.

July 03, 2008

Back from Silicon Valley fundraising. Blogging should resume regularly. For my first post back, here's a second quote on Harrah's from Winner Take All, the casino run by math-types.

When Loveman realized that losers are miserable, he figured he could keep them gambling longer if he could reduce their perception of losing. [...] Harrah's began tracking gamblers' losing streaks in real time while they were still sitting at the slot machine. As soon as a gamlber stuck their Total Rewards frequent gambler card in the machine, the computer started comparing their actual losses and winnings against the predicted odds. Big losers were flagged in the system. A "luck ambassador" was then dispatched to perk them up with friendliness and a token gift.

Arguably, web apps have any real-time usage interaction data that they want. Are there any that use this info in real-time to make users happy?

June 21, 2008

The title of this post bears repeating. What does your audience really want? If you're going to be best in the world, then you ought to be able to figure this out.

It's not what they tell you they want. And if its a feature of your product, you're most likely wrong. You're supposed to sell the benefits, not the features, but I don't think the benefits are what you think they are.

As I mentioned in my last post, I think Steve Jobs is one of the most in-touch founders out there. Here's one of my all-time favorite quotes by him, in an interview with Newsweek (via SVN).

Q: Microsoft has
announced its new iPod competitor, Zune. It says that this device is
all about building communities. Are you worried?

A:
In a word, no. I’ve seen the demonstrations on the Internet about how
you can find another person using a Zune and give them a song they can
play three times. It takes forever. By the time you’ve gone through all
that, the girl’s got up and left! You’re much better off to take
one of your earbuds out and put it in her ear. Then you’re connected
with about two feet of headphone cable.

Wait, what? Girl? Hint: The chief benefit of an iPod, for the 16-25 crowd is not about how to play music. Which is probably why they cut the FM receiver (a very cool feature, IMO) from the original design. It's a smaller, sleeker, sexier life.

June 17, 2008

I love Steve Jobs. I think he's brilliant: in one sense, he can envision things in the "correct way" to satisfy customers, even though they don't know what they want yet. He's a sort of personal hero, and I could definitely probably be classified as a groupie.

When I saw the book Inside Steve's Brain a few months back, I purchased and read it immediately. Here's something interesting on what customers are willing to pay for:

Take the iTunes online music store, which launched in 2001, at the height of the popularity of online file sharing. [...] Why would anyone spend $1 a song, when they could get the same song for free? Jobs's answer was the "customer experience." Instead of wasting time on the file-sharing networks, trying to find songs, music fans could log on to iTunes and buy songs with a single click. They're guaranteed quality and reliability [...] "We're going to offer you a better experiences... and it's only gonna cost you a dollar a song."

But the target here is clearly a Bobo audience (another book I finished recently). What's a Bobo? Bobos are the new upper class. And bobos don't like to spend money on "conspicuously consumption" (e.g. Donald Trump), which they consider consider vulgar. It's showing off. However, they are willing to spend significant money to prove their refinement of the "common necessities" of life (e.g. organic African dishware) -- here, it's alright to splurge.

From the back cover of Bobos in Paradise (David Brooks):

Do you believe that spending $15,000 on a media center is vulgar, but that spending $15,000 on a slate shower stall is a sign that you are at one with the Zenlike rhythms of nature? Do you work for one of those visonary software companies where people come to work wearing hiking boots and glacier glasses, as if a wall of ice were about to come through the parking lot? If so, you might be a Bobo.

After all, it is not conspicuous to shower; and if you're going to buy a pair of boots there's no point in buying something mediocre. And everyone listens to music -- but iTunes is elegant. ThePirateBay is crass.

I'm not sure if the majority of average America thinks like this. You might, but you're not average.

So who's your audience? I think that if your web application's audience is primarily bobo, customer experience is probably disproportionately important to your success. And if you're good at user experience, if you're a future Steve Jobs, then your sweet spot is affluent "bobo" buyers.

June 15, 2008

According to Yahoo, 80% of the end-user response time is in the front-end rather than code generation. For Athleon, a large part of this is loading up static images which can occasionally be quite large (~100k). If you believe that your users' perception of speed is more important than actual speed, here's something that can help:

June 11, 2008

Recently finished the book "Winner Takes All" by Christina Binkley. It's mainly about Steve Wynn, and the other big players in Las Vegas, but there were a few interesting chapters on Harrah's, the casino company run by quants.

Gary Loveman, the CEO of Harrah's is a former professor of the Harvard business school. The book briefly describes the great lengths his team went to, in order to quantify patron interactions with its casinos.

It is an interesting case study to read about Harrah's, and apply its lesson to Athleon. From the book, Harrrah's was able to determine that gamblers who the least time between pushing the button on slot machines were the most likely to be convinced to gamble more:

To entice [those gamblers] to make two visits that month, Harrah's sent cash and food offers that expired in consecutive two-week periods. The gamblers responded like maze-running rats: The group's average number of trips per month rose from 1.1 to 1.4. Harrah's new direct mail programs were so successful that, in its Las Vegas casino alone, the rate at which people responded to mail offers more than doubled.

Unbeknownst to the gamblers, Harrah's statistical model set calendars and budgets that predicted when they would gamble and how much. [...] Harrah's computers spit out "behavior modification reports" so personalized that they could suggest that one gambler would respond best to a cash offer while another would be more motivated by a free hotel room. [...] A gambler who was overdue for a visit to the casino would receive an "invitation" by mail or e-mail. If they didn't respond, they got a phone call from a Harrah's telemarketer. "We get him motivated, back in an observed frequency pattern," Loveman said.

Casinos make for an interesting real-world analogue to web
applications. In contrast to casinos, data collection is particularly
easy for web applications to do, but few websites utilize that data
nearly as effectively as Harrah's did. But, coupons and mail offers
are essentially the casino equivalent of alert e-mails sent to existing
users. Sending the right message to a casino patron is a subset Josh
Koppelman's Lifecycle Messaging. Harrah's Total Rewards program is like a web analytics packages, but with segmentation analysis of the highest caliber.

Sending the right message to the right group of people. More on this in a future post.