Month: December 2006

This post is a little more philosophical than most that you will see here. It provides a little bit of background as to why edgeio is in the business of bringing together, organizing and distributing listings to the edge of the network. In short it is because we believe that the Internet is moving away from big centralized portals, which have gathered the lions share of Internet traffic, towards a pattern where traffic is generally much flatter. The mountains, if you will, continue to exist. But the foothills advance and take up more of the overall pie. Fred Wilson had a post earlier this week about the de-portalization of the Internet which is essentially making the same point when seen from the point of view of Yahoo.

Update: 11am Pacific, Sunday 10 December

Several commentators are seeing the word “de-portalization” (first coined by Fred Wilson) and reading “end of portals”. To be clear, and apologies if I wasn’t already, de-portalization represents a change in the relative weight of portals in a traffic sense, and the emergence of what I call the “foothills” as a major source of traffic. This will affect money flows. Portals will remain both large and will continue to grow. But relativeley less than the traffic in the foothills. The foothills will monetize under greater control of its publishers and the dollar value of its traffic is already large and will get much larger.

The following 3 graphics illustrate what we believe has happened already and is likely to continue.

The first picture is a rough depiction of Internet traffic before the flattening

The second picture is a rough depiction of today – with the mountains still evident, but much less so

The third picture is where these trends are leading. To a flatter world of more evenly disributed traffic.

Some of the consequences of this trend are profound. Here are our top 10 things to watch as de-portalization continues..

1. The revenue growth that has characterized the Internet since 1994 will continue. But more and more of the revenue will be made in the foothills, not the mountains.
2. If the major destination sites want to participate in it they will need to find a way to be involved in the traffic that inhabits the foothills.
3. Widgets are a symptom of this need to embed yourself in the distributed traffic of the foothills.
4. Portals that try to widgetize the foothills will do less well than those who truly embrace distributed content, but better than those who ignore the trends.
5. Every pair of eyeballs in the foothills will have many competing advertisers looking to connect with them. Publishers will benefit from this.
6. Because of this competition the dollar value of the traffic that is in the foothills will be (already is) vastly more than a generic ad platform like Google Adsense or Yahoo’s Panama can realize. Techcrunch ($180,000 last month according to the SF Chronicle) is an example of how much more money a publisher who sells advertising and listings to target advertisers can make than when in the hands of an advertiser focused middleman like Google.
7. Publisher driven revenue models will increasingly replace middlemen. There will be no successful advertiser driven models in the foothills, only publisher centric models. Successful platform vendors will put the publisher at the center of the world in a sellers market for eyeballs. There will be more publishers able to make $180,000 a month.
8. Portals will need to evolve into platform companies in order to participate in a huge growth of Internet revenues. Service to publishers will be a huge part of this. Otherwise they will end up like Infospace, or maybe Infoseek. Relics of the past.
9. Search however will become more important as content becomes more distributed. Yet it will command less and less a proportion of the growing Internet traffic.
10. Smart companies will (a) help content find traffic by enabling its distribution. (b) help users find content that is widely dispersed by providing great search. (c) help the publishers in the rising foothills maximize the value of their publications.

edgeio is hoping to play a role in these trends. We will talk about some new products later in the month that follow from this approach.

edgeio has ambition in ChinaÃ¢â‚¬â„¢s market since its launch. When I talked to Keith this May, I found that Keith is familiar with China. And most important of all, he is very interested in ChinaÃ¢â‚¬â„¢s internet market (Keith went to Beijing during his realnames dream). After I visited edgeioÃ¢â‚¬â„¢s Menlo Park office twice on behalf of one of the craigslist-wannabe in China – http://www.edeng.cn (a startup I run with my ex-Siebel coworker, Yan Ma), I started getting interested in edgeioÃ¢â‚¬â„¢s distributed world of classifieds. Soon edgeio and edeng became partners and edeng became the first listing provider from Chinese classifieds market to edgeio.

When Matt mentioned the Chinese version of edgeio, I thought it will be a nightmare for most Chinese users Ã¢â‚¬â€œ at least the website name: edge-io. Actually I am surprised that Matt spoke out the word “ed-geio” rather than “edge-I-O” when we first met. “Use a different domain name for godÃ¢â‚¬â„¢s sake,” I said. Well, this is blog – I actually did not say “for godÃ¢â‚¬â„¢s sake”. Yes, it is TRUE that the name matters. For example, my observation is that, google will never be able to compete with baidu in ChineseÃ¢â‚¬â„¢s search market simply because of the domain name. More than 80% of the internet users in China will not be able to spell g-o-o-g-l-e or g-o-o-g-o-l no matter how skyrocket high the GOOG stock is. Since edgeio is essentially a syndication of classified listings all over the web, it is becoming a giant real-time catalog for various sizes of business. Therefore, we later pick a Chinese word Ã§â€ºÂ®Ã¥Â½â€¢(mulu) means “catalog”. In Chinese, 100 means 100% complete. So mulu100 becomes our favorite, which means this is a place for all the catalogs.

Nowadays the traffic from China is the second largest source of edgeio/mulu100. Based on fact that the online classifieds ads are growing dramatically in China, I expect that the traffic of mulu100 will soon take the lead.

Hot on the heels of our acquisition of Adaptive Real Estate Services we can today announce the initial roll out of edgeio’s relevance based search engine. This is the first step in our efforts to make edgeio.com the best place to find “stuff” anywhere in the world.

By way of background, edgeio launched in March with zero listings. We took in about 100 new listings per day at that time. Today we take in about 700,000 new listings per day. The search engine we began with (free text matching and then results in reverse chronological order) simply was not good enough to function with this number of listings.

We now have a dedicated search team and this is their first push. It is not yet perfect but it is a vast improvement on what was there before.

In this upgrade we are acknowledging the way partners and users are using edgeio and trying to improve their experience. Many listings based sites are uploading their listings to us and we are providing search traffic back to them. We are being used as a listings search service by companies with listings and by users looking for listings. A “search engine for stuff” if you will.

These are all global searches (edgeio has data from about 15000 cities worldwide). You can use the geography widget (top right of the results screen) to choose a city. Once you have done that then the slider control can be used to fine tune the results (zip, city, state, country, continent, world). Of course, you can also sort by price or by date listed.

Arun Jagota; Josh Myer and Dale Johnson are the team – mostly quite new at edgeio – who are working on search, and have moved us from a reverse chronological display of results into a relevance ranked display. Of course they have had a lot of help from others, most notably our technical advisors. And they have a lot of work still to do to make the results the best there is.

Going forward, as edgeio strives to bring together, organize and distribute the world’s marketplaces, edgeio.com will be the place that our organizing efforts are most obvious. It will be the place to find “stuff”.

From here on relevance will be our default sorting method. Of course we will enable users to modify the sort order (by time, by price, and in the future by other criteria). Our outbound APIs will eventually reflect these options also.

There is a whole lot more to come from us, and this is a baby step in many ways, but a significant directional move. Let us know what you think.In future posts we will talk about the bring together and distribute parts of our vision – these are realized through our edgedirect product.

But for now lets meet the team working on search:

Arun Jagota

I am a search engineer at edgeio. I am working on the design and evaluation of algorithms for improving relevance in particular and search in general at edgeio.

One of the key challenges is the relevance problem itself. A tough nut to crack. The challenge is to find methods that are both simple and efficient, yet effective in returning relevant results. Another challenge (specific to edgeio) is to fetch relevant results from a variety of sources in real-time, recompute their relevance internally in real-time, and merge them into a single set of results that the user sees. A third issue, also specific to edgeio, is that our documents (unlike general web pages) are listings in verticals with varying degrees of structure. So there are special issues involving relevance and search for finding “stuff” rather than web pages.

What keeps me motivated is that “relevance and search” supply me with a constant source of challenging (but not impossible) problems to solve, and algorithms from computer science, statistics, and information retrieval present me with solution methods to consider and evaluate. Another thing that keeps me going is constant incremental progress and quick feedback. You have an idea, try it out, sometimes it improves relevance, and you notice it quickly.

Before working at edgeio, I worked at another start-up (Xoom corporation) as a data analyst and machine learner. There I designed improved algorithms for predictive modeling in an e-commerce setting and also some for improved fuzzy matching of names and addresses of people. Prior to that I taught graduate courses as an adjunct faculty member in computer engineering at Santa Clara University, including one on “Information Retrieval And Search Algorithms”.

Josh Myer

Hi, I’m Josh. I’m the Young Guy at the office, but I make up for it with an intense background. Before going to college, I spent a few years working as a reverse engineer and general puzzle-solver, in fields ranging from accounting to instant messaging. I just wrapped up two degrees from the University of North Carolina (Chapel Hill), one in Linguistics and one in Mathematics. I focused on the typically-impractical formal aspects of both, but it’s actually come in handy when working on search problems.

I spend a lot of time in the plumbing of edgeio, but have been working more on search lately. The user-visible bits that I’ve done so far are the real-time search results from external providers. I’m currently working on several things to make search better, faster, and more user-friendly.

Working here has been great: there’s always a new problem to solve and the freedom to solve it the way you want to. All told, I get to use my entire background at work, ranging from unix arcana to the acquisition of language in children. It’s all the fun parts of college (laid-back, lots of new knowledge) with the fun parts of a job (making useful things, getting paid).

I have done 15 years of database work, on relational database, data warehousing and search. I have done work on PostgreSQL internals, and have studied MySQL internals. Most recently I worked at Tellme where I designed and developed a 1.5TB data repository to drive data warehouse reporting functions for the call details of each of over 1 billion Tellme-answered phone calls. This involved a redundant and reliable cluster of over 50 mysql servers using inexpensive off-the-shelf hardware. This used a combination of mysql and record-oriented raw data files.

I am currently coding extensions in C++ to our search engine, doing things like parsing out Chinese sentences into searchable blocks to support http://mulu100.com. Also I recently have implemented some statistical approaches to our full text search, gathering a corpus profile and applying that in real time to search terms to improve the selectivity of results.

The key challenge I think is be able to flexible enough to implement a solution as we discover the most natural way for a user to navigate through millions of items. To provide a back-end that is able to support a dynamic state-of-the art interactive user experience that people now expect; and to be able to provide these results in real time. Many requests need to distinguish between tens of thousands of documents which have one or more of the search terms present, and determine the top 10 / 100 / 1000 of those items in under a quarter of a second. Under these operational constraints, the traditional relation database approach completely falls down; quite the fun engineering challenge.

What keeps me motivated is the knowledge that the web is still 95% noise and 5% signal. Search is the thing that has the potential to cut through the noise, so we’re really fighting the good fight, of taking listings from potentially obscure but highly useful sites, and making them available to the people it will really matter to, and doing it in a fair and egalitarian way.