Unofficial news and tips about Google

March 31, 2006

Yahoo is tired of buying Web 2.0 companies one at a time. They just want tje whole package.

"We're in the midst of buying Dogg (a Web 2.0 cross between Digg and Dogster "Where Every Dog Has A Webpage"), and you know what? It's a lot of work. Buying up Web 2.0 companies here and there in piecemeal fashion gets old after a while.

It's a lot of work on our Corporate Development team, the Public Relations team, and really isn't that efficient.

So after some long discussions with Tim O'Reilly, Michael Arrington, and other Web 2.0 experts, we've decided to just buy Web 2.0.

All of it. All the people, the round cornered boxes, crazy business ideas, and pastel colors." (Yahoo Search Blog)

So if you want to be a Yahoo company, just create a Web 2.0 startup and you'll automatically be a part of Yahoo's big family.

"If the name of your app isn't short and catchy, it won't take off. Try recording yourself reading paragraphs from Geoffrey Chaucer's The Canterbury Tales, then play the audio backwards, slowly, for inspiration.

Make access to the app invite-only, but don't actually invite anyone. Nothing creates more desire for a product than its exclusivity. If no-one has it, everyone will want it. Simple!

Ask an A-list blogger to review your app (linking to your own "review", so that you drive traffic to both your web app and the site that reviews it). While it's unlikely that they'll do so without actually logging in and trying out your app for real, they might change their tune if you offer them some kind of incentive. Send them some Photoshopped screenshots showing tag clouds and images of your app in use, and blame your data centre for the server being down -- the perfect excuse for why they can't login just now. Then promise them 20% of your profit when you sell the app to Yahoo! Negotiate as required -- everyone has their price."

Philipp Lenssen has information that Google Rooms is Google April Fool's prank for this year. In fact, Philipp Lenssen is so sure that Google Rooms is this year's hoax that he photoshopped posted some screenshots. Google Rooms shows a map of Larry Page's room with pictures of Playboy magazine, Marissa Mayer as Larry's ex-girlfriend and Star Wars stuff.

"We have seen a few cases where users report that their accounts have been deleted, and in each case our investigations have revealed that the accounts were deleted by someone with that account's password. In these cases, we're unfortunately unable to restore accounts," a Google spokeswoman told ZDNet UK.

Many users think that even if someone knows their Gmail password, that person shouldn't be able to delete the Gmail account. But that's absurd: if you log in to Gmail, you can select mails (in a batch of maximum 100) and delete them. That's almost the same thing as deleting the account.

Until the identification systems evolve from username/password to biometrics authentication there will always problems like these. Biometric solutions use unique biological or behavioral characteristics (like fingerprint matching) to verify identification. A growing number of notebook PCs and computer peripherals are coming to market with built-in fingerprint readers, including keyboards, mice, external hard drives, USB flash drives and readers built into PC card and USB plug-in devices. For example, Wireless Intellimouse Explorer from Microsoft has a fingerprint reader that allows you to log on to your PC and your favorite Web sites with your fingerprint.

If you go to Google, enter [Google] and press "I'm Feeling Lucky" nothing happens. In fact, Google sends you to the first result for [Google], which is, of course, Google.com. Conan O'Brien has a different opinion.

Google launched a feature for businesses that want to target customers based on location. Marketers will be able to place photos and logos inside balloons that pop up on Google maps exactly where the merchants are located.

If you search for "New York books" and "Ralph Lauren New York" on Google Local, you will find some small icons on the map that represent a coffee cup, a shopping bag, a grocery cart, a flower or something related to the business. If you click on the link, you'll see more information about the place, the address, driving directions, the site.

* start with a written plan of action to avoid getting distracted * keep your plan simple and straightforward * start with the one thing you must get done today to feel productive * should be a manageable item you can complete in 10-15 minutes * your tasks should match your values or purpose * bring each task into congruence with your basic mission * if you can't, take it off of your list * don't put any "to-do" on your list that takes more than 30 minutes * if it takes longer, it's actually a series of smaller "to-do's"

* don't try to do everything perfectly * perfectionism often causes procrastination * any small step toward completion is an accomplishment * do the worst job (or part of the job) first and get it out of the way * set a time limit -- "I'll file papers for 5 minutes" * alternate unpleasant jobs with tasks you enjoy * delegate out items you can't make yourself do

* interruptions tend to occur in identifiable patterns * notice when interruptions occur, by whom, and why * take steps to prevent those interruptions before they occur * if they can't be prevented, learn how to delegate to someone else * if they can't be delegated, learn how to delay until you are finished

* make the project and environment as pleasant as possible * give yourself the best tools and work space for the project * take a few minutes to organize your work space * schedule a regular time to check in with a friend or colleague * rewarding your accomplishments encourages productivity

March 30, 2006

Google's PageRank was the key to Google's success because it gave a plausible answer to a simple question: "How valuable a site is?". Let's see who links to it and how important these sites are. But as time went by, Google realized that many links from less important sites could outweigh few links from major sites. And that's especially true for blogs. So Google wanted to fight spam from search results.

Jagger's [Google index update from October 2005] main change is the switch from the elegant but overly trusting PageRank system to the more realistically cynical TrustRank system, which is designed to only count votes from sites it trusts.

TrustRank imitates human behavior - if a stranger on a train recommends a movie, I'm going to value it a lot less than a recommendation from a close friend or movie critic, both of whom have earned my trust by either how long I've known them or by their reputation. Trust comes from two sources - site age and links from trusted sources. From my movie recommendation analogy above, site age is the close friend who has gained trust through the age of the relationship, whereas trusted sources are sites who has been granted a position of authority by links form a small seed group of trused sites.

Another way to look at this is from the point of view of a content publisher with a new site. At first, your links will be untrusted and will not contribute to the Page Rank of the page they link to. The site has to undergo an aging delay to before it is considered authoritative, which has led to discussion of the "Sandbox" (or the "Trustbox"). The idea is that new sites are sandboxed so they can't mess up the rankings until they've proven themselves, at which time they can participate in Page Rank voting.

There are two ways to gain trust and escape the Trustbox:* Acquire links from highly trusted sources (the "movie critic recommendation") * Acquire links from somewhat trusted sources and let them age (the "friend recommendation")

Google Sandbox is a filter whose criteria is the age of a site. After let's say 4-6 months or when the site acquires highly trusted links, a site is given credit for what it has achieved, for the backlinks it has established: its PageRank increases and it's more visible in the search results.

"You will identify key market trends that are shaping user behavior when watching Television. These include but are not limited to the intersection of internet and Television technologies, video-on-demand, personal video recorders and emergence of next generation set-top-boxes with IP connectivity. You will then identify areas where use of Google’s search and advertising technology can enhance this user experience and define appropriate products to deliver these user benefits." (Google Jobs)

Google aims to extend AdSense advertising program offline, in print and television. The problem is how will Google deliver contextual ads: they will just match the show content or the viewer profile?

Bart's PE Builder helps you build a bootable Windows CD or DVD from the original Windows XP or Windows Server 2003 setup CD.

You will have a complete Win32 environment with network support, a graphical user interface (800x600) and FAT/NTFS/CDFS filesystem support. It's useful for testing systems with no OS, data recovery or virus scan.

What's great about Bart's PE Builder is that it has many plugins that extend the operating system and allow you to do many tasks:

* Access USB drives.* Load the CD with SSH, Remote Desktop Client and VNC so you can use the boot CD as a workstation.* Recover deleted files.* Defragment the hard drive (much faster that defragmenting from a boot HD).* Use Internet Explorer and Firefox from the boot CD to surf the web.

Google may periodically sell up to 5.3 million shares of stock according to a regulatory filing on Wednesday. At its current share price of $394.98 a share, a sale of 5.3 million shares would raise nearly $2.1 billion.

Facebook, the Web site where students around the world socialize and swap information, has put itself on the block, BusinessWeek Online has learned. The owners of the privately held company have turned down a $750 million offer and hope to fetch as much as $2 billion in a sale, senior industry executives familiar with the matter say.

Well, can you see a connection between the two news? BussinessWeek says it may be. After all, Facebook might integrate with Google Scholar, Google Books, Google Groups. But $2.1 billion is a lot of money. Just consider that Rupert Murdoch bought Myspace.com for $580 million.

March 29, 2006

Remember I told you about Joga, a community site created by Google and Nike for football fans (I mean soccer fans)? Well, now you don't need an invitation to become a member of Joga. Just go to the site and create an account using your Google login.

The site will ask you a lot of questions about football: what teams you like, favorite players, most embarrassing football moment, favorite other sports. The good thing is that most of them are optional.

But what can you do on Joga? You can create a blog, upload videos, create albums, add bookmarks, add friends based on simple criteria like location, age and join communities. You can also create your own teams, find local fields, and play other Joga teams in your community.

"Whether we've succeeded, of course, will be up to all of you to determine. We look forward to seeing football-crazy people from around the world playing as beautifully as possible at Joga.com." (Google Blog)

The new Yahoo Mail, currently in beta, will be available to all users very soon. An evidence for that is the fact that Yahoo Mail started to display obtrusive ads: along with an animated flash in the right sidebar, you can now see huge ads (like the one for Vonage) that fill an entire page. Ironically, this ad is included in the Spam section.

Another usability problems regarding ads in Yahoo Mail Beta. There are two group of ads in most of the pages, one in the right sidebar (animated flash), next to the scrollbars that lets you select a mail or read it. The other group of ads sits at the bottom, right under the navigation, and includes the usual "credit card", "free loan consult" small ads. The problem is that it's very easy to click on the ads by mistake. And if you click on an ad, it doesn't open a new page or tab, the page just replaces the ad in the iframe, so you'll get this funny picture.

Because Yahoo chose to stick with the flash ads, if you move from a page to another (from inbox to the RSS reader, for example), you'll notice a big delay in page loading, especially if you have a slow connection.

And another thing: Yahoo didn't drop the welcome page that announces you how many emails you have, and whose only purpose is to make people click on the ad if they don't have any new messages.

Torrentspy, a BitTorrent search engine which was sued by the Motion Picture Association of America (MPAA), said that the music business might just as well have dragged Google into court. They say Google does the same thing as Torrentspy, sometimes much better, that is search for torrents and not host them.

The judges won't probably too impressed to hear that searching for [photoshop bittorrent] on Google yealds as a top result a site that links to Adobe Photoshop torrent links. If you search for something unlawful, Google is not responsible for the sites you find.

The MPAA filed seven lawsuits against Torrentspy and other search companies that help visitors find torrents or instruct them how to download it.

"You can easily add for example many HTML, CSS, Java, and JavaScript features by simply creating your web page for example with an HTML editor (e.g. Mozilla Composer), a WYSIWYG editor (e.g. OpenOffice.org) or with a text editor (e.g. SuperEdi, Notepad, or emacs) and then you just upload the file to GPC. You can also use Google videos, Google maps and stat counters by uploading your pages, which isn't possible with the Google Page Creator's Page/HTML editor, because it removes objects that use features such as JavaScript, which usually are in elements like script or embed in the HTML source."

So if you want Flash, Google AdSense, Google Maps, Google Video, Java, JavaScript, music, stat counters, or videos, create your pages offline and upload them to Google servers. And if you think the whole point of Google Page Creator was to create pages online, you're wrong. What webhosting provides you with 100 MB quote, 10 MB maximum size for files, unlimited bandwidth (well, almost) and no restriction for file types (mp3, avi, exe, zip) - all for free, and without including their ads in your pages?

What hoax will get out of the Googleplex this year for April Fool's Day? They made a tradition from making fun of their products, their corporate culture and their geekiness.

Let's take a guess:

1) Google will take over the world. Google will produce a device that will scan your important information (documents, bills, pictures, favorite colors, what you say, SIM content, hard-drive content), organize it, upload it to your Google account and make it searchable. EGoogle device.

2) Google will drop its simple and clean homepage in favour of the Personalized Home, that will also contain ads. Furthermore, you won't be able to search Google unless you have a Google account and you are logged in.

3) All mails from Gmail will be deleted. The Gmail team released it's so inefficient to type so many emails every day, so they will introduce AudioGmail, where you can record, search and organize voicemails. The new AudioGmail has another advantage: spam will be easier to recognize.

On a more serious note, everybody waits for the Google Calendar and there are some people out there that still hope that Gmail will go public.

Googlebot follows every links in the pages it indexes. But what if that links have disastrous effects? Here's a lesson about Googlebot's power to destroy a badly conceived site.

"Josh Breckman worked for a company that landed a contract to develop a content management system for a fairly large government website. Much of the project involved developing a content management system so that employees would be able to build and maintain the ever-changing content for their site.

Things went pretty well for a few days after going live. But, on day six, things went not-so-well: all of the content on the website had completely vanished and all pages led to the default "please enter content" page. Whoops.

Josh was called in to investigate and noticed that one particularly troublesome external IP had gone in and deleted *all* of the content on the system. The IP didn't belong to some overseas hacker bent on destroying helpful government information. It resolved to googlebot.com, Google's very own web crawling spider. Whoops.

As it turns out, Google's spider doesn't use cookies, which means that it can easily bypass a check for the "isLoggedOn" cookie to be "false". It also doesn't pay attention to Javascript, which would normally prompt and redirect users who are not logged on. It does, however, follow every hyperlink on every page it finds, including those with "Delete Page" in the title. Whoops."

So next time don't assume every visitor has JavaScript activated and validate the actions both on client side and on server side. If you want to validate your data for accuracy and security then you must use server side code to check your form inputs.

And something else: according to the HTTP 1.1 specification, the GET method is defined as a Safe Method which "SHOULD NOT have the significance of taking an action other than retrieval." If you to change a state (delete content, replace data), you should use POST.

March 28, 2006

If you forgot your Windows password and you don't want to reset it, you have a new option: Ophcrack LiveCD, a Windows password cracker that claims to be able to crack 99.9% of alphanumeric passwords in seconds. Ophcrack is based on a time-memory trade-off using rainbow tables, some precalculated data stored in memory. A rainbow table is a lookup table that is constructed by placing a plaintext password entry in a chain of keys and ciphertexts, generated by a one-way hash (a function that is easy to calculate but hard to invert). The end result is a table that contains statistically high chance of revealing a password within a short period of time, generally less than a minute.

"Do you know that there is so much internal competition in Google? For instance, there are some situations when a project is set to start its life cycle, that there is, on a parallel other teams that work on the same project (and if I'm right, sometimes the identity of the opposing team members or even this aspect of parallel teams existing for a project is not revealed until a later time in the SDLC). The final product that gets released belongs to the team that comes with the best proof of concept AND / OR the best design AND / OR the best pilot AND / OR the best final product AND / OR the best something else based on various parameters."

It seems that Google uses both competition and collaboration to deliver great products, but it's weird to work at Google as an undercover programmer.

Let's say you change the password very often on your Windows XP admin account from your home computer and you suddenly realize you don't remember the password. What do you do? No, you don't reinstall Windows XP and you don't format your hard-disk. You download Offline NT password & Registry editor, create a bootable floppy or CD, boot from the floppy / CD and start the utility. You can reset the password of any user that has a local account on your system.

This tip works for NT operating system: Windows NT4, 2000, XP and 2003.

Google‘s share of the US search market increased to 42.3 from 36.3 percent a year earlier, according to a study by ComScore. Yahoo's share dropped to 27.6%, MSN share dropped to 13.5%, while AOL's share increased slightly to 8.0% from 7.9% in Jan 06. Ask market share rose to 6.0% from 5.6% in Jan 06.

FlashGet is a feature-rich download manager with a nice interface and powerful features. Why would you need a download manager when you can download files with your browser? If you use Internet Explorer and your connection drops, you lost your file. If you want to download big files, FlashGet is a lot faster and can shut down your PC when finished downloading. Another advantage of a download manager is that it forces you to organize your downloads.

FlashGet features:

* clipboard monitoring, browser monitoring (Explorer, Firefox, Opera) - you just click or copy the download link you want and FlashGet will get it for you

* split files into sections or splits, and download each split simultaneously

* supports RTSP and MMS protocols

* it displays only an ad on top of the window (similar to the ad-supported Opera)

* a very nice Site Explorer utility that lets you explore HTTP and FTP sites. You can crawl Google site, starting from the index and flolowing all the links recursively.

* you can create rules for download management: for example, all mp3 files should go into a specific folder

Last night, Google deleted its blog by mistake. That allowed a so-called hacker, Trey Philips, to go to Blogspot and create a blog at googleblog.blogspot.com, Google Blog address. You can see how did Google Blog look for a couple of hours.

"We've determined the cause of tonight's outage. The blog was mistakenly deleted by us (d'oh!) which allowed the blog address to be temporarily claimed by another user. This was not a hack, and nobody guessed our password. Our bad." says Jason Goldman, Blogger Product Manager.

March 27, 2006

TestDisk can help you recover lost partitions and make non-booting disks bootable again. It works with the following partitions: FAT12, FAT16, FAT32, Linux (EXT2/EXT3/JFS/RFS/XFS), Linux swap (version 1 and 2), NTFS (Windows), BeFS (BeOS), UFS (BSD), and Netware.

The famous Microsoft tagline from 1996 is a very big question for many big websites. What do you visit on Google, Yahoo and MSN? The most visited page for Google is, of course, Google homepage. But things aren't the same for Yahoo. Most users visit Yahoo just to check their mail. That's why Yahoo disabled POP3 access on Yahoo Mail. MSN's situations looks even more dramatic: 3 from 4 visitors go to Hotmail.

WebSideStory informs that Google holds 75 percent marketing share in the U.K. in February 2006. Google’s search referral percentage in the U.K. exceeds both the U.S. average for the month (55.39 percent), as well as the global average (62.4 percent).

March 26, 2006

Gmail Skins is a Firefox extension that lets you customize Gmail interface.

Features:

* Change the colour scheme of your inbox* Insert smileys/emoticons and images in to your emails* Make the navigation (Inbox, Starred, Sent Mail) horizontal* Zebra stripes on the inbox (alternating colors for rows)* Change the attachment paperclip to an icon indicating the file type of attachment* Hide invite panel, page footer* Hide your email address at the top - for privacy issues

The skins don't look very pretty, but they disrupt a little the monotonous Gmail look.

Approximately 30% of the pages on the web are (near) duplicates. Google has a patent for some improved duplicate and near duplicate detection techniques.

"From the perspective of users, duplicate and near-duplicate documents raise problems. More specifically, when users submit a query to a search engine, most do not want links to (and descriptions of) Web pages which have largely redundant information. For example, search engines typically respond to search queries by providing groups of ten results. If pages with duplicate content were returned, many of the results in one group may include the same content. Thus, there is a need for a technique to avoid providing search results associated with (e.g., having links to) Web pages having duplicate content."

One idea might be indexing the keywords in the documents and comparing the percentage of terms shared by the two documents, but that highly inefficient.

Or you can try to compute the edit distance (Damerau-Levenshtein distance) between the two documents. The edit distance between two input strings is the minimum cost of a sequence of edit operations (substitution of a symbol in another incorrect symbol, insertion of an extraneous symbol, deletion of a symbol, and transpositions ) needed to change one input string into the other string.

A much better method for detecting duplicate and near-duplicate documents involve generating "fingerprints" (hashes) for elements (paragraphs, sentences, words, shingles) of documents. Two documents would be considered to be near-duplicates if they share more than a predetermined number of fingerprints.

A k-shingle is a sequence of k consecutive words from a documents. If S(A) is the set of shingles contained by A, we can compute the resemblance of A and B like this: |S(A)VS(B)| divided by |S(A)US(B)|. The problem is that the intersection is hard to compute, so it has to be estimated.

Western corporations provide much of the equipment and services for China's Internet system. Major players include Cisco Systems, Nortel Networks, Sun Microsystems, 3COM, Google, Yahoo!, Microsoft, IBM and others.

Cisco Systems has been integral to China's Internet development. Its router equipment, which reportedly provides no anonymity or encryption and was specifically designed for China, is in the core of the nation's surveillance of the Internet.

WHERE(is censorship exercised)

Called wangba, or Net bars, cybercafés are required to keep detailed logs of customers' online activity on file for 60 days. If a user tries to access forbidden Web sites, a café must disconnect the user and file a report with state agencies. Penalties for violations include fines and even imprisonment.

People use proxy relays to get around Internet filtering and monitoring.

Tunneling allows a user in a censored location to access information through a tunnel to a computer in an unfiltered location. All requests run through an encrypted tunnel to a non-filtered computer, which forwards requests and responses transparently. Both private and commercial tunneling services are available.

Google announced a new "MentalPlex" search technology that supposedly read the user's mind to determine what the user wanted to search for, thus eliminating the step of actually typing in the search query. Of its origins, Google comments that "As with the Internet itself, MentalPlex began as a highly classified Dept. of Defense initiative under the direction of Al Gore." In the FAQ, Google co-founder Larry Page said that "typing in queries is so 1999."

In 2003, Google decided to ignore April Fool's Day. But in 2002, Google explained that "the heart of Google's search technology is PigeonRank™, a system for ranking web pages developed by Google founders Larry Page and Sergey Brin at Stanford University."

"PigeonRank's success relies primarily on the superior trainability of the domestic pigeon (Columba livia) and its unique capacity to recognize objects regardless of spatial orientation. The common gray pigeon can easily distinguish among items displaying only the minutest differences, an ability that enables it to select relevant web sites from among thousands of similar pages.

By collecting flocks of pigeons in dense clusters, Google is able to process search queries at speeds superior to traditional search engines, which typically rely on birds of prey, brooding hens or slow-moving waterfowl to do their relevance rankings."

March 25, 2006

"I have accepted a job offer and will go back to a full time job in a few months. To avoid speculation and rumours: I am going to work for Google in Zurich. Fortunately I can spend part of my time on Vim." says Bram Moolenaar, author of Vim, the most famous text editor for Unix (you can also download the Windows version).

In 1998, when Google was getting started, Scott Rosenberg speaked about Google as a better search engine:

"Google ... is important -- as a sign, amid the profusion of look-alike portals, that there's still plenty of room for improvement in the basic technologies we use on the Web every day. If the portals themselves don't generate innovation, smart people elsewhere will. Commerce is a big driving force in how the Web evolves, but creativity is another. Just as imaginative marketers will keep finding ways to sell us more stuff, inventive programmers will keep finding ways to reduce noise and confusion online and help us all find what we're looking for. ... The irony here is that the big portal sites are the ones, increasingly, making it harder to use the Web: They're under such pressure to turn a profit to justify their market valuations that their pages have become crowded, blinking arrays of commercial distractions. Meanwhile, they're failing to drive forward the technology at the root of their business.

That a couple of grad students could build a better search engine than a whole raft of media and technology companies with stock-market valuations in the billions does not speak well of how these firms are spending their budgets. ... Which is one more reason to distrust the conventional view that the portals have the future of the Web sewn up. There's something ultimately dumb about these all-things-to-all-people sites in a medium whose greatest strength is the ability to be specific things to specific people. If the portals can't even build a better search engine, I am not betting on their ability to control an industry as fast-moving, innovative and metamorphic as the Internet -- next year or any year."

Google is not a portal, it's a homepage for the web, a door for information. It has a personalized home feature, but that's not a collection of links that promote other services or articles like: "Should I Forgive Her For Cheating?" (see the screenshot from msn.com).

If they are not already a portal, how will we know if it actually does become one? When the screenshot that illustrates this post will be from google.com.

If you search for [python] on Ask.com and Google, you will see Google has the first 10 results about the programming language called Python, while Ask has a one-box results that features a picture of a python (snake) and a definition and also alternates results about the two meanings of the word.

Of course, most of the pages that contain the word "python" will be about the programming language (you can check that, searching for [python snake] that gives 2,310,000 vs 234,000,000 results for [python]). That's why Google will consider these pages the most relevant. In fact, none of the first 100 results for [python] in Google is about the snake (few of them are about Monty Python).

Ask.com uses ExpertRank, finds clusters for the query you entered and returns the most authorative sites for each cluster.

After retiring Jeeves, Ask also retired Teoma, a search engine started in 2000 by Apostolos Gerasoulis and later acquired by Ask Jeeves.

The algorithm behind Teoma was rebranded ExpertRank: "Ask's ExpertRank algorithm provides relevant search results by identifying the most authoritative sites on the Web. With Ask search technology, it's not just about who's biggest: it's about who's best. Our ExpertRank algorithm goes beyond mere link popularity (which ranks pages based on the sheer volume of links pointing to a particular page) to determine popularity among pages considered to be experts on the topic of your search. This is known as subject-specific popularity. Identifying topics (also known as "clusters"), the experts on those topics, and the popularity of millions of pages amongst those experts -- at the exact moment your search query is conducted -- requires many additional calculations that other search engines do not perform. The result is world-class relevance that often offers a unique editorial flavor compared to other search engines."

ExpertRank is an evolution of IBM's CLEVER project, a search engine that never made it to public. "Clever attempts to ensure that the information it retrieves is useful by pointing people toward either of two classes of sites: authorities and hubs. An authority is a site to which many other sites have links, which Dom sees as implied endorsements of the site's usefulness. A hub is a site that has links to many other sites, and is therefore a potentially good reference. Clever's job is to identify the best hubs (those that link to the best authorities) and the best authorities (those that are linked to by the best hubs)."

The difference between PageRank and ExpertRank is that for ExpertRank the quality of the page is important and that quality is not absolute, but it's relative to a subject.

"Clever starts with 200 pages that are the result of an ordinary keyword search. It then adds all pages that link to, or are linked to by, one of those 200 pages. This step typically swells the set of pages to 1,000 or more. Clever initially assigns each page a hub score of one and an authority score of one. It sums up all the authority scores to get a page's hub score, and sums up all the hub scores to get a page's authority score. Then it repeat the process some five times until the system has identified the hubs that link to the top-scoring authorities and the authorities that are linked to by the top-scoring hubs."

McAfee's senior vice president for Risk Management George Kurtz demonstrated at a RSA conference in February that there are many interesting things to be found in the Google database.

If you search for sites with "Remote desktop web connection" in the title, you'll find... remote desktops that you can take over: [intitle:"Remote Desktop Web Connection"]

During a series of demonstrations, Kurtz showed how fairly straightforward queries will bring up user names and passwords as well sensitive information such as social security numbers. Just search for [ssn 111111111..999999999 death records].

If you type inurl:robots.txt in Google, you might be able to see the contents of that file and subdirectories that weren't meant to be public. For example, you can find Google MBD.

March 24, 2006

"Google is interviewing candidates for engineering positions at our lunar hosting and research center, opening late in the spring of 2007. This unique opportunity is available only to highly-qualified individuals who are willing to relocate for an extended period of time, are in top physical condition and are capable of surviving with limited access to such modern conveniences as soy low-fat lattes, The Sopranos and a steady supply of oxygen.

The Google Copernicus Hosting Environment and Experiment in Search Engineering (G.C.H.E.E.S.E.) is a fully integrated research, development and technology facility at which Google will be conducting experiments in entropized information filtering, high-density high-delivery hosting (HiDeHiDeHo) and de-oxygenated cubicle dwelling. This center will provide a unique platform from which Google will leapfrog current terrestrial-based technologies and bring information access to new heights of utility."

If you emailed Google about the supposed jobs, you would've got an auto-reply:

"Thank you for contacting Google about our Copernicus Research Center.

We've received an overwhelming response to this opportunity and are not currently accepting additional resumes. We will, however, keep your information on file should we have an opening in the future. At the current staffing levels, we anticipate that we may need additional applicants on or around April Fool's Day in 2104. Until then, we appreciate your interest in Google and your taking the time to write us.

"At Google our mission is to organize the world's information and make it useful and accessible to our users. But any piece of information's usefulness derives, to a depressing degree, from the cognitive ability of the user who's using it. That's why we're pleased to announce Google Gulp (BETA)™ with Auto-Drink™ (LIMITED RELEASE), a line of "smart drinks" designed to maximize your surfing efficiency by making you more intelligent, and less thirsty.

Think a DNA scanner embedded in the lip of your bottle reading all 3 gigabytes of your base pair genetic data in a fraction of a second, fine-tuning your individual hormonal cocktail in real time using our patented Auto-Drink™ technology, and slamming a truckload of electrolytic neurotransmitter smart-drug stimulants past the blood-brain barrier to achieve maximum optimization of your soon-to-be-grateful cerebral cortex. Plus, it's low in carbs! And with flavors ranging from Beta Carroty to Glutamate Grape, you'll never run out of ways to quench your thirst for knowledge. "

Up to 60% of the code in the new consumer version of Microsoft Vista operating system is set to be rewritten, announces Smarthouse. In an effort to meet a dealine of the 2007 CES show in Las Vegas Microsoft has pulled programmers from the highly succesful Xbox team to help resolve many problems associated with entertainment and media centre functionality inside the OS.

On a related note, Microsoft confirmed that it is also pushing the mainstream launch of Office 2007 to next year. The reason is that Microsoft wants to launch Office and Vista in tandem.

So the word of the week for Microsoft was DELAY. But what if the problems are so difficult to solve that they need another year? They're working for Windows Longorn since 2003, that's 3 years ago, and 60% of the code they wrote needs to be rewritten. I think this needs almost 2 years, so the best thing to do is set a new deadline: December 2007.

Remember I talked about a very nice system rescue live CD? There are many Linux distributions available as Live CDs: Knoppix, Kanotix, Adios, PCLinuxOS, MandrakeMove, Gnoppix, RiP, SystemRescueCD, Ultimate Boot CD and others. What if you want to create a bootable DVD that contains more Live CDs and lets you choose between them? You can do that using a script developed by Nautica.net, where you can find a list of Live CD distributions.

Google Books (former Google Print) allows you to see books in the Full Book View if the book is out of copyright. This way you can view any page from the book. Now the home page of Google books includes an option to search only for full view books.

"[0046] In stage 610, the first entity, in turn, credits the WAP provider with a portion of the advertisement revenue. The portion of the revenue may include a flat rate, a percentage of the advertisement revenue, or a combination thereof. In one embodiment, the first entity identifies the WAP to be credited via the IP address.

[0047] As a result of receiving a portion of the advertisement revenue, the WAP provider is may cover the expenses of providing the WAP and may recoup a profit, while providing end-users with access to the WAP at a reduced rate.

[0048] In alternative embodiments, data other than advertisements could be inserted by the first entity into the view presented to the end-user accessing a WAP. For example, the data could in the form of a message, or a static advertisement that does not include a hyperlink.

[0049] Furthermore, the processes and architecture described above may be used to provide wireless access at a reduced rate for multiple WAPs, including multiple disparate WAPs."

It will be interesting to see if the advertising-based WiFi will be a viable solution. Google might combine this with Web Accelerator and distribute the content via a proxy.

March 23, 2006

SystemRescueCd is a Linux system on a bootable CD/DVD for repairing your system and your data after a crash. It also aims to provide an easy way to carry out admin tasks on your computer, such as creating and editing the partitions of the hard disk. It contains a lot of system utilities (parted, partimage, fstools) and basic ones (editors, midnight commander, network tools). The kernel of the system supports most important file systems (ext2/ext3, reiserfs, reiser4, xfs, jfs, vfat, ntfs, iso9660), and network ones (samba and nfs).

"This privacy flaw has caused my fiancé and I to break-up after having dated for 5 years."

Bugzilla Bug 330884:

Summary: When different users on one system choose to save or not save passwords for sites, any other user can see sites they not only saved passwords for but can also see what other users have been saving/never saving passwords for.

4. Attempt to log-in to the site so that Firefox will ask whether or not you want your password saved.

5. Choose not to save the password.

6. After successfully logging in and having selected the "never save password" option, logout.

7. Log-in as Mary and open Firefox.

8. Browse, browse, browse ... but you don't really have to. Just go to "View Saved Passwords," click on the tab that will show you sites to never save passwords for, and you'll see whatever painful site Joe denied to save a password for.

Google.com had a steady increase in traffic in the latest months. So it managed to overtake the previous number two msn.com. And that's a big thing if we think that both yahoo.com and msn.com are portals, while google.com is just a search box.

March 22, 2006

Sequencing the human genome was far from the last step in explaining human genetics. Researchers still need to figure out which of the 20,000-plus human genes are active in any one cell at a given moment. Chemical modifications can interfere with the machinery of protein manufacture, shutting genes down directly or making chromosomes hard to unwind. Such chemical interactions constitute a second order of genetics known as epigenetics.

In 1998, Alexander Olek founded Berlin-based Epigenomics to create a rapid and sensitive test for gene methylation, a common DNA modification linked to cancer. The company's forthcoming tests will determine not only whether a patient has a certain cancer but also, in some cases, the severity of the cancer and the likelihood that it will respond to a particular treatment.

Philip Avner, an epigenetics pioneer at the Pasteur Institute in Paris, says that Epigenomics' test is a powerful tool for accurately diagnosing and understanding cancers at their earliest stages. "If we can't prevent cancer, at least we can treat it better," says Avner.

Yahoo launched a VoIP service in the United States that lets people make phone calls through the company's instant messaging software.

Available in several other countries since December, the service allows users to make calls from their computers for 2 cents a minute or less to the top 30 national phone markets, including the United States.

Here are the new features of the VoIP service:

Phone Out: Calls within the U.S. and to more than 30 other countries can be made for two U.S. cents a minute or less.

Phone In: For $2.99 a month or $29.90 a year, people can select a personal phone number, and receive incoming calls free. In the beta service, country-based phone numbers are initially available in France, the United Kingdom, and the United States with additional country-based numbers available in the coming months.

Free Voicemail. Additionally, Yahoo! Mail now includes useful links to Yahoo! Messenger with Voice, enabling people to easily check their voicemail directly from Yahoo! Mail.

Of course, this service competes directly with Skype that offers similar features for slightly higher prices. Yahoo Messenger with Voice rates average between 20 percent and 30 percent lower than Skype to many major markets outside the United States, according to a comparison furnished by Yahoo. Yahoo has struck phone partnerships with headset maker Plantronics, VTech, a maker of USB handsets, and Siemens AG, a big maker of cordless phones.

As mentioned here, Google will change the layout of their SERPs (Search engine results pages). They've experimented with many designs and it seems they chose the most simple one, the layout that uses more space for the results.

There is a similar screenshot on Flickr, where the ads are put at the bottom of the page.

"We face formidable competition in every aspect of our business, and particularly from other companies that seek to connect people with information on the web and provide them with relevant advertising. Currently, we consider our primary competitors to be Microsoft Corporation and Yahoo! Inc. Microsoft has announced plans to develop features that make web search a more integrated part of its Windows operating system or other desktop software products. We expect that Microsoft will increasingly use its financial and engineering resources to compete with us. Both Microsoft and Yahoo have more employees than we do (in Microsoft’s case, approximately 11 times as many). Microsoft also has significantly more cash resources than we do. Both of these companies also have longer operating histories and more established relationships with customers and end users. They can use their experience and resources against us in a variety of competitive ways, including by making acquisitions, investing more aggressively in research and development and competing more aggressively for advertisers and web sites. Microsoft and Yahoo also may have a greater ability to attract and retain users than we do because they operate Internet portals with a broad range of content products and services. If Microsoft or Yahoo are successful in providing similar or better web search results compared to ours or leverage their platforms or products to make their web search services easier to access than ours, we could experience a significant decline in user traffic. Any such decline in traffic could negatively affect our revenues."

The revenue growth will decline

"We expect that our revenue growth rate will decline over time and anticipate that there will be downward pressure on our operating margin. We believe our revenue growth rate will generally decline as a result of increasing competition and the inevitable decline in growth rates as our revenues increase to higher levels. We believe our operating margin will experience downward pressure as a result of increasing competition and increased expenditures for many aspects of our business."

Ad-blocking may kill Google

"Technologies may be developed that can block the display of our ads. Most of our revenues are derived from fees paid to us by advertisers in connection with the display of ads on web pages. As a result, ad-blocking technology could, in the future, adversely affect our operating results."

March 21, 2006

Microsoft said on Tuesday it plans to delay the launch the consumer version of Windows Vista until after this year's holiday shopping season.

Microsoft pushed back the consumer version of Vista until January 2007 from an earlier target for the second half of 2006 and pledged to ship the next version of its operating system to business customers in November.

"It is a critical eight- to 10-weeks for retailing and for the producers. The retailers and PC hardware manufacturers work on razor-thin margins, so the impact there could be pretty severe," said David Smith, analyst at Gartner.

The explanation for the delay is that Microsoft wants to improve overall quality, particularly in security, and that PC makers didn't want the operating system introduced in the middle of holiday sales, because a new version would create instability in the market.

It's not the first time Microsoft delays the launch of Vista (previously codenamed Longhorn), 2005 was another deadline.

"Adam Bosworth is a Vice President of Engineering at Google Inc. He was previously VP Engineering at BEA Systems and was responsible for the engineering efforts for BEA's Framework Division. Prior to joining BEA, Bosworth co-founded Crossgain, a software development firm acquired by BEA in 2001. Crossgain's "Cajun" project developed into BEA's WebLogic Workshop product. At BEA, Bosworth also developed the Alchemy intelligent caching framework in a team consisting of Bosworth and his son, Alex.

Known as one of the pioneers of XML, Bosworth previously held various senior management positions at Microsoft, including General Manager of the WebData group, a team focused on defining and driving XML strategy. While at Microsoft, he was responsible for designing and delivering the Microsoft Access PC database product (codenamed 'Cirrus') and assembling and driving the team that developed Internet Explorer 4.0's HTML engine (codenamed 'Trident')."

According to Garett Rogers, it seems that Adam Bosworth is working on a new Google project, known as Google Health. His title is "Architect, Google Health". Maybe Google Health is the same thing as Google MDB (Google Medical and Biological Database).

March 20, 2006

Google (NASDAQ:GOOG) launched Google Finance, a product that offers "information about North American stocks, mutual funds and public and private companies along with charts, news and fundamental financial data".

You can search for stocks, mutual funds, public and private companies, find news about companies and even blog posts, see related companies, company summary and management information.

You can create a portfolio, if you have a Google Account. Google Finance portfolios allow you to keep track of financial information, including how many shares you own and at what price, for up to 200 stocks or mutual funds.

But probably the best feature of Google Finance is the interactive charts, that correlate market data with corresponding dated news stories to help you determine if there is a relationship between them.

Of course, the product is far from perfect if you compare it with Yahoo! Finance: it doesn't have real-time quotes, statistics, SEC fillings, list of competitors, analyst estimates, list of major holders, income statements, option to compare stocks and many other features. So I think Google Finance can't be considered real competition for sites like fool.com or finance.yahoo.com. Yet...

Update: Google Blog says that Google Finance "started as a small project led by a few engineers in Bangalore and later joined by more engineers and finance enthusiasts in Mountain View and New York".

Google Video Player has been updated to version 1.1. The improvements are: support for DirectX, true frame-by-frame display and some rendering optimizations.

There is also a funny bug in the installer. If you rename GoogleVideoPlayerSetup.exe to GoogleVideoPlayer.exe, and run the setup, it will give you an error: "Google Video Player is running. Please close it to continue." even if the player isn't started. That only means one thing: the setup checks if there's a file called GoogleVideoPlayer.exe running (that's the name of the Google Video Player main exe). It checks for the file, and the file found is exactly the installer. That's a really dumb way to check if the player is running. I'm sure Googlers have heard about mutexes (even InnoSetup supports that). You can even rename ANY executable to GoogleVideoPlayer.exe and try to run the setup. The same error message.

That reminds me of another error message from Google Desktop. I had a version downloaded from 11 March, I downloaded the latest version when Desktop got out of beta (on 15 March) and I couldn't install it: "A newer version of Google Desktop is already installed." It's also interesting to note that, although, Google Desktop reached version 3, the software presents itself as "Google Desktop 4.2006.306.1208-en".