Matt Cutts: Here's What You Should Read To Learn About Search Engines

"What resources (textbooks, online PDFs etc) would you recommend to people interested in learning more about LSI, search engine algorithms, etc?"

Cutts first suggests checking out the original PageRank papers. "So there's a whole bunch of different stuff about the anatomy of a large-scale hypertext search engine and then also a bunch of papers about PageRank," he says .

Cutts also recommends some textbooks. "One is Modern Information Retrieval," he says. "That's got a lot of good stuff about the scoring and the science and thinking about that. And then there's also one called Managing Gigabytes. I think Ian Witten wrote that one. And that one is just a little bit more about the logistics and being able to horse around that much data and thinking about some of the machine's issues and how does a large scale engine work."

"So those three together, and then of course, you can always do searches," says Cutts. "Google Research actually has a ton of different papers that we've published. So you might want to look into that a little bit as well. But basically PageRank, the early Google papers, can give you an idea of how to write a very simple search engine that can scale to 100 million documents or so, Managing Gigabytes, and Modern Information Retrieval, and that will give you a pretty good view of the sort of different parts of the space."