CiteSeer's goal is to improve the dissemination and access of academic and scientific literature. As a non-profit service that can be freely used by anyone, it has been considered as part of the open access movement that is attempting to change academic and scientific publishing to allow greater access to scientific literature.

The name can be construed to have at least two explanations. As a pun, a 'sightseer' is a tourist who looks at the sights, so a 'cite seer' would be a researcher who looks at cited papers. Another is a 'seer' is a prophet and a 'cite seer' is a prophet of citations.

CiteSeer has not been comprehensively updated since roughly 2005 due to limitations in its architecture design. It's a representative sampling of research in computer and information science but is limited in its coverage since it only has access to papers that are freely available, usually at an authors homepage. A comparison of DBLP references versus those in CiteSeer
will always be found lacking since DBLP is manually implemented bibliography. As an example consider the references in DBLP for well known authors such as Alex Pentland (MIT) or Ramesh Jain (UCI) (DBLP listings for Alex Pentland - http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/p/Pentland:Alex.html or Ramesh Jain - http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/j/Jain:Ramesh.html). DBLP shows a regular number of publications (~9) each year in DBLP through 2007. While
CiteSeer has only one of their publications after 2000, DBLP has none of their actual publications but link to those publications on publisher websites.

A new version and design of CiteSeer can be found at the Next Generation CiteSeer, CiteSeerx, website. It's important to note that CiteSeer-like engines and archives usually only harvest documents from publicly available websites and do not crawl publisher websites. As such authors whose documents are freely available are more likely to be represented in the index.

Other Seer like search and repository systems have been built for chemistry, ChemXSeer and for archaeology, ArchSeer. Another has been built for robots.txt file search, BotSeer. All of these are built on the open source indexer Lucene.

Next Generation CiteSeer (CiteSeerx)

The Next Generation CiteSeer project, CiteSeerx, funded by the National Science Foundation and Microsoft Research, enhances CiteSeer both as a search engine and as a digital library. As an example, CiteSeer's notion of "contribution" to acknowledgments in addition to citations, which would make it the first automatically generated acknowledgment index. CiteSeerx is designed differently from CiteSeer with new algorithms for entity extraction and a modular, expandable, robust, scalable architecture based on open source tools such as Lucene and many Apache projects. As such, CiteSeerx will promote the creation of other Seer like systems.

The Next Generation CiteSeer, CiteSeerx, is now available in alpha
with over one million documents indexed and constantly growing.