Chapter 33. How Agents Work

The Internet has grown so quickly and its resources are so vast that we need help navigating around it. Special software called agents can help us access the Net's resources.

Although there are a lot of technical definitions for agents, put simply, agents are programs that do your bidding automatically. Many of them run over the Internet or on individual computers every day. Agents can find the latest news for you and download it to your computer; they can automatically monitor Internet traffic and report on its total usage; they can find you the best deal on the CD you want to buy; they can perform important web maintenance tasks; and they can do far more. They are becoming so complex that systems are being developed to allow agents to interact with one another so they can perform jobs cooperatively.

On the Internet, agents are commonly called spiders, robots (often shortened to "bots"), and knowbots, among other terms. Those used for searching automatically create indexes of almost every resource on the Web and then allow people to search through those indexes to find things more quickly. Common search tools such as Google and AltaVista use spiders in this way. This specialized use of spiders is discussed in Chapter 27, "How Internet Searching Works."

All these agents are software programs that are invisible to the user. You just determine the task you want done, and behind the scenes the agent automatically goes off and performs that task. A variety of programming languages can be used to write agent programs.

Agents might well alter the way we all use the Internet in the future. Not only do they respond to our requests, but they also "learn" from our requests the types of tasks and information that interest us. They then go off on their own and perform those tasks and get that information, even before we make these additional requests. As we use these types of agents more, they'll become even smarter and more efficient.

Robots and agents can cause problems for some websites. For example, they can overload web servers by swamping them with too many requests in too short a time. That means users who try to get access to those web pages will be denied access, or access will be exceedingly slow.

Another problem has to do with the way websites make money. Many websites sell ads to support themselves and charge advertisers based on the number of pages that have been viewed. If many of those pages "viewed" are in fact never seen by people and are instead accessed only by a computer via a robot, both advertisers and the website suffer.

Several ways exist to solve these problems and limit robot access. One way includes creating a file called Robots.txt that describes the areas that are off limits to robots, which the robots would automatically read, adhere to, and not visit. Another is to use a technology that automatically detects whether a robot or a human has visited a page and forgo charging advertisers whenever robots visit.