A new kind of pub crawl

Aug 24, 2012 By Angela Herring

Engin Kirda, an associate professor of information assurance at Northeastern, developed new software for detecting and containing malicious web crawlers. Photo: Dreamstime.

Web­sites like Face­book, LinkedIn and other social-​​media net­works con­tain mas­sive amounts of valu­able public infor­ma­tion. Auto­mated web tools called web crawlers sift through these sites, pulling out infor­ma­tion on mil­lions of people in order to tailor search results and create tar­geted ads or other mar­ketable content.

But what hap­pens when "the bad guys" employ web crawlers? For Engin Kirda, Sy and Laurie Stern­berg Inter­dis­ci­pli­nary Asso­ciate Pro­fessor for Infor­ma­tion Assur­ance in the Col­lege of Com­puter and Infor­ma­tion Sci­ence and the Depart­ment of Elec­trical and Com­puter Engi­neering, they then become tools for spam­ming, phishing or tar­geted Internet attacks.

"You want to pro­tect the infor­ma­tion," Kirda said. "You want people to be able to use it, but you don't want people to be able to auto­mat­i­cally down­load con­tent and abuse it."

Kirda and his col­leagues at the Uni­ver­sity of California–Santa Bar­bara have devel­oped a new soft­ware call Pub­Crawl to solve this problem. Pub­Crawl both detects and con­tains mali­cious web crawlers without lim­iting normal browsing capac­i­ties. The team joined forces with one of the major social-​​networking sites to test Pub­Crawl, which is now being used in the field to pro­tect users' information.

Kirda and his col­lab­o­ra­tors pre­sented a paper on their novel approach at the 21st USENIX Secu­rity Sym­po­sium in early August. The article will be pub­lished in the pro­ceed­ings of the con­fer­ence this fall.

In the cyber­se­cu­rity arms race, Kirda explained, mali­cious web crawlers have become increas­ingly sophis­ti­cated in response to stronger pro­tec­tion strate­gies. In par­tic­ular, they have become more coor­di­nated: Instead of uti­lizing a single com­puter or IP address to crawl the web for valu­able infor­ma­tion, efforts are dis­trib­uted across thou­sands of machines.

"That becomes a tougher problem to solve because it looks sim­ilar to benign user traffic," Kirda said. "It's not as straightforward."

Tra­di­tional pro­tec­tion mech­a­nisms, like a CAPTCHA, which oper­ates on an indi­vidual basis, are still useful, but their deploy­ment comes at a cost: Users may be annoyed if too many CAPTCHAs are shown. As an alter­na­tive, non­in­tru­sive approach, Pub­Crawl was specif­i­cally designed with dis­trib­uted crawling in mind. By iden­ti­fying IP addresses with sim­ilar behavior pat­terns, such as con­necting at sim­ilar inter­vals and fre­quen­cies, Pub­Crawl detects what it expects to be dis­trib­uted web-​​crawling activity.

Once a crawler is detected, the ques­tion is whether it is mali­cious or benign. "You don't want to block it com­pletely until you know for sure it is mali­cious," Kirda explained. "Instead, Pub­Crawl essen­tially keeps an eye on it."

Poten­tially mali­cious con­nec­tions can be rate-​​limited and a human oper­ator can take a closer look. If the oper­a­tors decide that the activity is mali­cious, IPs can also be blocked.

In order to eval­uate the approach, Kirda and his col­leagues used it to scan logs from a large-​​scale social net­work, which then pro­vided feed­back on its suc­cess. Then, the social net­work deployed it in real time, for a more robust eval­u­a­tion. Cur­rently, the social net­work is using the tool as a part of its pro­duc­tion system. Going for­ward, the team expects to iden­tify areas where the soft­ware could be evaded and make it even stronger.

(Medical Xpress) -- In the last two decades, the number of deaths from col­orectal cancer has steadily declined, according to the Amer­ican Cancer Society. While some of the decrease can be attrib­uted ...

Recommended for you

At the end of 2014, Facebook reported 1.39 billion monthly active users. In the meantime, 500 million tweets were sent each day on Twitter. Indeed, social networks have come to dominate aspects of our lives. ...

Two former federal agents are accused of using their positions and savvy computer skills to siphon more than $1 million in digital currency from the online black market known as Silk Road while they and their agencies operated ...

A report co-authored by a researcher from the University of Leicester has found that social media sites such as Twitter can be useful in keeping the peace and defusing tensions during times of social unrest.

Hillary Rodham Clinton emailed her staff on an iPad as well as a BlackBerry while secretary of state, despite her explanation she exclusively used a personal email address on a homebrew server so that she ...

User comments : 0

Please sign in to add a comment.
Registration is free, and takes less than a minute.
Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.