If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Advanced Web Based Honeypot Techniques

The GHH project develops web based honeypots designed to lure "Google Hackers" using malicious search engine tactics, along with tools and documentation to allow others to develop customized honeypots, decreasing the exposure of vulnerable applications in the Google index.

This tutorial will expand upon extension spoofing and transparent linking, and how to apply it in the creation of customized web based honeypots. The v1.1 honeypots and documentation released by GHH will be used as a reference for this tutorial.

Spoofed file extensions

While browsing through the Google Hacking Database (GHDB), you should notice that not all of the signatures target server side scripts (.php for example). This hack, for example:

inurl:passwd.txt

That hack searches for the file extension .txt. The contents of these files are usually interesting, and their exposure could introduce vulnerability on the server they are hosted on. There is usually more of a risk being introduced to the enviroment than a typical web application vulnerability in cases like these.

Or perhaps these:

inurl:admin.mdb
inurl:customer.mdb
inurl:users.mdb

Depending on their contents, a database file such as this could cause extreme losses. In order to emulate filetypes like these, GHH depends on apache htaccess files to spoof its file extension. We can then take advantage of server side scripting to log and handle the attack any way we want, and if we're using GHH as an engine, this means log remotely and apply signatures to the attack.

So following the previous tutorial on GHH v1.0 (Should still be compatible) we can leverage htaccess and Apache to allow our honeypot to spoof another file extension. By placing a htaccess file in the same directory as our honeypot with the following lines:

Apache & PHP will interpret the .xyz file as a PHP script . The only problem is that browsers won't behave normally when viewing some extensions (.mdb, .txt for example) To handle this, we can place the following PHP code at the beginning of our honeypot:

This will tell the browser to handle a file as a certain type of content. The previous code would be acceptable for a .sql, .txt, .log, .dat file or something similar. When the content reaches the attacker, the browser will behave like it should (we already have them captured, but it's best not to tip them off anyhow). If you had a database file, you'd want it to open in access for example. This would require 'Content-Type: application/msaccess' to be sent to the browser.

Transparent linking is the process of advertising your honeypot to search engines, but not the casual users of your website. There are a few ways to do this, some better than others. The better your transparent link, the less false positives you'll have in your logs. The goal is to have visitors to your honeypot that are referred from a search engine, and not from the site it's hosted on. This forces them to find the honeypot through the engine, and by that vector you can retrieve the search query they used against your site (intention and motive!)

Direct link
Simply making an obvious hyperlink with some text in your top level website:

PHP Code:

&lt;a href="http://yourwebsite.com/honeypot.php"&gt;blah&lt;/a&gt;

Obvious problems include users clicking on the link, and filling your logs with false positive. Don't use this type of link.

Camo Link
The following CSS style will make the link the same color as your background. You should change black to match your background.

This has it's problems as well. It's cumbersome, because you might not know what the background will be behind the link. This makes a literally transparent link desireable, however I haven't found any options other than CSS Alpha() function, which doesn't seem to work well with text.

Disappearing link
The following CSS will prevent the link from being shown to the user at all, as long as their browser renders CSS.

The link is now completely nonexistent, except in the source. The thought was that being completely invisible would be the best option, however the GHH project learned the hard way that display:none is completely ignored by Google because it can be abused. Against what seems to be the popular belief, Google does not index links with a CSS style of display:none (such a smart spider!) It will however, be indexed by less powerful crawlers.

Shy Link
In order to leverage a disappearing link, you'll need to plug in some PHP to detect when the Googlebot comes around (You have to cater to Googlebot :))

This is also a pain, but it does the job. Other spiders aren't as smart as Googlebot, and freely crawl links with the display:none style, so this technique will compeletly cover the link from casual browsers and still let it be discovered by Google.

Map Link
The use of image maps can be a quick way to link multiple honeypots. Create nearly untouchable links in an image.

Buddy Link
Buddy linking is as simple as having other domains link to your honeypot. When they are crawled, spiders will hopefully follow up to your site. Casual users of your site are not likely to cause false positives, however users of your buddies site may cause them, making it a good idea to stick to the tactics described here.

"Tattletale" Link
TELL the search engine where you are, and forget about linking. Most engines have a suggest feature, Google has sitemaps. If you don't feel like using the python tool or writing XML, there's the option to submit a textfile with URL's separated by CRLF's. Check it out here:http://www.google.com/webmasters/sit...id=us-et-about

GHH Theory

The nature of GHH is to be known but not seen. This is why working with GHH is challenging. The concept of Google Hacking and Honeypots are simple, however the design of the web and the design of a honeypot in tandem present the challenge of "hiding in plain sight" on the web. GHH is developed under that concept, which is useful in the creation of new tools related to the relevant attacks.

Benefits of GHH include very early warning of a potential attack, by catching an attacker in their reconnaisance phase and learning their possible motives. GHH also improves other vulnerable targets chances of survival on the web. By saturating a search engine index with specific false positives, it makes what was once an foolproof vector a more unreliable source of victims. So in short, it also benefits others.

------------

I attached the installation flowchart that was just released in v1.1 of the GHH package since it's kind of handy to have a visual nearby. Comments encouraged and appreciated as always.

Different file extensions for files like .mdb are not going to help at all... it is more likely that an experienced user would select the link for a "save as" rather than attempting to actually open the file. You're better off using a real .mdb file, else the attacker will know something is up before they've even begun.

Also it seems simpler to just make an alernate page for googlebot with the link in plain sight and skip all the CSS crap. This should reduce your false positives to zero (by adding a refresh to deal with google cache hits).

Originally posted here by catch Different file extensions for files like .mdb are not going to help at all... it is more likely that an experienced user would select the link for a "save as" rather than attempting to actually open the file. You're better off using a real .mdb file, else the attacker will know something is up before they've even begun.

Also it seems simpler to just make an alernate page for googlebot with the link in plain sight and skip all the CSS crap. This should reduce your false positives to zero (by adding a refresh to deal with google cache hits).

Neat ideas otherwise.

cheers,

catch

They select the link for a save as... and get caught. "Save As" is no different than opening the file. It behaves the same way. It's a .mdb server side script:

1. GET request is made
2. .mdb file is generated server side
3. Attacker details are calculated
4. Log is made
5. Generated .mdb file is returned.

That's how the GHH .mdb honeypot is designed. You were caught as soon as you made the get request.

If you were to make an alternate page for googlebot with the link in plain sight, then users of your site may click the link, firing off the honeypot and creating a bad log entry. The CSS is there to hide the link from the user to prevent that from happening, but still allow the crawler to reach it. There are better and worse ways to handle this problem, as I stated.

Get caught with "save as"? At that point they've done nothing wrong. The point of honeypots is to bait attackers in order to teelgraph attacks or to pick up zero day techniques, or because you've got nothing better to do.

The point of allowing them to open the file would be to include information that baits the attacker further into the system (this would be a fun tutorial to write). "Save as" attempting to pull a fake mdb file is no different than pulling a real mdb file, except that the attacker won't be tipped off that something is awry with your system.

If you were to make an alternate page for googlebot with the link in plain sight, then users of your site may click the link, firing off the honeypot and creating a bad log entry.

How may they click it? It wouldn't even exist unless they happened to be a googlebot. having the script output the link on a variable basis is better than oututting CSS on a variable basis. You are likely to have more users without CSS than users browsing as googlebots.

cheers,

catch

PS. I used to have a honeypotted cmd.exe file on my web server... oh the logs I ended up with.

Originally posted here by catch Get caught with "save as"? At that point they've done nothing wrong.

Let's pretend my website sells knick-knacks. My GHH logs say I have someone hitting my honeypots. The referral header says they've been searching for vulnerable software (which is being emulated) all day. That specific visitor? They didn't want my knick-knacks! Bingo, you have motive, and a reason to seach that IP in your IDS. That's how GHH is useful.

The point of honeypots is to bait attackers in order to teelgraph attacks or to pick up zero day techniques, or because you've got nothing better to do.

Telegraph attacks? When someone searches for admin.mdb databases and other various vulnerabilities on your online floral business, you've telegraphed that they aren't looking for flowers!

Want a honeypot that will detect a zero day? Customize one to emulate a version software that doesn't have a known exploit.
Want a production honeypot to alert you to an attacker's presence (or in this case, motive)? Use one that has a known exploit, or just has interesting, garbage information that is completely irrelevant to your site (They won't get it unless they actively google for it).

I feel like pretending again. This time, you own a retail storefront on clark and division street. Wouldn't you like to know which visitors in your store have the intention to rob you? Same concept applies, except for the web. Did they find your store in the yellow pages? Or a safe manufacturer's purchasing department, looking for owners of a vulnerable model of safe?

How may they click it? It wouldn't even exist unless they happened to be a googlebot. having the script output the link on a variable basis is better than oututting CSS on a variable basis. You are likely to have more users without CSS than users browsing as googlebots.

Right off the bat, I think I misunderstood you, and I might be doing it again. So bear with me .

In order to allow maximum exposure to ALL spiders, I want the link echo'ed to EVERYONE. This places the honeypot in other search databases. Google happens to parse CSS, the other crawlers don't. I do not want to output the link on a variable basis, I want to output the link all the time, conforming to the visitor (googlebot, AND all other spiders.)

I don't understand your last sentence in where I quoted you. Hopefully I got the point regardless.

PS. I used to have a honeypotted cmd.exe file on my web server... oh the logs I ended up with.

Let's pretend my website sells knick-knacks. My GHH logs say I have someone hitting my honeypots. The referral header says they've been searching for vulnerable software (which is being emulated) all day. That specific visitor? They didn't want my knick-knacks! Bingo, you have motive, and a reason to seach that IP in your IDS. That's how GHH is useful.

Unfortunately there is no evidence of a crime here. By doing nothing more than following the google link they are not violating "appropriate use" as the link was provided by a third party where it was placed by the "victim." (even more so with google cached pages)

Telegraph attacks? When someone searches for admin.mdb databases and other various vulnerabilities on your online floral business, you've telegraphed that they aren't looking for flowers!

Not exactly what I meant, but fair enough.

Want a honeypot that will detect a zero day? Customize one to emulate a version software that doesn't have a known exploit.

Or just use the actual software, since this saves you the time of testing their attempts on a system with little more than the appearances of valuable assets and ensure secure auditing.

Want a production honeypot to alert you to an attacker's presence (or in this case, motive)?

Unfortunately this doesn't pin down a motive beyond exploring the capabilities of google. No matter how exotic the link is.

In order to allow maximum exposure to ALL spiders, I want the link echo'ed to EVERYONE.

Ah... I disagree with this approach is it only furthers the unpredictability of where the links may turn up, and in what form. Potentially even reducing the honepot's sensitivity below the internet noise floor. Google is so widely accepted and so favored for such "hacking" techniques for precisely the same reasons... it is predictable and usefully sensitive.

Unfortunately there is no evidence of a crime here. By doing nothing more than following the google link they are not violating "appropriate use" as the link was provided by a third party where it was placed by the "victim." (even more so with google cached pages)

Yup, a GHH log sure as hell wouldn't hold up in court. But that's not the point, it's meant to support other logs and act as a cross reference. It's not evidence of a crime, but neither is purchasing the anarchists cookbook. We can go on about this point but I hope you're seeing the scope. You won't catch them on GHH logs, but you'll get them on everything else. GHH is useful to point out "everything else", and would be support.

Or just use the actual software, since this saves you the time of testing their attempts on a system with little more than the appearances of valuable assets and ensure secure auditing.

And there's the difference between a low and high interaction honeypot.

Unfortunately this doesn't pin down a motive beyond exploring the capabilities of google. No matter how exotic the link is.

I disagree, I think it depends on the log and the story it tells.

Ah... I disagree with this approach is it only furthers the unpredictability of where the links may turn up, and in what form. Potentially even reducing the honepot's sensitivity below the internet noise floor. Google is so widely accepted and so favored for such "hacking" techniques for precisely the same reasons... it is predictable and usefully sensitive.

I see your point and can agree with it, but with the advent of Google's censorship, I'm inclined to let it be indexed into other databases.