Why is this? Well – a web application scanner can’t test attack surface if it doesn’t know that the attack surface exists. How do most scanners determine an applications attack surface? They will typically:

Spider the application – The web scanner will act like Google does when it indexes pages on the Internet by analyzing a web page’s HTML, JavaScript and other assets to determine links to other pages that have not yet been visited as well as any GET or POST parameters that may be passed to those pages. Scanners will also look for the cookies an application sets so those can become part of the application surface to be tested as well. For applications that require a user to be logged-in in order to access functionality this spidering process must be coupled with login and session-management capabilities in the scanner.

Guess – Web scanners can also try to make guesses about additional URLs that might be exposed as well as parameters that can be passed in. These may be guesses at common URLs (like an admin/ directory) or permutations of previously identified objects (like looking for /index.php.bak if a site exposes the URL /index.php).

That’s all well and good, but what about pages and parameters that are missed by these approaches? Elements can be missed for a variety of reasons:

Weaknesses in the spider – No scanner is perfect so the spidering process could potentially miss exposed attack surface. URLs might be discoverable by analyzing JavaScript, Flash and other site elements, but not all spiders take these into account.

Pages with no inbound links – Complex applications may have all sorts of pages that are not exposed to the spidering process. One example of these are landing pages that are an entry point to the application with links back into other parts of the application, but that have no outbound links to them.

Invisible parameters – I once did some work on an application where every page would respond to a parameter of “d” being passed in with a request by attempting to delete the order with the value of the “d” parameter. (yikes!) This appeared to be some utility functionality that a site developer had placed in the application for debugging and convenience and, thankfully, you would never find any “d” parameters when crawling the application. But the application behavior was there nonetheless. This isn’t the only such time we’ve found application behaviors like this and when we do find them, the impact of exploiting those behaviors is pretty severe.

So seeding application scans with attack surface data gives us the opportunity to jump-start the spidering process and can help get us better scan coverage for applications that expose these sorts of hidden capabilities.

ThreadFix does a lightweight static analysis of the source code to create a database of mappings between attack surface points and the source code responsible for that attack surface (screenshot is of ThreadFix’s command-line Hybrid Analysis Mapping tool to illustrate the underlying analysis)

From OWASP ZAP, you configure the ThreadFix server and API key to be used to pull the attack surface data

Then you select the application whose attack surface you want to retrieve

And provide a relative base URL where the scanning will occur

OWASP ZAP then pulls the attack surface data from ThreadFix, consisting of URLs and GET/POST parameters that will be used to see ZAP’s spidering and scanning. Note that the ZAP scan now knows about the “admin.jsp” page and multiple “debug” parameters that would not have been found otherwise. This results in a more thorough scan and an identification of vulnerabilities that would have otherwise been missed

This attack surface calculation provides the scanner with an exhaustive list of all the URLs and parameters it will need to fuzz to get a thorough examination of the application. This isn’t foolproof – a scanner won’t be able to fuzz parts of the application it knows about but doesn’t know how to get to. One example of a situation like this is a multi-step process like an e-commerce checkout. However, this sets the stage for running an analysis of the URLs a scanner should have hit, but did not. Down the road we might look to automate some of this checking as well.

These examples show how this technique can be used with ThreadFix and the open source OWASP ZAP dynamic scanner. We also have a plugin for Portswigger BurpSuite and I’ll follow up with a Burp-specific blog post before too much longer.

About Dan Cornell

Dan Cornell has over fifteen years of experience architecting and developing web-based software systems. He leads Denim Group's security research team in investigating the application of secure coding and development techniques to improve web-based software development methodologies.
Dan was the founding coordinator and chairman for the Java Users Group of San Antonio (JUGSA) and currently serves as the OWASP San Antonio chapter leader. Dan has speaks at such international conferences as RSA, ROOTs in Norway and OWASP AppSec EU.

By Daniel March 12, 2014 - 3:57 pm

By Daniel Miessler March 12, 2014 - 4:09 pm

By dancornell March 13, 2014 - 2:43 am

Daniel: Images updated – should be a bit more readable now. Thanks for the feedback.

Daniel Miessler: Cool stuff. Right now we have language/framework-specific support for Java/JSP and Java/Spring. We have a little bit of Python/Django (internally, not yet released) and we’re looking to add support for ASP.NET (C#), PHP and some others.

By Sam March 17, 2014 - 9:18 pm

That would be a good start, but assumes that the traffic to the website has covered all of the landing pages, hidden pages, etc as well as all the available parameters. I think that would be effective in finding landing pages that might not be seen by a crawl, but wouldn’t be as effective at finding debug/admin parameters that aren’t in common use. Certainly a better starting point than a raw, uninformed crawl.