Monday, 23 January 2012

When you're a company that is both a consultancy providing advice and code review services around software security and also an independent software developer; there is always the danger of not practising what you preach. The problem is compounded when resources aren't as abundant as in larger companies and time is ideally either spent doing paid consultancy or writing new features for products. As a result we think we're a good Petri dish, so we're happy to talk openly about our experience and incurred time costs of streamlining a Security Development Lifecycle during a recent development effort.

The story began when we set out to write our newest product (Recx GPS Image Forensics for Maltego). When starting a project, rigorously following a traditional SDL in the early stages of a project may consume too much resource to be economically viable. We decided to use an 'SDL diary' (new buzzword alert) in order to maintain security mindfulness and track how much development time security cost us. Utilising any shared resource (such as a wiki) allows input that can be easily be reviewed and recorded and can be used as the basis for future documentation.

Simply by keeping the SDL in mind within the early development/research cycle can guide a project in the right direction and ease the transition into a fully imposed SDL. What follows are notes from our SDL diary. Prior to product development starting we knew what we wanted to do and had settled on an open source component to build upon.

All new code should be written in a managed language to minimise the risk of certain classes of vulnerability.

All mitigations against successful exploitation of arbitrary code vulnerabilities in native code via OS and compiler/linker features should be leveraged where needed.

High risk native processes that work on externally supplied data should run with minimal privileges.

High risk native processes should be blocked from speaking directly to the network

Any single crash when processing an image should not halt processing of others.

Input extracted from external data should not be trusted when generating output.

You'll notice we didn't stipulate a requirement of identifying possible memory corruption vulnerabilities in native code. The reason? We knew from the start that we had to contend with 20MB of C/C++ source code from open source components (GPL people: we've included the source in the installation binary to be compliant with the license). The reason for not reviewing the code either manually or using automation? Our risk analysis said the likelihood of successful exploitation of any issues if we implemented the other requirements was low based on current understanding (the assumption), plus we would be reducing its privileges to the lowest possible and thus the risk was worth bearing versus the effort. We accept mitigations are not a replacement for secure coding practices but we felt they were sufficient given our deployment model.
If at this point you're questioning if someone is really going to target forensics software, we encourage you to go and read the paper from iSec Partners from 2007.

Design / Functional Requirements

For the first requirement meant obtaining code signing keys, which was easy enough to do. We decided to use C# to satisfy the second requirement.

To satisfy the third requirement we wanted our code to be 'Recx SDL Windows Binary Auditor' clean (a soon to be released product which is currently in the final throws of legal) and to doubly sure Microsoft BinScope clean. To meet this requirement it meant instead of using the typical GCC produced binaries available from the open source projects we would recompile them all with Visual Studio 2010 to leverage all the available defensive compiler/linker and OS features.

When we conceived the product, as we've previously mentioned, we decided to use an open source component for the image parsing which is a mass of C++ . We knew through our threat modelling exercise that this represented the biggest attack surface. We also recognized it was highly unlikely that we'd have the resources to run static or manual analysis. As a result we designed the solution so as to run this functionality in a self contained process that could run as Low Integrity under Windows. The idea being, if the process was successfully compromised, then the impact on the overall system would hopefully be minimized (if the OS does what it says on the tin) and satisfy our fourth requirement. For the fifth requirement we leverage the in-built firewall in Microsoft Windows Vista/7/Server 2008 to specifically block our image processor from speaking with the network by creating a rule during the installation process.

To satisfy the sixth requirement we decided to spawn a new native process for the parsing of each image. This would also have the benefit of mitigating the ability for an attacker be able to use exploitation techniques that rely on multiple images being parsed consecutively within the same process.

For the final requirement this meant stipulating a functional requirements that CDATA should be used for all image originating data that would be outputted in XML to Maltego.

Implementation

For the implementation phase we performed the following security specific tasks:

Now for a little digression (sort of). The standard C# API has, simply put, not kept up with Windows Vista / Windows 7 security features. So if you're using the C# process class there is no easy way to launch a process with a custom token and thus low integrity. Microsoft do provide an example of how to create a process to run as low integrity in .NET in KB2278183, where they use CreateProcessAsUser. However, this example doesn't support the redirection of stdin and stdout, and is frankly a tiny bit clunky compared to the standard Process class. While others have overcome a similar problem using CreateProcessAsUser in C# and redirecting stdin and stdout the solution was messy compared to the standard .NET classes. So we decided to not go with it as a solution. It's important to point out that Microsoft's private classes in System do come close to doing what we need, for example the private method:

So in our humble opinion it shouldn't be a huge leap to provide a supported public method that allows launching low integrity processes in .NET with all the niceness of the existing Process class. So Santa, if you're listening I think you know what we're asking for. Anyway, the solution we used in the end? We went with using icacls.exe during the installation process to set the image processing executable to run as low integrity.

Verification

For the verification phase we did a number of different things:

Ensured our XML output was using CDATA for image originating data so as to not provide a XML injection vector:

via manual inspection of code coupled with inspection of the XML output.

Using BinScope we had ~20 'failures' against the 'GSFunctionOptimizeCheck' with an error similar to:

Click for Larger Image

This is an interesting failure for several reasons:

The check isn't explained in the BinScope help.

Microsoft don't provide any indication on how to fix it.

It was present in large quantities in the debug in the build - as it disables optimization by default, so to be expected.

On the release build there was one instance in a function with only a single local variable, two input variables and not output variables. So we put this down to false positive.

People will likely ask, why we didn't do fuzzing? Again, given our security requirements and design decisions we felt it would be like manually reviewing code or using static code analysis. Yes some value, but given the other mitigations likely a lot of work for little extra value based on our current risk analysis.

Release / Response Planning

Even doing what we've discussed above we know we needed to prepare for the worse case. As a result we followed to the standard of establishing a secure @ e-mail address to allow third parties to report any security issues should they find any.

Sustainment / On Going Actions

We recognize there are some not so obvious security debt associated with using open source (and some obvious ones such as the need a code review of all that C/C++ code when we're a massive success to repay some of the security debt we've incurred). Anyway, as they say, what you don't pay for upfront you end up paying for in the end. In our case the non obvious security cost is the ongoing need to monitor three open source projects for new releases. We need to review these releases to check if they resolve obvious and not so obvious security issues. What does this mean in practice?

Monitoring the release pages for each project.

Reviewing change logs on new releases even if not accompanied with a security advisory to see if they resolve security defects which might not be obvious.

All of which adds to the on-going sustainment cost.

The Cost To-date

The total cost of following a streamlined SDL for our product was ~14% of our time prior to the release. This can be a substantial cost to a product that has not yet become profitable and we are currently testing new ways to balance this overhead without incurring too much 'security debt'.

Foot Notes
Something we would have liked to have done but is slated for a future release:

Distribute EMET and a configuration for our native process to provide mitigations to XP users

Friday, 20 January 2012

It's time for another notes.txt, this time from a long-weekend in May 2009 and shared with a few people at the time (about 80 on a mailing list) . If you have the time/inclination to take it any further please do...

So during our travels in May in 2009 we came across Extended Display Identification Data (EDID) [1][2]. What is EDID you might ask? Well EDID is the thing which tells the graphics card or operating system about the monitor you have just plugged in. From the wikipedia article [1]:

"The channel for transmitting the EDID from the display to the graphics card is usually the I2C bus. The combination of EDID and I2C is called the Display Data Channel version 2, or DDC2. The 2 distinguishes it from VESA's original DDC, which used a different serial format."

Now why is this interesting? Well, there are some varible(ish) length strings (see specs bit below):

Now the specification says (originally at the end of page 13, in a file called EEDIDguideV1.pdf which VESA have removed/hidden from their site since 2009 - although you can find copies down the back of the Internet's sofa, if you break out some Google foo.):

"Descriptors identified by tag numbers FFh, FEh and FCh contain ASCII strings. The length of the string in the descriptors is limited to 13 characters. Strings that are less than 13 characters must use the line feed character (0Ah) following the final string character and the space character (20h) for all remaining bytes of the descriptor. Use of 00h or 01h to fill remaining bytes is not compliant."

Now the fact it is hard limited means it is unlikely there will be a memory corruption vulnerability (unless the format string gods are kind to us). But lets look at a Linux implementation [4] anyway (the bit that copies the string):

We see this this is well and truly safe. The only other implementation [6] we found in our five minutes of searching back in 2009 didn't guarantee NULL termination, but no great shakes. It would have had more impact if the destination was a structure, as in the first, as we may have been able to effectively build one big string across the three structure fields (subject to the compiler structure padding of course).

So in our haste in reading the first set of specifications, we missed the extension flag on page 14:

"3.8 Extension Flag and Checksum: 2 bytes offset 7Eh-7Fh"

Extensions, we like extensions; and didn't we find a good one [8] (the other one we found wasn't interesting [9]):

Supports UTF8, 16 and 32 encoding, varible length etc. What could possibly go wrong? However sadly we searched high and low and could not find an open source or operating system tool which implemented this feature at the time or even code which parses the extension blocks to perform a sample source code audit upon.

This closes another (and rather dusty) notes.txt, we hope you enjoyed it.

We've done many application source code assessments looking for security issues. The outputs are essentially lists of risks or vulnerabilities identified in the product, described in detail along with relevant suggested mitigations, workarounds or example code. Clients typically exercise this sort of assessment on an infrequent basis, typically per release, six monthly or annually. This raised the question internally, of how best to perform security assessments of applications and how to integrate the results into the Software Development Life-Cycle.

Discussions that we've had with developers over the use of security testing have raised some interesting points. One developer we talked to, said they got great value from free trials of code auditing tools. But didn't feel that the value was sufficient in many cases to justify purchase. Essentially the tools were used to modify the development process, educating the developers and refining coding practises. If the tool reported an exposure, that exposure was investigated, the code fixed and that fix applied to not only all examples in the code (identified or not) but also to the approach taken to all new code. In our experience this is very much an exception to the norm.

The norm appears to be formal, semi-regular assessments of the code; typically late in the life-cycle, code is either:

Soon to enter beta.

Soon to ship to end customers.

Soon to enter production state.

We see lots of requests for security consulting at the sharp end of a development cycle. The product is ready for ship or production, and now needs a security rubber stamp. Does this approach work? Of course it doesn't (unless you have exceptionally security savvy developers). The reality is with more and more development moving away from waterfall models to agile models where iterative, daily or continuous integration approaches are used there needs to be a duplicate approach for security testing. With more agile development models you don't only do quality assurance testing at the end, and as security is a measure of quality it is implied you shouldn't really leave security testing to end either. As quality assurance testing (functional, new feature and regression) has adapted to the evolution so should security.

The reality is that with security testing done at or near the end of a development cycle there can be pressure on the assessment team to find less or downgrade findings so as not to derail the release schedule. Our pub fuelled conversations (at the not so sharp end) have raised the point that customers can, at times, want a 'everything is ok' assessment service, the 'tick in the box'; where they get security sign-off and things are good to go.

Diligent assessment of an application performed on an infrequent basis tends to have the following characteristics:

Highly detailed reporting.

Large volume of results with detailed analysis.

Can be overwhelming for the application developers, particularly if they're not security savvy.

Large time investment in fixing code and mitigating exposures.

New feature development suspended whilst fixes are developed, deployed and tested.

Often perceived as a restrictive process on the development practice.

The developer perspective of having their "homework" marked.

Can cause developers to reject security due to the impact an assessment has on progress.

The assumption is of course, that it's better to have visibility of security exposures within an application. A comprehensive audit can educate the developers; potentially to a point where the assessment creates an advanced process where routine checks against coding standards and criteria are augmented with tests for security faults. An approach that we like, is that of making wrong code look wrong blogged about by Joel Spolsky.

The list above (which is by no means exhaustive) raises some questions: is formal infrequent testing the best approach? and should security exposures in an application be treated any differently to any other software bug or fault? In our opinion no, as we've said previously in this post and others, security is just another measure of overall quality, just with different risks associated with the bugs (or vulnerabilities as they're also known). For more on these different risks refer to our earlier post 'The Business v Security Bugs'.

One of the things we've observed (which helped form the basis for this post) was the iterative use of our online ApexSec engine against the same set of Oracle Application Express source code. At regular intervals over a period of around a week, the same application was uploaded to us and analysed. Each time, fewer and fewer issues were identified to a point where (we assume) the residual risk was deemed acceptable. Although for the developers, this was the execution of an external process; the principle of frequent security analysis or inspection does appear to have distinct advantages:

Security issues are treated the same as other routine software bugs.

Smaller volume of findings at any one time.

Becomes part of the daily development routine.

Encourages secure development.

Educates developers to embrace security as just another facet of development.

Limited impact to onward development.

To us, frequent security assessments of an application, integrated into the development cycle makes a great deal of sense. Using software to inspect your source code as part of the daily/nightly build process, identifying vulnerabilities in the same way as any other software bug seems more than sensible. Of course there will also be a place for manual assessment by human beings to catch the issues that static analysis just can't today; such as the logic issues that are typically not present in traditional systems (ever try walking into a betting shop and entering a negative sum on your betting slip?).

In summary as Secure Development Life-cycles have taught us, security must exist throughout the development process. The quality gates that an SDL sets up are good as the final checks before progression however security must be a consistent work item in the same way features, overall testing and quality are. We'd go as far as saying security must be a theme and not just a work item. Pushing tooling out to development to allow them to identify on a daily basis new security issues is a powerful weapon in the software security war. When integrated into the secure development process, support staff who perform manual analysis throughout the development process can create a potent mix for finding, eradicating and reducing new security issues before they see wider release.

Tuesday, 17 January 2012

We're very happy to announce a new set of local Maltego transforms and supporting entity types. These transforms will be available to customers in the next week or so (contact maltego@recx.co.uk for purchasing and pricing information) and are currently going through the final throws of beta testing.

We recognized the problem that having a large collection of images in a forensic case can mean it's difficult to make sense of them. During a case investigation you may want to do a number of different investigative procedures on the images:

See images taken in the same location.

See images taken in the same location but with different devices.

See images taken in the same location but altered via software.

Search for images taken in a certain location across your acquired set.

As a result we developed a new set of local transforms and entity types for Maltego to address these needs. To start we introduced a new entity type called 'Filesystem Path', as the name implies it allows you to specify the local path upon which to run the transform.

The 'Filesystem Path' as has an optional field of 'Location' which allows you to specify a physical location for only the images you're interested in. This physical location might be for example:

Road or street name.

A town or city.

State or country.

Country.

At which point we can run our first transform 'Discover Images with EXIF Data'. This transform will search the provided path for all the images that contain EXIF data that have GPS properties. If you do a location search for each image the GPS data is extracted, resolved and compared to the location you provided. In return you'll receive a number of image entities back including thumbnail views (contact us if you wish to have this feature disabled due to the material you're working with).

In the detail view of the image entity there are a number of other features which can be seen in the screenshot below. These include a larger view of the image and a link to view the full sized version.

Click for Larger Version

Next we're able to run our next transform 'Extract EXIF Data'. This transform then goes through each selected image and extracts a number of properties as separate entities, as can be seen below:

Click for Larger Version

The transform is able to extract the following image properties:

Manufacture and device type.

GPS location.

Image time.

Original image time.

GPS image time.

We're now free to use Maltego's existing transforms on the GPS location to convert it into a place name or specific street. We've also populated the detailed view of the GPS object with a quick link to Google maps.

This gives us an end to end workflow looking something like:

Click for Larger Version

In the above example we worked with a single image to walk you through the process. If we now scale this up to work with a larger set. We can get something like this:

Anyway we hope you've enjoyed this brief introduction to our upcoming release. We think the Recx EXIF GPS Image Forensics Pack for Maltego combined with the new Casefile entities in the upcoming Maltego 3.1 make a powerful combination for photo forensic investigators. If you're interested in purchasing a copy or finding out more please contact us via e-mail (maltego@recx.co.uk), use our on-line form or give us a call.

Saturday, 14 January 2012

For every vulnerability, blog post, whitepaper or conference presentation there are numerous dead ends, blind alleys and wasted hours that don’t bare fruit. However researchers rarely talk about these expeditions even though the workings and thought processes might be interesting. Instead the findings are consigned to a notes.txt in a folder somewhere to be forgotten about. However, when these failures do get discussed, they can be insightful; a good example is the now infamous Ben Nagy e-mail ‘How to FAIL at Fuzzing’ from December 2010.

While we don’t promise to be as amusing or insightful as Ben, this is our story from Friday 13th January, 2012. In the middle of the afternoon I tweeted:

This tweet prompted a private response, asking for more information. So join us on a tale that started with reading a totally unrelated set of guidelines, then a Friday morning spent digging, to come out with nothing. This is our notes.txt for ‘BMP and ICC’.

Say what? Bitmap (BMP) images able to access local files? This has got to be interesting. The low hanging fruit bell goes off. The initial theory this would be a good way to trace when files where opened via UNC paths. With a hop skip and a jump off to Google we went looking for details. Turns out that Oracle are a bit anxious about this attack as they've suffered a couple of bloody noses in the past:

The BMP image parser in Sun Java Development Kit (JDK), when running on Unix/Linux systems, allows remote attackers to cause a denial of service (JVM hang) via untrusted applets or applications that open arbitrary local files via a crafted BMP file, such as /dev/tty.

But how does it work? It turns out that ICC color profilesare not limited to BMP images. Instead they’re a high-level graphics standard which provide the following functionality:

Profiles describe the color attributes of a particular device or viewing requirement by defining a mapping between the device source or target color space and a profile connection space (PCS).

OK, so? Well back to BMP. First we dug into the BMP file header (who knew there were five versions). Thankfully Microsoft provide a clean structure for working with BMP headers. The key part of the header is:

bV5CSType - PROFILE_LINKED. This value implies that bV5ProfileData points to the file name of the profile to use (gammas and endpoints are ignored).

OK great, so a quick tool got hacked together to set bV5CSType to PROFILE_LINKED and bV5ProfileData to 0xFFFFFFFF (theory being any application parsing this field would either crash or throw a first chance exception at least). And... drum roll... nothing in any of the BMP parsers on our Windows 7 machines other than some colour in what was a completely white image previously.

So we ran our new BMP header tool on Chris’s test case from 2007, hmmm the output sort of looks correct (red text) but not exactly compared to the Microsoft specification:

bV4CSType (Recx note: also known as ColorSpaceType) - The color space of the DIB. The following table lists the value for bV4CSType. LCS_CALIBRATED_RGB - This value indicates that endpoints and gamma values are given in the appropriate fields. See the LOGCOLORSPACE structure for information that defines a logical color space.

lcsFilename - A null-terminated string that names a color profile file. This member is typically set to zero, but may be used to set the color space to be exactly as specified by the color profile. This is useful for devices that input color values for a specific printer, or when using an installable image color matcher. If a color profile is specified, all other members of this structure should be set to reasonable values, even if the values are not completely accurate.

But it looked more like a v5 than a v4 header. The reason we say this is that Microsoft defines PROFILE_LINKED is as 'LINK' in WinGDI.h. compared to '3' in Java and ImageMagick. So it looks valid for Java but not Microsoft although the size indicates a v5 header.

Got it, now we realised there are no CRC checks on any of this. Take Chris’s original evil2.bmp change the path contained within it (to c:\Moooo instead of /dev/tty). Fire up ProcessMonitor from Microsoft Sysinternals, load the BMP file in every installed application that can parse BMPs and... drum roll... nothing...

At which point we came to the conclusion this is an obscure feature that basically no one supports. So how did Java get rolled? Off to the code we went. Sure enough we find where Java had it's issues:

So we can see the code, understand how it came about in the Java implementation and the patches they've put in. We also had a quick look at a number of BMP parsers and none seemed to support this external file feature for ICC colour maps. But alas a vulnerability that had promise didn't yield anything, during our three hour investigation at least.

We don't think all is lost if someone wants to pick up the baton, there may be mileage in other formats that support similar ICC features, or high-end graphics applications we don't have access to. For example some quotes we found during our investigation regarding other formats that appear a little interesting.

PDF/VT-2 is designed for multi-file exchange and based on one of PDF/X-4p, PDF/X-5g, or PDF/X-5pg. PDF/VT-2 documents can reference external ICC profiles, external page contents, or both. A PDF/VT document and all its referenced PDF files and external ICC profiles are collectively called a PDF/VT-2 file set.

ICC profile part names SHOULD contain four segments, using ?/Documents/n/Metadata/? as the first three segments, where n is the fixed document that uses these parts [S2.30]. If an ICC profile part is shared across documents, the part name SHOULD contain two segments, using ?/Metadata/? as the first segment and a second segment that is a string representation of a globally unique identifier, followed by an extension [S2.30]. ICC profiles SHOULD use an appropriate extension for the color profile type. [S2.30] [Example: ?.icm? end example]

For us, for now at least, it's time to save and close our BMP ICC notes.txt file.

Note: Updated 3 hours after the original post to add a little more information and correct a few small errors.

So what's the purpose of this post you may ask? Well we needed a quick way to enumerate what aspects of the system were accessible from low integrity processes on Microsoft Windows to aid with the SDL verification phase. So we wrote a small utility to do exactly that. It enumerates different objects and looks for the mandatory label and low integrity. Currently the tool enumerates the following aspects:

File system

Registry

Objects

Named pipes

The way we implemented it was as follows. First enumerate the objects (via their respective mechanisms) and then secondly call on each, the following the functions:

If you're interested in using the utility it can be downloaded in binary form from here. The tool has been statically compiled with the CRT so you wont need to have the correct re-distributable installed for it to work.

Before posting we checked with Tom Keetch to see if we was aware of any other tools that would do something similar as we didn't want to waste peoples time. Tom pointed out that AccessChk from Windows Sysinternals can be used to do something similar with the -w -e command line options but won't specifically filter out just the low integrity covered objects (accepted that you could do some grep-foo to post process the output).

Tom also mentioned that the Attack Surface Analyzer from Microsoft may also flag low integrity accessible objects. The downside of Attack Surface Analyzer is it needs to be run before and after product installation so may be a little too cumbersome in some situations and specifically if you've been given an installed box to assess.

Anyway we hope you find the tool useful and if you have any feedback, bugs or omissions please do get in touch.

"Once systems are up and running, deploying a critical security patch out of the regular patch cycle introduces a greater risk of outage and therefore, a failure to meet SLAs. Customers want a regular predictable patch cycle they can build into their SLAs, emergency critical patches screw these up."

We paraphrased and incorporated Phil's input into our post; as a vendor, whether or not your clients will deploy a patch is a key driver to the patch development process. Patches although core to the overall security are relative to the risk position of the business. The 'risk appetite' of a business is not something to be over or under estimated; and at times we've been very surprised. For example: the organisation considering an upgrade of it's user network to gigabit Ethernet as a solution to the network slowdown caused by a rampant conficker infection; they understood the cause, but an upgrade was seen by them as the 'least cost' option to dealing with the effects. For us, as a security consultancy and software development business, it's sometimes a challenge to understand the mindset of clients who have ravenous risk appetites, particularly when you're being paid to advise them on them on their technical risks.

The reality of course, is that within a business there is a huge amount of decision conflict between the different risk postions of suppliers, stake-holders and customers. Balancing the three is a complex task where more often than not at least one party is left disappointed and with an uneasy feeling. However, Phil has presented an interesting perspective which is worthy of some more detailed discussion.

Service Level Agreements (SLAs) are commonplace; they exist at various levels, but mostly between a business and it's suppliers and between the business and their customers. For a large proportion of the time, the focus is on the upkeep of the SLA. For example, ensuring that an Internet connection is maintained with five-nines [1][2] availability rather than at the latest patch version of the router; or ensuring that customers can check out of your web store within certain usability time frames rather than implementing additional security checks to the input/output routines that may come with a performance penalty.

A typical SLA will have provisions which permit the provider to perform scheduled work such as maintenance, often outside the guaranteed service agreement. This leaves, as Phil implies, the 0.001% for unplanned events, which encompass all manner of outages and disruptions as well as unplanned emergency security upgrades or mitigation exercises. The nature of emergency upgrades/changes is such that unlike scheduled outages, they don't necessarily have the same logical planning and management over-site and often are deployed hurriedly to mitigate an exposure, which in itself is not a risk free process.

Of course with a system implementing high-availability, there should be sufficient resilience and resource to take down any component to perform maintenance without impacting the SLA; likewise there should be a structured duplicate test environment through which patches can be rolled out to provide detailed impact analysis. However, this is not always (and if we're being honest rarely) the case and with budgetary constraints getting ever tighter in all sectors spare equipment or capacity is constantly being eyed up for repurposing. However robust a system, the penalties of breaching the SLA are often too commercially great to risk implementing unplanned emergency changes to a functioning system; when that window may be required for an actual fault or outage.

The stance you take as a vendor will likely never please all of your clients. They have to deal with the negative perception of service updates and outage announcements to their customers; and how reactive organisations appear regardless of whether the choice is their own. Of course, if that vulnerability is being actively exploited as a vendor you can't appear to react fast enough or provide detailed enough advice; irrespective of whether that impacts your normal patch development process and diligence. Sometimes you just have to be seen doing something.

Of course, you could read Phil's comment and say 'why do we care'. An emergency patch was required and as such you released it; if the SLAs maintained by your customers mean that they can't roll the patch out without risk of breach - surely that's their problem; and maybe they should have negotiated their terms more effectively. Although commercially harsh, this is a valid standpoint. As a business you can't bend to the whims of every customer. Inevitably you will have one who wants a patch yesterday, and another who'd like the whole thing delayed a year or so. There is also the potential for customers to place pressure on the vendor to downgrade or upgrade the classification of a vulnerability in order to avoid or force the implementation.

Providing work-arounds or alternate mitigations is an approach vendors often take. As is, the implementation of solutions which facilitate 'hot patching', facilitating system up-time whilst allowing risks to be mitigated. However, neither of these approaches is significantly diverse to the deployment of an emergency patch to make them anything other than optional decision routes on the same branch.

The business question we're asking, is how much weighting should you apply to customer SLAs when making the decision to develop an emergency patch for a security exposure in your product; and of course there is no definitive answer. Advising your clients to build in provisions within their service and customer SLAs to allow for reactive implementation of security fixes may be one solution. Provisions that factor in security related actions that don't negatively impact the overall SLA; however of course this is ripe for abuse by unscrupulous providers looking to squeeze the agreement when balancing on the 0.00149% boundary.

Emergency patches do throw a spanner in the works, a regular predictable cycle of patch releases, each robust and diligently tested to the nth degree is the ideal; but almost impossible to achieve without offloading considerable risk onto your clients. Emergency patches will be released for your systems, it will happen and you will have to react to them. How you react, how you balance the risk and how big your risk portion is are all relative. However, we feel that having the security of your system potentially compromised due to an overbearing or highly restrictive SLA is unlikely a solution with solid foresight.

With more and more companies moving services onto hosted or cloud solutions, the bigger security question is what impact does the 99.999% availability have on your security position. How are they patching the hardware, software and applications providing your environment, and what provisions are there within your SLA for reacting to unplanned security risks.

Tuesday, 3 January 2012

The title might sound like a court case, and in reality security bugs like all defects, are an annoyance that the business has to deal with on an increasingly regular basis that may lead to it feeling like one. We say annoyance as security bugs are exactly that. Without their existence life would be simpler. When feature complete, functional bugs may exist, but the business can ship product safe in the knowledge that if the bugs are stumbled upon they won’t likely be splashed all over the BBC, The Register, H-Online or CERTs front page and their key customers won’t be phoning for a chat.

We think it’s safe to say that software security vulnerabilities (and likely hardware too) are going to be a fact of life for at least our careers (and we’re being optimistic). This statement is based on the fact they’ve been around for over forty years already and there are no signs of the volume or diversity slowing down. This leads us to an on-going decision battle that all businesses who write software have to wrestle with when shipment time nears - ‘Should we stop the release for a fix?’ In reality the question when asked explodes into ‘Should we stop the release to do root cause analysis, development, testing, regression testing, packaging and then release?’.

The modern reality is that this needs to be a risk management decision. On one side you have the business and on the other the engineering facts about the vulnerability or vulnerabilities. The alternate perspective is that software development organizations, inside every sector, have wrestled with this problem since the first programs were written. Bugs, both new and old are found in every release, at every stage of the development cycle; which leads us to the question: what does a ‘stop release’ bug actually mean? Much has been written about end user organization software vulnerability/patch risk management, but little in the context of software security from an independent software vendor release risk management perspective.

“A bug bar is a quality gate that applies to the entire software development project and defines the severity thresholds of security vulnerabilities—for example, no known vulnerabilities in the application with a “critical” or “important” rating at time of release. The bug bar, once set, should never be relaxed.”

“... none of these factors should play any role in determining whether to fix security bugs—that is, bugs that could potentially lead to security vulnerabilities in the product. Classification of security bugs must be objective and consistent. It doesn’t make any difference to an attacker that you found a vulnerability only a week before your code-complete milestone; he’ll exploit it just the same”

We agree that vulnerability classification should be objective and consistent. However, alone this classification will rarely be used to decide delay product shipment in organizations where security is seen as a cost. Software vulnerabilities are a reality that organizations who develop software have to deal with, and in our experience, it is rarely an purely objective decision unless the business both embraces security and is exceptionally flexible with cash to burn.

Ask The Questions

If your product or system has vulnerabilities that are sufficiently severe enough in terms of impact or importance there are a number of questions that require answers before a release delay decision can be made. We've raised some example questions below for both the business, engineering and security teams; the answers of which can then be fed into a risk decision process or model.

Of The Business

What are the direct costs of delaying the release? Financial cost to the business if the planned release date is missed.

Are there any overriding internal factors? Internal factors aside from fiscal costs which mean that the shipment can’t be delayed, such as dependant products.

Are there overriding external factors to consider? External non-cost factors, which mean the shipment can't be delayed, like external commitments.

What are the direct costs if an emergency patch is required? Financial costs associated with producing, testing, releasing and communicating an out of band / emergency patch.

Are there any indirect costs if an emergency patch is required? Various indirect costs which include company, brand and product reputation, customer good will and other less tangible costs.

Will clients deploy an emergency / critical patch? Once clients have deployed a system a critical security patch out of the patch cycle potentially introduces a greater risk of outage and therefore a failure to meet SLAs. Customers typically want a regular predicatble patch cycle they can build into their SLAs. If emergency critical patches will be precluded from deployment in the majority of customer deployments then there is a counter argument for developing / releasing one.

Of The Engineering and Security Teams

Is this an internal or external application? The position of the application, such as: is it purely internal, Internet facing, in active support or being sold to end customers.

Who is the discoverer? Whether the source of the exposure is internal or external to the business and the way in which it was raised.

What is the technical risk of the vulnerability? Microsoft’s STRIDE (although their DREAD model will likely need answering too), CVSS or equivalent.

What is the likelihood of discovery? A purely subjective view that could well be wrong, but used as an indication.

How complex is the fix: Requires root cause analysis to have been completed followed by a technical estimate from the engineering team.

What sort of testing effort is required? Once the impacted code has been identified (through root cause analysis) the scope and scale of the quality testing required can be understood and effort estimates created.

When is the next patch opportunity? Next scheduled patch opportunity or release window. This then becomes the period of exposure for the business; which when combined with typical patch application the customer exposure period can be derived.

Are there any mitigating factors? Anything else not captured previously that could be used to define a business case for either supporting or delaying the release.

Without being in the position to answer most of these questions it is unlikely that the business will be able to make an informed risk decision; and will likely push back or just ship regardless. Of course, we can rely on gut instinct and experience (or shouting); but if we want something repeatable that can stand up to scrutiny (and the inevitable worst case fallout) then you’ll likely want to consider adopting something similar. A list of questions to inform the decision process, prior to presenting the case to the judge and jury (aka the business) on the risk being carried with a release.

At which point we should probably be posting an elegant equation that is a panacea to software vulnerability risk that gives a yes or no answer to the 'should we ship question?'. But we're going to point out the uncomfortable truth and say it's just not that simple. There will be horse trading, there will be discussion, there will be bartering. So be prepared and don't be a risk automaton.

“Our results show that vulnerability disclosure leads to a significant loss of market value for software vendors. This indicates that the stock markets react negatively to the news of a vulnerability disclosure, because the discovery of a vulnerability could suggest a loss in future cash flow for the software vendors.”

This conclusion comes with many caveats, so we encourage you to read the paper rather than take our salacious, out of context quote, as read. For example it doesn't deal with internal applications, nor touch upon positive security vulnerabilities. By positive security vulnerabilities we mean those vulnerabilities that allow users to exercise a product in a way the original manufacturer did not intend. Yet through the bugs existence, subsequent exploitation and press coverage, leads to more sales of the product in question or improved brand recognition. Additionally, if you've never experienced a public security vulnerability you'll be unlikely able to answer how much it costs to respond.

Finally, it’s important to keep in mind; that even with the most comprehensive risk analysis process the reality is the final veto lies with the business. If the appetite exists to assume the risk no matter how grave, then we as security professionals must provide support whilst documenting, planning and mitigating as much as possible given the constraints placed upon us.

In conclusion, we don't feel bug bars and similar on their own are enough given the reality in most software development organizations. The decision of whether a product should or should not ship given the presence of serious security issues is a complex one. However, making a business aware risk decision process, which captures input from all stake holders can remove barriers and reduce confrontation, whilst providing support to the goal of security teams, assessors and accreditors - and more secure products in the long run.