Nessus scripts and Moore's Law

[.I have been thinking about this for a time and I recently talked about all this privately with a couple of people]

Last year, we had less than 2000 Nessus scripts if I remember well. Today we have 5570. Although some of them are related to a brand new feature (local tests) and are automatically generated, it seems that the number of "classical" plugins increase quicker than Moore's Law (double every 18 months).

In a very far future, we will not be able to find a computer quick enough to run the full test set before the target machine is obsolete. So far it is only a nice topic for a geek scifi story :-)

The situation is not dramatic: most "local tests" are skipped when you run "optimized" tests. As I did not like some implications of script_exclude_keys() I added an "optimization_level" setting in nessusd.conf - default (old behaviour) is 3, but I set it to 2 instead, which only enforce script_require_ports() and script_require_keys(). Lowering it to 1 is not recommended as the useless local tests will be run (script_require_keys is not enforce any more, only script_require_ports)

However, as we pile new tests up, scans are expected to run (a little?) slower. Removing old tests is definitely a bad idea, as I have seen machines that had never been patched for more than 5 years connected to Internet! Why they were not rooted remains a mystery...

I have come with several possibilities to reduce the overall scanning time:

1. Cut down the number of banner matching plugins and replace them with generic tests. Thus, we might even find unpublished flaws. But: - everything cannot be tested easily - most security advisories are fuzzy and reproducing the bug is not easy. - generic tests are often dangerous and would be declared as ACT_DESTRUCTIVE_ATTACK, i.e. disabled by people who run "safe checks". IMHO, "safe checks" should not be used most of the time. Users ask for 99.99% uptime and ultra high reliability but they often do not need it. Exceptions are few: medical life support systems (don't even scan them!), big banking / money transfer systems (but you may have time for play between two cut off), round the clock worldwide airplane reservation systems...

2. Rewrite the NASL interpretor using a VM. According to gforth / vmgen developers, such an interpretor might be 10 to 100 times quicker than a simple "syntax tree walker" (just like the current NASL2). But: - some people say the speed gain is not that great, and definitely not worth the trouble - the current code is clean and flexible, a stack machine based interpretor might not be. - CPU is not that an issue but on huge network. On my point of view, memory is (because I usually use an old laptop to scan a couple of machines, maybe? :)

3. Try other "simple" NASL optimizations. I have implemented a cache for include files (they are only parsed once for all plugins) in the CVS head branch. Even if it saves only 1% of CPU, it is worth it because the code is simple & safe (less than one page of C, two exported functions) Another idea would be to convert some very common NASL functions into C so that they run quicker.

4. Make sure that all plugins do not perform redundant operations. e.g. if a plugin need a banner, it should be grabbed from the KB, not read on the network. I guess that there are few ill designed plugins (maybe none).

> I have come with several possibilities to reduce the overall scanning > time:

What are people's feelings about combining related plugins, especially those that rely on simple version strings? For example, I count 9 plugins named apache_2_0_*.nasl, most of which do nothing more than check the banner. It seems to me that they could be replaced by a single plugin that checks the banner against the latest version and is updated as new versions and vulnerabilities are announced. [. Perhaps nessus-update-plugins could be augmented to support removing plugins (optionally??) to help with this. ]

This suggests, though, a greater degree of editorial control over plugins development than what currently exists. Are there resources for that and is it still consistent with the open and distributed nature of plugin development?

> 4. Make sure that all plugins do not perform redundant operations. > e.g. if a plugin need a banner, it should be grabbed from the KB, not > read on the network.

This is a good idea, one that I've strived to implement with my plugins, especially those involving web applications (which reduces the need to scan a bunch of locations looking for the application itself).

> What are people's feelings about combining related plugins, especially > those that rely on simple version strings? For example, I count 9 > plugins named apache_2_0_*.nasl, most of which do nothing more than > check the banner.

The philosophy of Nessus is to have one plugin for each flaw (whenever possible). As the different Apache versions are vulnerable to different flaws, it makes sense to have different scripts. There might be one rare case when we can (and should) remove old plugins. If Gizmo 1.23 is vulnerable to a web directory traversal for example, which is fixed in 1.24, and if sometimes later, somebody finds a new way to exploit the directory traversal, which is fixed in 1.45, than we don't need two plugins. But I'm afraid that this situation is rare :-\

Over optimizing might be dangerous too: if the web server banner is not in the KB because of a network glitch, the plugin should try to grab it again (that's what get_http_banner does) Maybe we should spend a little more time on service identification and banner grabbing to be able to run at full speed once the KB is filled with information. As I had some incomplete reports sometimes, I wrote doublecheck_std_services.nasl which tries hard to identify common services: I'd rather lose some time during the first phase than run the whole scan again. The service identification system is split in several scripts 1) to make it more flexible (as find_service is in C), and 2) to make it more reliable (let's hope!)

> It seems to me that they could be replaced by a single plugin that > checks the banner against the latest version and is updated as new > versions and vulnerabilities are announced.

We don't want to just read: "upgrade to the latest versions". We need to know if the flaws are dangerous or not, considering the environment.

Another point before we try to enhance anything: profiling Nessus is not easy. Some time ago, I wrote a quick & dirty patch which printed at the end of each plugin the used resources (real time, CPU time...). It is still in exec.c (but commented out of course). We may enable it and look at the results...

> On Thu, Nov 11, 2004 at 06:06:29PM +0100, Michel Arboi wrote: > > > I have come with several possibilities to reduce the overall scanning > > time: > > What are people's feelings about combining related plugins, especially > those that rely on simple version strings? For example, I count 9 > plugins named apache_2_0_*.nasl, most of which do nothing more than > check the banner. It seems to me that they could be replaced by a > single plugin that checks the banner against the latest version and is > updated as new versions and vulnerabilities are announced. [. Perhaps > nessus-update-plugins could be augmented to support removing plugins > (optionally??) to help with this. ]

Considre this message the virtual hand raised in support of this suggestion. (Of combining plugins that are quite similar to one another.)

Hugo.

-- I hate duplicates. Just reply to the relevant mailinglist. hvdkooij [at] vanderkooij http://hvdkooij.xs4all.nl/ Don't meddle in the affairs of magicians, for they are subtle and quick to anger.

George Theall wrote: > On Thu, Nov 11, 2004 at 06:06:29PM +0100, Michel Arboi wrote: > > >>I have come with several possibilities to reduce the overall scanning >>time: > > > What are people's feelings about combining related plugins, especially > those that rely on simple version strings? For example, I count 9 > plugins named apache_2_0_*.nasl, most of which do nothing more than > check the banner. It seems to me that they could be replaced by a > single plugin that checks the banner against the latest version and is > updated as new versions and vulnerabilities are announced. [. Perhaps > nessus-update-plugins could be augmented to support removing plugins > (optionally??) to help with this. ] >

This is *way* out there....but, didn't Kaminsky (sp?) write a 2-part scanner? So, you have the 'scanner' portion which sends the stimulus to the server, and you have the 'collector' which reads the return data. In this way, the redundant banner checks (and similar) are done in the 'collector'.

On Thu, Nov 11, 2004 at 06:06:29PM +0100, Michel Arboi wrote: > 1. Cut down the number of banner matching plugins and replace them > with generic tests. Thus, we might even find unpublished flaws.

This is actually a _slower_ alternative to the current banner matching plugins we have today - it's _way_ faster to write one plugin which connects to port N on the remote host, store the result in the KB, and run hundreds (if not thousands) of egrep() on it, than it is to test all the permutations of a given protocol like a generic plugin would.

There are tons of drawbacks to generic plugins :

- They are very slow and most of the time unreliable. So you send "USER XXXX[...]XXXX" to a remote FTP server, and it cuts the connection down. How do you distinguish a segfault from an exit() ? You simply can't. If you run Nessus without safe checks _today_, most of the false positives come from such plugins ;

- They are destructive. Crashing the remote service is not an option

- They are too fuzzy. When most users read that the remote server _might_ be vulnerable to a buffer overflow, without any reference to any BID or CVE, they just assume it's a false positive. And even if your plugins had references to hundreds of BIDs, don't expect anyone to click on every of them to determine wether their product is listed or not.

In short, generic plugins are useful BUT unreliable, and I want to move most of them in the "thorough checks" section.

> 2. Rewrite the NASL interpretor using a VM. According to gforth / > vmgen developers, such an interpretor might be 10 to 100 times quicker

That would be good. The issue today is not really speed, but CPU usage. If you are testing 3 hosts in parallel, you don't care about such a VM. However, if you intend to scan your class B, you want a high level of optimization, which can be achieved if each process has a very little CPU footprint.

So this one would not be a waste of time.

> 3. Try other "simple" NASL optimizations. I have implemented a cache [...]

I don't think they will attenuate your current fears.

> 4. Make sure that all plugins do not perform redundant operations.

This is also important. I introduced a very crude HTTP caching mecanism a while ago (now disabled) and I want to re-do it (mostly, to only cache static pages), this would speed up the checks a lot. What slows down the Nessus scans today are all the dumb cross-site-scriptings/SQL injections in nearly unknown PHP scripts, as it's tough to optimize using the KB and each script has to look into a lot of places. However, a lot of the "banner grabbing" plugins already use this mecanism, and Nessus 2.2 paves the way for an ever increased usage of the KB since the cost of accessing it is now nearly null.

On Thu, Nov 11, 2004 at 03:59:14PM -0500, George Theall wrote: > On Thu, Nov 11, 2004 at 06:06:29PM +0100, Michel Arboi wrote: > > > I have come with several possibilities to reduce the overall scanning > > time: > > What are people's feelings about combining related plugins, especially > those that rely on simple version strings? For example, I count 9

I don't feel good about this. The idea behind one script per vulnerability is to give the users the ability to perform grep-like research on vulnerabilities.

If you consider the overflow in Apache-mod_proxy as a non-issue for your organization, you'll know that your Apache 2.0.0 is still vulnerable to a great number of flaws. If you start to aggregate vulnerabilities under one given plugin ID, you basically decide what is important instead of the users, and it makes everyone's life more difficult.

> This is actually a _slower_ alternative to the current banner matching > plugins we have today - it's _way_ faster to write one plugin which > connects to port N on the remote host, store the result in the KB, and > run hundreds (if not thousands) of egrep() on it, than it is to test all > the permutations of a given protocol like a generic plugin would.

Optimization is always a matter of compromise: CPU vs RAM, CPU vs network. Generic tests eat more resources on the network & target machines and less on the scanner. Currently (and for a loooong time ahead), local CPU is more abundant and cheaper, I admit.

> - They are very slow and most of the time unreliable. So you send > "USER XXXX[...]XXXX" to a remote FTP server, and it cuts the connection > down. How do you distinguish a segfault from an exit() ? You > can't. If you run Nessus without safe checks _today_, most of the > false positives come from such plugins ;

IIRC, we have made many improvements on them.

> - They are destructive. Crashing the remote service is not an option

It depends on what are your "security objectives". If reliability is your priority, you certainly don't want to crash a service to verify if it is vulnerable to a known DoS that might be exploited one day. If confidentiality is your priority, then you don't care. That's the case with many military systems that protect secret data: better dead than caught. I was told that a French defence "network cutter" was vulnerable to the old ping o' death. I'm not sure that this is not a legend spread by competitors, but anyway, the goal of the devise is to prevent information leak, at *any* price.

As I said in my previous message, I'm quite sure that most users don't need 99.99% reliability.

> - They are too fuzzy. When most users read that the remote server > _might_ be vulnerable to a buffer overflow, without any reference to > any BID or CVE, they just assume it's a false positive.

Maybe we might enhance the message and give more data so that the user can investigate?

> In short, generic plugins are useful BUT unreliable, and I want to move > most of them in the "thorough checks" section.

They are already in ACT_DESTRUCTIVE_ATTACK, no?

>> 2. Rewrite the NASL interpretor using a VM. According to gforth / >> vmgen developers, such an interpretor might be 10 to 100 times quicker > That would be good.

*If* the VM really speeds things up. Does anybody know how I could benchmark it without full rewriting the interpretor? I don't want to code during a month to discover that I save 3% of CPU time!

> I don't think they will attenuate your current fears.

I am more looking for enhancement than "fearing" anything. As I said, the "sluggish Nessus" scenario is good for scifi. <troll> This is different with some other scanners that have sacrificed completeness or reliability to get a reasonable speed. </troll>

> This is also important. I introduced a very crude HTTP caching mecanism > a while ago (now disabled)

IIRC, it did not really improved speed and ate much CPU and RAM :-] However, we fixed a terrible memory leak since (I plead guilty for this horror)

> Last year, we had less than 2000 Nessus scripts if I remember well. > Today we have 5570. Although some of them are related to a brand new > feature (local tests) and are automatically generated, it seems that > the number of "classical" plugins increase quicker than Moore's Law > (double every 18 months).

BTW: The sheer number of plugins in one directory makes CVSweb (as well as grep ... *.nasl) unusable.

> 3. Try other "simple" NASL optimizations. I have implemented a cache > for include files (they are only parsed once for all plugins) in the > CVS head branch. [...]

I got a feeling there was a plan to preparse all scripts and save them in a parsed form?

On Thu, 11 Nov 2004, Michel Arboi wrote:

> Over optimizing might be dangerous too: if the web server banner is > not in the KB because of a network glitch, the plugin should try to > grab it again (that's what get_http_banner does)

This is silly. Rather than ending with a completely and obviously bogus report (no information about the service), you end with a partially bogus report (some plugins got the banner, some plugin did not). The latter is worse IMHO.

On Fri, 12 Nov 2004, Renaud Deraison wrote:

> - They are very slow and most of the time unreliable. So you send > "USER XXXX[...]XXXX" to a remote FTP server, and it cuts the connection > down. How do you distinguish a segfault from an exit() ? You simply > can't. [...]

> That would be good. The issue today is not really speed, but CPU usage. > If you are testing 3 hosts in parallel, you don't care about such a VM. > However, if you intend to scan your class B, you want a high level of > optimization, which can be achieved if each process has a very little > CPU footprint.

As far as I can tell, Nessus has always been more memory (*) and network bandwith hungry than CPU hungry. I had to reduce parallelism in order to prevent thrashing and network congestion on several occasions but I don't recall I have ever had to reduce it because CPU was overloaded. YMMV.

> BTW: The sheer number of plugins in one directory makes CVSweb > (as well as grep ... *.nasl) unusable.

I guess that we could create sub-directories. Somebody suggested one directory for each family. Why not? We just have to change the install script.

> I got a feeling there was a plan to preparse all scripts and save them in > a parsed form?

I've tried to save the syntax tree in a simple binary format but loading this is not quicker than parsing the file. Bison is really good! Another trick: we could parse all the .nasl, and keep all the syntax tree in memory. But this might be too expensive (in MB).

>> Over optimizing might be dangerous too: if the web server banner is >> not in the KB because of a network glitch, the plugin should try to >> grab it again (that's what get_http_banner does) > This is silly. Rather than ending with a completely and obviously bogus > report (no information about the service)

You will get information about other services, so you might miss the untested service. The service detection system tries to warn you anyway: e.g. find_service will say "an unknown service is running on this port, it is usually reserved for HTTP" and then find_service2 ou doublecheck_std_services will detect a web server. You will know that you have a problem, that you should increase the timeout or fix the network.

> BTW: The sheer number of plugins in one directory makes CVSweb > (as well as grep ... *.nasl) unusable.

I guess that we could create sub-directories. Somebody suggested one directory for each family. Why not? We just have to change the install script.

> I got a feeling there was a plan to preparse all scripts and save them in > a parsed form?

I've tried to save the syntax tree in a simple binary format but loading this is not quicker than parsing the file. Bison is really good! Another trick: we could parse all the .nasl, and keep all the syntax tree in memory. But this might be too expensive (in MB).

>> Over optimizing might be dangerous too: if the web server banner is >> not in the KB because of a network glitch, the plugin should try to >> grab it again (that's what get_http_banner does) > This is silly. Rather than ending with a completely and obviously bogus > report (no information about the service)

You will get information about other services, so you might miss the untested service. The service detection system tries to warn you anyway: e.g. find_service will say "an unknown service is running on this port, it is usually reserved for HTTP" and then find_service2 ou doublecheck_std_services will detect a web server. You will know that you have a problem, that you should increase the timeout or fix the network.

> On Thu, 11 Nov 2004, Michel Arboi wrote: > >> Over optimizing might be dangerous too: if the web server banner is >> not in the KB because of a network glitch, the plugin should try to >> grab it again (that's what get_http_banner does) > > This is silly. Rather than ending with a completely and obviously bogus > report (no information about the service), you end with a partially > bogus > report (some plugins got the banner, some plugin did not). The latter > is > worse IMHO.

I agree. I'd much rather have something obviously inaccurate than something that is inaccurate but appears accurate on the surface.

On a side note, port scanning is the most sensitive and time consuming for us. With optimize tests enabled, many of the tests will not run. Plus, the tests seem to run very quickly in comparison. Kudos to the nessus devel team for keeping them that way. :-)

>> That would be good. The issue today is not really speed, but CPU >> usage. >> If you are testing 3 hosts in parallel, you don't care about such a >> VM. >> However, if you intend to scan your class B, you want a high level of >> optimization, which can be achieved if each process has a very little >> CPU footprint. > > As far as I can tell, Nessus has always been more memory (*) and > network > bandwith hungry than CPU hungry. I had to reduce parallelism in order > to > prevent thrashing and network congestion on several occasions but I > don't > recall I have ever had to reduce it because CPU was overloaded. YMMV.

Another point of reference, our experience has been that network bandwidth is the critical resource. Second would be memory, then cpu a distant third. However, most of our assessments have been done over the Internet where we obviously don't have as much bandwidth to play with.

> I agree. I'd much rather have something obviously inaccurate than > something that is inaccurate but appears accurate on the surface.

But you cannot never be sure that you did not miss something. Renaud planned to implement adaptative network timeout; this might help.

We could imagine that every Nessus test tries very hard to get an answer from a previously identified service. This way, if there is a network problem, we will not miss anything. *However*, if the machine or the service gets down, then the scanner will remain stucked on the port, waiting for an answer that will never come. Anything could happen: somebody unplugged the wrong cable on the switch, or a router was misconfigured, or the machine was rooted and the cracker wants to "glue" your scanner here while he looks for other vulnerable machines...

I don't think there is a silver bullet. But if you have any idea...

> On a side note, port scanning is the most sensitive and time consuming > for us.

netstat and snmpwalk "pseudo port scanners" might help you.

> Another point of reference, our experience has been that network > bandwidth is the critical resource. Second would be memory, then cpu > a distant third. However, most of our assessments have been done over > the Internet where we obviously don't have as much bandwidth to play > with.

There are in fact several categories of people who run Nessus with different goals. Fully satisfying everybody is impossible but it seems that Nessus made a good compromise: - some run it on quick LAN, other on Internet. - some run it against at most a couple of machines at a time, others watch a full class B network. - some want an in-depth audit, others want only the biggest holes (and no false positive). - some can afford to crash any service (and will be happy to find an unknown flaw before a 0-day exploit is out), others want their systems to be up & running.

On Fri, Nov 12, 2004 at 12:02:48PM +0100, Michel Arboi wrote: > Optimization is always a matter of compromise: CPU vs RAM, CPU vs > network. > Generic tests eat more resources on the network & target machines and > less on the scanner. Currently (and for a loooong time ahead), local > CPU is more abundant and cheaper, I admit.

Generic tests don't produce a *helpful* output. If the output if fuzzy, most users will think it's a false positive - trust me on that.

> > - They are very slow and most of the time unreliable. So you send > > "USER XXXX[...]XXXX" to a remote FTP server, and it cuts the connection > > down. How do you distinguish a segfault from an exit() ? You > > can't. If you run Nessus without safe checks _today_, most of the > > false positives come from such plugins ; > > IIRC, we have made many improvements on them.

Sure, but they are still less reliable than other tests. And their output is less useful than other tests in the sense that they don't point to an exact flaw description, nor to a solution.

> > - They are destructive. Crashing the remote service is not an option > > It depends on what are your "security objectives". If reliability is [...] > As I said in my previous message, I'm quite sure that most users don't > need 99.99% reliability.

Crashing a service is _NOT_ acceptable, except if your objective really is to throw everything you can to the remote host. In other words, while generic tests are useful, they can not replace dedicated and less intrusive tests. Ergo, they do not solve the initial problem you attempted to raise on the list.

> > - They are too fuzzy. When most users read that the remote server > > _might_ be vulnerable to a buffer overflow, without any reference to > > any BID or CVE, they just assume it's a false positive. > > Maybe we might enhance the message and give more data so that the user > can investigate?

You have a "single host" vision. The truth is that there are big networks out there being scanned by Nessus, and Nessus already reports hundreds of thousands of _known_ flaws in them. Who wants / can investigate unknown/possible flaws when dealing with big networks ?

> > In short, generic plugins are useful BUT unreliable, and I want to move > > most of them in the "thorough checks" section. > > They are already in ACT_DESTRUCTIVE_ATTACK, no?

The slowest ones should be in the "through tests" section. John Lampe wrote several fuzzers that I want to include in the future, but they are way to slow for a generic scan.

> >> 2. Rewrite the NASL interpretor using a VM. According to gforth / > >> vmgen developers, such an interpretor might be 10 to 100 times quicker > > > That would be good. > > *If* the VM really speeds things up. Does anybody know how I could

Not "speed things up". Lower the CPU usage.

> > This is also important. I introduced a very crude HTTP caching mecanism > > a while ago (now disabled) > > IIRC, it did not really improved speed and ate much CPU and RAM :-] > However, we fixed a terrible memory leak since (I plead guilty for > this horror)

The problem of the original HTTP caching mecanism is that it would grow the KB, which would in turn make forking way too slow on Linux/FreeBSD systems (where the fork() time is proportional to the process size).

> > and I want to re-do it (mostly, to only cache static pages) > > Maybe in C?

On Fri, Nov 12, 2004 at 12:50:11PM +0100, Michel Arboi wrote: > I've tried to save the syntax tree in a simple binary format but > loading this is not quicker than parsing the file. Bison is really > good!

Just a side note on this : what takes time is the loading of the .nasl file from disk to memory. The way the interpretor is done today, a compiled .nasl file is actually bigger than a un-compiled one, so the CPU time gained by using a binary file is lost in system calls to read the file from disk.

Then we should tune them with another yet unused global setting: "Report paranoia" (I'm not sure the name is great. If anybody have something better...)

>> *If* the VM really speeds things up. Does anybody know how I could > Not "speed things up". Lower the CPU usage.

That's what I meant: speed up the interpretor => use less CPU for the same job. By the way, the fact that gforth relies upon a VM is surprising. The implementation of the original Forth was simpler (and very efficient). I have to look at this.

If anybody finds a way to benchmark such a thing without writing a new interpretor, that would be great.

>Just a side note on this : what takes time is the loading of the .nasl >file from disk to memory. The way the interpretor is done today, a >compiled .nasl file is actually bigger than a un-compiled one, so the >CPU time gained by using a binary file is lost in system calls to read >the file from disk. > > Have you actually tested this with buffered reading, and so on?

> I guess that we could create sub-directories. Somebody suggested one > directory for each family. Why not? > We just have to change the install script.

That wouldn't help with CVSweb. :)

> Another trick: we could parse all the .nasl, and keep all the syntax > tree in memory. But this might be too expensive (in MB).

The memory will stay shared among all nessusd processes unless you write to it. You might even dump the result into a big file and map the file readonly to make sure it will be shared (and this approach makes it possible to choose between permanent mmap() plus extra fork() overhead and indiviadual mmap()'s in children plus the overhead of minor page faults).

> You will get information about other services, so you might miss the > untested service.

Yes but it is much easier to catch that all the reports for a particular service are missing than that some of the reports are missing.

> The service detection system tries to warn you anyway: > e.g. find_service will say "an unknown service is running on this > port, it is usually reserved for HTTP" and then find_service2 ou > doublecheck_std_services will detect a web server. You will know that > you have a problem, that you should increase the timeout or fix the > network.

Fine. It works for services running on standard ports only but it covers the vast majority of cases. On the other hand, I find it quite confusing to get both that warning and some real reports simultaneously, and I am tempted to ignore the warning as irrelevant noise. Joe Average Luser's temptation to ignore the warning would probably be much higher. Perhaps such warnings should be promoted to "Nessus Alerts"?

But what's the point of repeated attempts to grab the banner in every plugin that needs it? It might slow the test down quite noticeably (esp. if the problem does not disappear spontaneously), and I'll have to rerun it anyway if the problem was caused by a transient/fixable communication glitch.

I still think it makes more sense to try harder to get the banner at the same beginning of the test rather than to retry in every individual plugin.

Well, yes, there is one reason not to make pure kb-checking plugins: the ability to run plugins in a standalone mode (esp. when the standard nasl is still unable to load and provide kb data).

> > MS Windows kill TCP connection with RST when the task dies (GPF etc.). > > Excellent idea! We just have to enhance the API to get the last error > code - I guess that nessusd will get ECONNRESET, no?

Probably. You can check this when you connect to a task running on Windows and kill the task from the Task List.

On Fri, 12 Nov 2004, Michel Arboi wrote:

> We could imagine that every Nessus test tries very hard to get an > answer from a previously identified service. This way, if there is a > network problem, we will not miss anything.

It might be a good idea to write tests in such a way that a warning (or alert...see above) is generated whenever the service fails to respond in either of the expected (vulnerable or not vulnerable) ways. Tests expecting no output may do an additional http_is_dead()-like check; this is somewhat controversial because it adds more work but I think most of them are destructive/DoS test ergo they should check the service's liveness anyway.

Lowering CPU usage is pointless unless you want 1. to run other CPU intensive tasks on the same machine, or 2. to save some joules of electric energy, or 3. want to "speed things up" and CPU is the bottleneck. :)

> The problem of the original HTTP caching mecanism is that it would grow > the KB, which would in turn make forking way too slow on Linux/FreeBSD > systems (where the fork() time is proportional to the process size).

How much memory did it consume? Yes, you pay for every page whose entry must be copied (and prepared for COW if it is writable) but there are already hundreds of pages to be duplicated for the program itself, dynamic libraries etc.

On Sat, 13 Nov 2004, Renaud Deraison wrote:

> Just a side note on this : what takes time is the loading of the .nasl > file from disk to memory. The way the interpretor is done today, a > compiled .nasl file is actually bigger than a un-compiled one, so the > CPU time gained by using a binary file is lost in system calls to read > the file from disk.

Do you need more than 3 syscalls to read the whole file: open(), read(), and close()?

On Sat, 13 Nov 2004, Michel Arboi wrote:

> If anybody finds a way to benchmark such a thing without writing a > new interpretor, that would be great.

Put one "start timer" call at the beginning of the interpreter, one "stop timer" plus "report timer value" at the end, and a pair of "stop" and "start" around all I/O procedures. It will reveal how much time is spent interpreting the code.

BTW: One obvious optimization: rewrite recv_line() to read input in bigger chunks rather than sucking it by single characters. Esp. in non-SSL mode where one character == two syscalls, and one of them is select() with a rather high overhead. This leads to another obvious optimization: poll() (if supported by the OS) is much more efficient than select() when only a small number of fd's (such as a single fd) is watched.

Yet another possible optimization: the hashing function in kb.c is suspicious (repeated use of << shifts startings characters into oblivion when it gets a long key) and might lead to uneven distribution of keys.

> The memory will stay shared among all nessusd processes unless you write > to it.

Only the cleaning process would write it. We can disable than.

> You might even dump the result into a big file and map the file > readonly

Not easy, because the tree contains pointers and each cell is allocated indivually on the heap, among other things like string.

> Yes but it is much easier to catch that all the reports for a particular > service are missing than that some of the reports are missing.

Honestly, I think that adaptative timeout should be the best answer to this problem. It could issue a proper warning when it triggers under some circumstances like a timeout. I've worked once on a WAN which was supposed to be the 8th marvel of the world and happened to be very unreliable. It was not as quick as full switch 100 Mb ethernet, of course, I increase the timeouts and lowered the parallelism and everything should have run well. But the gizmo decided to lose packets from time to time, maybe during 30 s. The 1st Nessus report contained crap, so I re-run it. The 2nd report contained crap (at other places) so I re-run it. The 3rd report contained crap *again*. I looked at my watch, it was 1 a.m., I merged the 3 reports and went back home :-\ As the network glitch appeared at random time, I had a good chance to have a full report from the merge. I had no way to be sure (not in a reasonable time anyway)

> Fine. It works for services running on standard ports only but it > covers the vast majority of cases. On the other hand, I find it quite > confusing to get both that warning and some real reports > simultaneously

The warning is issued by the first detection script, and the second detect the web server.

> and I am tempted to ignore the warning as irrelevant noise.

Not any more, now you know :-)

> Joe Average Luser's temptation to ignore the warning would probably > be much higher.

I agree.

> Perhaps such warnings should be promoted to "Nessus Alerts"?

Not the 1st one. But the 2nd one could be: "A web server was detected on this port but was missed by the 1st detector. Your report might be incomplete. You should re-run your scan with high timeouts" This way, we can even issue warnings on all ports, not only standard ports.

I can fix find_service2 & its brothers.

> But what's the point of repeated attempts to grab the banner in every > plugin that needs it? It might slow the test down quite noticeably (esp. > if the problem does not disappear spontaneously), and I'll have to rerun > it anyway if the problem was caused by a transient/fixable communication > glitch. > > I still think it makes more sense to try harder to get the banner at the > same beginning of the test rather than to retry in every individual > plugin.

> Well, yes, there is one reason not to make pure kb-checking plugins: > the ability to run plugins in a standalone mode (esp. when the standard > nasl is still unable to load and provide kb data).

This could be changed by using the new COMMAND_LINE pre-defined variable. For example: port = get_kb_item("Services/www"); if (! port) { if (COMMAND_LINE) port = 80; } else exit(0)

> Lowering CPU usage is pointless unless you want 1. to run other CPU > intensive tasks on the same machine, or 2. to save some joules of > electric energy, or 3. want to "speed things up" and CPU is the > bottleneck. :)

If I understand well, CPU might be an issue on huge parallel scans. Fortunately, 1. there are quite powerful processors now, 2. Nessus run well on a multiprocessor machine.

[HTTP cache] > How much memory did it consume?

I did not remember, but it was supposed to speed up web tests and did not.

> Do you need more than 3 syscalls to read the whole file: open(), read(), > and close()?

For most scripts, no. For bigge scripts, maybe a couple of read

>> If anybody finds a way to benchmark such a thing without writing a >> new interpretor, that would be great. > Put one "start timer" call at the beginning of the interpreter, one "stop > timer" plus "report timer value" at the end, and a pair of "stop" and > "start" around all I/O procedures. It will reveal how much time is spent > interpreting the code.

We already have this in exec.c (it prints the result of getresources) What I need is a way to estimate the speed gain between the current interpretor and a VM based interpretor without rewriting the whole interpretor with the VM. i.e., I don't want to do it and discover that it wasn't worth the effort.

> BTW: One obvious optimization: rewrite recv_line() to read input in bigger > chunks rather than sucking it by single characters.

This is done indirectly by buffered network IO. I considered it as an experimental feature and it is enabled only by http_open_socket Should we generalize it?

I want (1): run more processes at the same time. If we can lower the CPU usage by 50%, assuming that the network connection is not the bottleneck, we could test twice as many hosts at the same time (and then there are ways to optimize the bandwidth usage anyway).

> > The problem of the original HTTP caching mecanism is that it would grow > > the KB, which would in turn make forking way too slow on Linux/FreeBSD > > systems (where the fork() time is proportional to the process size). > > How much memory did it consume? Yes, you pay for every page whose entry > must be copied (and prepared for COW if it is writable) but there are > already hundreds of pages to be duplicated for the program itself, > dynamic libraries etc.

I don't have exact figures - it worked fine for me but Michel attempted to strangle me because it slowed down his laptop to its knees during an audit, so I disabled it for now and I'm thinking of a better approach.

But this is definitely something I want to investigate.

> On Sat, 13 Nov 2004, Renaud Deraison wrote: > > > Just a side note on this : what takes time is the loading of the .nasl > > file from disk to memory. The way the interpretor is done today, a > > compiled .nasl file is actually bigger than a un-compiled one, so the > > CPU time gained by using a binary file is lost in system calls to read > > the file from disk. > > Do you need more than 3 syscalls to read the whole file: open(), read(), > and close()?

No, but the kernel has to actually read the file from disk - and this is slow.

When I had the plugin-server running (one process loading all the plugins in memory and "handing them out" to other processes), the system load was lower (less open()/read()/close() calls) but the results were not entirely there yet (and these changes bring a lot of complexity to the behavior of Nessus, so I decided to remove them). However, this approach, mixed with other optimizations, might prove to be worthwhile (especially if we use a real VM, where the compilation of the scripts will probably be slower).

> BTW: One obvious optimization: rewrite recv_line() to read input in bigger > chunks rather than sucking it by single characters.

> Yet another possible optimization: the hashing function in kb.c is > suspicious (repeated use of << shifts startings characters into oblivion > when it gets a long key) and might lead to uneven distribution of keys.

I'll have a look at it, but since the number of keys is low anyway, I'm not too concerned about this (it won't have a significant performance impact).

> I don't have exact figures - it worked fine for me but Michel attempted to > strangle me because it slowed down his laptop to its knees during an > audit, so I disabled it for now and I'm thinking of a better approach.

As memory is more a militing factor than CPU today, I think that any kind of optimization should not cost a single byte more. Otherwise, you will not be able to run more process in parallel: RAM will be the limit.

> No, but the kernel has to actually read the file from disk - and this is > slow.

Unless we have enough memory and the scripts are kept in the file system cache. The scripts directory weight 27 MB, not that big.

> When I had the plugin-server running (one process loading all the > plugins in memory and "handing them out" to other processes)

IMHO, we should not attempt to do the buffer cache job. There are OS when you can give "hints" to the VM manager. We could lock the files in RAM with mmap + mlock, but I'm not sure this is a great idea. At least, this is easy to implement in a separate process and probably more efficient than any "plugin server".

> (especially if we use a real VM, where the compilation of the scripts > will probably be slower).

Even after the port scan of a system is finished, yes, it can take quite a bit of time to get through all the scripts. When a lot of systems are being scanned simultaneously it can put a great load on the CPU. I would say to concentrate on optimizing CPU, because nessus already has a user definable settings for max number of IPs to scan, and max number of scripts to run simultaneously, should the network or memory load be too great.

I agree a great way to optimize CPU is to run scripts smarter, i.e. not to run scripts unnecessarily. So I think the optimize setting is great, and anything that can improve the reliability of the optimization (namely, running as few things unnecessarily as possible, without skipping anything that should have been run) will be a good thing.

Since I don't know much about nessus internals I can't speak much to that, but I can offer the perspective of someone who is able to use nessus on a wide range of systems, and sees strange results regularly.

In regard to banner collection it seems like a good idea to collect the banners first, reliably, and then move on to the analysis of that collection, rather than trying to recollect. If there was a problem collecting, then either try harder (i.e. quickly send request again, with longer timeouts, etc) or freeze for a while to ride out whatever problem might be out there before trying again. (And perhaps nessus automatically tries harder on well known ports) But I'm in agreement that this is a bad idea to distribute collection into scripts. At least if the user knows there is "no result" from the scan, they know to try again later. With "some result", they do not.

Further on that topic, it seems to me that the current plugins are producing a lot of conflicting results from scripts. A great example is simply knowing what kind of web server is on a port. One script detects that it's a VNC HTTP server. Another script says oh yeah, that port is Pi3Web. Or a bunch of stuff on a CompaqHTTPd. Or IIS findings on a platform another script figured out was a Unice. The kb/optimization could do a better job of not running scripts that don't apply. Among those scripts whose job it is to figure out what kind of platform is, the kb can collect results. The first script to submit a result (i.e. asserts platform like windows/unice or application like apache/iis) is presumed to be correct. Any conflicting results (i.e. unice+iis, iis+apache, apache+other, etc) can be dealt with before the "worker bee" scripts are launched. The way to deal with it could range from logging a security_hole of "Not sure what platform" and setting both kbs, or silently keeping a vote and ignoring vastly outvoted kb settings, or whatever. Fundamentally, you can't have two different services listening the same port of the same ip address, and I think stuff related to http is the majority of conflicting results, so this would be a good way to not only weed out bad results but also optimize scripts from being run at all.

I agree that old scripts shouldn't be removed, but one possible input into the optimization decision could be whether to trust version numbers. There are two ways to look at version numbers. If version 1.14 is vulnerable and 1.15 is fixed, nessus can trust that if you see 1.14 it's vulnerable (although frequently wrong because vendors do not see fit to alter the version numbers in any way to so indicate.) However, it can also trust that if it sees 1.15, that it's not vulnerable and there's no point in running the script to test a vulnerability that went away (unless it comes back in a later version).

(Another option that comes to mind is a firewall/no firewall detected/ known no firewall result, possibly even set by the user if they wish. It's a waste of time to be told about "holes in the firewall" ala port 53 etc, when I'm scanning a LAN and not going through a firewall.)

> In regard to banner collection it seems like a good idea to collect the > banners first, reliably

I have written a new TCP scanner that is quick and also grab banners. It saves time in find_service.nes

> I'm in agreement that this is a bad idea to distribute collection > into scripts.

We need a collection part to debug the scripts: it is easier with the command line NASL. Maybe we should this kind of code so that it behaves differently when COMMAND_LINE is set. This kind of thing could be changed too, from: port = get_kb_item("Services/www"); if (! port) port = 80; into: port get_kb_item("Services/www"); if (! port) if (COMMAND_LINE) port = 80; else exit(0);

> Further on that topic, it seems to me that the current plugins are producing > a lot of conflicting results from scripts.

You should report them so that we can fix them. Be sure that you are using up to date plugins before you do: the bug might be already fixed.

> Or IIS findings on a platform another script figured out was a > Unice.

This can happend with load balancers or port redirectors. This might be more common than we think. So we cannot rely upon os_fingerpriting to remove false alerts on services.

> The kb/optimization could do a better job of not running > scripts that don't apply.

Usually, they do. Some script_require_keys instructions might be missing, however.

> Among those scripts whose job it is to figure out > what kind of platform is, the kb can collect results. The first script to > submit a result (i.e. asserts platform like windows/unice or application > like apache/iis) is presumed to be correct. Any conflicting results (i.e. > unice+iis, iis+apache, apache+other, etc) can be dealt with before the > "worker bee" scripts are launched.

Services fingerprinting is the answer: the experimental www_fingerprinting_hmap (too much verbose currently) has never mixed an Apache with an IIS, or a Compaq WM with a VNC.

> The way to deal with it could range > from logging a security_hole of "Not sure what platform" and setting both > kbs

The idea is interesting anyway. This could be a way to detect a service behing a load balancer, which means that DoS plugins will be unreliable and should not be launched, among other things.

> Fundamentally, you can't have two different services listening > the same port of the same ip address

Yes you can, with Pound for example: this reverse proxy can redirect different URL to different machines. I suspect that this is common on huge web sites behing load balancers, when they have both a "static" part (only HTML and JPG) and an "application" part (CGI...).

But unless the load balancer is seriously broken, this should not trigger on out simple service identification plugins.

> I agree that old scripts shouldn't be removed, but one possible input into > the optimization decision could be whether to trust version numbers. There > are two ways to look at version numbers. If version 1.14 is vulnerable and > 1.15 is fixed, nessus can trust that if you see 1.14 it's vulnerable > (although frequently wrong because vendors do not see fit to alter the > version numbers in any way to so indicate.) However, it can also trust that > if it sees 1.15, that it's not vulnerable and there's no point in running > the script to test a vulnerability that went away (unless it comes back in > a later version).

This means that we should have to put version numbers in the KB, and maybe add a new kind of optimization. Something like script_require_version I'm afraid this is not a simple modification.

And now for something completely new!

I have played with oprofile, and I may have found a good way to save CPU. I will not tell more until I am sure this works, because the profiler output looks a little strange and inconsistent from one run to another.

> We need a collection part to debug the scripts: it is easier with the > command line NASL. Maybe we should this kind of code so that it > behaves differently when COMMAND_LINE is set. > This kind of thing could be changed too, from: > port = get_kb_item("Services/www"); > if (! port) port = 80; > into: > port get_kb_item("Services/www"); > if (! port) if (COMMAND_LINE) port = 80; else exit(0);

I like this idea a lot. How difficult would it be to better variablize the port? What if I want to test against a non-standard port? Is the answer to just modify the nasl script? Ideally, it could be passed in on command line or nasl could accept the script on stdin so that I could massage the script on-the-fly, put it in a pipe, and smoke it.

On Thu, Nov 18, 2004 at 11:14:51AM -0700, Erik Stephens wrote: > I like this idea a lot. How difficult would it be to better variablize > the port? What if I want to test against a non-standard port? Is the > answer to just modify the nasl script? Ideally, it could be passed in > on command line or nasl could accept the script on stdin so that I > could massage the script on-the-fly, put it in a pipe, and smoke it.

It would make more sense to implement one of the two following :

- When in command-line mode, get_kb_item() prompts the user for a value ;

- When in command-line mode, the user can 'import' a KB when executing the script ;

Changing behavior when run in command-line mode is wrong :

- Command-line is used for debugging. If the script behaves differently when in command-line mode rather than run from within nessusd, it makes problems un-fixable (or hard to fix at least) ;