How To Tell A Good Security Test Report From A Bad One

Suppose you had a penetration test, a vulnerability assessment, a security test, whatever it was called. (Different people would use different names for different kind of test). Now you have a report, and, apart from having to sort out the problems that were discovered, you want to know if the testers have done a good job.

This post should help you with that last one.

First of all it is important to distinguish between fully automatic tests and what I'll call for a lack of better term "manual" tests. Fully automatic tests basically consist of running some vulnerability scanner (such as Nessus, Retina, etc.) and giving the report produced by the scanner to the customer. The final report has little or no human input. "Manual" tests may also use automated scanners for some part of the work. However, the findings produced by the scanner are validated by a human and used as input for further testing or exploitation.

The two kinds of test differ in price. Automatic tests are cheap, manual are expensive. Because of that, comparing the two does not make a lot of sense. Below I will be talking about a "manual" test.

A test report will always contain a section with test findings and recommendations. This is, of course, the most important part of the report, because it lists the discovered security problems and the suggestions for fixing them. Another interesting section is test details or test log. It should describe the actual tests that have been performed and their outcome. Not all testers will include it in the report. I personally believe that this is the most important part of the report that allows to distinguish between a good job and a lousy one.

So, let's start with findings and recommendations. What indicates a good report:

  • The findings are aggregated. That is, you don't have 10 different findings for missing Windows updates, you have one finding that says "There is a number of missing updates, here is a list of all affected hosts and the missing updates". Why does it make a difference? A vulnerability scanner will output a separate finding for each missing patch. For the customer it is important to know that there are unpatched systems (and what are they) and if there is anything important missing. The fact that findings are aggregated indicates that a human processed the results and made some sense out of them.
  • The descriptions and recommendations are specific to your situation. If your web application is written in ASP.NET and the description gives an example in PHP, you can tell it was copy-pasted without much thought.
  • The descriptions give both background information (for example, what is SQL injection in general) and the information specific to your particular case - what exactly is vulnerable and how. The background information is not always necessary - if a tester has been working with this customer for a while, he or she may know that the customer has a good idea what an SQL injection is and does not have to explain it.
  • The severity of the findings takes into account the specifics of the environment. A reflected cross-site scripting may be a very important problem in one case (single sign-on system) and not particularly important in another (an application that does not store any session-specific information)
  • The description includes a proof of concept exploit or a screenshot or log of successful exploitation. This is not always strictly necessary, but if the tester says that something is really a severe problem, I believe that he has to back up this statement by some proof. At least the report has to state if the tester has managed to exploit the problem and to what extent.
  • Again, not strictly necessary, but always a good sign - the description of the way to reproduce a problem. This information can be used by the customer to validate the fixes, or by the testers if a re-test is required. This information may not be included in findings and recommendations, but at least should be present in the test log or test details (see below).
  • The findings include a list of references that give background information about this type of problem and the ways to fix it. This allows a customer to get a better idea of the problem.

Now let's talk about the test details section. The first good sign is that it is present in the report. It allows the customer to see what exactly has been done, and also how it has been done. It also allows the tester to do the same if a re-test is required or some claim in the report has to be validated.

A "manual" test can use automated scanners. The test details section should state clearly what software has been used, what findings is has produced and why they are considered valid or invalid. In case of a large test (internal company network, for example) only the valid findings may be mentioned to save time and space.

For any security problem listed in findings and recommendations, the test details section should show the way to detect and/or exploit the vulnerability. If a number of similar problems has been found, it is usually enough to demonstrate the exploit for one case. Let's say if there are 200 workstations missing Windows updates hacking into each one of them may be a waste of time, but showing that you can get a shell on one of them is fine.

I think that proof of concept exploits are important. First of all, actually exploiting a problem may give a tester a better idea of the impact of the problem. There may be conditions that make it not exploitable, increase or decrease the impact. Secondly, it makes it easier for the customer's security officer or other person who commissioned the test to make a point. The first thing a person who mentions a security problem to the people who are responsible for it is "Yes, but it is not possible, because..." Saying that while looking at the evidence of successful exploitation is slightly more difficult.

The test details section should also include the tests that produced no evidence of vulnerability. This is important for the peace of mind of both the tester and the customer. It gives evidence that the tester was sufficiently thorough. Also, if at some later point a serious vulnerability is discovered that was not mentioned in the test report (pen-tester's nightmare) the test details section should provide information why it was missed. (For example, yes, we tested it, and at the time of the test, the system behaved differently from what we see now - this is what we got then).

The detailed description of how the vulnerability was identified or exploited is also important for re-tests. A re-test may be performed by the customer themselves, in which case it is very important to know what exactly to do to reproduce the problem. It may be performed by another tester, again in this case knowing what has been previously done saves time and improves reliability (The exploit seems to be blocked by IDS. Did the original tester have to use some trick to get around IDS filtering or is this how the vulnerability is fixed now?). Even if it is the original tester who does the retest, having the details helps a lot. Remembering the exact syntax of a particular SQL injection 6 months after is a lot to ask from a human memory.

Another sign of a good report is appendices containing custom scripts and tools. Not every test requires a development of custom test tools. However if a tester has done that, it usually shows general clue and capability.


Do you have any opinion on the scale used for the severity rating? In other words, would you value more a report that uses a common, standard scale (e.g. CVSS) than a report that uses an arbitrary scale that is only in-use at the tester's company? I would personally prefer the former, since the use of a standard scale - no matter how flawed it is in principle - helps the client better integrate the findings into their defect management/tracking system. Wanted to hear your opinion on this...

The intention of severity ratings is to assign the priority and the deadline to the corresponding fix and to decide if the action is needed at all. That's all the severity ratings are for - no more, no less.

The decision on the priority, the deadline and the necessity of a fix will be affected by two factors: the business impact of the problem and how difficult the fix is. But, unfortunately, the pen-tester has a very limited view on either factor.

The business impact of exactly the same vulnerability (i.e. SQL injection) can be from almost negligible (there is nothing in that database that is not otherwise publicly available) to a total disaster. On the other hand, what seems to the pen-tester like a trivial info leak, puts the hair on end on the client's head and gets escalated to the CEO. A pen-tester usually has some understanding of what the tested system is doing in terms of the client's business, but it is nowhere close to complete. So, in the end, only the client self can fully estimate the business impact of any particular problem.

The difficulty or the cost of the fix is also something that the pen-tester may have no idea about. What seems like a trivial change in the code (just escape the stupid string!) may actually involve getting a patch from a vendor that has months-long turn around on bug reports (while the client's system is three days before the release). It can also involve various political considerations, usability problems (if we tell the users that their passwords must contain non-alphanumeric characters, there going to be a riot), and other weirdness (see Warranty Void If Password Changed).

So, what I am getting at here? In my opinion, a carefully calculated CVSS score is about as useful as wetting your finger, sticking it in the air, and declaring it "medium severity". The penetration tester just does not have the necessary information to assign a meaningful severity rating. I might even argue that giving a CVSS score is actually misleading, because when you say "it's 7.8 exactly", it seems like you mean it .