When 'scan completed successfully' means nothing

Table of Contents

A security scan just finished. The report shows some vulnerabilities. The security checkbox gets marked. The deployment proceeds. But here’s what nobody asked: how much was actually tested?

It doesn’t matter whether that scan was done with a DAST tool, an AI-assisted pentest, or a manual pentest. The gap is the same: “scan completed” and “target tested” are not the same thing.

Maybe that gap sounds familiar. This post it’s the first in a series where metrics, not reports, guide the path to scan quality.

This series doesn’t focus on findings. It focuses on measurement. Specifically: how to tell if any dynamic security scan actually tested your target (API, web application, mobile app…) or just went through the motions. The data below will show what that looks like in practice, why it matters, and why your current scans might be giving you a false sense of security.

The setup: two APIs, one scanner

To show what good measurement looks like versus guessing, two APIs will be scanned throughout this series:

crAPI: OWASP’s intentionally vulnerable API. It’s designed to be exploited, which makes it perfect for tracking how security findings evolve as scan quality improves.
EnterpriseAPI: A FastAPI application with Pydantic validation and complex business logic. This one reflects what you’d typically find in a real enterprise environment: strict input validation, interconnected endpoints, and workflows that require specific data to function.

Both APIs will be scanned using the same OpenAPI specification file in every test. This ensures that all experiments are comparable.

The scanner used throughout this series is ZAP, running in DAST mode. DAST was chosen deliberately: unlike manual or AI-assisted approaches, automated scans are deterministic - same configuration, same payloads, same target, same results. That consistency is what makes it possible to measure the impact of each improvement across iterations.

These targets are both APIs, but the principles demonstrated throughout this series apply equally to web applications, mobile apps, or any other system that processes requests. The measurement framework, the failure modes, and the quality signals are the same regardless of asset type or testing method.

The goal is straightforward: demonstrate that high-quality dynamic security scans with real coverage are achievable. To get there, stop blindly trusting the final report and start measuring what actually happened during the scan.

The baseline configuration

For this first experiment, we applied what we consider a minimal configuration:

ZAP’s default API scan policy with its out-of-the-box settings
A Replacer rule to inject the authentication token into the Authorization header
Slightly optimized Active Scan settings for API testing

What the metrics tell us

So what separates a scan that worked from one that just looked good? Once any dynamic security scan completes, looking at the final report isn’t enough. Understanding what actually happened requires digging into the data. These metrics apply universally:

Endpoints/URLs — Total endpoints/URLs in scope
Scanned — Endpoints/URLs that returned at least one successful response (labeled “Reached app logic” in the charts). This is a critical distinction: a request counts only if it passed validation and actually executed code inside the application — not just if a request was sent and rejected.
Failing — Endpoints that always return HTTP 4XX/5XX errors
Unscanned — Endpoints in scope that were never tested
Hitting rate limits — Endpoints returning rate limit responses

Why does “Failing” matter? If an endpoint returns errors on every single request, it means no valid request ever reached the application logic. That’s not a scanned endpoint, it’s a configuration problem hiding in plain sight.

Rate limits are a silent killer of scan quality. When the scanner encounters rate limiting, it backs off and moves to other targets. A single rate limit response can cause the scanner to abandon an endpoint before it’s been tested thoroughly. Across dozens of endpoints, this creates massive coverage gaps. While rate limits are essential in production, in a testing environment designed for security validation they create blind spots where vulnerabilities can hide.

Coverage results

Coverage breakdown by endpoint status — crAPI vs EnterpriseAPI

Effective coverage comparison — 51% vs 16%

The “Effective coverage” metric tells the real story. It’s the percentage of endpoints where valid requests actually reached the application logic. In crAPI, more than half the attack surface was tested. In EnterpriseAPI, only one endpoint in six was.

The unscanned endpoints in EnterpriseAPI are marked as “failing” on every request. From the scanner’s perspective, they’re invisible.

Now, here’s another detail worth studying: EnterpriseAPI has a rate limit of 40 requests per minute on all endpoints. For this baseline test, rate limiting was intentionally left enabled to measure how many endpoints would be silently skipped because of it. A thorough DAST scan should inject hundreds of payloads per endpoint within seconds, easily triggering the rate limit. The result: only 1 of the 5 scanned endpoints hit the rate limit. This reveals two problems. First, the scan barely tested those endpoints. Second, and more importantly, it signals that scan depth itself needs improvement. If payloads were being tested exhaustively, rate limits would trigger across most scanned endpoints. They didn’t.

Info

In a real testing environment, rate limits should be disabled during security testing. A test environment should be configured to allow complete and correct security testing. Other impediments that should be removed include WAF rules that block payloads with injection patterns. The testing environment should be sterile, isolated, and configured for testing. Production defenses belong in production, not in testing.

Vulnerability findings

Detected findings — crAPI vs EnterpriseAPI

The contrast is stark. crAPI, designed to be vulnerable, shows 31 findings across 10 different vulnerability types. EnterpriseAPI shows just 2 findings, both informational or low severity.

Is EnterpriseAPI more secure than crAPI? Maybe. But in this case, the scanner simply couldn’t test it properly.

Why the difference?

This is where it gets interesting. The gap between crAPI and EnterpriseAPI comes down to two factors, and understanding them explains most dynamic security testing failures in production:

1. Test data quality

crAPI’s OpenAPI file came pre-configured with example test data values for every parameter. When the scanner builds requests, it uses those examples to create valid payloads. The requests work, the endpoints respond, and the scan can do its job. The same principle applies to a manual pentest: a tester who has valid test accounts and example payloads will reach more application logic than one working blind.

EnterpriseAPI’s OpenAPI file had no examples. Pydantic’s strict validation rejected almost everything before it even reached the application logic. The 5 endpoints that got scanned? Those are the ones with no parameters at all.

2. Business logic complexity

crAPI has straightforward endpoints. Send the right data, get a response. EnterpriseAPI has interdependencies. Some endpoints require data from previous calls, certain fields need specific formats, and relationships between objects must be valid.

This is the reality of enterprise APIs. They’re not designed to be easily scanned. They’re designed to be robust against invalid input. That robustness, ironically, makes them resistant to security testing too.

What this baseline teaches us

These metrics reveal something uncomfortable: a completed scan with a vulnerability report tells you almost nothing about scan quality. It doesn’t matter if the scan was a DAST tool, an AI-automated pentest, or a human tester - the same blind spots exist, and they’re invisible without measurement. Here’s what the data actually shows:

Coverage isn’t binary. “Scan completed” doesn’t mean “target tested.” In EnterpriseAPI, 84% of endpoints failed on every request. That’s not a scan. That’s a configuration problem - one that would affect any testing approach that relied on the same broken test data.
Failing endpoints are invisible risks. If you’re only looking at the final report, you’d never know that 27 endpoints in EnterpriseAPI weren’t tested at all. Those 27 endpoints could contain critical vulnerabilities that will never appear in any report.
Correct test data is non-negotiable. Input validation, strict schemas, complex business logic: for testing to work, whatever performs the test needs to send data that passes those validations. Without proper test data, no scanner, AI agent, or human tester can reach the application logic to test it.

The uncomfortable question

This baseline shows where the starting point is: two APIs, one with decent coverage and one that’s barely tested at all. Both scans completed successfully. Both generated reports. Both got approved. But the data tells a completely different story.

Which one is your production API more like?

In the next post in this series, the focus shifts to the most impactful change in automated scanning: improving how valid requests are built.

The deeper question: if your test environment silently blocks 84% of your API from being tested, what is your security assessment actually telling you? And if you haven’t measured coverage metrics this way yet, how confident are you that your production environment isn’t doing the same thing?

When 'scan completed successfully' means nothing

The setup: two APIs, one scanner

The baseline configuration

What the metrics tell us

Coverage results

Vulnerability findings

Why the difference?

What this baseline teaches us

The uncomfortable question

Tags: