The API Penetration Testing Checklist I Actually Use

Most API pentesting guides read like a textbook. They list the OWASP API Top 10, define each item, and stop. That is not how testing works in practice. When I open Burp and look at a target API for the first time, I am not thinking about categories. I am running a specific sequence of checks, in a specific order, because some bugs are cheap to test and some are expensive, and you want the cheap signal first.

This is that sequence. It is the same one I run on every engagement. Steal it, adapt it, criticize it. Whatever makes you a better tester.

Step Zero: Understand What You Are Looking At

Before any active testing, I spend roughly an hour just reading. The goal is to answer:

What does this API do, and who is supposed to call it?
How does authentication work? Cookies, bearer tokens, API keys, mTLS, signed requests?
Is there documentation? Swagger, OpenAPI, GraphQL introspection, Postman collections?
Are there multiple user roles or tenants? Where are the boundaries between them?
What backend technology is in use? Headers, error messages, and timing all leak this.
Are there public clients consuming this API? Mobile apps and SPAs both reveal endpoint shapes.

If documentation is missing, I generate my own. Proxy the official client (web app, mobile app) through Burp and let it walk the API for me. Mobile apps especially are gold because developers often expose more endpoints there, assuming the client is hard to inspect. It is not.

Authentication

Authentication bugs are usually the highest-impact findings, so they get tested first. If I can become another user or skip authentication entirely, almost nothing else matters.

The checks I run:

Remove the auth header entirely. Does the endpoint respond with data?
Replace the token with garbage. Does it fail differently than no token at all? Different error messages can indicate the parsing happens before the verification.
If JWT, run through the full JWT checklist (algorithm confusion, weak secrets, alg none, kid injection).
If API keys, check whether they are scoped, whether they expire, whether they appear in URLs or referer headers.
If OAuth, check the redirect URI validation, the state parameter, PKCE on public clients, and whether the authorization code is single-use.
Try the auth header from a low-privileged user on endpoints that should require admin.
Try the auth header from a different tenant entirely.
Look for endpoints that accept either session cookies or bearer tokens. The validation paths are often different and one may be weaker than the other.
Check password reset and account recovery flows specifically. These are usually outside the main auth path and weaker than the main login.

Authorization: BOLA and BFLA

Broken Object Level Authorization is the number one bug in modern APIs. I find it on roughly two out of every three engagements.

The test is simple. Find an endpoint that returns or modifies a specific object by id. Create two test accounts, A and B. Have A create an object. As B, try to access or modify A's object using A's object id.

Variations to try:

Numeric ids that you can simply guess or enumerate.
UUIDs that you might find via other endpoints, search functionality, audit logs, or response headers.
IDs embedded in JWT claims that the server uses directly without re-checking the database.
Bulk endpoints that accept arrays of ids and may not check authorization on every element.
GraphQL queries that traverse from an authorized object to a related unauthorized one through field resolution.

Broken Function Level Authorization is the sibling. The endpoint exists, it just should not be callable by your role. Find an admin route, hit it with a non-admin token, see what comes back. Read the HTTP method too. Many APIs check authorization on GET routes and forget DELETE.

A real finding from a 2025 engagement: an admin user-deletion endpoint accepted requests from any authenticated user, but only when the method was DELETE. The web client never sent DELETE, so the bug had survived multiple internal reviews.

Input Validation

Once authentication and authorization are tested, I start tampering with values. The mindset is "what happens if I send something the server did not expect?"

Things to try on every input field:

Type confusion. Send a string where a number is expected, an array where an object is expected, null where a string is expected.
Length. Send an empty string, a string of one character, a string of one million characters.
Encoding. Send UTF-8 surrogates, null bytes, RTL override characters, zero-width spaces.
Injection. SQL, NoSQL, LDAP, command, template, XPath, XML. The injection class depends on the backend, but the test is the same: insert syntax for that language and watch the response.
Path traversal in any field that touches the filesystem.
SSRF in any field that takes a URL, hostname, or even an image reference.

GraphQL deserves special attention. Send introspection queries even if introspection is supposedly disabled, because partial disablement is common. Try aliasing to bypass rate limits. Test for batching attacks. Send deeply nested queries to find DoS vectors. Look at every mutation and ask whether the resolver re-checks authorization or assumes the parent did.

Mass Assignment

If an endpoint accepts a JSON object and maps it to a database record, the developer may have forgotten to whitelist which fields are settable. The bug is simple but the impact is often large.

Find a profile or account update endpoint. Look at what fields the official client sends. Then try sending:

"role": "admin"
"is_verified": true
"balance": 999999
"tenant_id": <other tenant>
"created_at": <past date>
"id": <other user id>

The field names depend on the application. Read the response when you GET the same object, and try setting every field you see in the response. Then try the field names that are not in the response but might exist in the database, things like email_verified, is_active, subscription_tier, permissions.

I have found banking apps that let users set their own account balance through mass assignment. It happens.

Rate Limiting and Resource Abuse

APIs that rely on the client to enforce limits are common, especially in B2B contexts where the developer thought the consumer would always be trusted.

What to test:

Does the login endpoint have rate limiting at all? Try a hundred wrong passwords and see if anything happens.
Is the rate limit per IP, per token, or per account? IP-based limits fall to anyone with a botnet or a residential proxy service.
Are there expensive operations (report generation, export, bulk import) that can be triggered repeatedly?
Are there endpoints that send emails or SMS messages and can be abused to send to arbitrary recipients?
Are there endpoints that consume third-party API quotas the target pays for?
Can you bypass the rate limit by varying case in the URL, adding query string parameters, or changing the HTTP method?

This category often gets dismissed as low severity. It should not. A login endpoint without rate limiting is an open door to credential stuffing, and credential stuffing is how a huge fraction of real breaches actually happen.

Error Handling and Information Disclosure

Every error message is a free hint. I deliberately trigger errors and read what comes back.

Stack traces with file paths reveal the framework and the application structure.
SQL errors confirm the backend and the parameter that reaches the database.
"User not found" versus "Invalid password" lets you enumerate accounts.
Different response times for valid versus invalid users have the same effect even when the messages are identical.
Verbose error responses on staging environments that got copied to production.
Debug endpoints (/debug, /actuator, /api-docs, /swagger, /.git) that should not be exposed.

The Spring Boot actuator family of endpoints is a personal favorite. When exposed and unauthenticated, /actuator/env and /actuator/heapdump give you the entire configuration, including database credentials. I find one of these per quarter, conservatively.

Business Logic

This is the category that automated scanners cannot find, and the category where the highest-impact bugs live.

You have to actually understand what the application is trying to do, then think about every step in the workflow and ask what happens if a step is skipped, repeated, performed in the wrong order, or performed with values just outside the expected range.

Examples I have actually exploited:

A checkout flow that calculated tax on the client side and trusted the value.
A refund endpoint that did not check whether the order had already been refunded.
A coupon code system that did not enforce one-use-per-customer.
A wire transfer endpoint that accepted negative amounts, effectively withdrawing money from the destination account.
A two-factor enrollment flow that, if you opened two browser tabs and submitted the second one, replaced the new TOTP secret with one the attacker had chosen.

None of these were findable with a scanner. All of them were findable by reading the API carefully and asking "what is this assuming, and what happens if I violate the assumption?"

Tooling

For completeness, the tools that actually earn their place in my workflow:

Burp Suite Professional. The autorize extension is essential for BOLA testing at scale.
mitmproxy for mobile app interception when I want a CLI-driven workflow.
ffuf for endpoint discovery.
nuclei for known-vulnerability sweeps and quick wins on third-party components.
jwt_tool for JWT-specific manipulation.
A custom Python script library for the recurring tedious tasks. Build your own. Do not rely on someone else's wrapper for testing primitives you should understand intimately.

Closing Thought

Most pentest reports I read are checklist-driven and miss the interesting findings. The checklist exists to catch the easy stuff so that you have time for the hard stuff. Run through every item above quickly, find what you find, then spend the rest of the engagement actually thinking about the application's logic.

A pentester who only runs the checklist will find the same bugs every other tester finds. A pentester who thinks finds the bug nobody else saw, the one the client actually needed to know about.

Be the second kind.