Types of Penetration Testing: Black, White, and Gray Box

Types of Penetration Testing: Black, White, and Gray Box

Types of Penetration Testing: Black, White, and Gray Box

TL;DR

  • Penetration tests divide first by knowledge level (black box, white box, gray box) and second by target surface (network, web app, mobile, API, cloud, social engineering, physical).
  • Black box gives testers no prior information and best simulates an external opportunistic attacker. White box gives testers full access to source code and architecture, maximizing defect yield. Gray box sits in between and is the most commonly commissioned type.
  • PCI DSS Requirement 11.4 mandates both internal and external network pen tests annually. SOC 2 and ISO 27001 do not name a type but expect evidence of at least one annual third-party assessment.
  • Red team engagements are not a variety of pen test. They use pen test techniques over weeks or months against a defined objective, guided by frameworks such as MITRE ATT&CK.
  • OWASP publishes three free testing guides that define scope and technique for the most common surfaces: WSTG v4.2 for web apps, MASVS / MASTG for mobile, and API Security Top 10 (2023) for APIs.

Who this is for: Security engineers preparing for a pen test engagement, compliance managers scoping annual testing programs under PCI DSS, SOC 2, ISO 27001, or HIPAA, and startup CTOs running their first third-party assessment. The guide assumes you understand what a penetration test is but want to know how to choose the right type.


What a penetration test actually produces

A penetration test is an authorized, scoped, simulated attack against your systems, run by external security professionals to find exploitable vulnerabilities before real attackers do. The output is a written report that ranks findings by severity (critical, high, medium, low, informational) with reproduction steps and remediation guidance.

A pen test is not a vulnerability scan. A scan automatically catalogs known weaknesses against signature databases. A pen test chains those weaknesses together to demonstrate real-world impact: a tester might combine an exposed admin panel with a weak credential to show full database access, not just note that the panel exists. For the distinction in depth, see our pen test vs vulnerability assessment guide.

NIST SP 800-115 ("Technical Guide to Information Security Testing and Assessment," 2008) is the foundational published US government reference on methodology. It defines four phases common to all pen test types: planning, discovery, attack, and reporting. Note that NIST has signaled a revision is forthcoming; check the NIST CSRC page for the current draft status.


The three knowledge-level types of penetration testing

Illustration related to The three knowledge-level types of penetration testing
Photo by Keyla Brito

The first choice is how much information you give the testing firm before the engagement begins. This determines whose behavior the test simulates and which classes of bugs surface.

Black box penetration testing

Testers start with nothing but the target identifier: a domain, an IP range, or an app store listing. They perform open-source intelligence gathering, fingerprint the stack, and attack from the outside, the same way an unaffiliated external attacker would.

Where it fits: Simulating opportunistic external attackers. Validating perimeter defenses. Satisfying compliance requirements that specify "external" testing (PCI DSS Requirement 11.4.1 requires testing from outside the network perimeter).

What it finds: Exposed services, misconfigured cloud assets, leaked credentials in public repositories, forgotten subdomains, and authentication bypass on internet-facing assets.

What it misses: Reconnaissance consumes engagement time instead of exploitation work. Authenticated application paths, internal lateral movement chains, and business logic flaws often go untested within a standard engagement window.

White box penetration testing

Testers receive full access: source code, architecture diagrams, valid credentials at every privilege level, infrastructure-as-code repositories, and direct access to engineering contacts for questions. The engagement is collaborative.

Where it fits: Finding deep application vulnerabilities (insecure deserialization, race conditions, business logic flaws, cryptographic implementation errors) before launch. Building the highest-assurance evidence package for SOC 2 or ISO 27001 reviews.

What it finds: Logic bugs that are invisible from the outside. Code-level issues that no amount of external probing would surface within a realistic timeframe.

What it misses: External misconfigurations that are obvious to an outside attacker. Testers focus on what the code does, not on what the deployed surface looks like to someone with no context.

Gray box penetration testing

Testers receive partial information: a low-privilege user account, a high-level architecture summary, and sometimes a public API specification. They simulate an authenticated user acting maliciously, or an attacker who has already compromised low-tier credentials.

Where it fits: Most modern SaaS applications, where the realistic threat is account takeover or privilege escalation by an authenticated user, not purely an external port scan. Gray box is the most commonly commissioned type for this reason.

What it finds: Both perimeter bugs and authenticated-flow issues — broken access controls, insecure direct object references, privilege escalation paths, and session management weaknesses.

Where scope discipline matters: The exact access level provided determines what the test actually covers. An engagement that provides admin credentials instead of user-level credentials produces a fundamentally different test. Define this precisely in the statement of work.


The seven target surfaces

The second choice is what the testers are attacking. Most engagements combine two or three surfaces; few cover all seven.

1. Network penetration testing

Tests the corporate network from outside (external network test) or inside (internal network test). An external test looks for exposed services, misconfigured firewall rules, and unpatched internet-facing servers. An internal test models post-phishing lateral movement: once a foothold exists inside the perimeter, how far can an attacker go?

PCI DSS Requirement 11.4 mandates both external and internal network pen tests at least annually, plus segmentation testing if you use network segmentation to reduce cardholder data environment scope. This is the one compliance requirement that names the test type explicitly.

2. Web application penetration testing

The most frequently commissioned surface. Testers evaluate the application against the 12 testing domains defined in OWASP WSTG v4.2: information gathering, configuration management, identity management, authentication, authorization, session management, input validation, error handling, cryptography, business logic, client-side security, and API security.

The OWASP Top 10 (currently the 2021 edition, with the 2025 edition in development) provides the widely accepted short-form summary of the most critical web application risks.

Engagement length scales with application complexity. A small marketing site differs fundamentally from a multi-tenant SaaS platform with role-based access control across thousands of customer accounts.

3. Mobile application penetration testing

Targets iOS or Android applications. Testing covers static binary analysis, dynamic analysis of the running app, transport security (certificate pinning, TLS configuration), and any backend APIs the app communicates with.

The OWASP Mobile Application Security Testing Guide (MASTG) is the standard methodology reference for mobile pen testing. It maps to the OWASP Mobile Application Security Verification Standard (MASVS), which defines the security requirements that mobile apps should meet.

Common findings include insecure local data storage, weak certificate validation, hardcoded credentials in the binary, and exposed third-party SDK integrations.

4. API penetration testing

APIs now carry the majority of application traffic in modern SaaS architectures. The OWASP API Security Top 10 (2023) defines the test scope: broken object-level authorization, broken authentication, broken object property level authorization, unrestricted resource consumption, broken function-level authorization, unrestricted access to sensitive business flows, server-side request forgery (SSRF), security misconfiguration, improper inventory management, and unsafe consumption of third-party APIs.

API testing is increasingly priced as a standalone engagement rather than bundled with web app testing, because the attack surface is distinct: GraphQL introspection abuse, mass assignment through JSON property manipulation, and rate limiting failures are invisible to a tester focused on browser-delivered HTML.

5. Cloud penetration testing

Tests the configuration of AWS, Azure, or GCP environments for misconfigured IAM roles, publicly readable storage buckets, overly permissive security groups, and metadata service abuse paths (the SSRF-to-instance-metadata attack pattern that has appeared in several high-profile cloud breaches).

Cloud pen testing operates within provider-specific rules of engagement. AWS publishes its Penetration Testing Policy directly; testing is permitted for your own resources without prior approval for most services. Azure and GCP have their own policies. Review the relevant policy before scoping to avoid accidental terms-of-service violations.

6. Social engineering penetration testing

Tests the human layer rather than technical systems. Standard deliverables include phishing simulations (email-based), vishing (phone-based pretexting), and tailgating (physical access through deception). The report covers credential capture rates, click rates, and which controls actually triggered (DMARC rejection, endpoint alerts, helpdesk escalations).

The 2024 Verizon Data Breach Investigations Report found that the human element was involved in 68% of breaches. Social engineering testing moves that finding from an abstract statistic into a measured result for your specific organization and workforce.

7. Physical penetration testing

Tests physical access controls: badge cloning, lock picking, tailgating into server rooms, unauthorized equipment removal, and dumpster diving for sensitive documents. Less common than technical testing, but required for some regulated environments (financial services data centers, defense contractor facilities, healthcare sites with PHI stored on physical media).


Specialized categories worth knowing

Wireless penetration testing. Targets Wi-Fi networks for weak authentication protocols (WEP, WPA-Personal with weak passphrases), rogue access point detection, and Bluetooth attack paths. Often paired with physical testing when the assessment covers a physical facility.

IoT and OT penetration testing. Targets industrial control systems, medical devices, or consumer IoT hardware. Requires specialized firms with hardware reverse-engineering capability and familiarity with protocols like Modbus, DNP3, and BACnet. Pricing and engagement complexity are significantly higher than standard IT testing.

Red team engagements. A red team is not a type of pen test. It is a different operational model: a small adversary simulation team works against a defined high-value objective (exfiltrate a specific dataset, achieve persistence on a domain controller) over weeks or months, using whatever combination of network exploitation, social engineering, and physical access works. The deliverable is a narrative of how the adversary succeeded or was stopped, mapped to the MITRE ATT&CK Enterprise Matrix tactics: initial access, execution, persistence, privilege escalation, defense evasion, credential access, discovery, lateral movement, collection, exfiltration, and impact.

For the distinction between red teaming and traditional pen testing, see our pen testing vs red teaming comparison.


Compliance-driven decision table

Illustration related to Compliance-driven decision table
Photo by Mikhail Nilov

The right test type depends heavily on which framework is driving the requirement.

Compliance driverWhat is requiredRecommended combination
PCI DSS (Req. 11.4)Annual internal + external network pen test; segmentation test if applicableBlack box external + gray box internal network
SOC 2 Type 2No type specified; auditors expect evidence of annual third-party assessment and remediation of high/critical findingsGray box web application + API
ISO 27001 (Annex A 8.8)Technical vulnerability management; certifying bodies expect at least one annual pen test as supporting evidenceGray box web application + external network
HIPAA (§164.308(a)(1))Risk analysis covering technical safeguards for ePHI; no pen test type namedNetwork + web app + API covering all ePHI-touching systems
First pen test, no specific driverNo requirement; choose based on highest-risk surfaceGray box web application against the most business-critical product

HIPAA's Security Rule does not name penetration testing, but satisfying the risk analysis requirement at §164.308(a)(1) without technical testing is difficult to defend under HHS enforcement review.


Comparison table

Type Knowledge level Realism Defect yield Best for
Black box None Highest Lowest per day External attack simulation, PCI DSS 11.4.1
Gray box Partial Medium-high Medium-high SaaS apps, SOC 2, ISO 27001, HIPAA
White box Full Lowest Highest per day Critical pre-launch reviews, deep assurance
Network (external) Black or gray High Medium PCI DSS, perimeter validation
Web application Usually gray Medium High SaaS, e-commerce, customer portals
API Gray Medium High API-first SaaS, fintech
Cloud Gray or white Medium High Cloud-native orgs (AWS, Azure, GCP)
Social engineering Black High Medium Phishing resilience, security awareness validation
Physical Black or gray High Medium Data centers, regulated facilities


How to scope an engagement in five decisions

  1. Identify the trigger. PCI DSS, SOC 2, ISO 27001, HIPAA, a customer security questionnaire, or a self-initiated review each imply different surfaces and frequencies.
  1. Set the knowledge level. Default to gray box for most production SaaS applications. Use black box when the explicit goal is validating what an external attacker sees. Use white box only for high-assurance reviews of critical or pre-launch systems.
  1. Choose the target surfaces. Web app plus API covers most modern SaaS stacks. Add external network if PCI DSS applies or if you have significant internet-facing infrastructure. Add cloud if you are AWS, Azure, or GCP-native with complex IAM configurations.
  1. Agree on the testing window. Most engagements run five to fifteen business days of active testing, followed by a reporting phase. Define this in the statement of work with a clear retesting window after remediation.
  1. Confirm deliverables before signing. Written report with severity ratings, an executive summary, reproduction steps, and a retest letter for closed findings. Verify whether the retesting pass is included or priced separately.

Common scoping mistakes

Illustration related to Common scoping mistakes
Photo by Marta Nogueira

Treating black box as the compliance default. PCI DSS does require external testing (black or gray box), but most other frameworks specify only "a pen test." Choosing black box for a SOC 2 engagement because it appears less expensive leaves authenticated-flow vulnerabilities untested.

Excluding the API surface from a web app test. Modern SaaS applications route a large share of traffic through REST or GraphQL APIs that operate independently of the browser-rendered UI. A web app engagement that excludes underlying APIs misses the surface where broken object-level authorization and mass assignment vulnerabilities typically live, as defined in the OWASP API Security Top 10 (2023).

Skipping segmentation testing under PCI DSS. If you use network segmentation to reduce your cardholder data environment scope, PCI DSS Requirement 11.4.5 requires segmentation testing to confirm the isolation actually holds. This is a separate line item from the standard network pen test.

No retesting budget. A finding is not remediated until a tester confirms the fix works. Most reputable firms include 30 days of retesting as standard; verify this before signing the engagement letter. A high-severity finding without a retest letter is not a closed finding for auditor purposes.

Running the same test annually without updating scope. The attack surface changes with every major release, new integration, and infrastructure migration. The test scope should reflect the current state of the production environment, not last year's architecture diagram.


Mini-FAQ

What is the most common type of penetration testing?

Gray box web application testing is the most commonly commissioned type for modern software companies. It balances realism (the test simulates an authenticated attacker or account-takeover scenario) with defect yield (the tester can reach authenticated endpoints and role-based access paths that a pure black box engagement would never reach in a standard window).

Do I need external testers or can my internal team do the pen test?

For compliance evidence under SOC 2, PCI DSS, and ISO 27001, auditors expect independence from the team that built and operates the system. A third-party firm with no existing access to your environment provides that independence. Internal red teams and security engineers add value for continuous testing between annual external engagements, not as replacements for third-party assessments.

How long does a penetration test take?

Active testing typically runs five to fifteen business days depending on surface complexity, followed by five to ten business days for reporting. Add a retesting window of two to four weeks after the initial report is delivered, to allow for remediation and verification. Plan the full cycle at eight to twelve weeks from contract to final retest letter.

What credentials should a penetration tester hold?

Look for OSCP (Offensive Security Certified Professional) for network and infrastructure testing, OSWE for web application testing, CRTO (Certified Red Team Operator) for red team engagements, or PNPT (Practical Network Penetration Tester). For mobile application testing, look for OMAS (OffSec Mobile Application Security) or practitioners who demonstrate competency against the OWASP MASTG standard. CISSP alone is a management and governance credential, not a hands-on pen testing qualification.

Can a penetration test disrupt production systems?

A small risk exists with any active testing. Reputable firms minimize production exposure by running destructive or denial-of-service tests against staging environments and limiting active exploitation of production systems to low-impact confirmation of vulnerabilities. Discuss the testing window, rollback procedures, and specific out-of-scope actions (no destructive payloads on production databases, for example) during scoping.

How does MITRE ATT&CK relate to penetration testing?

MITRE ATT&CK is a publicly maintained knowledge base of adversary tactics and techniques observed in real-world attacks. Pen testers and red teams use it to structure engagements — mapping their techniques to ATT&CK tactic categories (initial access, persistence, lateral movement, etc.) — and to help clients measure which ATT&CK techniques their defenses can detect versus miss. It does not replace a methodology standard like NIST SP 800-115 but complements it.


For the foundational guide, see our penetration testing complete business guide. For a cost breakdown by surface type, see our penetration test cost guide.


Sources

  1. NIST, "Technical Guide to Information Security Testing and Assessment (SP 800-115)," September 2008. https://csrc.nist.gov/publications/detail/sp/800-115/final — accessed 2026-05-12.
  2. OWASP, "Web Security Testing Guide v4.2," December 2020. https://owasp.org/www-project-web-security-testing-guide/ — accessed 2026-05-12.
  3. OWASP, "API Security Top 10 2023." https://owasp.org/www-project-api-security/ — accessed 2026-05-12.
  4. OWASP MAS, "Mobile Application Security Testing Guide (MASTG)." https://mas.owasp.org/ — accessed 2026-05-12.
  5. MITRE, "ATT&CK Enterprise Matrix." https://attack.mitre.org/matrices/enterprise/ — accessed 2026-05-12.
  6. PCI Security Standards Council, "PCI DSS v4.0.1 Requirements 11.4 and 11.4.5." https://www.pcisecuritystandards.org/document_library/ — accessed 2026-05-12.
  7. Verizon, "2024 Data Breach Investigations Report." https://www.verizon.com/business/resources/reports/dbir/ — accessed 2026-05-12.
  8. AWS, "Penetration Testing Policy." https://aws.amazon.com/security/penetration-testing/ — accessed 2026-05-12.

Last reviewed: 2026-05-12. This article was prepared by the Security Compliance Guide Editorial Team. We use AI to draft initial summaries of publicly available cybersecurity compliance documentation, then verify every claim against primary sources before publication. We are not licensed auditors, attorneys, or compliance consultants. For binding decisions, consult a qualified professional. See our editorial standards for full sourcing rules.

Security Compliance Guide Editorial Team
Security Compliance Guide Editorial Team
Author
Security Compliance Guide Editorial Team covers topics in this category and related fields. Views expressed are editorial and based on research and experience.