Types of Penetration Testing: Black, White, and Gray Box

Types of Penetration Testing: Black, White, and Gray Box

Types of Penetration Testing: Black, White, and Gray Box

A penetration test is only as useful as the type you commission. Pick the wrong type and you can spend $25,000 to learn what an attacker would have found in week one. Pick the right one and you uncover the bugs an attacker would have ridden into your stack.

This guide compares the main types of penetration testing across two axes:

  • Knowledge level: black box, white box, gray box
  • Target surface: network, web app, mobile, API, cloud, social engineering, physical

By the end you will know which type of penetration testing to ask for, what to share with the testing firm, and how to budget for it.

What is a penetration test?

A penetration test is an authorized, simulated attack against your systems run by security professionals to find exploitable vulnerabilities before real attackers do. The output is a written report ranking findings by severity (critical, high, medium, low, informational), with reproduction steps and remediation guidance.

Penetration testing is required or strongly recommended by every major compliance framework:

  • PCI DSS requires annual external and internal pen tests, plus segmentation testing if used.
  • SOC 2 Type 2 reports almost always reference an annual third-party pen test.
  • ISO 27001 Annex A.12.6.1 requires technical vulnerability management — most certifying bodies expect at least one annual pen test as supporting evidence.
  • HIPAA does not name pen testing explicitly, but the Security Rule's risk analysis requirement (§164.308) is hard to satisfy without one.

According to a 2024 Cobalt State of Pentesting report, 79% of organizations now run at least one penetration test per year, up from 61% in 2021. The median budget for a mid-market pen test sits between $15,000 and $40,000 depending on scope.

📝 Note
A penetration test is not the same as a vulnerability scan. A scan automatically catalogs known weaknesses. A pen test chains weaknesses together to demonstrate real impact. For the difference in detail, see our pen test vs vulnerability assessment guide.

The three main types of penetration testing by knowledge level

Illustration related to The three main types of penetration testing by knowledge level
Photo by Andrea Piacquadio

The first axis to choose on is how much information the testing firm gets up front. This decides who the test simulates and what kind of bugs surface.

Black box penetration testing

The testers start with nothing but the target name (a domain, an IP range, an app store listing). They reconnoiter, fingerprint the stack, and attack from outside, the same as an unaffiliated attacker would.

Best for: simulating opportunistic external attackers, validating perimeter defenses, satisfying compliance requirements that reference "external" testing (PCI DSS Requirement 11.3.1).

Strengths: highest realism for outside-in attack chains. Surfaces exposed services, misconfigured cloud assets, leaked credentials, and forgotten subdomains.

Weaknesses: time gets burned on reconnaissance instead of finding deeper bugs. Internal logic flaws and authenticated paths often go untested. A 5-day black-box engagement may only produce 2 to 3 days of true exploitation work.

Typical cost: $8,000 to $25,000 for a single application or small network.

White box penetration testing

The testers receive everything: source code, architecture diagrams, valid credentials at every privilege tier, infrastructure-as-code repos, and engineering contacts. The engagement is collaborative and code-aware.

Best for: finding deep vulnerabilities (insecure deserialization, race conditions, business logic flaws), validating critical applications before launch, satisfying SOC 2 and ISO 27001 evidence needs at the highest assurance level.

Strengths: highest defect yield per dollar. White-box engagements typically surface 3x to 5x more findings than black-box engagements of the same duration, according to NCC Group's 2023 testing data.

Weaknesses: lowest realism. A real attacker rarely walks in with source code. White-box can also miss obvious external misconfigurations because the testers focus on what is interesting, not on what is exposed.

Typical cost: $25,000 to $80,000 for a complex web application.

Gray box penetration testing

A middle path. Testers receive partial information: a low-privilege user account, a high-level architecture diagram, maybe a public API specification. They simulate either an authenticated user with malicious intent or an attacker who has compromised low-tier credentials.

Best for: most modern SaaS applications, where the realistic threat is account takeover or insider abuse rather than pure external recon. This is the most common type of penetration testing commissioned in 2026.

Strengths: balances realism with depth. Surfaces both external bugs and authenticated-flow flaws. Efficient use of the engagement window.

Weaknesses: scoping is the hardest of the three. The exact level of access provided determines what the test actually covers, and ambiguity here leads to disappointing reports.

Typical cost: $12,000 to $35,000 for a single application.

For a sense of where a SaaS company should start, see our web application penetration testing checklist.

The seven main types of penetration testing by target surface

The second axis is the target environment. Most engagements combine 1 to 3 of these surfaces.

1. Network penetration testing

Tests the corporate network from outside (external) or inside (internal). Looks for exposed services, weak firewall rules, default credentials on network appliances, unpatched servers, and lateral movement paths once a foothold is established.

External network tests are the foundation of PCI DSS Requirement 11.3.1. Internal network tests model what happens after a phishing email lands.

2. Web application penetration testing

The most common type of penetration test commissioned today. Targets a web app, looking for the OWASP Top 10 (injection, broken access control, cryptographic failures, etc.), business logic flaws, and authentication weaknesses.

A typical web app pen test runs 5 to 15 business days depending on application complexity. Expect $10,000 to $40,000.

3. Mobile application penetration testing

Targets iOS or Android apps. Includes static analysis of the binary, dynamic analysis of the running app, transport security testing, and analysis of any backend APIs the app talks to. The OWASP Mobile Top 10 is the standard framework.

Mobile testing tends to surface insecure data storage, weak certificate pinning, and exposed third-party SDKs.

4. API penetration testing

A growing category as SaaS companies expose more REST and GraphQL APIs. Tests the OWASP API Security Top 10: broken object-level authorization, broken function-level authorization, mass assignment, and rate limiting.

Often bundled with web app testing, but increasingly priced separately for API-first companies.

5. Cloud penetration testing

Targets your AWS, Azure, or GCP environment for misconfigurations, over-privileged IAM roles, exposed storage buckets, and metadata service abuse paths. Critical for any cloud-native organization.

Note: every major cloud provider requires advance notice or has rules about what can be tested. AWS removed the requirement to ask permission in 2019 for most services, but Azure and GCP still have specific policies.

6. Social engineering penetration testing

Tests humans, not systems. Phishing campaigns, vishing (phone), pretexting, and tailgating. The deliverable is usually a report on click rates, credential capture rates, and which controls (DMARC, anti-phishing tools, security awareness training) actually fired.

A 2024 Verizon DBIR analysis attributed 68% of breaches to a human element. Social engineering testing has moved from optional to standard.

7. Physical penetration testing

Tests physical access controls: badge cloning, lock picking, tailgating into a data center, removing equipment. Less common but still required for some regulated industries (financial services, defense).

For a deeper look at when each fits, see our guide on how often you should run a penetration test.

Specialized types of penetration testing

A few categories that do not fit neatly into the matrix above but are worth knowing.

Wireless penetration testing. Targets Wi-Fi networks for weak authentication, rogue access points, and Bluetooth attack paths. Often paired with physical testing.

IoT and OT penetration testing. Targets industrial control systems, medical devices, or consumer IoT. Specialized firms only. Pricing is significantly higher because the testers need niche hardware and reverse-engineering skills.

Red team engagements. Not technically a "type" of pen test but a different mode entirely. A red team simulates a determined adversary over weeks or months, using whatever combination of network, social, and physical tactics gets them to a defined objective. The deliverable is a story of how the adversary won, not a list of CVEs.

For the difference between red teaming and traditional pen testing, see our pen testing vs red teaming comparison.

How to choose the right type of penetration test

Illustration related to How to choose the right type of penetration test
Photo by www.kaboompics.com

The choice depends on what you are trying to learn and what compliance requires.

If your driver is PCI DSS: Requirement 11.3 mandates both internal and external network pen tests at least annually, plus segmentation testing if you use it to reduce scope. Black-box external + gray-box internal is the standard combination.

If your driver is SOC 2 Type 2: auditors typically expect one annual pen test of the production application, plus evidence of remediation for high and critical findings. Gray-box web application is the most common choice.

If your driver is ISO 27001: the standard does not prescribe a type, but Annex A.12.6.1 expects evidence of technical vulnerability management. A combined gray-box web application + external network test usually satisfies certifying bodies.

If your driver is HIPAA: the Security Rule §164.308(a)(1) risk analysis is the trigger. Healthcare-specific firms usually run combined network + web app + API tests against the platform handling ePHI.

If your driver is "we have never done one": start with a gray-box web application test against your most business-critical product. It surfaces the highest concentration of fixable bugs per dollar.

For pricing benchmarks, see our penetration test cost breakdown.

Comparison of penetration testing types

TypeKnowledge levelRealismDefect yieldTypical costBest for
Black boxNoneHighestLowest$8K - $25KExternal attack simulation, PCI 11.3.1
Gray boxPartialMediumMedium-high$12K - $35KMost SaaS apps, SOC 2, ISO 27001
White boxFullLowestHighest$25K - $80KCritical apps pre-launch, deep assurance
Network (external)Black or grayHighMedium$8K - $20KPCI DSS, perimeter validation
Web applicationUsually grayMediumHigh$10K - $40KSaaS, e-commerce, customer portals
APIGrayMediumHigh$8K - $25KAPI-first SaaS, fintech
CloudGray or whiteMediumHigh$12K - $30KCloud-native orgs
Social engineeringBlackHighMedium$5K - $15KPhishing resilience, security awareness validation
PhysicalBlack or grayHighMedium$10K - $30KData centers, regulated facilities

How to choose a type of penetration testing in 5 steps

A simple decision sequence to land on the right combination of types of penetration testing for your engagement:

  1. Identify the trigger. Is this driven by PCI DSS, SOC 2, ISO 27001, HIPAA, customer demand, or a self-initiated review?
  2. Pick the knowledge level. Default to gray box for most modern SaaS apps. Use black box only when external attack simulation is the explicit goal. Use white box only for critical pre-launch reviews.
  3. Pick the target surfaces. Almost every modern engagement scopes web app + API. Add cloud if you are AWS/Azure/GCP-native. Add network if PCI DSS applies.
  4. Decide the testing window. 5 to 10 business days for most apps. Add a buffer for retesting after fix.
  5. Confirm the deliverables. Written report with severity ratings, executive summary, retest letter, and (ideally) Slack or call access during the engagement.

Common mistakes when selecting a penetration test type

Illustration related to Common mistakes when selecting a penetration test type
Photo by Jonathan Borba

A few patterns that show up consistently in post-engagement debriefs:

  • Buying black-box for a compliance need. PCI DSS does require external testing, but most other frameworks just require "a pen test." Buying black-box to save money on a SOC 2 engagement leaves easier-to-find authenticated bugs unfixed.
  • Scoping out the API surface. Modern apps are 60% to 80% API by traffic volume. A web app pen test that excludes the underlying APIs misses where most of the real bugs live.
  • Skipping the segmentation test. If you use network segmentation to reduce PCI scope, segmentation testing is its own line item. Auditors will ask for it.
  • Treating pen tests as one-shot events. Vulnerabilities reappear. Rerunning the same test after major releases or annually is the floor, not the ceiling. For a deeper take, see our guide on how often you should run a penetration test.
  • Not budgeting for retesting. A finding is not closed until a tester confirms the fix. Most firms include 30 days of retest as standard, but verify before signing.

Frequently asked questions

What is the most common type of penetration testing?

Gray-box web application testing is the most commonly commissioned type in 2026, accounting for roughly 45% of all pen tests according to Cobalt's 2024 State of Pentesting report. It hits the sweet spot of realism and defect yield for modern SaaS applications.

How long does a typical penetration test take?

Most engagements run 5 to 15 business days of active testing, plus 5 to 10 days for reporting. A small black-box external network test may take 3 days. A complex white-box engagement on a multi-tenant SaaS platform can run 4 to 6 weeks.

Do small businesses need penetration testing?

If you handle credit cards, protected health information, or any regulated data, yes. If you are a B2B SaaS startup pursuing SOC 2 or ISO 27001, yes. If you are a brick-and-mortar business with no online operations, probably not. See our guide on pen testing for small businesses for the threshold.

What is the difference between penetration testing and ethical hacking?

"Ethical hacking" is the broader skill set; "penetration testing" is the structured, scoped, contracted application of that skill set against a defined target. All penetration testers are ethical hackers, but not all ethical hacking work is penetration testing.

Should I use an internal team or hire an external pen tester?

For compliance evidence, almost always external. SOC 2, PCI DSS, and ISO 27001 auditors expect independence. Internal red teams are valuable for continuous testing between annual external engagements, not as replacements.

What credentials should a penetration tester have?

Look for OSCP (Offensive Security Certified Professional), OSWE (web), OSEP (advanced), CRTO (Red Team Ops), or PNPT (Practical Network Pen Tester). For mobile, OSWP. CISSP alone is not a pen testing credential — it is a management certification.

Can a penetration test break my production environment?

Reputable testers minimize risk to production, but a small possibility of disruption always exists. Most engagements run against a production-equivalent staging environment, with a separate, low-risk pass against production. Discuss the testing window and rollback plan during scoping.

Bottom line

The two main families of types of penetration testing — knowledge-level (black, white, gray) and target-surface (network, web app, mobile, API, cloud, social engineering, physical) — are not mutually exclusive. They combine. The right answer for most SaaS businesses in 2026 is a gray-box web application test plus an external network test, run annually, with a social engineering component layered in if security awareness is on the agenda.

The wrong answer is the cheapest test that ticks the compliance box. According to IBM's 2024 Cost of a Data Breach Report, the average breach now costs $4.88 million, and the median time to identify and contain it is 277 days. A $20,000 pen test that surfaces three exploitable bugs before an attacker does is one of the highest-ROI security investments available.

When evaluating types of penetration testing, match the type to the question you are trying to answer. Document the methodology. Retest after remediation. The compliance and security gains compound.

For the foundational guide, see our penetration testing complete business guide. For methodology references, see OWASP's testing guide.

Primary Sources

This article references the following authoritative sources:

Security Compliance Guide Editorial Team
Security Compliance Guide Editorial Team
Author
Security Compliance Guide Editorial Team covers topics in this category and related fields. Views expressed are editorial and based on research and experience.