Methodology: sources, math, and limitations

Data verifiedMay 2026

Every dollar figure on CodeSmellCost.com traces to a named source. This page is the source list, the per-component formula, the limitations, and the refresh cadence. If a number on the site does not check out against this methodology, email [email protected] and we will correct it.

The methodology rests on three layers: the canonical software-engineering references (Fowler, Martin, Beck, Feathers, McConnell, Ousterhout) for vocabulary and pattern catalogue; peer-reviewed empirical research (Bavota, Bird, Rahman, Khomh) for the smell-to-defect correlation evidence; and named industry research (Stripe, DORA, Pluralsight, GitClear) for the productivity-cost translation that makes the dollar figures defensible to a finance team.

§ 01

Sources

Source	Type	Refresh	What we take
Fowler, Refactoring (2nd ed., 2018, Addison-Wesley)	Canonical reference	Static (book)	The 22-smell taxonomy, the refactoring catalog, and the prose definitions. Every per-smell page on the site cites the relevant chapter or section page reference.
Robert C. Martin, Clean Code (2008, Prentice Hall)	Canonical reference	Static (book)	Naming discipline, function-length guidance, and the Single Responsibility Principle. Cited alongside Fowler on the home page and /the-22-smells, with the explicit caveat that some Clean Code guidance is contested by Ousterhout.
Kent Beck, Test-Driven Development: By Example (2002, Addison-Wesley)	Canonical reference	Static (book)	TDD as a precondition for cheap refactoring. Cited in /refactoring-roi as the answer to the CFO question 'how do you refactor safely without breaking everything?'.
Michael Feathers, Working Effectively with Legacy Code (2004, Prentice Hall)	Canonical reference	Static (book)	Seams, characterisation tests, and the legacy-code refactoring playbook. Cited in /case-studies and /refactoring-roi for the practical 'how do you start refactoring a smell-dense codebase?' question.
Steve McConnell, Code Complete (2nd ed., 2004, Microsoft Press)	Canonical reference	Static (book)	Empirical defect-density data on function length, complexity, and naming. Cited on /pr-review-time and /cost-model for the cognitive-complexity to defect-rate translation.
John Ousterhout, A Philosophy of Software Design (2018, Yaknyam Press)	Counterpoint reference	Static (book)	The structural counterpoint to Clean Code's small-function gospel. Deep modules over shallow ones. Cited prominently on /duplicate-code (the DRY counterpoint) and /when-smells-are-ok.
Bavota et al., ICSE 2015: 'Are Code Smells Harmful?'	Peer-reviewed research	Static (published)	Empirical evidence that smells emerge under schedule pressure and immediately increase fault-proneness. Cited on /bug-rate-correlations as one of the three primary defect-density references.
Bird et al., FSE 2011: 'Don't Touch My Code!'	Peer-reviewed research	Static (published)	Fragmented ownership doubles or triples defect rates. Cited on /bug-rate-correlations and /cost-model for the incident-cost-attribution component.
Rahman 2025 meta-study	Peer-reviewed research	Static (published)	Effect sizes across 28 primary studies. God Class rho=0.38, Feature Envy rho=0.31, Duplicate Code rho=0.27. Cited prominently across all per-smell pages and /bug-rate-correlations.
Khomh et al. 2012: 'An Exploratory Study of the Impact of Code Smells on Software Change-proneness'	Peer-reviewed research	Static (published)	Change-proneness correlation with smell density, especially Shotgun Surgery and Divergent Change. Cited on /the-22-smells and the per-smell change-preventer pages.
Stripe Developer Coefficient Report (2018, updated 2023)	Industry research	Periodic	The macro number: $85 billion per year lost globally to technical debt and bad code (16 percent of developer time). Used as the upper-bound sanity check for our per-team cost ranges. Our team-of-8 estimates summed across smells fall within the Stripe 15-25 percent range.
DORA State of DevOps 2024	Industry research	Annual	The performance-tier benchmarks (elite, high, medium, low) and the 10-25 percent rework-percentage band for low-performers vs under 5 percent for elites. Cited on the home page card grid, /refactoring-roi, /pr-review-time, and /cost-model.
Pluralsight State of Developer Onboarding 2024	Industry research	Annual	Time-to-first-PR data: 2-4 weeks on clean codebases vs 6-12 weeks on smell-dense ones. The 4-8 week gap, at fully-loaded cost, drives the onboarding component of our cost model. Cited on /onboarding-drag and /cost-model.
Cisco / SmartBear, Best Kept Secrets of Peer Code Review (2007)	Industry research	Static (published)	Reviewer effectiveness drops after 60 minutes and 400 LOC. Cited on /pr-review-time and /calculator for the PR review drag formula.
GitClear State of the Code 2024	Industry research	Annual	AI-assistant copy-paste rate research. Cited on /duplicate-code with the explicit caveat that the AI-assistant-era duplication-rate observation is still being validated.
SonarSource, SonarQube documentation	Vendor reference	Continuous	The CodeSmell taxonomy reference (Sonar's internal categorisation), Cognitive Complexity definition (Campbell, SonarSource white paper 2018), and the rule set used by most teams in production. Cited on /tools and per-smell detection sections.
Software Engineering Institute (CMU SEI), Technical Debt research program	Academic / institutional	Periodic	The standard taxonomy of technical-debt items and the cost-of-deferral framing. Cited on /refactoring-roi and /cost-model.
IEEE SWEBOK (Software Engineering Body of Knowledge, v3)	Academic / institutional	Periodic	The discipline-level definitions of software maintainability, quality attributes, and measurement. Cited on /cost-model and /methodology.
Empirical Software Engineering journal (Springer)	Peer-reviewed venue	Continuous	Source venue for Khomh 2012 and several Bavota / Palomba follow-up studies. Treated as the gold-standard publication venue alongside ICSE / FSE / ASE.
BLS Occupational Employment and Wage Statistics (Software Developers)	Public statistics	Annual	Anchor for fully-loaded cost defaults. US median software engineer wage approximately $130K base; fully-loaded cost (benefits, equipment, overhead) approximately $180K-$200K. Used as the calculator default.

§ 02

In Scope

Dollar cost ranges per code smell, derived from peer-reviewed and industry-research source pattern.
Refactoring ROI math (NPV, risk-weighted incident cost, velocity reclaim, retention math) in CFO language.
Empirical research synthesis on smell-to-defect correlations (Rahman 2025, Bavota 2015, Bird 2011).
Named-incident case studies with sourced dollar figures (Knight Capital, Therac-25, plus three positive examples).
Detection-tool reference for the eight tools that matter in 2026 (SonarCloud, CodeScene, Sourcegraph, GHAS, CodeClimate, ESLint, PMD, Semgrep).
Counter-narrative coverage (Ousterhout, when not to refactor, prototype and expiring-code exceptions).
Executive-committee deliverables (memo template, NPV calculator, prioritisation framework, decision tree).

§ 03

Out of Scope

Vendor-pricing pages (no /sonarqube-pricing, /jetbrains-pricing, /codescene-pricing, /sourcegraph-pricing). Vendor pricing changes frequently and the brand-pricing SEO surface carries trust-ceiling and Lanham Act exposure.
Specific compliance-grade certifications or audit frameworks (ISO 25010, CISQ, etc.). Adjacent territory but a different audience.
Language-specific tutorial content (how to write a class in Java, how to use TypeScript generics). Refactoring patterns are language-agnostic at the conceptual level; per-language deep-dives live elsewhere.
Framework-level architecture rewriting patterns (microservices vs monolith, event-driven vs RPC). Architectural smells are a distinct surface from code smells.
Performance-only tuning that does not change structure. Cache eviction, query optimisation, and similar work is performance engineering, not refactoring.
Vendor-preference rankings. The /tools page covers eight tools by what they detect; we do not rank them with affiliate-driven scores or compute a 'best tool' synthetic verdict.

§ 04

Calculation Framework

Team-of-8 baseline

All per-smell dollar bands on the site assume a team of 8 engineers. Scale linearly for other team sizes: a team of 12 multiplies the figure by 1.5x, a team of 4 by 0.5x. The team-of-8 baseline matches the median engineering-team size in DORA, Pluralsight, and GitClear survey populations.

Fully-loaded cost

US anchor: $180K-$200K per engineer fully loaded (base + benefits + equipment + overhead). Derived from BLS OEWS median wage ($130K) plus a 1.4-1.5x loaded multiplier consistent with Stripe and DORA survey populations. Calculator overrideable; per-smell badges use $180K.

PR review drag

Smell-dense code is reviewed approximately 2.3x slower (CodeScene Code Health research). Formula: team_size x prs_per_week x review_hours x smell_overhead_percent x 52 x hourly_rate. Smell overhead percent defaults to 30 percent (Cisco / SmartBear 2007 reviewer-effectiveness research bounds it at 20-40 percent).

Defect-density translation

Rahman 2025 rho values map to expected-value incident cost. God Class rho=0.38 means smell-dense modules have approximately 1.5-3x the defect rate of clean ones (per Bird 2011 ownership effect). Multiply by base incident frequency, average incident hours, and responding-engineer count to get annual expected-value cost contribution.

Onboarding drag

Pluralsight 2024: 2-4 weeks to first PR on clean, 6-12 weeks on smell-dense. Use the 4-8 week delta. Cost: hires_per_year x extra_ramp_weeks x (fully_loaded_cost / 52). For a 3-hire-per-year team at the US anchor, onboarding drag alone is approximately $42K-$84K per year before any other smell cost.

Severity multiplier

Three bands: cosmetic (0.6x), structural (1.0x baseline), critical (2.5x). Calibrated against Mantyla taxonomy. God Class is the canonical critical smell; Lazy Class is the canonical cosmetic; the bulk of the catalog sits in the structural band. See /cost-model for the full multiplier table.

§ 05

Refresh Cadence

Numbers, source URLs, and tool pricing get a first-business-week pass each month. LAST_VERIFIED_DATE in src/lib/schema.ts rolls forward when the pass finishes. The layout footer date, every per-page Article schema dateModified, the badge at the top of /about and /methodology, and every lastVerified page-shell prop all read from that single constant. One change rolls the entire site's freshness signal forward.

Out-of-cycle refresh triggers:

Vendor pricing change announcement (SonarSource, CodeScene, Sourcegraph, JetBrains).
New peer-reviewed publication with effect-size data on a top-5 smell (we watch ICSE, FSE, ASE, ICSME, and Empirical Software Engineering).
DORA State of DevOps annual release (typically Q3 each year).
Stripe Developer Coefficient Report refresh.
Substantive editorial correction submitted via [email protected].

§ 06

Limitations

Effect sizes from Rahman 2025 (rho=0.27-0.38 for the top smells) are correlations, not causation. The empirical literature is honest about this; we are honest about it in /bug-rate-correlations and /faq.
The team-of-8 baseline normalises across team sizes; an actual team of 12 should scale the figures by 1.5x, a team of 4 by 0.5x. The calculator on /calculator handles this; the per-smell badge figures assume the baseline.
Fully-loaded cost of $180K is a US anchor. International teams should adjust. The calculator allows the override; per-smell page badges use the US anchor.
Tool pricing on /tools is monthly verified but vendor pages drift between cycles. Prices are 'as of LAST_VERIFIED_LABEL' rather than continuously refreshed.
Open-source-study selection bias applies to several of the cited empirical studies (Bavota 2015, Khomh 2012). Proprietary-codebase smell distributions and effect sizes may differ; we note this on /bug-rate-correlations.

§ 07

Corrections

Found a number that does not check out, a citation that does not resolve, or a smell missing from the catalog? Email [email protected]. Corrections turn around within 5 business days. The corrected version ships with the next monthly pass and rolls LAST_VERIFIED_DATE forward.

About this site Cost model deep dive Run the calculator Books and references