Methodology: sources, math, and limitations
Every dollar figure on CodeSmellCost.com traces to a named source. This page is the source list, the per-component formula, the limitations, and the refresh cadence. If a number on the site does not check out against this methodology, email [email protected] and we will correct it.
The methodology rests on three layers: the canonical software-engineering references (Fowler, Martin, Beck, Feathers, McConnell, Ousterhout) for vocabulary and pattern catalogue; peer-reviewed empirical research (Bavota, Bird, Rahman, Khomh) for the smell-to-defect correlation evidence; and named industry research (Stripe, DORA, Pluralsight, GitClear) for the productivity-cost translation that makes the dollar figures defensible to a finance team.
| Source | Type | Refresh | What we take |
|---|---|---|---|
| Fowler, Refactoring (2nd ed., 2018, Addison-Wesley) | Canonical reference | Static (book) | The 22-smell taxonomy, the refactoring catalog, and the prose definitions. Every per-smell page on the site cites the relevant chapter or section page reference. |
| Robert C. Martin, Clean Code (2008, Prentice Hall) | Canonical reference | Static (book) | Naming discipline, function-length guidance, and the Single Responsibility Principle. Cited alongside Fowler on the home page and /the-22-smells, with the explicit caveat that some Clean Code guidance is contested by Ousterhout. |
| Kent Beck, Test-Driven Development: By Example (2002, Addison-Wesley) | Canonical reference | Static (book) | TDD as a precondition for cheap refactoring. Cited in /refactoring-roi as the answer to the CFO question 'how do you refactor safely without breaking everything?'. |
| Michael Feathers, Working Effectively with Legacy Code (2004, Prentice Hall) | Canonical reference | Static (book) | Seams, characterisation tests, and the legacy-code refactoring playbook. Cited in /case-studies and /refactoring-roi for the practical 'how do you start refactoring a smell-dense codebase?' question. |
| Steve McConnell, Code Complete (2nd ed., 2004, Microsoft Press) | Canonical reference | Static (book) | Empirical defect-density data on function length, complexity, and naming. Cited on /pr-review-time and /cost-model for the cognitive-complexity to defect-rate translation. |
| John Ousterhout, A Philosophy of Software Design (2018, Yaknyam Press) | Counterpoint reference | Static (book) | The structural counterpoint to Clean Code's small-function gospel. Deep modules over shallow ones. Cited prominently on /duplicate-code (the DRY counterpoint) and /when-smells-are-ok. |
| Bavota et al., ICSE 2015: 'Are Code Smells Harmful?' | Peer-reviewed research | Static (published) | Empirical evidence that smells emerge under schedule pressure and immediately increase fault-proneness. Cited on /bug-rate-correlations as one of the three primary defect-density references. |
| Bird et al., FSE 2011: 'Don't Touch My Code!' | Peer-reviewed research | Static (published) | Fragmented ownership doubles or triples defect rates. Cited on /bug-rate-correlations and /cost-model for the incident-cost-attribution component. |
| Rahman 2025 meta-study | Peer-reviewed research | Static (published) | Effect sizes across 28 primary studies. God Class rho=0.38, Feature Envy rho=0.31, Duplicate Code rho=0.27. Cited prominently across all per-smell pages and /bug-rate-correlations. |
| Khomh et al. 2012: 'An Exploratory Study of the Impact of Code Smells on Software Change-proneness' | Peer-reviewed research | Static (published) | Change-proneness correlation with smell density, especially Shotgun Surgery and Divergent Change. Cited on /the-22-smells and the per-smell change-preventer pages. |
| Stripe Developer Coefficient Report (2018, updated 2023) | Industry research | Periodic | The macro number: $85 billion per year lost globally to technical debt and bad code (16 percent of developer time). Used as the upper-bound sanity check for our per-team cost ranges. Our team-of-8 estimates summed across smells fall within the Stripe 15-25 percent range. |
| DORA State of DevOps 2024 | Industry research | Annual | The performance-tier benchmarks (elite, high, medium, low) and the 10-25 percent rework-percentage band for low-performers vs under 5 percent for elites. Cited on the home page card grid, /refactoring-roi, /pr-review-time, and /cost-model. |
| Pluralsight State of Developer Onboarding 2024 | Industry research | Annual | Time-to-first-PR data: 2-4 weeks on clean codebases vs 6-12 weeks on smell-dense ones. The 4-8 week gap, at fully-loaded cost, drives the onboarding component of our cost model. Cited on /onboarding-drag and /cost-model. |
| Cisco / SmartBear, Best Kept Secrets of Peer Code Review (2007) | Industry research | Static (published) | Reviewer effectiveness drops after 60 minutes and 400 LOC. Cited on /pr-review-time and /calculator for the PR review drag formula. |
| GitClear State of the Code 2024 | Industry research | Annual | AI-assistant copy-paste rate research. Cited on /duplicate-code with the explicit caveat that the AI-assistant-era duplication-rate observation is still being validated. |
| SonarSource, SonarQube documentation | Vendor reference | Continuous | The CodeSmell taxonomy reference (Sonar's internal categorisation), Cognitive Complexity definition (Campbell, SonarSource white paper 2018), and the rule set used by most teams in production. Cited on /tools and per-smell detection sections. |
| Software Engineering Institute (CMU SEI), Technical Debt research program | Academic / institutional | Periodic | The standard taxonomy of technical-debt items and the cost-of-deferral framing. Cited on /refactoring-roi and /cost-model. |
| IEEE SWEBOK (Software Engineering Body of Knowledge, v3) | Academic / institutional | Periodic | The discipline-level definitions of software maintainability, quality attributes, and measurement. Cited on /cost-model and /methodology. |
| Empirical Software Engineering journal (Springer) | Peer-reviewed venue | Continuous | Source venue for Khomh 2012 and several Bavota / Palomba follow-up studies. Treated as the gold-standard publication venue alongside ICSE / FSE / ASE. |
| BLS Occupational Employment and Wage Statistics (Software Developers) | Public statistics | Annual | Anchor for fully-loaded cost defaults. US median software engineer wage approximately $130K base; fully-loaded cost (benefits, equipment, overhead) approximately $180K-$200K. Used as the calculator default. |
- Dollar cost ranges per code smell, derived from peer-reviewed and industry-research source pattern.
- Refactoring ROI math (NPV, risk-weighted incident cost, velocity reclaim, retention math) in CFO language.
- Empirical research synthesis on smell-to-defect correlations (Rahman 2025, Bavota 2015, Bird 2011).
- Named-incident case studies with sourced dollar figures (Knight Capital, Therac-25, plus three positive examples).
- Detection-tool reference for the eight tools that matter in 2026 (SonarCloud, CodeScene, Sourcegraph, GHAS, CodeClimate, ESLint, PMD, Semgrep).
- Counter-narrative coverage (Ousterhout, when not to refactor, prototype and expiring-code exceptions).
- Executive-committee deliverables (memo template, NPV calculator, prioritisation framework, decision tree).
- Vendor-pricing pages (no /sonarqube-pricing, /jetbrains-pricing, /codescene-pricing, /sourcegraph-pricing). Vendor pricing changes frequently and the brand-pricing SEO surface carries trust-ceiling and Lanham Act exposure.
- Specific compliance-grade certifications or audit frameworks (ISO 25010, CISQ, etc.). Adjacent territory but a different audience.
- Language-specific tutorial content (how to write a class in Java, how to use TypeScript generics). Refactoring patterns are language-agnostic at the conceptual level; per-language deep-dives live elsewhere.
- Framework-level architecture rewriting patterns (microservices vs monolith, event-driven vs RPC). Architectural smells are a distinct surface from code smells.
- Performance-only tuning that does not change structure. Cache eviction, query optimisation, and similar work is performance engineering, not refactoring.
- Vendor-preference rankings. The /tools page covers eight tools by what they detect; we do not rank them with affiliate-driven scores or compute a 'best tool' synthetic verdict.
Team-of-8 baseline
All per-smell dollar bands on the site assume a team of 8 engineers. Scale linearly for other team sizes: a team of 12 multiplies the figure by 1.5x, a team of 4 by 0.5x. The team-of-8 baseline matches the median engineering-team size in DORA, Pluralsight, and GitClear survey populations.
Fully-loaded cost
US anchor: $180K-$200K per engineer fully loaded (base + benefits + equipment + overhead). Derived from BLS OEWS median wage ($130K) plus a 1.4-1.5x loaded multiplier consistent with Stripe and DORA survey populations. Calculator overrideable; per-smell badges use $180K.
PR review drag
Smell-dense code is reviewed approximately 2.3x slower (CodeScene Code Health research). Formula: team_size x prs_per_week x review_hours x smell_overhead_percent x 52 x hourly_rate. Smell overhead percent defaults to 30 percent (Cisco / SmartBear 2007 reviewer-effectiveness research bounds it at 20-40 percent).
Defect-density translation
Rahman 2025 rho values map to expected-value incident cost. God Class rho=0.38 means smell-dense modules have approximately 1.5-3x the defect rate of clean ones (per Bird 2011 ownership effect). Multiply by base incident frequency, average incident hours, and responding-engineer count to get annual expected-value cost contribution.
Onboarding drag
Pluralsight 2024: 2-4 weeks to first PR on clean, 6-12 weeks on smell-dense. Use the 4-8 week delta. Cost: hires_per_year x extra_ramp_weeks x (fully_loaded_cost / 52). For a 3-hire-per-year team at the US anchor, onboarding drag alone is approximately $42K-$84K per year before any other smell cost.
Severity multiplier
Three bands: cosmetic (0.6x), structural (1.0x baseline), critical (2.5x). Calibrated against Mantyla taxonomy. God Class is the canonical critical smell; Lazy Class is the canonical cosmetic; the bulk of the catalog sits in the structural band. See /cost-model for the full multiplier table.
Numbers, source URLs, and tool pricing get a first-business-week pass each month. LAST_VERIFIED_DATE in src/lib/schema.ts rolls forward when the pass finishes. The layout footer date, every per-page Article schema dateModified, the badge at the top of /about and /methodology, and every lastVerified page-shell prop all read from that single constant. One change rolls the entire site's freshness signal forward.
Out-of-cycle refresh triggers:
- Vendor pricing change announcement (SonarSource, CodeScene, Sourcegraph, JetBrains).
- New peer-reviewed publication with effect-size data on a top-5 smell (we watch ICSE, FSE, ASE, ICSME, and Empirical Software Engineering).
- DORA State of DevOps annual release (typically Q3 each year).
- Stripe Developer Coefficient Report refresh.
- Substantive editorial correction submitted via [email protected].
- Effect sizes from Rahman 2025 (rho=0.27-0.38 for the top smells) are correlations, not causation. The empirical literature is honest about this; we are honest about it in /bug-rate-correlations and /faq.
- The team-of-8 baseline normalises across team sizes; an actual team of 12 should scale the figures by 1.5x, a team of 4 by 0.5x. The calculator on /calculator handles this; the per-smell badge figures assume the baseline.
- Fully-loaded cost of $180K is a US anchor. International teams should adjust. The calculator allows the override; per-smell page badges use the US anchor.
- Tool pricing on /tools is monthly verified but vendor pages drift between cycles. Prices are 'as of LAST_VERIFIED_LABEL' rather than continuously refreshed.
- Open-source-study selection bias applies to several of the cited empirical studies (Bavota 2015, Khomh 2012). Proprietary-codebase smell distributions and effect sizes may differ; we note this on /bug-rate-correlations.
Found a number that does not check out, a citation that does not resolve, or a smell missing from the catalog? Email [email protected]. Corrections turn around within 5 business days. The corrected version ships with the next monthly pass and rolls LAST_VERIFIED_DATE forward.