Code smell FAQ: 20 questions on definitions, cost, detection, and when to refactor
The questions engineers and technical managers ask most often, answered without hedging. Where the answer is contested, we say so and cite the source.
What is a code smell?
A code smell is a surface symptom in source code that usually indicates a deeper design problem. The term was coined by Kent Beck and popularised by Martin Fowler in Refactoring (1999). A smell does not necessarily mean the code is broken - it means the code has a structural characteristic that tends to correlate with higher maintenance cost, higher defect rates, and slower change velocity. Common examples: a method over 20 lines, a class with more than one clear responsibility, duplicated logic across multiple locations.
Are code smells the same as bugs?
No. A bug is a functional defect: the code does something other than intended. A code smell is a structural characteristic that makes bugs more likely and more expensive to fix. You can have a codebase full of smells that passes all its tests. The cost shows up in indirect ways: slower code reviews, more merge conflicts, longer onboarding, higher incident frequency over time. The empirical literature (Rahman 2025, Bavota 2015, Palomba 2019) consistently finds a statistically significant correlation between smell density and post-release defect counts, but causation is contested.
How were the 22 smells identified?
Martin Fowler catalogued 22 smells in Refactoring (1999), building on Kent Beck's original smell vocabulary. A second edition (2018) updated language examples and refined several definitions. The catalog has become the standard reference. Other researchers have proposed extensions - Girish Suryanarayana's Refactoring for Software Design Smells (2014) adds architectural-level smells - but the Fowler 22 remain the most widely cited and tool-supported.
Which code smell is the most expensive?
God Class consistently ranks highest in empirical studies of maintenance cost and defect density. A single God Class can account for 30-40% of all bug reports in a module (Palomba et al. 2019). The estimated annual drag at a 30-engineer team is $42,000 to $180,000 depending on codebase age and team size. Long Method and Duplicate Code follow closely. The rankings vary by context: for onboarding cost, God Class and Divergent Change dominate. For incident rate, God Class and Inappropriate Intimacy score highest.
Can you put a dollar figure on code smells?
Approximately, yes - with important caveats. The calculator on this site estimates annual cost from four mechanisms: PR review drag, incident overhead, story-point inflation (rework), and onboarding drag. The methodology uses a fully-loaded engineering cost and empirically-derived percentage ranges from the research literature. The outputs are ranges, not point estimates. The purpose is not forensic accounting - it is to make an otherwise invisible cost visible enough to justify a conversation about remediation. See /calculator for the full methodology.
What is the difference between a code smell and technical debt?
Technical debt is the broader concept: any shortcut taken now that creates future maintenance cost. Code smells are a subset of technical debt that manifests in source code structure. Technical debt also includes: missing tests, outdated dependencies, undocumented architecture decisions, and infrastructure shortcuts. Ward Cunningham coined the technical debt metaphor in 1992. Steve McConnell later distinguished between deliberate and inadvertent debt. Code smells are typically inadvertent - they accumulate gradually without deliberate decision.
How do I detect code smells automatically?
Several tools detect a significant subset of the 22 smells automatically. SonarCloud and SonarQube have the widest language support and the most mature ruleset. ESLint (with complexity and max-lines rules) covers JavaScript and TypeScript well. PMD covers Java specifically. CodeScene combines static analysis with git history to find smells in the code that changes most frequently - a uniquely valuable signal. No tool detects all 22 smells reliably. Shotgun Surgery, Divergent Change, and Feature Envy all require contextual judgement that exceeds current static analysis. See /tools for full reviews.
What threshold makes a method 'too long'?
There is no universal answer. Fowler's heuristic: if you feel the need to write a comment to explain what a block does, extract a method instead. Practical thresholds used in tooling: SonarQube defaults to a warning at 20 lines per method; ESLint's max-lines-per-function can be set to 30 or 50 depending on team convention. The research literature (Ferme et al. 2013) finds that methods over 50 lines have measurably higher defect rates regardless of cyclomatic complexity. Cyclomatic Complexity over 10 is a better proxy for review cost than raw line count.
What is Cognitive Complexity and how does it differ from Cyclomatic Complexity?
Cyclomatic Complexity (McCabe 1976) counts the number of linearly independent paths through a function - essentially the minimum number of test cases for branch coverage. It correlates with defect rate but does not penalise nested structures more than linear ones. Cognitive Complexity (Campbell 2018, SonarSource) was designed to measure the effort to understand code. It penalises structural depth: a nested if inside a loop inside a try block increments the score more than three sequential ifs. Empirically, Cognitive Complexity is a better predictor of review time than Cyclomatic Complexity. See /pr-review-time for details.
Is the DRY principle always right about duplicate code?
No. Several respected practitioners argue for exceptions. Sandi Metz: prefer duplication over the wrong abstraction - premature unification creates dependencies that are harder to undo than the duplication itself. John Ousterhout: deep modules justify some repetition if the interface is significantly simpler. Duplication in tests is often a feature, not a smell - test isolation benefits from self-contained setup. The practical test is whether the duplicated code is likely to diverge. If two blocks do the same thing now but have different change reasons, unifying them creates coupling that hurts more than the duplication. See /duplicate-code for a full treatment.
What is Feature Envy?
Feature Envy is a smell where a method is more interested in the data of another class than its own. Signs: a method that calls five accessors on another object before doing anything with its own data; a method in class A that could move wholesale to class B with only a parameter change. Fowler's original example involved calculation methods that knew everything about another class's fields. The fix is usually Move Method. The cost is coupling: when class B's interface changes, every envious method in A breaks. Rahman (2025) found Feature Envy correlated with ρ=0.28 against post-release defect count.
When should I not refactor a code smell?
Four contexts where the smell may be the right call: (1) Prototype code that will be thrown away - investing in structure is waste if the feature is being validated. (2) Expiring code - if a module is being replaced in a known timeline, refactoring it delays replacement value. (3) Deadline-bound production fixes - adding method extractions during a live incident is a risk multiplier. (4) Cognitively-coupled code - some algorithms are genuinely complex, and extracting methods fragments the reasoning without aiding comprehension. The test is whether the complexity is accidental (introduced by the implementation) or essential (inherent to the problem). See /when-smells-are-ok for a full treatment.
How do I make the business case for refactoring to a CFO?
The three framings that work: (1) Risk reduction - smelly code fails more often; cost the failure modes (MTTR, incident engineer-hours, SLA breach penalties). (2) Velocity reclaim - story-point inflation is measurable from sprint data; a 20% reduction in cycle time at a 10-engineer team is worth roughly $250K/year in delivered features. (3) Retention - engineers who work in high-smell codebases leave faster; a single senior engineer replacement costs $100K-$300K fully loaded. See /refactoring-roi for an NPV worksheet and a memo template that works in a board pack.
What is Shotgun Surgery?
Shotgun Surgery is a smell where a single logical change requires editing many small pieces spread across the codebase. It is the inverse of Divergent Change (one class changes for many reasons). Signs: a single configuration value stored in five places; a business rule implemented across four files in three layers. The cost is coordination: every change requires a mental map of all touch points, and missing one creates a subtle defect. The fix is usually Move Method or Move Field to consolidate related behaviour. Detection is difficult automatically because it requires understanding change coupling across commits - CodeScene's temporal coupling analysis handles this best.
What is Primitive Obsession?
Primitive Obsession is using primitive types (string, int, boolean) for domain concepts that deserve their own type. Examples: using a raw string for an email address instead of an EmailAddress value object; passing (int lat, int lng) instead of a Coordinate type; using a boolean flag for a state machine that needs three states. The symptom is that the same validation logic (is this string a valid email?) appears in multiple places. The fix is Replace Primitive with Object. The benefit is that validation, formatting, and comparison logic are encapsulated once. Semgrep is effective at enforcing Primitive Obsession rules with custom patterns.
How does code smell density relate to onboarding time?
Brooks's Law (1975) established that adding engineers to a late project makes it later. The mechanism is communication overhead. Code smell density multiplies that overhead: a God Class that one senior engineer understands intuitively takes a new hire four to six weeks to navigate safely. The calculator uses a ramp-time multiplier for smell overhead: the formula is (extra_ramp_weeks * hires_per_year * (fully_loaded_cost / 52)). At a 20-engineer team with 15% annual turnover and a 4-week average ramp extension, that is approximately $65,000 per year. See /onboarding-drag for the full treatment.
What is the difference between SonarCloud and SonarQube?
SonarQube is self-hosted. SonarCloud is the SaaS version hosted by SonarSource. Functionally they share the same analysis engine and ruleset. The practical difference: SonarCloud connects directly to GitHub, GitLab, and Bitbucket without infrastructure overhead; SonarQube is preferred when data must stay on-premises (financial services, government contracts). SonarCloud is free for open source projects. For closed-source teams, pricing scales by lines of code analysed. Both support 30+ languages. For most teams starting fresh, SonarCloud is the simpler choice.
Does CodeScene replace SonarCloud?
No - they are complementary. SonarCloud does static analysis: it scans code as it is now and flags rule violations. CodeScene does behavioural analysis: it mines git history to find which smells are in the code that changes most frequently. A God Class touched once is a different risk to a God Class changed 80 times in the last quarter. CodeScene finds the high-risk intersection; SonarCloud finds the full population of issues. The recommended setup for mature codebases: SonarCloud as a standing CI gate, CodeScene as a quarterly audit tool to prioritise the remediation backlog.
Can AI tools like GitHub Copilot generate code smells?
Yes, and the research is unambiguous on this. Copilot and similar LLM-based tools complete code based on immediate context; they do not maintain awareness of the full codebase architecture. In practice this means they readily generate Duplicate Code (the model has no access to existing utility functions in other files), Long Method (they optimise for a working solution, not for decomposition), and Primitive Obsession (they default to primitives unless explicitly prompted for domain types). A SonarCloud or ESLint gate in CI is more valuable, not less, for teams using AI-assisted development.
Where can I learn more about code smell research?
Five sources: (1) Martin Fowler, Refactoring (2nd edition, 2018) - the primary catalog. (2) Adam Tornhill, Software Design X-Rays (2018) - the definitive source on behavioural analysis and hotspots. (3) Foutse Khomh's systematic review (2012, EMSE) - the most rigorous meta-analysis of smell-to-defect correlations. (4) Fabio Palomba et al. (2019) - god class and defect density across industrial projects. (5) Google's Potvin and Levenberg, CACM 2016 - monorepo experience at scale, including duplication costs. For practitioner-level treatment with specific refactoring patterns, see /books-and-references.