Long Method: why a 300-line method costs your team more than a 300-line class
Fowler places Long Method first in the smell catalog for a reason. It is the most common smell in production code, the easiest to introduce under deadline pressure, and one of the most tractable to fix once the decision to fix it is made.
From Refactoring 2nd ed., p. 82: “the longer a procedure is, the more difficult it is to understand.” Fowler's observation, first made in 1999, is now supported by twenty-five years of empirical research on review time, defect density, and maintenance cost.
The line-count intuition is correct but incomplete. The real cost driver is Cognitive Complexity, as defined in Campbell 2018 (G. Ann Campbell, “Cognitive Complexity: A New Way of Measuring Understandability,” SonarSource whitepaper). Cognitive Complexity measures the mental effort required to understand a piece of code, rather than the number of paths through it.
The critical insight: nesting adds more to cognitive complexity than length. A 40-line method with three nested if/else/switch blocks can have a higher Cognitive Complexity score than a 120-line method with a flat switch statement. The 40-liner is harder to review, harder to test, and correlates more strongly with defects.
// 40-line method, Cognitive Complexity: 28
// Each nesting level multiplies the cost
function processPayment(order, user) {
if (order.isValid()) { // +1
if (user.hasCreditCard()) { // +2 (nesting)
if (order.amount > user.creditLimit) { // +3 (nesting)
if (user.canOverride()) { // +4 (nesting)
// ...
} else { // +1
for (const item of order.items) { // +5 (nesting)
if (item.isRestricted()) { // +6 (nesting)
// ...
}
}
}
}
}
}
}
// 120-line method, Cognitive Complexity: 9
// Flat switch, each case adds 1
function getShippingRate(region) {
switch (region) { // +1
case 'US': return 4.99; // +1
case 'EU': return 12.99; // +1
case 'UK': return 8.99; // +1
case 'AU': return 18.99; // +1
// ... 30 more cases, each +1
default: return 25.99; // +1
}
}The 40-liner costs more in review time and correlates more strongly with defects. SonarSource's default threshold: 15 CC / method. Methods above 25 are flagged critical.
The Cisco / SmartBear “Best Kept Secrets of Peer Code Review” study (2007, 2,500 code reviews across Cisco Systems) found that reviewers lose effectiveness after approximately 60 minutes of reviewing and 400 lines of code. Beyond those thresholds, defect-detection rates drop sharply while time-spent continues rising.
Google's internal data, published in Software Engineering at Google (Winters, Manshreck, Wright, O'Reilly 2020, chapter 9): the median PR review latency at Google is approximately 24 hours. The primary predictors of latency are PR size and complexity - both of which Long Methods inflate directly.
Long Methods inflate both the LOC budget and the cognitive budget of a review. A method that crosses the 400-LOC reviewer-saturation threshold is a method whose review will be partially skimmed rather than fully read. Bugs in the skimmed section survive review.
Extract Method (Fowler ch. 6) is the primary refactoring. The mechanical steps are straightforward; the hard work is naming and boundary selection.
// Before: 200-line processOrder() doing everything
function processOrder(order: Order, user: User): Receipt {
// --- validation block (lines 1-40) ---
if (!order.items.length) throw new Error('Empty order');
if (order.totalAmount <= 0) throw new Error('Invalid amount');
// ... 38 more validation lines
// --- pricing block (lines 41-100) ---
let discount = 0;
if (user.membershipTier === 'GOLD') { discount = 0.1; }
// ... 58 more pricing lines
// --- persistence block (lines 101-200) ---
await db.orders.insert(order);
// ... 99 more persistence lines
}
// After: each responsibility extracted and named
function processOrder(order: Order, user: User): Receipt {
validateOrder(order);
const pricing = calculatePricing(order, user);
return await persistOrder(order, pricing);
}
function validateOrder(order: Order): void { ... }
function calculatePricing(order: Order, user: User): Pricing { ... }
async function persistOrder(order: Order, pricing: Pricing): Promise<Receipt> { ... }Modern IDEs (JetBrains: Refactor → Extract Method, VS Code with language server plugins) execute 90% of the mechanical extraction. The human work is: (1) naming the extracted method accurately, (2) choosing the right boundary (do not extract a block that modifies many variables - that signals it is not a cohesive unit), (3) reviewing the parameter list (if the extracted method takes more than three parameters, the boundary is probably wrong).
Four contexts where a Long Method is the correct choice:
- Top-down procedural scripts - data migrations, one-off reports, build scripts. These are linear, run once, and will not accumulate callers.
- Generated code - ORM-generated schema files, protocol buffer output, parser generators. Extracting these creates no improvement.
- Flat enumerations - a 200-line switch statement that routes 50 event types to handlers. The length is not the problem; the nesting would be.
- Test setup methods - integration tests that require a detailed environment setup. Extracting each setup step often makes the test harder to read, not easier.
The test is not length; it is Cognitive Complexity. A 200-line method with CC of 8 is better than a 40-line method with CC of 28. Use your linter's CC threshold, not a line count, as the gate.