Data Clumps: What It Costs and How to Fix It
Groups of data items that repeatedly appear together across multiple classes or method signatures.
What It Is
Data Clumps occur when the same group of variables (for example, street, city, state, ZIP code) appear together in multiple classes, method signatures, or data structures. Each location maintains its own copy of the grouping logic, and any change to the group's structure must be replicated everywhere. The clump should be a first-class object, but instead it exists as a scattered pattern that developers must recognize and maintain by convention.
Why It Costs Money
Changes to the data group propagate to every location where the clump appears. Adding a country field to an address that exists as 4 separate parameters in 20 functions requires updating all 20 functions and their callers.
Validation inconsistency creeps in over time. Each location that handles the clump may validate the data slightly differently, leading to subtle bugs where data passes through one path but fails in another.
The missing abstraction prevents useful behaviour. An Address object can calculate shipping zones, validate formats, and format for display. Four separate strings cannot.
Specific Cost Mechanisms
- ●Change propagation: each structural change to the clump requires updating 10-30 locations
- ●Validation inconsistency: different validation rules at different sites cause intermittent bugs
- ●Missing encapsulation: behaviour that belongs to the concept is scattered across consuming code
Estimated Annual Cost
Cost per instance by team size and codebase size. Based on $120,000 average developer salary. See full methodology.
| Team Size | Small (<50k LOC) | Medium (50k-200k) | Large (200k+) |
|---|---|---|---|
| 3 devs | $2,000 | $4,500 | $9,000 |
| 5 devs | $3,300 | $7,500 | $15,000 |
| 10 devs | $5,000 | $11,000 | $15,000 |
| 20 devs | $6,600 | $15,000 | $15,000 |
How to Detect It
Specific rules and thresholds for automated detection. See full tool comparison.
Repeated parameter groups across methods
Detects repeated parameter patterns
Indirect detection through parameter count
Built-in inspection that detects repeated field groups
Refactoring Patterns
Proven techniques to eliminate this smell. See all refactoring patterns.
Extract Class
The same group of fields appears in multiple classes
Introduce Parameter Object
The same group of parameters appears in multiple method signatures
Preserve Whole Object
Callers destructure an object only to pass its fields individually