BloatersMedium

Data Clumps: What It Costs and How to Fix It

Groups of data items that repeatedly appear together across multiple classes or method signatures.

Annual Cost$2k - $15k
Severity
3/5
CategoryBloaters
Detection4 tools

What It Is

Data Clumps occur when the same group of variables (for example, street, city, state, ZIP code) appear together in multiple classes, method signatures, or data structures. Each location maintains its own copy of the grouping logic, and any change to the group's structure must be replicated everywhere. The clump should be a first-class object, but instead it exists as a scattered pattern that developers must recognize and maintain by convention.

Threshold: When the same 3+ fields appear together in 3+ locations, extract a class. Even 2 fields that always travel together warrant consideration if they carry domain meaning.

Why It Costs Money

1

Changes to the data group propagate to every location where the clump appears. Adding a country field to an address that exists as 4 separate parameters in 20 functions requires updating all 20 functions and their callers.

2

Validation inconsistency creeps in over time. Each location that handles the clump may validate the data slightly differently, leading to subtle bugs where data passes through one path but fails in another.

3

The missing abstraction prevents useful behaviour. An Address object can calculate shipping zones, validate formats, and format for display. Four separate strings cannot.

Specific Cost Mechanisms

  • Change propagation: each structural change to the clump requires updating 10-30 locations
  • Validation inconsistency: different validation rules at different sites cause intermittent bugs
  • Missing encapsulation: behaviour that belongs to the concept is scattered across consuming code

Estimated Annual Cost

Cost per instance by team size and codebase size. Based on $120,000 average developer salary. See full methodology.

Team SizeSmall (<50k LOC)Medium (50k-200k)Large (200k+)
3 devs$2,000$4,500$9,000
5 devs$3,300$7,500$15,000
10 devs$5,000$11,000$15,000
20 devs$6,600$15,000$15,000

How to Detect It

Specific rules and thresholds for automated detection. See full tool comparison.

SonarQube
squid:S107 / squid:S1448

Repeated parameter groups across methods

CodeClimate
identical-code / similar-code

Detects repeated parameter patterns

PMD
ExcessiveParameterList

Indirect detection through parameter count

IntelliJ IDEA
Data clump inspection

Built-in inspection that detects repeated field groups

Refactoring Patterns

Proven techniques to eliminate this smell. See all refactoring patterns.

Extract Class

The same group of fields appears in multiple classes

Effort: 2-4 hours
Impact: Centralises validation and eliminates duplication

Introduce Parameter Object

The same group of parameters appears in multiple method signatures

Effort: 1-3 hours
Impact: Simplifies all call sites and enables shared behaviour

Preserve Whole Object

Callers destructure an object only to pass its fields individually

Effort: 30-60 minutes per call site
Impact: Reduces parameter count and coupling

Related Smells