We've got some data, we know we need to clean it up before we
can safely persist it, so we just clean it at the point
we need it cleaned. It's the simplest thing to do.

The problem is that in order to realize that we're
cleaning (the variable name does give a hint), or to
understand what "cleaning" means (when we want to change it
or fix a bug) we have to analyze the code to discover
that "this is the bit that guarantees all required fields
have a reasonable value in them."

Wrapping the ugly bit in a method whose name explains
its purpose makes the code self-documenting and cleans-up
the calling method. We call this pattern "Intention-Revealing
Method".

Let's not kid ourselves: all the complexity of filling-in
fields is still there, but it's been moved to a method
that only fills-in fields for a single row, which is easier
to think about because we can ignore everything else
that process_data is doing.

It gets even better when there are multiple steps to get
a "clean" row.

Imagine how large and difficult-to-read the last
example would be if all the logic was implemented at the
point of use! Now we can see the "big picture" view of
the process_data behavior and if we're
interested in any of the specific aspects, we can jump
to the method that implements it.

We've also created new testing opportunities:
we can pass a sample row object into any of those methods
and ask "does this particular bit work?" without worrying
about any of the other bits messing up our results.

Now process_data is distilled down to its
essence (load, clean, persist) and the details of what it
means to clean data is neatly gathered in a dedicated module
called DataCleaner which can be extended and tested
independently of process_data.

DataCleaner knows nothing about getting or persisting
data and the only thing process_data knows about "cleaning"
is that DataCleaner.clean knows how to do
it. DataCleaner can be tested with any data we
can fake up and doesn't require it to be loaded or persisted,
greatly-simplifying our setup and results-checking.

Choosing good names

Of course, it's still possible to ruin all our good work by choosing
names that are brittle, misleading or not helpful.

Good names describe what we need done, not how to do it.
Keeping this in mind leads us to names that are
"future-proofed": they won't have to
change if the underlying code changes.

Good name: not_in_system
Bad name: cant_find_id_in_mysql

The bad name breaks if we change how we determine a row is
missing, or if we switch from mysql to postgres or redis.

Good name: fill_in_required_fields
Bad name: fill_in_name_and_address

The bad name breaks if we add/remove required fields.

Takeaways

When you realize you need a bit of behavior at a certain spot,
write something to do it for you (a method or even a
class/module)
with a name explaining the behavior, then call that thing
from the spot you need it.

When choosing a name, explain the behavior in terms of
"what" and not "how" in order to future-proof your code.