How To Migrate a DB Without Having to Take a Year-long Vacation -

The Migration That Would Have Taken Ten Months

When a manager describes a technical problem, most people listen for what he says. I listen for what the approach reveals.

In 2015, I joined VisionMAX, a food takeout software provider (for Pizza Pizza, among others), building a system for another Pizza chain. The project required migrating Panago’s existing customer and order data from a Microsoft SQL Server database in Vancouver to a MySQL database in the cloud in Virginia. Ten million records across five to eight tables — two and a half million customers and seven and a half million related records covering orders, payments, deliveries, and everything attached to them.

The manager told me they had already built a migration tool. Java-based running on a parent-to-child record model. Nearly finished. Just a couple of bugs were causing it to fail. He needed me to fix the bugs.

The moment he said Java-based migration tool, I knew it would not work. Not because of the bugs. Because of the approach. I didn’t argue, but I did mention it in passing, like commenting on the weather.

Migration 101

There is one rule in data migration that supersedes everything else. You do not migrate data record by record. Never, ever, whether it’s 5 thousand or 5 billion records.

This is not an advanced principle. It is the first thing you learn. Record-by-record processing — connecting to a source, extracting one record, transforming it, writing it to the destination, then starting again — is appropriate for real-time event processing. It is catastrophically wrong for bulk data migration. The overhead of individual transactions, connections, and commits at scale is not a performance problem you optimize your way out of. It is a structural problem baked into the approach before a single line of code is written.

A Java tool that processes records individually is not a slow migration tool. It is the wrong tool, written in the wrong paradigm, for the wrong reason. Making it faster — multithreading it, optimizing the queries, tuning the connection pool — does not change what it fundamentally is.

I told the manager this before I touched the code. He was confident the tool would work once the bugs were fixed.

So I fixed the bugs.

The Mathematics

The first full run completed overnight. I checked the results in the morning.

Eight thousand customer and related records had been migrated.

I did the calculation. Two and a half million customers to migrate. Eight thousand per day (24 hours). At that rate, running continuously, twenty-four hours a day, seven days a week, for ten months.

I told the manager. Even if I made the tool multithreaded, parallelizing the record-by-record processing across multiple threads, the best realistic outcome was a five to ten times improvement. One to one and a half months instead of ten. Still completely unacceptable for a one-time data migration that needed to happen in a defined cutover window.

The whole approach was off-base. Not the implementation. The approach.

What The Data Actually Looked Like

Before anything could be migrated, the source data needed to be understood. What I found when I examined it was what twenty years of accumulated production data almost always looks like when nobody has been actively maintaining its quality.

Millions of illegal characters embedded in text fields. Addresses that were not addresses — partial entries, malformed strings, fields used for purposes they were never designed for. Phone numbers in formats that bore no relationship to any telephone numbering standard. Duplicate records. Null values in fields that downstream systems would not accept as null.

This was not unusual. Production databases accumulate garbage over time. Users enter bad data. Systems import records from sources with different validation rules. Edge cases get handled with workarounds that leave artifacts. Two and a half million customer records built up over years of real-world use will contain real-world imperfection at scale.

Any migration tool that moved this data without cleaning it first would have delivered garbage to the destination. Fast garbage, but garbage.

This is the second basic rule of data migration: don’t migrate garbage data as your seed data.

The Right Approach

I told the manager I would build a process that would complete the migration in days. He gave me the time.

The correct approach to bulk data migration has a clear structure. Extract everything from the source in bulk, preferably in parallel. Clean the data completely before it touches the destination, preferably in parallel. Load it to the target in bulk, preferably in parallel. The individual steps can be complex. The principle is simple.

I built a set of Perl and bash scripts over two to three weeks. The process downloaded the complete source dataset in bulk from the Vancouver SQL Server, ran it through a cleaning stage that handled the illegal characters, the malformed addresses, the phone number formatting, the duplicates — everything that would have caused failures or corrupted the destination data. It formatted the cleaned data as XML and loaded it into the MySQL database in Virginia in bulk.

The entire migration ran in three to four days.

What The Story Is Actually About

The manager was not incompetent. He had a team that had built a tool, identified a problem with it, and brought in someone to fix it. That is a reasonable response to a technical problem when you do not have deep expertise in the specific domain.

The issue was that the tool itself — not its bugs — was the problem. And that distinction is only visible if you have enough experience with data migration to recognize what the approach reveals before you see the implementation.

Five years of Morgan Stanley data migration work had given me that experience. I knew what bulk migration looked like, what record-by-record processing costs at scale, and what the mathematics of ten million records would produce against an eight thousand record per run baseline.

Most consultants would have fixed the bugs, run the tool, seen the results, and then proposed a new approach. I told him before I fixed the bugs what the result would be. I fixed them anyway because the mathematics needed to speak for themselves. Eight thousand records. One overnight run. Ten months to completion.

Some problems are not what you are told they are. The skill is hearing the description and knowing, before you open a single file, where the real problem lives.

Russ Profant is a solutions architect and independent consultant with 30 years of experience across financial services, investment banking, healthcare, and government. He runs PC4IT, offering cloud cost diagnostics and architecture advisory to mid-market organizations. pc4it.com

Book a 20-minute call

Send a message