Skip to main content
Feedback

High-performance data extraction for large MySQL tables

This topic describes how Data Integration uses Key-Set Pagination to extract data efficiently and reliably from large MySQL tables during a migration.

Why this matters

When migrating massive tables ranging from a few gigabytes to terabytes, traditional methods often become slower as they progress. This puts increasing strain on your database and can cause migrations to stall. Data Integration uses Key-Set Pagination, a seek-based strategy, to ensure your migration stays fast and stable from the first row to the last.

How it works

Instead of reading through every row to find where it left off, the Data Integration engine uses a unique identifier to jump directly to the next set of data. This is referred to as the seek method.

Key benefits include:

  • Consistent speed: Performance remains high and steady regardless of total table size, whether you are moving the first 1,000 rows or the last billion.
  • Reduced database stress: By avoiding heavy sorting tasks, your source database stays responsive and your daily business operations are not interrupted.
  • Built-in safety: If the system encounters a record that is too large to move safely, it pauses the migration and alerts you rather than skipping data or causing a crash.

Requirements and limitations

To use high-speed extraction, your tables must meet the following criteria.

  • Storage engine

    Your tables must use the InnoDB storage engine, which is standard for most modern MySQL setups.

  • Unique identifiers (indexes)

    The system requires a way to count and navigate through your data accurately:

    • Primary key (recommended): This is the most efficient way to move data.
    • Unique index (alternative): If no primary key exists, the system can use other unique columns, such as email_address or serial_number.
  • Automatic fallback

    If a table uses an older engine such as MyISAM or lacks a unique identifier, the system automatically switches to Legacy Chunking. This method remains reliable but is significantly slower on very large datasets and increases CPU, I/O, and memory usage due to full-table scans and disk-based sorting.

Best practices for a smooth migration

  • Check your engines: Before starting, ensure your largest tables are converted to InnoDB for maximum performance.
  • Identify your keys: If a large table lacks a unique identifier, adding a primary key before the migration can save hours, or even days, of transfer time.
  • Select only what you need: Deselect large, unnecessary columns (such as long text descriptions or internal logs) that are not required in your destination to keep the migration moving efficiently.
  • Monitor database capacity: While this method reduces CPU and memory overhead, it still performs reads on your database. Ensure your source database has sufficient I/O capacity to handle the data being pulled during peak hours.

Pre-migration checklist: optimizing your MySQL source

Before starting your migration, complete this four-step check to ensure your tables move at maximum speed.

1. Check your storage engines

Confirm that your large tables use InnoDB.

The high-speed seek method only works with InnoDB. If a large table uses MyISAM, convert it to InnoDB before starting your migration.

2. Confirm unique identifiers

Confirm that every large table has a primary key or a unique index.

The system requires a reference point to navigate through your data. Without one, the migration falls back to a slower method. Add a primary key to any table with over 1 million rows for a significantly faster migration experience.

3. Audit your columns

Deselect unnecessary columns before syncing.

Large text blocks, internal logs, or BLOB data can slow down the transfer. Sync only the columns your business needs in the destination.

4. Verify database capacity

Confirm that your source database is under 80% I/O load.

Migrating data requires reading from your disk. If your database is already under heavy daily traffic, the migration may take longer. Schedule your initial large-scale migration during off-peak hours if your database resources are limited.

On this Page