Published 2 months ago

Seamless MySQL to OceanBase Data Migration with Apache SeaTunnel

Software Development
Seamless MySQL to OceanBase Data Migration with Apache SeaTunnel

Seamless MySQL to OceanBase Data Migration with Apache SeaTunnel

Migrating your data from MySQL to OceanBase can be a complex undertaking. This guide simplifies the process by demonstrating a lightweight, efficient solution using Apache SeaTunnel and its powerful Zeta engine. We'll cover full data synchronization, offline incremental synchronization, and real-time Change Data Capture (CDC), providing a comprehensive approach for various data migration needs.

Preparing Your Environment

Before we begin, let's ensure your environment is properly configured.

Install Java

SeaTunnel requires Java 8 or higher. While Java 8 is recommended for optimal compatibility, later versions should work without issue. Verify your Java installation using the following command:

java -version

Ensure that the JAVA_HOME environment variable is correctly set.

Download and Install Apache SeaTunnel

Download the latest stable release of Apache SeaTunnel from the official website. For this tutorial, we'll use version 2.3.9. You can download and extract it using these commands:

wget https://dlcdn.apache.org/seatunnel/2.3.9/apache-seatunnel-2.3.9-bin.tar.gz
tar -zxvf apache-seatunnel-2.3.9-bin.tar.gz

Installing Connector Plugins

SeaTunnel's core framework and the Zeta engine are included in the installation package. To connect to various data sources, including MySQL and OceanBase, you'll need the appropriate connector plugins. Let's explore two installation methods.

Automatic Plugin Installation

The easiest way is to utilize SeaTunnel's automatic plugin installation. Modify the config/plugin_config file to specify the required connectors. This example includes the essential connectors for this guide:

connector-cdc-mysql
connector-jdbc
connector-fake
connector-console

Then, install the plugins using:

sh bin/install-plugin.sh 2.3.9

Manual Plugin Installation

Alternatively, manually download the necessary JAR files from the Apache Maven Repository. For example:

connector-cdc-mysql-2.3.9.jar
connector-console-2.3.9.jar
connector-fake-2.3.9.jar
connector-jdbc-2.3.9.jar
seatunnel-transforms-v2-2.3.9.jar

Place these JAR files in the Connectors directory within your SeaTunnel installation.

Verifying Connector Installation

Confirm correct installation using:

./bin/seatunnel-connector.sh -l

You should see a list of installed sources, sinks, and transforms, including Jdbc and MySQL-CDC.

Because we'll be using JDBC to connect to OceanBase, you'll also need the MySQL JDBC driver. Download it from the official MySQL website and place the mysql-connector-j-*.jar file in the SeaTunnel lib directory.

Verifying SeaTunnel Installation

Test your SeaTunnel installation with a simple batch processing job using the default configuration template:

./bin/seatunnel.sh --config ./config/v2.batch.config.template -m local

Successful execution will display log output indicating data processing. A final summary log will show statistics, such as total read and write counts.

Full Data Synchronization

Let's demonstrate full data synchronization by creating test tables in both MySQL and OceanBase. We'll then migrate data from MySQL to OceanBase.

Creating Test Tables

Create identical tables in both your MySQL and OceanBase databases. Example schema for table table1:

CREATE TABLE table1 (
    id INT NOT NULL AUTO_INCREMENT,
    value1 VARCHAR(255) NOT NULL,
    value2 VARCHAR(255),
    value3 VARCHAR(255),
    value4 VARCHAR(255),
    value5 VARCHAR(255),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    PRIMARY KEY (id),
    UNIQUE INDEX idx_value1 (value1),
    INDEX idx_value2_value3 (value2, value3),
    INDEX idx_value3_value4_value5 (value3, value4, value5)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Populate your MySQL table1 with sample data.

Configuring SeaTunnel for Full Synchronization

This configuration file demonstrates single-table full synchronization:

env {
  parallelism = 5
  job.mode = "BATCH"
}
source {
    Jdbc {
        url = "jdbc:mysql://127.0.0.1:3306/mysql?&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "xxx"
        password = "xxx"
        query = "select * from seatunnel.table1"
    }
}
sink {
    jdbc {
        url = "jdbc:mysql://127.0.0.1:2883/mysql?&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
        driver = "com.mysql.cj.jdbc.Driver"
        user = "xxx@xxx"
        password = "xxx"
        generate_sink_sql = true
        database = seatunnel
        table = table1
    }
}

Run this configuration using ./bin/seatunnel.sh --config <config_file>

For multi-table synchronization, you can extend this configuration by listing multiple tables in both the source and sink sections. Remember to handle schema differences appropriately.

Incremental Synchronization

Incremental synchronization efficiently updates the target database with only the changes since the last synchronization. This configuration uses a query to filter based on an id or updatetime column:

env {
  parallelism = 1
  job.mode = "BATCH"
}
source {
    Jdbc {
        url = "jdbc:mysql://127.0.0.1:3306/mysql?&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "xxx"
        password = "xxx"
        query = "SELECT * FROM seatunnel.table1 WHERE updatetime > '2024-01-01' "
    }
}
sink {
    jdbc {
        url = "jdbc:mysql://127.0.0.1:2883/mysql?&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
        driver = "com.mysql.cj.jdbc.Driver"
        user = "xxx@xxx"
        password = "xxx"
        generate_sink_sql = true
        database = seatunnel
        table = table1
    }
}

For more robust incremental sync, consider using workflow schedulers like Apache DolphinScheduler to manage the update time or ID.

CDC Synchronization

Change Data Capture (CDC) provides real-time data synchronization. This requires MySQL binlog to be enabled. Ensure your MySQL user has the necessary privileges:

GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'user' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;

Verify binlog settings:

show variables where variable_name in ('log_bin', 'binlog_format', 'binlog_row_image', 'gtid_mode', 'enforce_gtid_consistency');

Your binlog configuration should be similar to:

log_bin = ON
binlog_format = ROW
binlog_row_image = FULL
enforce_gtid_consistency = ON
gtid_mode = ON

SeaTunnel's CDC configuration:

env {
  parallelism = 1
  job.mode = "STREAMING"
  checkpoint.interval = 10000
}
source {
  MySQL-CDC {
    base-url = "jdbc:mysql://127.0.0.1:3306/mysql"
    username = "xxx"
    password = "xxx@xxx"
    table-names = ["seatunnel.table1", "seatunnel.table2"]
    startup.mode = "initial"
  }
}
sink {
    jdbc {
        url = "jdbc:mysql://127.0.0.1:2883/mysql?&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
        driver = "com.mysql.cj.jdbc.Driver"
        user = "xxx@xxx"
        password = "xxx"
        database = "seatunnel"
        table-names = ["seatunnel.table1", "seatunnel.table2"]
        generate_sink_sql = true
    }
}

startup.mode controls the initial sync behavior: initial (default), earliest, latest, or specific (requires offset file and position).

Conclusion

Apache SeaTunnel provides a powerful and flexible solution for data migration between MySQL and OceanBase. We've explored full, incremental, and CDC synchronization strategies, enabling you to choose the best approach for your specific requirements. Remember to carefully configure your connectors and handle schema differences to ensure a smooth and successful data migration.

Hashtags: #ApacheSeaTunnel # MySQL # OceanBase # DataMigration # CDC # IncrementalSync # FullSync # ZetaEngine # JDBC # DatabaseMigration

Related Articles

thumb_nail_Unveiling the Haiku License: A Fair Code Revolution

Software Development

Unveiling the Haiku License: A Fair Code Revolution

Dive into the innovative Haiku License, a game-changer in open-source licensing that balances open access with fair compensation for developers. Learn about its features, challenges, and potential to reshape the software development landscape. Explore now!

Read More
thumb_nail_Leetcode - 1. Two Sum

Software Development

Leetcode - 1. Two Sum

Master LeetCode's Two Sum problem! Learn two efficient JavaScript solutions: the optimal hash map approach and a practical two-pointer technique. Improve your coding skills today!

Read More
thumb_nail_The Future of Digital Credentials in 2025: Trends, Challenges, and Opportunities

Business, Software Development

The Future of Digital Credentials in 2025: Trends, Challenges, and Opportunities

Digital credentials are transforming industries in 2025! Learn about blockchain's role, industry adoption trends, privacy enhancements, and the challenges and opportunities shaping this exciting field. Discover how AI and emerging technologies are revolutionizing identity verification and workforce management. Explore the future of digital credentials today!

Read More
Your Job, Your Community
logo
© All rights reserved 2024