Seamless MySQL to OceanBase Data Migration with Apache SeaTunnel
Seamless MySQL to OceanBase Data Migration with Apache SeaTunnel
Migrating your data from MySQL to OceanBase can be a complex undertaking. This guide simplifies the process by demonstrating a lightweight, efficient solution using Apache SeaTunnel and its powerful Zeta engine. We'll cover full data synchronization, offline incremental synchronization, and real-time Change Data Capture (CDC), providing a comprehensive approach for various data migration needs.
Preparing Your Environment
Before we begin, let's ensure your environment is properly configured.
Install Java
SeaTunnel requires Java 8 or higher. While Java 8 is recommended for optimal compatibility, later versions should work without issue. Verify your Java installation using the following command:
java -version
Ensure that the JAVA_HOME
environment variable is correctly set.
Download and Install Apache SeaTunnel
Download the latest stable release of Apache SeaTunnel from the official website. For this tutorial, we'll use version 2.3.9. You can download and extract it using these commands:
wget https://dlcdn.apache.org/seatunnel/2.3.9/apache-seatunnel-2.3.9-bin.tar.gz
tar -zxvf apache-seatunnel-2.3.9-bin.tar.gz
Installing Connector Plugins
SeaTunnel's core framework and the Zeta engine are included in the installation package. To connect to various data sources, including MySQL and OceanBase, you'll need the appropriate connector plugins. Let's explore two installation methods.
Automatic Plugin Installation
The easiest way is to utilize SeaTunnel's automatic plugin installation. Modify the config/plugin_config
file to specify the required connectors. This example includes the essential connectors for this guide:
connector-cdc-mysql
connector-jdbc
connector-fake
connector-console
Then, install the plugins using:
sh bin/install-plugin.sh 2.3.9
Manual Plugin Installation
Alternatively, manually download the necessary JAR files from the Apache Maven Repository. For example:
connector-cdc-mysql-2.3.9.jar
connector-console-2.3.9.jar
connector-fake-2.3.9.jar
connector-jdbc-2.3.9.jar
seatunnel-transforms-v2-2.3.9.jar
Place these JAR files in the Connectors
directory within your SeaTunnel installation.
Verifying Connector Installation
Confirm correct installation using:
./bin/seatunnel-connector.sh -l
You should see a list of installed sources, sinks, and transforms, including Jdbc
and MySQL-CDC
.
Because we'll be using JDBC to connect to OceanBase, you'll also need the MySQL JDBC driver. Download it from the official MySQL website and place the mysql-connector-j-*.jar
file in the SeaTunnel lib
directory.
Verifying SeaTunnel Installation
Test your SeaTunnel installation with a simple batch processing job using the default configuration template:
./bin/seatunnel.sh --config ./config/v2.batch.config.template -m local
Successful execution will display log output indicating data processing. A final summary log will show statistics, such as total read and write counts.
Full Data Synchronization
Let's demonstrate full data synchronization by creating test tables in both MySQL and OceanBase. We'll then migrate data from MySQL to OceanBase.
Creating Test Tables
Create identical tables in both your MySQL and OceanBase databases. Example schema for table table1
:
CREATE TABLE table1 (
id INT NOT NULL AUTO_INCREMENT,
value1 VARCHAR(255) NOT NULL,
value2 VARCHAR(255),
value3 VARCHAR(255),
value4 VARCHAR(255),
value5 VARCHAR(255),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (id),
UNIQUE INDEX idx_value1 (value1),
INDEX idx_value2_value3 (value2, value3),
INDEX idx_value3_value4_value5 (value3, value4, value5)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Populate your MySQL table1
with sample data.
Configuring SeaTunnel for Full Synchronization
This configuration file demonstrates single-table full synchronization:
env {
parallelism = 5
job.mode = "BATCH"
}
source {
Jdbc {
url = "jdbc:mysql://127.0.0.1:3306/mysql?&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
driver = "com.mysql.cj.jdbc.Driver"
connection_check_timeout_sec = 100
user = "xxx"
password = "xxx"
query = "select * from seatunnel.table1"
}
}
sink {
jdbc {
url = "jdbc:mysql://127.0.0.1:2883/mysql?&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
driver = "com.mysql.cj.jdbc.Driver"
user = "xxx@xxx"
password = "xxx"
generate_sink_sql = true
database = seatunnel
table = table1
}
}
Run this configuration using ./bin/seatunnel.sh --config <config_file>
For multi-table synchronization, you can extend this configuration by listing multiple tables in both the source
and sink
sections. Remember to handle schema differences appropriately.
Incremental Synchronization
Incremental synchronization efficiently updates the target database with only the changes since the last synchronization. This configuration uses a query to filter based on an id
or updatetime
column:
env {
parallelism = 1
job.mode = "BATCH"
}
source {
Jdbc {
url = "jdbc:mysql://127.0.0.1:3306/mysql?&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
driver = "com.mysql.cj.jdbc.Driver"
connection_check_timeout_sec = 100
user = "xxx"
password = "xxx"
query = "SELECT * FROM seatunnel.table1 WHERE updatetime > '2024-01-01' "
}
}
sink {
jdbc {
url = "jdbc:mysql://127.0.0.1:2883/mysql?&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
driver = "com.mysql.cj.jdbc.Driver"
user = "xxx@xxx"
password = "xxx"
generate_sink_sql = true
database = seatunnel
table = table1
}
}
For more robust incremental sync, consider using workflow schedulers like Apache DolphinScheduler to manage the update time or ID.
CDC Synchronization
Change Data Capture (CDC) provides real-time data synchronization. This requires MySQL binlog to be enabled. Ensure your MySQL user has the necessary privileges:
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'user' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;
Verify binlog settings:
show variables where variable_name in ('log_bin', 'binlog_format', 'binlog_row_image', 'gtid_mode', 'enforce_gtid_consistency');
Your binlog configuration should be similar to:
log_bin = ON
binlog_format = ROW
binlog_row_image = FULL
enforce_gtid_consistency = ON
gtid_mode = ON
SeaTunnel's CDC configuration:
env {
parallelism = 1
job.mode = "STREAMING"
checkpoint.interval = 10000
}
source {
MySQL-CDC {
base-url = "jdbc:mysql://127.0.0.1:3306/mysql"
username = "xxx"
password = "xxx@xxx"
table-names = ["seatunnel.table1", "seatunnel.table2"]
startup.mode = "initial"
}
}
sink {
jdbc {
url = "jdbc:mysql://127.0.0.1:2883/mysql?&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
driver = "com.mysql.cj.jdbc.Driver"
user = "xxx@xxx"
password = "xxx"
database = "seatunnel"
table-names = ["seatunnel.table1", "seatunnel.table2"]
generate_sink_sql = true
}
}
startup.mode
controls the initial sync behavior: initial
(default), earliest
, latest
, or specific
(requires offset file and position).
Conclusion
Apache SeaTunnel provides a powerful and flexible solution for data migration between MySQL and OceanBase. We've explored full, incremental, and CDC synchronization strategies, enabling you to choose the best approach for your specific requirements. Remember to carefully configure your connectors and handle schema differences to ensure a smooth and successful data migration.
Related Articles
Software Development
Unveiling the Haiku License: A Fair Code Revolution
Dive into the innovative Haiku License, a game-changer in open-source licensing that balances open access with fair compensation for developers. Learn about its features, challenges, and potential to reshape the software development landscape. Explore now!
Read MoreSoftware Development
Leetcode - 1. Two Sum
Master LeetCode's Two Sum problem! Learn two efficient JavaScript solutions: the optimal hash map approach and a practical two-pointer technique. Improve your coding skills today!
Read MoreBusiness, Software Development
The Future of Digital Credentials in 2025: Trends, Challenges, and Opportunities
Digital credentials are transforming industries in 2025! Learn about blockchain's role, industry adoption trends, privacy enhancements, and the challenges and opportunities shaping this exciting field. Discover how AI and emerging technologies are revolutionizing identity verification and workforce management. Explore the future of digital credentials today!
Read MoreSoftware Development
Unlocking the Secrets of AWS Pricing: A Comprehensive Guide
Master AWS pricing with this comprehensive guide! Learn about various pricing models, key cost factors, and practical tips for optimizing your cloud spending. Unlock significant savings and efficiently manage your AWS infrastructure.
Read MoreSoftware Development
Exploring the GNU Verbatim Copying License
Dive into the GNU Verbatim Copying License (GVCL): Understand its strengths, weaknesses, and impact on open-source collaboration. Explore its unique approach to code integrity and its relevance in today's software development landscape. Learn more!
Read MoreSoftware Development
Unveiling the FSF Unlimited License: A Fairer Future for Open Source?
Explore the FSF Unlimited License: a groundbreaking open-source license designed to balance free software distribution with fair developer compensation. Learn about its origins, strengths, limitations, and real-world impact. Discover how it addresses the challenges of open-source sustainability and innovation.
Read MoreSoftware Development
Conquer JavaScript in 2025: A Comprehensive Learning Roadmap
Master JavaScript in 2025! This comprehensive roadmap guides you through fundamental concepts, modern frameworks like React, and essential tools. Level up your skills and build amazing web applications – start learning today!
Read MoreBusiness, Software Development
Building a Successful Online Gambling Website: A Comprehensive Guide
Learn how to build a successful online gambling website. This comprehensive guide covers key considerations, technical steps, essential tools, and best practices for creating a secure and engaging platform. Start building your online gambling empire today!
Read MoreAI, Software Development
Generate Images with Google's Gemini API: A Node.js Application
Learn how to build an AI-powered image generator using Google's Gemini API and Node.js. This comprehensive guide covers setup, API integration, and best practices for creating a robust image generation service. Start building today!
Read MoreSoftware Development
Discover Ocak.co: Your Premier Online Forum
Explore Ocak.co, a vibrant online forum connecting people through shared interests. Engage in discussions, share ideas, and find answers. Join the conversation today!
Read MoreSoftware Development
Mastering URL Functions in Presto/Athena
Unlock the power of Presto/Athena's URL functions! Learn how to extract hostnames, parameters, paths, and more from URLs for efficient data analysis. Master these essential functions for web data processing today!
Read MoreSoftware Development
Introducing URL Opener: Open Multiple URLs Simultaneously
Tired of opening multiple URLs one by one? URL Opener lets you open dozens of links simultaneously with one click. Boost your productivity for SEO, web development, research, and more! Try it now!
Read More
Software Development, Business
Unlocking the Power of AWS: A Deep Dive into Amazon Web Services
Dive deep into Amazon Web Services (AWS)! This comprehensive guide explores key features, benefits, and use cases, empowering businesses of all sizes to leverage cloud computing effectively. Learn about scalability, cost-effectiveness, and global infrastructure. Start your AWS journey today!
Read MoreSoftware Development
Understanding DNS in Kubernetes with CoreDNS
Master CoreDNS in Kubernetes: This guide unravels the complexities of CoreDNS, Kubernetes's default DNS server, covering configuration, troubleshooting, and optimization for seamless cluster performance. Learn best practices and avoid common pitfalls!
Read MoreSoftware Development
EUPL 1.1: A Comprehensive Guide to Fair Open Source Licensing
Dive into the EUPL 1.1 open-source license: understand its strengths, challenges, and real-world applications for fair code. Learn how it balances freedom and developer protection. Explore now!
Read MoreSoftware Development
Erlang Public License 1.1: Open Source Protection Deep Dive
Dive deep into the Erlang Public License 1.1 (EPL 1.1), a crucial open-source license balancing collaboration and contributor protection. Learn about its strengths, challenges, and implications for developers and legal teams.
Read MoreSoftware Development
Unlocking Kerala's IT Job Market: Your Path to Data Science Success
Launch your data science career in Kerala's booming IT sector! Learn the in-demand skills to land high-paying jobs. Discover top data science courses & career paths. Enroll today!
Read More
Software Development
Automation in Software Testing: A Productivity Booster
Supercharge your software testing with automation! Learn how to boost productivity, efficiency, and accuracy using automation tools and best practices. Discover real-world examples and get started today!
Read MoreSoftware Development
Mastering Anagram Grouping in JavaScript
Master efficient anagram grouping in JavaScript! Learn two proven methods: sorting and character counting. Optimize your code for speed and explore key JavaScript concepts like charCodeAt(). Improve your algorithms today!
Read More
Software Development
Mastering Kubernetes Deployments: Rolling Updates and Scaling
Master Kubernetes Deployments for seamless updates & scaling. Learn rolling updates, autoscaling, and best practices for high availability and efficient resource use. Improve your application management today!
Read More