AWS Data Migration Service – Replication Aspect

 

AWS Data Migration Service (DMS) reduces migration time and migrates databases to and from AWS easily and securely. This AWS service supports homogeneous migrations such as MS SQL to MSSQL, heterogeneous migrations among different database platforms, such as Oracle to PostgreSQL or MySQL to Oracle or migration to different versions (upgrade or downgrade) from the same databases with virtually no downtime. Migration can be done to and from most widely used commercial and open-source databases, such as Oracle, MySQL, and PostgreSQL.

AWS Database Migration Service can migrate your data in several ways:

  • Migrating Data to the Target Database – This process creates files or tables in the target database, automatically defines the metadata that is required at the target, and populates the tables with data from the source. The data from the tables is loaded in parallel for improved efficiency. This process is the Migrate existing data option in the AWS console and is called Full Load in the API.
  • Capturing Changes During Migration – This process captures changes to the source database that occur while the data is being migrated from the source to the target. When the migration of the originally requested data has completed, the change data capture (CDC) process then applies the captured changes to the target database. Changes are captured and applied as units of single committed transactions, and several different target tables can be updated as a single source commit. This approach guarantees transactional integrity in the target database. This process is the Migrate existing data and replicate ongoing changes option in the AWS console and is called full-load-and-cdc in the API and will be in focus of this blog.
  • Replicating Only Data Changes on the Source Database – This process reads the recovery log file of the source database management system (DBMS) and groups together the entries for each transaction. If AWS DMS can’t apply changes to the target within a reasonable time (for example, if the target is not accessible), AWS DMS buffers the changes on the replication server for as long as necessary. It doesn’t reread the source DBMS logs, which can take a large amount of time. This process is the Replicate data changes only option in the AWS DMS console.

Using AWS DMS allows customized table mappings and translations. This post is focused on continuous data replication feature, which keeps the target database up to date after the initial migration.

 

Migrate from on-premise MS SQL to AWS RDS PostgreSQL database – continuous replication

 

The main players in one AWS DMS migration process are replication instance, endpoints and migration task.

 

how DMS operates

 

To understand what happened, we need to consider how DMS operates. DMS functions as an intermediate broker to a variety of different database engines. To make this conversion happen, DMS first maps the source data type (JSON in this case) to an intermediate DMS type and then maps that type to the target data type (long text). It’s very important to take into account these mappings when designing your migration strategy. Note that I used the default mapping. DMS also allows you to customize the mappings if you want.

AWS DMS provides comprehensive ongoing replication of data, although it replicates only a limited amount of data definition language (DDL). AWS DMS doesn’t propagate items such as indexes, users, privileges, stored procedures, and other database changes not directly related to table data.

If you want to use ongoing replication, you must enable the Multi-AZ option on your replication instance. The Multi-AZ option provides high availability and failover support for the replication instance.

Also there are several configuration steps that needs to be done on source enpoint. For example, if you migrate from MS SQL source, you need to decise how Capture Data Change will be done:

  • MS-Replication
  • MS-CDC
  • Without MS-Replication and MS-CDC only for INSERT and DELETE

If MS SQL source is configured to use MS-CDC there are limitation listed here: http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.SQLServer.html#CHAP_Source.SQLServer.CDCLimitations .

After source is configures are ready to go, you can configure DMS replication task.

 

Ongoing replication setup

 

Let’s create one task that will support replication of source changes to the target. We assume that endpoints and replication instances are already set up: http://interworks.com.mk/data-migration-service/

Next is to create a task:

  • Navigate to Tasks from DMS main menu
  • Select Create Task
  • Choose Task name, Replication instance, Source and Target endpoints
  • For Migration type choose Migrate existing data and replicate ongoing changes
  • Because task will do the initial load in Task Setting you can choose Drop tables on target (if any)
  • If you have LOB, choose setting that suits your need
  • You can enable logging for extra info about migration process
  • That’s it. You can hit Create task.

DMS first migrates initial data and than enables replication for ongoing changes. If you make data change in source you will quickly see the change in the destination data set.

DMS provides good statistics dashboard for Task: Task Monitoring and Table Statistics.

 

Conclusion

 

Companies change their database resources and migrates the data that usually cause them extended outage. The AWS DMS provides quick setup, fast database migration workloads to AWS or database’s engine change and minimize any associated downtime.