Data Migration Service

09 Jun, 2017 | 6 minutes read

Data migration is a time-consuming process and it requires effort to set up, transform, replicate, and keep up-to-date data that is migrating from one database to another. When replication is heterogeneous meaning different source and target database types are present, this process has an even longer execution time.

The introduction of AWS Data Migration Service (DMS) reduces migration time and migrates databases to and from AWS easily and securely. The service supports homogeneous migrations such as MS SQL to MSSQL, heterogeneous migrations among different database platforms, such as Oracle to PostgreSQL or MySQL to Oracle, or migration to different versions (upgrade or downgrade) from the same databases with virtually no downtime. Migration can be done to and from the most widely used commercial and open-source databases, such as Oracle, MySQL, and PostgreSQL.

Although there are menu constraints and unsupported migration options, AWS DMS is a powerful tool that will be even more sophisticated in the future, knowing the AWS upgrading trend.

Migration

AWS Database Migration Service helps you migrate databases to AWS easily and securely. The source database remains operational during the migration, minimizing downtime to applications that rely on the database. You can use the AWS Database Migration Service to migrate your data to and from the most widely used commercial and open-source databases.

Current supported sources are available here: (http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Sources.html ).

Current supported targets are available here: (http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Targets.html ).

If you are using homogeneous migration, where the destination schema is the same as the schema on the source database, you can use the AWS DMS console to set up and start your migration.

If your destination schema is different from the source schema you can use the AWS Schema migration tool to describe in detail how the mappings from source to destination need to be done. We will dive into the details of setting migration and go through current AWS DMS options.

Migrate from on-premise MS SQL to AWS RDS PostgreSQL database

The main players in one AWS DMS migration process are replication instances, endpoints, and migration tasks.

Transaction log processing

Before you start the migration make sure your source and target databases are set up and ready to use. Note: one of the endpoints – source or target – must be the AWS cloud database.

1. Configure replication instance on AWS.

This instance reads data from the source database and loads it into the destination database, and can be used both as a one-time migration or as an ongoing replication. A replication instance is a type of EC2 instance and it needs to be run on the same VPC as the endpoint that is under VPC. In this case, target database.

Configuration steps:

  • Navigate to AWS DMS
  • On the menu to the left select Replication instances
  • Choose Create replication instance
  • Enter the Name and Description for the instance
  • Choose Replication instance type
  • Choose VPC under your target database is configured
  • Choose Multi-AZ (Yes or No)
  • Check if you like to be Publicly available
  • Allocated storage (by default is 50 GB)
  • Replication Subnet Group. DMS created one Replication Subnet Group by default that adds all Subnets from selected VPC. You can configure this Subnet Group at any moment.
  • Availability zone (if it is Multi-AZ, you should not specify any)
  • VPC security groups (use the default one if you do not have created one, especially for AWS DMS)
  • Choose the KMS master key (one that you have created or from available ones)
  • Account and Key ARN are populated by AWS

Create a replication instance. It took around 3-5 minutes to create an operational replication instance. Here is an example of an already-created one:

Creating an operational replication instance

   2. Configure endpoints.

Endpoints are connections to source and target databases.

  • Navigate to AWS DMS
  • On the menu to the left select Endpoints
  • Select Create endpoint
  • In a new window choose Source/Target
  • Enter endpoint name in Endpoint identifier
  • Choose Source engine. In our case MSSQL/Posgres
  • Enter server name
  • Enter Port (by default is 1433)
  • For SSL mode choose none, if it is not defined differently
  • Enter server username and password
  • In the Advanced section leave everything by default
  • In the Test endpoint connection, choose the VPC and replication instance that you have created in previous steps and hit Run test. If you have success, choose Create endpoint.

3. Create Task

  • Navigate to AWS DMS
  • On the menu to the left select Tasks
  • Select Create task
  • Enter Task name
  • Choose the replication instance that you’ve created in step 1
  • Choose the source endpoint that you’ve created in step 2
  • Choose the target endpoint that you’ve created in step 2
  • Choose Migration type. In our initial load migration, we choose to Migrate existing data. DMS supports ongoing replication to keep the target in sync with the source. On-going replication can’t be done using the Aws Schema conversion tool.
  • If you like to start the task immediately after creation, leave the option Start task on Create checked
  • In the Task Settings section choose Do nothing for tables in Target and Do not include LOB columns because in our case we do not have any
  • Jump to the Table mappings section. Here you enter schemas and tables that need to be included in migration. This section will look like:
Table mappings section
  • Transformation rules are used when we need to take only a subset of data from the source.
  • Choose Create task.
  • Migration will start automatically. First, the structure on target will be created and then the data will be migrated.

4. Re-run the migration

  • Choose the task
  • Modify if needed
  • Start the task again

Different schema and table definitions in the target database

If you want to migrate data from one database to another one where table definitions are different, you will need to use the AWS Schema migration tool.

You can download it and set it up locally from the AWS link Download, under the AWS Schema conversion tool section, on the right menu from the AWS DMS welcome page.

Note: In order to make data migration using the AWS Schema conversion tool, one of the endpoints must be on AWS and accessible from a local machine (outside AWS). SCD is used only for initial migration and does not support ongoing replication.

Improving the Performance of an AWS Database Migration Service Migration

A number of factors affect the performance of your AWS DMS migration. In our case, we were using RDS db.t2.small PostgreSQL database instance and we made a migration of four tables with more than 6 million records. The migration of structure and data was done in less than a minute. Check and optimize the following factors in order to have a faster migration:

  • Resource availability on the source
  • The available network throughput
  • The resource capacity of the replication server
  • The ability of the target to ingest changes
  • The type and distribution of source data
  • The number of objects to be migrated

Migration’s performance can be limited to one or more bottlenecks along the way, so we can do a few things to increase performance:

  • Load multiple tables in parallel. By default, AWS DMS loads eight tables at a time, but this number can be increased and we can produce some performance improvements when using a very large replication server, such as a dms.c4.xlarge or a larger instance. However, at some point increasing this parallelism reduces performance. If the replication server is relatively small, such as a dms.t2.medium, in order to achieve better performance this number should be reduced.
  • Remove bottlenecks on the target, meaning remove all processes that might compete with each other for writing into resources on the target database like disabling unnecessary triggers, validation, and secondary indexes. When migrating to an Amazon RDS database, it is a good idea to disable backups and Multi-AZ on the target until you are ready to cutover. Similarly, when migrating to non-Amazon RDS systems, disabling any logging on the target until cutover is usually a good idea.
  • Use multiple tasks. Using multiple tasks for a single migration can improve performance particularly if migration has sets of tables that do not participate in common transactions. Transactional consistency is maintained within a task, so it is important that tables in separate tasks do not participate in common transactions.

AWS DMS processes change in a transactional mode, which preserves transactional integrity. Each task independently reads the transaction stream, so be careful not to put too much stress on the source system.

Conclusion

As companies move database resources to AWS they often change their primary database engine. Most of the current methods for data migration to the cloud or switching engines require an extended outage. The AWS DMS provides fast database migration workloads to AWS or database engine change while minimizing any associated downtime.