Data migration service

    June, 2017

    Data migration is a time consuming process and it requires effort to set up, transform, replicate and keep up-to-date data that is migrating from one database to another. When replication is heterogeneous meaning different source and target database types are present, this process has an even longer execution time.

    Introduction of AWS Data Migration Service (DMS) reduces migration time and migrates databases to and from AWS easily and securely. The service supports homogeneous migrations such as MS SQL to MSSQL, heterogeneous migrations among different database platforms, such as Oracle to PostgreSQL or MySQL to Oracle or migration to different versions (upgrade or downgrade) from the same databases with virtually no downtime. Migration can be done to and from most widely used commercial and open-source databases, such as Oracle, MySQL, and PostgreSQL.

    Although there are menu constraints and unsupported migration options, AWS DMS is a powerful tool that will be even more sophisticated in future, knowing the AWS upgrading trend.

    Migration

     

    AWS Database Migration Service helps you migrate databases to AWS easily and securely. The source database remains operational during the migration, minimizing downtime to applications that rely on the database. You can use the AWS Database Migration Service to migrate your data to and from the most widely used commercial and open-source databases.

    Current supported sources are available here: (http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Sources.html ).

    Current supported targets are available here: (http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Targets.html ).

    If you are using homogeneous migration, where destination schema is the same as schema on source database, you can use AWS DMS console to set up and start your migration.

    If your destination schema is different from the source schema you can use AWS Schema migration tool to describe in details how the mappings from source to destination needs to be done. We will dive into details of setting migration and go through current AWS DMS options.

     

    Migrate from on-premise MS SQL to AWS RDS PostgreSQL database

     

    The main players in one AWS DMS migration process are replication instance, endpoints and migration task.

    Before you start the migration make sure your source and target databases are set up and ready to use. Note: one of the endpoints – source or target – must be AWS cloud database.

    1. Configure replication instance on AWS. This instance reads data from the source database and loads it into the destination database, and can be used both as one-time migration or as an on-going replication. Replication instance is a type of EC2 instance and it needs to be run on the same VPC as endpoint that is under VPC. In this case, target database.

    Configuration steps:

    • Navigate to AWS DMS
    • On the menu to the left select Replication instances
    • Choose Create replication instance
    • Enter Name and Description for the instance
    • Choose Replication instance type
    • Choose VPC under your target database is configured
    • Choose Multi-AZ (Yes or No)
    • Check if you like to be Publicly available
    • Allocated storage (by default is 50 GB)
    • Replication Subnet Group. DMS created one Replication Subnet Group by default that adds all Subnets from selected VPC. You can configure this Subnet Group at any moment.
    • Availability zone (if it is Multi-AZ, you should not specify any)
    • VPC security groups (use the default one if you do not have created one specially for AWS DMS)
    • Choose KMS master key (one that you have created or from available ones)
    • Account and Key ARN are populated by AWS

    Create replication instance. It took around 3-5 minutes to create an operational replication instance. Here is an example of an already created one:

       2.Configure endpoints. Endpoints are connections to source and target databases.

    • Navigate to AWS DMS
    • On menu to the left select Endpoints
    • Select Create endpoint
    • In a new window choose Source/Target
    • Enter endpoint name in Endpoint identifier
    • Choose Source engine. In our case MSSQL/Posgres
    • Enter server name
    • Enter Port (by default is 1433)
    • For SSL mode choose none, if it is not defined differently
    • Enter server username and password
    • In Advanced section leave everything by default
    • In Test endpoint connection, choose VPC and replication instance that you have created in previous steps and hit Run test. If you have success, choose Create endpoint.

    3.Create Task

    • Navigate to AWS DMS
    • On the menu to the left select Tasks
    • Select Create task
    • Enter Task name
    • Choose the replication instance that you’ve created in step 1
    • Choose the source endpoint that you’ve created in step 2
    • Choose the target endpoint that you’ve created in step 2
    • Choose Migration type. In our initial load migration, we choose Migrate existing data. DMS supports ongoing replication to keep the target in sync with the source. On-going replication can’t be done using Aws Schema conversion tool.
    • If you like to start the task immediately after creation, leave the option Start task on create checked
    • In Task Settings section choose Do nothing for tables in target and Do not include LOB columns because in our case we do not have any
    • Jump to Table mappings section. Here you enter schemas and tables that need to be included in migration. This section will look like:

    • Transformation rules are used when we need to take only subset of data from source.
    • Choose Create task.
    • Migration will start automatically. First, the structure on target will be created and then the data will be migrated.

    4. Re-run the migration

    • Choose the task
    • Modify if needed
    • Start the task again

    Different schema and table definitions in target database

     

    If you want to migrate data from one database to another one where table definitions are different, you will need to use AWS Schema migration tool.

    You can download it and set it up locally from AWS link Download, under AWS Schema conversion tool section, on the right menu from AWS DMS welcome page.

    Note: In order to make data migration using AWS Schema conversion tool, one of the endpoints must be on AWS and accessible from local machine (outside AWS). SCD is used only for initial migration and does not support on-going replication.

     

    Improving the Performance of an AWS Database Migration Service Migration

     

    A number of factors affect the performance of your AWS DMS migration. In our case, we were using RDS db.t2.small PostgreSQL database instance and we made migration of four tables with more than 6 million records. Migration of structure and data was done in less than a minute. Check and optimize the following factors in order to have a faster migration:

    • Resource availability on the source
    • The available network throughput
    • The resource capacity of the replication server
    • The ability of the target to ingest changes
    • The type and distribution of source data
    • The number of objects to be migrated

    Migration’s performance can be limited to one or more bottlenecks along the way, so we can do a few things to increase performance:

    • Load multiple tables in parallel. By default, AWS DMS loads eight tables at a time, but this number can be increased and we can produce some performance improvements when using a very large replication server, such as a dms.c4.xlarge or a larger instance. However, at some point increasing this parallelism reduces performance. If replication server is relatively small, such as a dms.t2.medium, in order to achieve better performance this number should be reduced.
    • Remove bottlenecks on the target, meaning remove all processes that might compete with each other for write into resources on target database like: disable unnecessary triggers, validation, and secondary indexes. When migrating to an Amazon RDS database, it is a good idea to disable backups and Multi-AZ on the target until you are ready to cutover. Similarly, when migrating to nonAmazon RDS systems, disabling any logging on the target until cut over is usually a good idea.
    • Use multiple tasks. Using multiple tasks for a single migration can improve performance particularly if migration has sets of tables that do not participate in common transactions. Transactional consistency is maintained within a task, so it is important that tables in separate tasks do not participate in common transactions.

    AWS DMS processes change in a transactional mode, which preserves transactional integrity. Each task independently reads the transaction stream, so be careful not to put too much stress on the source system.

    Conclusion

     

    As companies move database resources to AWS they often change their primary database engine. Most of the current methods for data migration to cloud or switching engines require an extended outage. The AWS DMS provides fast database migration workloads to AWS or database’s engine change while minimizing any associated downtime.

     

    Katerina Tashoska,

    Cloud Practice Lead