Friday 15 January 2021

Utilising AWS DataSync and AWS Cloud Native Storage to migrate file servers to support workloads running in VMware Cloud on AWS

Whilst working with customers on the technical aspects of migrating their applications from on-premises into VMware Cloud on AWS most often the topic of file servers comes up. If the customer currently has their file servers running as virtual machines on top of ESXi then it's very straight forward, they can use HCX to migrate the workload. Depending on the size of the file server the customer may not want to do this due to potentially increasing the host count of the cluster and instead use some native AWS storage services such as FSx, EFS or even S3. Another scenario I have come across is customers having physical file servers running on storage arrays such as NetApp etc. 

During my investigation, I can across AWS DataSync which automates and accelerates the moving of data from on-premises into AWS storage services as well as between AWS storage services.

As of writing this article, the following services/protocols are supported:
  • Amazon EFS file system (Source and Target)
  • Amazon FSx for Windows File Server (Source and Target)
  • Amazon S3 (Source and Target)
  • Network File System (NFS) (Source Only)
  • Object storage (Source Only)
  • Server Message Block (SMB) (Source Only)
As an example, DataSync can be used to move data from an on-premises Windows file server into AWS FSx for Windows or from Amazon EFS into...Amazon S3. It cannot be used to go from Amazon back to on-premises i.e. Amazon FSx for Windows back to an on-premises Windows file server. In this article, I'm going to move data from an on-premises Windows file server running Windows Server 2019 into an Amazon FSx for Windows environment. The Amazon FSx service is already deployed and integrated with my on-premises Active Directory servers.

My Amazon FSx service is already running and has been deployed into the connected VPC and in the same Availability Zone as the SDDC to avoid any cross AZ charges:

I've also created a new share within FSx using Windows Shared Folders (fsmgmt.msc) which matches the name of the share that I already have on-premises:

I now need to deploy the AWS DataSync agent on-premises and configure the replication task. Within the AWS console change to the region where your SDDC is deployed and find the AWS DataSync service:

We want to create a transfer task between on-premises and AWS:

Select the agent VM hypervisor image, download and deploy it into your infrastructure. In my example, I will be using VMware ESXi as my hypervisor of choice but KVM, Hyper-V and EC2 are also supported. Once the appliance has been download and deployed follow the AWS instructions to configure the agent's network settings. Once the agent is deployed and configured we need to decide which route we want traffic to take when moving the data. We can either go via the internet and use an AWS Public Service Endpoint or VPC Endpoint using AWS PrivateLink. Since I don't have a VPN or Direct Connect established to my VMC connected VPC I will be using the Public Service Endpoint. 

When configuring the agent I need to manually enter the agent's activation key which can be found when logging into the agent VM console, specifying the AWS region and service endpoint type:

You can now enter the Agent VM activation key and well as an optional name and any tags you might care to use:

The agent should then successfully communicate with the service and we should see the agent status as being online:

If you are having connectivity issues using either the public endpoint or the VPC endpoint verify the network requirements.

Now that we have our agent deployed and online as well as our Amazon FSx service deployed with a share created and connected to our on-premises Active Directory for authentication, we can create the scheduled task to sync the data on a regular basis. Within the AWS DataSync service create a new task and specify the location type. Since we are copying data from a Windows file server the location type will be Server Message Block (SMB). We then select the agent that we deployed and supply the SMB server name/IP and share mount:

We then need to specify the user credentials that have access to read from that share:

Once we have created the source location we need to create the target location. We are copying the data to Amazon FSx so select that as the location type and enter the FSx file system and share name:

Enter the user credentials that have access to the FSx server and click Next:

Give the task a suitable name and you can then specify additional options such as verification of data, bandwidth limits and whether or not you wish to enable queueing. The only change from the default options I have made is to not keep deleted files, I want my FSx file server to be identical to my on-premises file server:

You can filter out specific files that perhaps you don't want to be replicated such as temp files using regex patterns. I've set my sync to occur every hour at five minutes past the hour. You can specify hourly, daily, weekly, days of the week or a custom cron job:

You can then specify whether you want task logging enabled. This would allow you to set up alerts in the event of a task failing to ensure the issue is resolved straight away.

You get the chance to review your settings and if you are happy then you can create the task. During the task creating process, your settings are pushed down to the agent VM on-premises and once it's complete it should show available. Rather than wait for the configured time for the task to start I will just manually start it:

Prior to the job running, we can see that my on-premises file server has some files but my Amazon FSx is currently empty:

Once the task finishes we can see that our Amazon FSx file server now has an exact copy of the on-premises file server including permissions and timestamps:

The task will continue to run every hour and copy new or changed files across.

If you are looking to use AWS DataSync and FSx (Or any other supported service) as part of a migration strategy then you could simply keep the task operational until you agree downtime on the on-premises file server with the business. At that point, you would simply change the share to read-only and use logon scripts/GPOs to map the drive to the new location running on FSx. If this was for a disaster recovery scenario then you could just leave the task in place permanently and if a 1hr RPO is acceptable (Worse case) then your data will be ready and waiting for you when you failover into VMware Cloud on AWS.

No comments:

Post a Comment