Monday, 22 February 2021

HCX Mobility Optimized Networking Policy Routes

With the R145 release of HCX on 30th October 2020, VMware Cloud on AWS customers were treated to some great new functionality at no additional cost. New features such as Replication Assisted vMotion, Application Path Resiliency, TCP Flow Conditioning, Mobility Groups and my personal favourite, Mobility Optimized Networking. I'm not going to go into too much detail around Mobility Optimized Network (MON) since Patrick Kremer (Blog | Twitter) has covered it extensively here.

On an internal slack channel the following question was asked:


When we migrate a VM from on-premises into VMC that resides on a stretched layer 2 network without enabling MON, any traffic that needs to egress that network either destined to VMs on-premises, within VMC or out to the internet will need to go via on-premises. In my lab I have two VMS currently on-premises in the same VLAN:

MigrateVM-12 - 172.30.41.12/24
MigrateVM-13 - 172.30.41.13/24

If I SSH into 172.30.41.12 and ping 172.30.41.13 I see my latency is ~<1ms:



If I ping a workload currently running in VMC (172.30.119.129) you see that my latency is ~100ms as the traffic has to route from Chicago to Frankfurt via a Route Based VPN:



If I ping an IP address out on the internet such as 8.8.8.8 you see that my latency is ~2ms



I'm now going to migrate 172.30.41.12 from on-premises into VMC:


With MON disabled we see that when I ping 172.30.41.13 my latency is now ~100ms since traffic has to traverse the L2 extension, which is as expected:



If I ping 172.30.119.129 which is a VM also running in VMC my latency is ~200ms because in order to egress out of the network I have to do it via the on-premises gateway and then come back into VMC, which once again, with MON disabled is as expected:



Now just for completeness if I ping 8.8.8.8 my latency is ~100ms because once again, I have to traverse the L2 extension in order to egress out of the network, which is as expected:



I'm now going to enable MON on MigrateVM-12 (172.30.41.12) and test the same scenarios as above:


If I ping 172.30.41.13 my latency is still ~100ms as expected since I have to go back across the L2 extension to on-premises:



Now when I ping 172.30.119.129, which is running in VMC, my latency is ~<1ms. This is where the benefit of MON is realised because I now have optimised routing to workloads running in VMC on either native network segments i.e. network segments created directly in VMC or additional stretched networks that have been presented into VMC:



Back to the original question around internet traffic, we see that traffic egresses directly out of the SDDC rather than going via on-premises. We can tell this because the latency is ~3ms when it should be over 100ms:



The reason internet traffic egresses via the SDDC even if the default route of 0.0.0.0/0 is being advertised into the SDDC is because we need to avoid asymmetric routing.  For stretched networks, if we need internet traffic to always egress via on-premises, maybe because we want to ensure the on-premises security posture is maintained whilst extending into the cloud then we need to configure HCX policy routing. By default, all RFC1918 addressed are configured automatically to route via the source gateway rather than the cloud gateway. In order to route internet traffic via the source gateway, we need to add the default route of 0.0.0.0/0 into the policy route. Within the HCX Network Extension Advanced menu item select Policy Routes:


We see by default the source gateway is used for routing to 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16 networks. We simply need to add 0.0.0.0/0 into the policy and ensure the option to Redirect to Peer is set to Allow:


Once we add that route and click submit we now see that the latency from 172.30.41.12 to the internet is ~102ms since it has to traverse the L2 extension and then egress out via on-premises:



This ensures that any on-premises security policies will be applied to VMs running in VMC that need to access the internet but they will still benefit from optimised routing when they need to communicate with other workloads running in the SDDC.

No comments:

Post a comment