I recently worked on a project to deploy several VM’s in Azure. One of the requirements for this was to block all internet access from the Azure VM’s. This is a prudent step in securing an environment; preventing malicious code from web based threats.
Update 1/2018 – Microsoft has implemented NSG Service Tags for storage and Azure SQL. Information on that is located here. Additional information and the opportunity to vote on adding other services can be found here.
To accommodate this, a Network Security Group (NSG) was created and applied to the VM Subnet. Several rules were applied, including one similar to the picture below. The rule simply blocked traffic from the VirtualNetwork out to the Internet on any source or destination port.
After the rule was put in place and tested I began to setup the rest of the environment. Right away I ran into trouble, the VM’s took up to 30 minutes to deploy and errored out with the message “New-AzureRmVm : Long Running Operation Failed with status ‘Failed’.
I also noticed that BGInfo did not deploy on the VM’s. I attempted to redeploy the VM, but that failed as well. With Redeploy not functioning troubleshooting and recovery options are limited and resolving the issue became a priority.
I found a clue in Azures Virtual Network documentation Under “Special Rules” it indicates not to block any traffic from 220.127.116.11. This is a public IP Microsoft users on all the Azure hosts for DHCP, DNS and Health Service Monitoring. The Public IP is used so it will not overlap with any customers non-routable or public IP’s. Thinking that may be the cause of the issue I created inbound and outbound rules allowing the IP. But no luck, the issue still persisted.
We had created several rules blocking inbound and outbound traffic from different subnets. In an effort to identify the source of the issue I disabled each, one by one, deploying VM’s at each step. After several hours (remember, each VM took 30 minutes to deploy) I found disabling the rule that blocks internet access fixed the issue. The VM deployed much quicker with the BGInfo extension working properly and the Redeploy worked as expected.
The next step was to identify why. The only documentation I found was related to allowing the 18.104.22.168 address on the subnet. I started by looking at what the VM was trying to connect to. I ran Netstat on a VM with the internet blocked and saw this:
Notice the two connection attempts with the state of SYN_SENT to blob:http. This indicates the server has tried to connect and waiting for a response. Looking at event logs gave more clues. The Plugins log under Microsoft > WindowsAzure> Status indicated an error connecting to blob.core.windows.net when trying to install BGInfo. I suspect that the inability to connect to the BGInfo blob was causing a timeout , delaying the deployment and causing the error message.
Reviewing other logs under WindowsAzure furthered supported that the VM needs to access Microsoft datacenter services, such as blob storage, to function properly. Access is not limited just to blob storage. The VM will need to access internet based resources to use any PaaS offerings such as SQL.
With that, I’m only left with the option of allowing Internet access. It’s unfortunate that Microsoft does not create a Tag in the NSG for “Azure Resources” so they could be explicitly allowed while disabling Internet access. An alternative option would be to add rules for each Azure IP resource. Below is a link to Azures Datacenter public IP’s. It’s a large list and according to the description, it updates every Wednesday.
Microsoft Datacenter IP’s https://www.microsoft.com/en-us/download/details.aspx?id=41653