Pages

Wednesday, April 3, 2013

CloudNaas: A Cloud Networking Platform for Enterprise Applications

Here is the paper
The paper is not well written. It is all over the place and is missing many details. The introduction moves in circles telling the same thing again and again. It leaves you confused about what problem the paper is trying to solve. It is only on page 4 when the paper talks about the implementation that you understand what the paper's about.

I have rewritten the premise of the paper in my words:

Network admin has an enterprise network. He wants to migrate it to the cloud but he is reluctant because clouds don't support MBs and private IPs (absent or limited control available to customers to configure the network). The cloud provider need to reduce the need to rewrite applications when moving them to the cloud. We should allow customers to use private IP addresses that they were using in their enterprise networks. The paper says another key issue is to allow broadcasts - I don't think this is an issue as broadcasts are generally bad and should be avoided unless absolutely necessary. There are ways for getting around ARP and DHCP broadcasts by making them unicasts to the network controller/directory service. Also, you can replace broadcasts with IP multicasts.

Now, CloudNaaS comes into the picture: It asks the admin for the network policy and tries to satisfy the needs of this logical topology using the cloud's physical resources.


CloudNaaS is an under the cloud implementation i.e. it is implemented by the cloud provider
Allow customer to specify network requirements (nothing new)
1) Tenants specify bandwidth requirements for applications hosted in the cloud.
2) Specify MB traversal

Components:

Cloud controller
Takes user network policy and
Network controller
Monitor and configure the network devices
Decide optimal placement of VMs in the cloud to satisfy network policies - Monitor and change to meet policies

Drawbacks
1) Multiple VMs can have the same private IP address. How do you distinguish between them? Each VM has a public and private IP address and the software switch inside the hypervisor rewrites the IP address of incoming and outgoing packets (How does it work with tunneling). On migration these rules are updated. So, communication still happens using public IP address. To allow MB traversal, CloudNaaS installs rules in software switches of source VM and subsequent MBs which simply tunnel the packet through the policy path. This works but now you will need policy rules at each hop which means changing policies is inflexible, you use more switch memory, installing certain policies is infeasible. You still have to identify the previous hop somehow but the paper does not talk about it.
2) This assumes the MB is a layer 3 device. How do I handle transparent firewalls/NAT? These devices will now have to be installed at choke points (the paper says they provide NAT service at cloud gateway) within the data center network which has disadvantages as explained in PLayer paper.
3) The software switch attached to the MB has to know the privateIP<-->public IP translation for all the hosts of the tenant. Migration is still not seamless. How does a LB work? It can't use DSR/DNAT/Tunneling
4) Bandwidth reservation schemes are not very useful in a cloud with high churn rate. They are either overly conservative and lead to low network utilization or overly lenient and have bad isolation as a result.
5) Not super clear about the communication matrix business and how is VM placement decided using that?
6) The paper does not talk about how the core network topology is set up. It implicitly assumes it somehow works in a scalable manner. The policies are pushed to the edge switches in the hypervisors. The paper suggests the optimization of breaking up the address space and allocating each subnet to servers attached to a ToR. This is exactly what a traditional hierarchical data center looks like (doesn't allow seamless VM migration). Also, migration changes the public IP address of a host and this requires major rewriting of rules on all switches.

Things learnt from the paper
The hardware Openflow switches use expensive and power hungry TCAMs while the software switches use DRAM. As a result the software switch can store many more rules. So, try and push more intelligence to the edge of the network while treating the core as a simple and dumb packet pusher.


No comments:

Post a Comment