Pages

Tuesday, April 2, 2013

A Policy-aware Switching Layer for Data Centers

A good paper with good ideas about the role and position of middleboxes in data center topology. The premise of the paper is simple. Middleboxes are cause of lot of agony in data centers. 78% of downtime in data center is cause by misconfiguration. This is because of human involvement in implementing the middlebox policies in a data center. Typical methods used to ensure correct MB traversal are:
1) Overloading path selection mechanisms (adjusting weights in spanning tree protocol)
2) Put MBs at choke points
3) Place MBs at all possible choke points/ incorporate MB functionality in switches - costly.

Problems
1) Ad-hoc practices like overloading existing path selection mechanisms are hard to configure and maintain (think link/switch failure). Other practices: remove links to create choke points: complex and we lose fault tolerance, separate VLANs with MB at inter-connection points: violates efficiency, can no longer do seamless VM migration.
2) Inefficiency - packet should not traverse MBs it is not supposed to - this wastes resources and can also cause incorrect behavior.
3) Inflexibility - New instances of MB can be deployed to balance load (how to make this automatic) or policy changes in future. Doing this will require human intervention - leads to errors. 
4) If MB fails this leads to network partition (since MBs are placed at choke points)

What we need
1) Correctness - Traffic should traverse middleboxes in the sequence specified by the network administrator under all network conditions.
2) Flexibility - The sequences of middleboxes should be easily reconfigured as application requirements change.
3) Efficiency - Traffic should not traverse unnecessary middleboxes

Solution
1) Don't place MBs at choke points, instead attach them to pswitches and configure switches to implement MB policies i.e. Take MBs off the physical network path. Data centers have low network latency so sending packets to off-path MBs is not costly
2) Separate logical topology from physical topology or in other words separate policy from reachability.

Architecture
Consists of Policy controller, Middlebox controller, and pswitches

Policy controller accepts policies (from network admin) and converts it into pswitch rules

Policies are of the form:
[Start Location, Traffic selector (5-tuple)] --> sequence of MB types

Pswitch rules are of the form:
[Previous Hop, Traffic selector (5-tuple)] --> Next Hop

Middlebox controller monitors the liveness of MBs and informs pswitches about addition or failure of MBs.

Pswitches
perform three operations:
1) Identify the previous hop traversed by the frame - based on their source MAC addresses (this can cause problems if the MB modifies the source MAC address) or incoming interface.
2) Determine the next hop to be traversed by the frame
3) Forward the frame to its next hop - using L2 encapsulation - A redirected frame is encapsulated in a new Ethernet frame identified by a new EtherType code. The dst MAC is set to next MB or server and src MAC is set to original frame or last MB instance traversed. Preserving original MAC address is required for correctness by some MBs.
Forwarding also allows balancing load across multiple instances of same MB type (specify multiple next hops and use 5-tuple consistent hashing to distribute traffic among them) which is resilient to MB failure
If the next hop is a transparent device (say firewall) then we need to identify it somehow using a MAC and set up the dst MAC of packets going to it. Also, the next hop after the firewall will need to identify the previous hop as the firewall somehow. The paper gets around this by giving a fake MAC address to such devices which is registered with the MB controller when the device comes up for the first time. If the device is attached to a non-pswitch however, we then need a SrcMacRewriter in between the MB and the switch. This is a stateless device which inserts a special source MAC address that can uniquely identify the MB.

Drawbacks of the paper
1) The number of rules stored in pswitches does not scale very well. Each pswitch is essentially storing rules for all the policies implemented in the data center. This will not scale well if we move to public cloud with multiple tenants.
2) Leaves MB unmodified but modifies switches (cannot be adopted in its current form)
3) The paper does not talk much about how pswitches fit in to a traditional data center topology (it briefly says that they'll replace layer-2 switches). The examples they give are focused more on simple line topologies. There is a disconnect here.
4) Uses flooding-based L2-learning switch which does not scale well. Broadcasts become a problem.
5) Extra SrcMacRewriter boxes for transparent MBs.
6) Some MBs are stateful (firewalls), so we need to make the packets in both directions (forward and reverse) go through the same MB. The pswitches must make sure this condition is met.
7) If the policy changes in future, ensuring that all MBs switch to new policy simultaneously is not possible*. Some frames will violate middlebox traversal guarantee (i.e. they might not traverse either the previous or the new policy). This is an eventual consistency model and has security vulnerabilities. An end-to-end model will be better here. We just change the policy at the end and get stronger consistency guarantees.
8) Since, pswitches use consistent hashing, adding new MBs of same type causes re-assignment of existing flows. This is bad for stateful MBs. 

* It is possible to get the consistency guarantees even in this case but it requires essentially doubling the number of rules installed on the switches. (http://conferences.sigcomm.org/sigcomm/2012/paper/sigcomm/p323.pdf)

No comments:

Post a Comment