Futuregrid Mooc - Ipop unit 6 - Ipop architecture - Ipop address spaces

Hello and welcome to Unit 6 of our course, where we're going to cover the use of address spaces in IPOP to support multiple virtual networks sharing the same peer-to-peer overlay. [pause] So the goal of this use of address space is in the virtual network to support this idea... this feature where you can have multiple virtual networks sharing the same peer-to-peer overlay. So this example illustrates a scenario where we have three users with three different virtual networks, VN1, 2, and 3. And they are sharing the same underlying P2P overlay. So the machines in blue belonging to virtual network 3, for example, can communicate within their virtual networks. They can have IP addresses that are... isolated from... the addresses that are allocated to other virtual networks, and are able to communicate without worrying about collisions with the namespace or the address space of the other virtual networks. So... how is this supported in the context of IPOP? [pause] The idea is we introduce this abstraction of a namespace, which is a user-provided string that uniquely identifies a virtual network that is multiplexing a P2P overlay. And then every IPOP node is bound to a namespace. Now one thing that's important to keep in mind is that this namespace is not seen by applications. The applications only see the IP address that's bound to a virtual network interface. They are part of the configuration of an IPOP router when it is started up. And it's provided by the user who creates this virtual network. So then the key primitive that's used to help in the mapping of addresses to... peer-to-peer identifiers is a lookup of the IPOPid, which is, as you recall, the unique identifier of every node in the P2P network. So to perform this lookup we need two parameters. One is the namespace and the other is the IP address within the virtual network. For example, if we look at... a scenario where virtual network 3, VN3, has two nodes... with addresses that have been allocated... IP addresses 10.10.1.5 and 10.10.1.6. So when 10.10.1.5 needs to communicate with 10.10.1.6 in VN3, the first thing that it does is a lookup in the first row here of the slide. And this lookup should yield the value n3, which is the IPOP identifier bound to the machine with that IP address on that namespace. And conversely if 10.10.1.6 wants to communicate with 1.5, it looks up a... an address space on the virtual network's namespace and the IP address. Now because we used different unique virtual network namespaces, and it's possible for different virtual networks to have nodes which had the same virtual IP address. So in this example, virtual network 2 has a node with also the address 10.10.1.6, but that is mapped to a different IPOP identifier, n5. And that's because it belongs to a different namespace. Now to accomplish a lookup, what you basically need is a table. And a hash table would be one approach of implementing this lookup. But we don't want to have a centralized server that's responsible for performing these lookups. So the goal is to perform the lookup in a decentralized way. And to do this, we leverage the fact that we can use a distributed hash table... which is a data structure that's... supported by a structured peer-to-peer system like IPOP. So in a... distributed hash table works similarly to a regular hash table, but its entries are distributed across peers in the network. And to look up this table we're going to use a combination of the namespace and the virtual IP address. And the value that we store in this table is the 160-bit IPOP identifier. [pause] So in a distributed hash table, the key... that is used to look up and store data is hashed to a value that's... fits within the peer-to-peer identifier address space. Because of IPOP, it's 160 bits, and we use the [unknown] hash function to compute a hash of the key that's being inserted. And that information is stored in the nodes that are left and right to this identifier, to this hashed value. And to support additional redundancy, we can... append unique values to the key and hash the key multiple times. Let's say if you hash four times, you'd have a total of eight replicas of a DHT entry, because you store in two nodes around the value of each hashed entry, and if you have 'k' times a hash, and you have k times 2, or four times two entries in the DHT. But the basic primitives of distributed hash table, or the DHT, are very simple. It's a PUT of a key/value pair which translates into sending a message to a node, which is the hash of the key. And again, potentially appending values and recomputing the hash multiple times for more redundancy. And the value is, in this particular case, our IPOPid. And to get... or to read... look up a value from the DHT, is also a simple operation. You provide the key and a return will be the value associated with that key. Or nothing, if there's no value associated with the key. [pause] So in general, in the DHT, you would have, for example, node n1 in this case holding a key-value pair of "foo" as a key and "bar" as the value. So the first thing we could do is append a unique value to these hash... to these key, and computing a hash of that key. Now in this example, let's say that a hash of this value's 107. Now there's no node in this network that has exactly the value 107. So storing that message... storing that value bar associated with the key "foo"... ends up being a message that is delivered to both neighbors of this value, 107, which in this example would be n6, which has id 101, and n7 has id 205. And again, recall that these identifiers are ordered... in increasing ascending order. So once you look up... the value that's immediately smaller than 107 and immediately larger, you know that you can stop routing this message and you can store the information on these three nodes. And again, to look up a value for this "foo" key, all we need to do is, again, compute the hash and send a message for these... identify and obtain the result. In this case it could come from either n6 or n7, they're both storing this information. [pause] So that's a general behavior of a DHT. In the case of IPOP, the specific key and values that are used are as follows: so the key is a concatenation of a keyword, 'dhcp'... with a namespace in the IP address separated by colons. And the value is a string that has, again, a keyword, 'brunet:node', followed by... a 160-bit IPOPid encoded in ASCII format. So that's the long string that you see in green in this example. [pause] So having this basic capability of looking up addresses also allows us to implement... [pause] flexible ways of managing the allocation of the IP addresses. And IPOP supports two different approaches. One is dynamic assignment, where we have a DHCP proxy that understands DHCP messages that come... a DHCP request that comes in the IPOP tap device, and... [pause] provides the functionality of dynamic address assignment without having a single centralized server. So how this is done is the node itself generates an IP address at random within the range of the DHCP configuration. And then it attempts to store in the distributed hash table. Now it ensures that a majority plus one of replicas in the DHT... let's say if you have eight replicas total, you expect at least five... replies to acknowledge that the value has been inserted before binding the address to the interface. If it's not possible to bind... to insert this map in the DHT, the DHCP proxy will regenerate another address and retry, and continue this process until it finds a value that has not been allocated to any other node. It's also possible to support static addresses by inserting them into the DHT mapping. And in both cases it's important to keep in mind that the DHT stores a value only for a certain amount of time, a 'time to live', or TTL. So these values have to be refreshed in the DHT or they... these mappings expire and the node... will not be addressable anymore. Nodes in the IPOP network can be moved across physical links and maintain their IPOP identifier. For example, this is useful if you're migrating virtual machines from one data center to another. They can have a different physical address on the physical network, but IPOP will maintain the virtual IP address at the destination. The key idea is that the IPOP identifier remains the same, and also the mapping between the virtual IP address and the IPOPid can remain the same in a distributed hash table. What changes is that at the destination, the node will re-initiate the process of creating edges with its neighbors, if edges have been... torn down during the migration. So the node will go through the process we saw in the previous unit, learn the URIs for itself, the endpoints that it has in the new network, begin creating edges with its left and right neighbor and the far edges, and eventually reconnect in... becoming again routable on the virtual network. All of this, again, without losing connectivity, without losing the IP address allocation in the process. [pause] So that concludes this unit, and in the next unit we're going to look at some of the performance optimizations in the system.