Wednesday, July 21, 2010

Tech-Ed 2010 Session Summaries 5: Benjamin Armstrong on Hyper-V and Dynamic Memory

In previous years, Jeff Woolsey has been one of the leading Microsoft speakers on Hyper-V; this year he was absent, but his place was filled by Benjamin Armstrong, feature manager for the Hyper-V team. He was clearly pumped about the new release and its major new feature, dynamic memory, as came across in this clear, focused, productive presentation.


Why care about this? Part of this comes from the fact that we seldom know exactly how much RAM an app will need, since this depends so much on factors such as number of usage, hours of use, types of use, etc. And for VMs this is even more so. Armstrong gave some good examples of “best practices” for this. And since memory can be the biggest single budget item in a server quote, it merits attention and thoughtful calculation. Given this centrality of RAM, for Armstrong and his team, one pivotal concern was optimizing its use. And with this in mind, they gave extended thought to:


- Scale of variation in memory utilization for workloads

- How well the systems were sized initially

- Something that would work for both servers and desktops

- Minimizing system overload

- Satisfying users


How Does it Work? What Does it Do? They are able to add memory in realtime. One difficulty they discovered with this was that on server boot, the BIOS reports memory slots as empty or full, and one early attempt led to changing the BIOS to report the presence of 4096 memory slots on the motherboard. An early build, needless to say. Early efforts also led to weird bugs that had no parallels in the real world. He gave the example of “After adding 60 sticks of memory in half an hour, the 61st was not correctly enumerated.” With removing memory, other issues emerged, with the result that for this first release they are using a tactic of ballooning, rather than the exact reverse of adding.


Naturally, VMware and its memory overcommit tactic entered this discussion. Here Armstrong made the point that the reward of such approaches depended on how well you sussed out your memory approach at the outset. So, someone with a penchant for four-gig RAM VMs would see vast returns on this, whereas a tighter-fisted admin would not.


The system requirements for using dynamic memory are predictable- SP1 for W2K8S R2 SP1. It will host Server 2003, 2008, and 2008 R2, 32 and 64-bit versions (where applicable). In addition, Vista and Win7, 32 and 64-bit, Enterprise and Ultimate Editions will work with this.


Next he talked about the Dynamic Memory Architecture. Key points here were the memory balancer, operating in user mode, which balances the needs of various VMs for memory. He illustrated how this would work, with the memory manager able to take memory from one VM and commit it to another. The main settings it is guided by are the startup memory and the max memory. The startup memory represents the minimum needed to run; 512 megs is the default for this. The max value is an astonishingly generous 64 gigs!


Availability and Priority are also discussed. Priority is assigned a value, 1 to 10,000, with the default being 5000. Memory buffer was briefly covered.


Armstrong next presented a demo to show dynamic memory in action. This was a mixed success. Soon enough it was followed by a discussion of changes to Root Reserve. This refers to memory always reserved for the parent partition but sometimes this reserve is jeopardized by “rampaging VMs”. The new Hyper-V manages this more smoothly, albeit at the cost of some memory not being available for VMs.


NUMA Management was also changed. Armstrong mercifully explained what NUMA systems are- an architecture for memory management. Without it, the cores on a motherboard can generate contention as they reach for memory and board resources. In this, the board is split up into logical groups, or “NUMA Nodes”. Each has its own bus and memory. There is also a backchannel which allows NUMA-ignorant software to still work. However, the back channel’s speed varies with systems, with the result that performance can range from adequate to awful.


Currently, Hyper-V tries to get all memory from a single NUMA node; when it cannot, it then spans nodes. We can specify which NUMA nodes are used, but this is only available through the WMI, not the GUI for VM management. This of course reduces the spanning, and the attendant chance of hits to performance. The concern is that with machines not dynamically adjusting their memory, they might wind up spanning NUMA nodes, which would have dire consequences for performance. Therefore, SP1 will allow the disabling of node spanning. This can be done from the GUI. The result is that the computer seems to have more than one computer within it as the NUMA nodes are separated.


Memory Techniques/Competition. Here Armstrong explained the philosophical differences between MS and VMWare on this subject. The first difference pertains to knowing system performance. In the course of architecting memory overcommit, VMWare saw uncertainty in the reliability of counters from guest system memory use, and chose instead to construct a black-box model. Microsoft obviously had no such difficulty, and could build on a strong understanding of the guest OS inner workings. The second thing is that VMWare started with swapping for memory management, and ever since has been working to make this work better. MS started differently by setting min and max, and letting the platform decide.


This is one reason for the differences in dynamic memory versus overcommit. He makes a distinction between oversubscription and over-commitment. Oversubscription, as Armstrong describes it, is selling more tickets for an airline flight than you have seats; overcommitment is when everyone shows up.


Why use the dynamic add/remove model? For VMWare, transparent page sharing is the "golden-haired child" of their memory management. They base this on external inspection of the VM's memory management. They hash the memory pages, and then compare them, looking for common bits, and consolidating from there. Microsoft studied this and decided to go down the add/remove memory route because the hash creation and review process is quite processor-intensive. The hash process for VMWare can be quite slow, and can take hours to complete. So it is not dynamic or responsive. Furthermore, recently Intel and AMD system architectures have introduced (much) large pages- 2 megs instead of 4K. This was engineered to help with hardware-related virtualization memory management, among other reasons. Obviously the tremendously complicates the effort of hash comparison. And it is no surprise that the most recent ESX turns this transparent page-sharing off since it offers few benefits.


Second-Level Paging, or paging at the virtualization layer. There can be many problems with this, as Jeff Woolsey has already elaborated. The problem is that when virtualization is done at this layer, the problems come from the fact that when you do this you do not have the context of the guest OS, which knows what everything is doing. External inspection cannot do this. VMware's technique for identifying memory pages to share is random selection, which Armstrong says is not a bad choice because it gets you 75% of the way to optimal, and has no CPU overhead. The biggest advantage of this approach is that it always works.


Armstrong referred briefly to other techniques: Virtual Box talked about memory fusion, corresponding to MS's guest-directed page-sharing. VMWare has spoken of memory compression; this evoked grins at MS because it's such an old idea. The key message here is that this story is not over, and efforts continue to find new tools and techniques. Hyper-V is still a work in progress, with hopes and plans for the future. And obviously it helps to be in the same building as the Windows kernel team.


Overall, clear and informative. One of the things I appreciated the most was the differentiation not merely of technics and tactics, but the juxtaposition of different ways of looking at memory and its management. Worth seeking out and watching.


0 Comments:

Post a Comment

<< Home