Saturday, June 19, 2010

Tech-Ed 2010 Session Summaries 1: Performance Enhancements in the Windows 7/Windows 2008 Server R2 Kernel

At Tech-Ed, one of the best-attended sessions is always the high-level presentations by Mark Russinovich. This year was no exception, and he fielded no less than four strong topics, including one, improbably, hosted with Mark Minasi (more on that in a later post). Below I summarize a session (WCL404) which repeats and expands on one from 2009: kernel changes in Windows 7 and Windows Server 2008, R2.

One of the most wide and devastating criticisms directed against MS Vista was its slow performance, even on modern machines. Microsoft took this torrent of censure to heart and re-engineered Vista dramatically in its service packs, achieving substantial improvements in performance across the board, often with great ingenuity. These efforts proactively continued with the development for Windows Server 2008 R2 (W2K8SR2 hereafter); Mark explored them in detail in this session. He divided his presentation topically, focusing on the key areas of performance improvement for the W7/W2K8SR2 kernel. Again, this is a summary, so parallels of wording and presentation are inevitable.

1. Performance

One big first is that this is the first client version to be smaller rather than larger. Some 400 changes were made to improve performance: over-aggressive caching, buffer management inefficiencies, etc. This in turn led to 10-25% memory improvement, which allowed more resources for important uses. One major aspect of this was re-architecting the desktop window manager (DWM), whose memory footprint was reduced 50%. In addition, the registry was read into a paged pool rather than memory.

Physical memory is also managed better. The system cache, paged pool, and system code all receive separate, distinct working sets rather than drawing on a shared pool of memory. This prevented runaway processes in one from affecting the other two, as with, for example, copying large files.

Perftrack- 300 key user operations were identified, such as booting, using the Start menu, opening the Control Panel, etc. Then benchmarks were set for each, and the processes were re-worked repeatedly until substantial improvements were achieved.

2. Power Efficiency

This matters to consumers watching movies on the notebooks on planes, and to datacenter admins dreading the monthly power bill. The W2K8SR2/W7 kernel team found that a 10% decrease in CPU usage led to a notebook getting a 10% boost in its battery life. Related to this with servers is the innovation of core-parking. In core-parking, rather than evenly distributing the workload across all the cores of all a system’s processors, the power manager tells the schedule to move threads from one core to another. There are some caveats to this- NUMA nodes always have a core unparked. Targeted interrupts cores are unaffected. Microsoft also worked with Intel and AMD to better optimize function of the processors.

3. Reliability

Fault Tolerant Heap. Heap corruption (processes corrupting their memory buffer allocation) accounted for 15% of all user-mode crashes, and 30% of user-mode crashes during shutdown. Such phenomena are often due to programming by an application vendor, not MS, and are often hard to diagnose. The idea of the fault tolerant heap monitors for heap corrupting behavior, gives a few extra bytes, and applies such mitigations dynamically. The rules for this call for FTH to be invoked if an app process crashes four times in an hour in ntdll.dll. Russinovich compares this to putting a trampoline under a process, and gave a good demonstration.

Process Reflection. Detects processes which seem hung or to have leaked memory, and then creates a snapshot of these processes for analysis. This copy is then reviewed by a leak detection diagnostic.

4. Security

Virtual accounts for processes are a major improvement: accounts whose management and passwords are the role of the computer. They also have reduced privileges. (56) In addition, services sometimes require a network identity. Previously a domain account was the only solution for this, requiring administrative support with passwords. W2K8SR2 overcomes this with the idea of managed service accounts. Their password and SPN and managed by AD, and they can be configured by PowerShell. It can, though, be used on a single machine.

Bitlocker now allows full-volume encryption, and make it easier to start using it by always keeping 100 megs free. In addition, Bitlocker is now available as Bitlocker-to-Go for USB drives and removable storage. Russinovich gives a great example of how this works.

5. Native VHD Support

Allows orderly shutdown of VHD volumes, booting from them, and improved management of these volumes. This also contributes to faster provisioning and repurposing, and faster patching and rollback.

6. Scalability

This kernel release employs hyperthreading (also known as SMT, or Symmetric MultiThreading) to improve performance and scalability. This allows new versions of Windows to take better advantage of recent proliferation of cores in procs by, for instance, moving processes to idle cores in preference to less-in-use ones. Very significant performance improvements came from this; Mark gave one example from Windows 7 of a 23% increase in Windows Media Encoder over Vista SP1.

In Terminal Services, Dynamic Fair Share Scheduling (DFSS) has been introduced to prevent one connection’s app from hogging a disproportionate share of the resources through assigning budgets to sessions. Then, when a session exceeds its quota, its threads go to the idle-only queue. Good demo of this.

Bitmasks. The new kernel also recognizes the growing number of CPUs, and has re-architected its ability to support new apps which take advantage of as many LPs (logical processors) as are available. The new kernel divides the number of such LPs into up to four groups of up to 32 LPs. Then, by default, processes are affinitized so that all threads operate on the LPs in a single group. Other important changes include the removal of the Dave Cutler-era Dispatcher Lock, which serialized access to data structures. Instead, each object was protected by its own lock, and many operations were deemed to run lock-free.

Important parts of this material were presented last year; even so, this presentation was well-received, and one of the best-attended. It makes me look forward to the next edition of Inside Windows.