AffinityMask: August 2010

Thursday, August 26, 2010

Change of Venue

Thanks very much for visiting. I am changing the location for this blog to affinitymask.wordpress.com, and all future posts after September 1, 2010, will be at this new location. Hope to see you there.

Sunday, August 22, 2010

Tech-Ed-2009-Peschka-Capacity Planning with SharePoint 2007

One major consideration for a successful SharePoint deployment is knowing how much to request in the way of resources. In some cases this can be by guess and by golly. However there are some effective principles to use in planning and testing capacity. In this rich, deep presentation, Steve Peschke covers them. This is an exceptionally valuable session for anyone looking to make their SharePoint deployment a success. It’s the best kind of presentation: meat and potatoes all the way through.

How to determine the necessary capacity for planned expansion/new deployments?

What’s the goal of your testing?

Most important for most of us is RPS (Requests per Second). We might also be interested in a specific operation, in terms of the way it affects the farm or the way the farm affects it. Perhaps even just a single page, in terms of its TTLB (time to last byte). Usually capacity planning testing means verifying that an existing approach/plan is viable, or it means proving a concept.

Once we know what we want to do, we know what we need to do: what we want to measure. Most tests are based on RPS, since so many things are based on it. However TTLB is also crucial, and many tests might include both. Peschke gives the example of a farm that needs to satisfy 100 RPS, with pages loading within 5 seconds.

Also: Crawl time: how long it takes and how much material there is to index. Document indexing rate is more complicated.

Determining Throughput Requirements, or RPS

Can be complicated. It must reflect not necessarily what can theoretically be done, but what your farm’s customers need. Here he makes on of his most important points: the number of users means nothing. So, naturally enough, ascertaining this is imperative, and Peschke offers several ways:

- Historical data, from IIS logs and Log Parser, Web Trends, etc.

- Start with number of users, divide into profiles, multiply the number of users by the number of operations per user profile, and base you RPS on the peak concurrency of this.

Peschke gives an example. It’s such a great example of his approach and the merits of this presentation that I reproduce it below.

1. Contoso has 80,000 users; during any 8 hours, up to 40,000 may be at work. So we have 80,000 users, 40,000 active, and concurrency of 5% to 10%. Concurrency means active at the same time, and can be estimated.

2. Of these, 10% are light, 70% are typical, 15% are heavy, and 5% are extreme. This is a best guess.

3. Let’s say light users do 20 RPH (requests per hour). Collectively, this means 80,000 RPH.

4. Let’s say typical users do 36 RPH (requests per hour). Collectively, this means 1,008,000 RPH.

5. Let’s say heavy users do 60 RPH (requests per hour). Collectively, this means 360,000 RPH.

6. Let’s say extreme users do 120 RPH (requests per hour). Collectively, this means 240,000 RPH.

7. This means 1,688,000 RPH, or 469 RPS for these 40,000 users.

8. When we factor in peak concurrency (10%), this comes to 46.9 RPS. That’s the target we need for this farm.

Excellent, logical analysis. Now that we have some idea of throughput, we need to consider what test mixes to prepare. What activities should these test mixes reflect? Historical information, if available, helps with this; otherwise Peschke refers us to test mixes in “Planning for Software Boundary” documents on TechNet. Yet inevitably some educated guesses will be necessary.

Once we know what our throughput is, and we know what sort of transactional mix is needed, we can design a test environment. Peschke notes that few people invest the necessary time in this; he recommends two months, including three weeks in a test lab.

The test environment needs to reflect crucial infrastructure factors. Including AD: AD (How many forests and domains? How many user accounts and groups needed?); Peschke mentioned he aims for one DC for every 3-4 web front ends. How will load-balancing be implemented for this?

Hardware must also be considered. A Visual Studio Team Test Controller will be needed, several VSST agents, as well as a separate SQL server (so it does not impact the MOSS SQL server which is, after all, one of the main choke points). He also recommended turning off anti-virus software on the load test controller and its agents.

Certain configuration changes should also be made, such as stopping the timer and admin services, as well as profile imports and crawls. All pages should be published, and none should be checked out. A wide variety of pages should be included. He also points out that each Write scenario will change the content database, and this database should be restored from backup after a test run which includes writes; this assures a consistent, uniform baseline. Again, he recommends stopping anti-virus software unless you want to measure performance when using the SharePoint-integrated anti-virus capability.

Other, account-related tasks to consider include how many users and in what roles, how these will be populated, what audiences/profile imports/search content sources will be needed (if any), and whether crawl content or profiles will need to be imported for testing (he gave the example of one crawl and profile import which took three days to complete).

More on Test Design

Sample Data are a major stumbling block for many implementations. Also, the sample content should be varied, not merely numerous iterations of a single document. The search query testing results will be ridiculous. Using a backup of an existing farm is the best option for sample data. Peschke recommended tools from www.codeplex.com /sptdatapop for populating test environment data. Even so, he noted in his experience that you will almost always have to write some of your own tools for this.

Web Tests Best Practices. A mixed bag of good insights. The system behaves differently for different roles, so a variety of such roles should figure in testing; do not simply use farm admin for everything. Test with model client apps such as RSS, Outlook, etc. In addition, don’t neglect time-based operations, such as Outlook syncs. Validation rules should also be used. Test the web test itself; does it work for all your URLs, and does it work for all users? He also recommended setting parse dependent requests to no.

Load Test Best Practices. Make sure your planned test reflects a good mix of tasks. Restore from a backup before each test run, remembering to defrag the indices. Use iisreset. Remember a warm-up period, since the first test after a restore will invoke many things which are not characteristic of regular operations. He briefly discussed “think times”, mainly to say that such user behavior is almost impossible to accurately model.

Sample Tests and Data Population. Peschke again referred to www.codeplex.com /sptdatapop as a great resource for many basic operations tests. A great resource for sample tests top be adapted for your own use. Other tools: a script to create users in AD, a tool to scrape a list of webs, lists, libraries, list items, etc., for use in webtests, tools to create webs, lists, libraries, list items, etc., a tool to create My Sites. These are fairly generic tasks.

Demo of a Test. Peschke next crossed his fingers and did a demo to add a web test and run a load test. What struck me most was that this testing was all done from within Visual Studio, in a very straightforward manner. Even so, it will take a few viewings, preferably with the software running on your own machine, for this to all make sense.

Questions to Consider. Good wisdom here about testing. Always assume there is a bottleneck, and ask yourself where this might be, and how it might be alleviated. And ask if the bottleneck is in an unexpected location. Is the throughput spiky? Are there many errors displaying? And be as concerned about tests which are extremely good as about tests which are extremely bad.

Post-Testing Investigation

Investigation Techniques: There were more ideas here than the standard troubleshooting ideas, and this reflected real, painful experience. Make sure the configuration is correct for hardware, the load balancer, SP settings, etc. Try running just portions of the test to divide and conquer. Simplify the farm topology. Isolate the workloads/operations/pages. Use Read scenarios rather than read/write ones. Avoid taking things for granted. This also helps illustrate why more time will be needed for testing than you might expect.

Investigating with Visual Studio Team Tests. Peschke presented a demo of this. He gave examples of table and graph views for seeking RPS and TTLB patterns and analyzing perf counters. He also gave recommendations for perf counters to focus on; this was fairly standard for machines (proc, memory, disk IO, network) and counters from VSTT default sets, adding SharePoint, Search, and Excel specific ones. One good example of this showed WFE CPU dropping while SQL CPU surged and the SQL Lock Wait Time went up, meaning that the bottleneck in this case was SQL. Peschke gave some good examples of reading test results and root-causing them.

Scaling Points. A loooong list of things to consider when scaling once capacity is understood. Too many to simply list here; I’ll include some highlights. Data files and log files on separate spindles. Does custom code need to be re-visited? For the various databases, should additional data files be added, up to one such file per proc core, and spread across multiple drives. The SQL disk design is critical for high-user or high-write implementations. Perhaps have separate farms: one just for My Sites, one just for publishing, one just for search, etc. Be sure to run on x64. In virtual environments, are VMs optimally configured? Are you monitoring the object cache and sized accordingly? Does your farm have two VLANs, one for page traffics, and one for a backend channel for interserver communication and SQL? Is there a dedicated WFE for indexing? There were more, but these seemed the most germane.

Conclusion: Exceptional granularity and depth, goes far beyond mere scenarios or demos. Worth absorbing, and worth coming back to in the future, regardless of which version of SharePoint you deploy. This is why we go to Tech-Ed.

Sunday, August 15, 2010

Tech-Ed-2010-Vir204-Comparing-Hyper-V-and-VMware

This presentation is a good intro to the major differences between Hyper-V and VMware. That being said, the materials were more polished that the presenter. Microsoft has a long history of annihilating competitors, and can enter almost any market and immediately become a dominant player. Yet in this presentation, Jason Fulenchek seemed quite tentative, and reluctant to go toe to toe with VMware. This is quite different from some presentations last year and much less go-for-the-throat than some others. In fact, I was sitting next to a VMware rep who was texting jibes back and forth with the other VMware folks in the audience. Microsoft’s case against VMware can be very strong, but you could not see it very clearly in this presentation. Parenthetically, it was made more technically in Armstrong’s “Hyper-V and Dynamic Memory” presentation.

This is a 200-level presentation, so the presentation materials were quite polished, and seemed based on ones from technical marketing or sales engineering. The price for this was not much time spent at a very high level.

The strategy seems to have been to emphasize the wide and broad nature of Microsoft’s platform environments, naturally looking to contrast this with VMware’s much more confined view of platforms.

Part One: Details, Details

Main benefits: familiarity, cost, ease of implementation, and consistency. Important also is the system monitoring capability for VMs: integrated, and offering intra-VM monitoring and optimization, with similar monitoring capabilities for applications and services as well. I would add the large ecosystem of community for MS products in general and superior documentation.

Unified Management. Much time went into this. Points made were the emphasis by users on applications, not VMs, an emphasis said to be more effectively made through Microsoft’s approach. Most interesting here was the availability of “in-guest management and monitoring”, allowing you to integrate virtualization into existing processes, reducing start-up and operating costs. MS VM Manager 2008 R2 is completely integrated into System Center Operations Manager 2007 R2. This allows VMs to be monitored and managed just like other machines, and also allows superior monitoring of applications, processes, and services.

At this point he launched into a detailed, list-based comparison of vSphere (various editions) and Hyper-V in terms of features and costs. This was interesting in principle, and worth reviewing at one’s leisure, but unsuitable for a presentation. This is not to say that there’s no value here; it’s just that you have to look and listen pretty hard to get a clean handle on what the key differences are. The presentation, for an IT pro crowd, would have made more sense if it had focused with more depth on a few key themes.

Part Two: Roger Johnson from Crutchfield Corp.

This began with a video explaining the benefits of MS virtualization for Crutchfield Corporation, a consumer electronics retailer. They had started out as a VMware deployment, but grew displeased with the perceived high cost, and reconsidered. Eventually they switched to Hyper-V and related management technologies, saving about half a million dollars by doing so. Their corporate datacenter now runs as 77% virtualized (not bad), with 5 virtual hosts and an average VM density of 45:1, including all dev and test environments. Johnson described substantial savings versus VMware, on the scale of 3:1.

Part Three: Duking It Out

To the extent the gloves come off, it does not happen until 50 minutes into the presentation. In a slide titled “Responding to FUD”, Fulenchek invests ten (10) minutes in busting some myths.

Software footprint is not synonymous with security. The admittedly small footprint of ESXi does not equate with invulnerability, and does not mean fewer patches. Fulencheck invites people to compare the security track records of ESX and Windows Server 2008. He invites the audience to compare, but does not do this himself.
Numerous technical cavils are put forth against Windows. Fulencheck argues that solutions for these issues are all included within Windows itself. After all, companies willing to run Exchange, SQL, AD, CRM systems on Windows should be willing to consolidate dev and test machines on it. “Think about it. That’s all I’m asking” he concluded.
Hyper-V is just a role for Server 2008. Fulenchek argues this is a strength, not a weakness.
Some argue that VMware drivers are “harder”. How? Harder than what? In what way?
There are some things in vCenter which cannot be done in MS VM Manager. Fulencheck says the point of VM Manager is not to replace vCenter but to allow management of VMs within a Windows network, so these observations are off the point.
Memory management, often touted by VMware, relies on overcommitment. With Windows 2008 Server R2, Microsoft will be offering comparable memory management resources, and ones which rely on understanding the internals of VMs. Armstrong talks about this.

Fulencheck followed this with some demos, which were reasonably successful, and showed the impressive resources available for managing VMs, in particularly (for me), being able to manage apps running on particular servers from within Systems Center Operations Manager 2007. This can be contrasted with the “black-box” (as another MS presenter described it) approach used by VMware in VM management. Other features he touched on seemed less revolutionary: “storage migration” which parallels storage VMotion, and volume re-sizes which can be done without needing storage VMotion at all.

So there is definitely a case to be made that Microsoft’s approach to virtualization is quite competitive with VMware, not least because anyone with Server 2008 already has everything needed to try this out. However despite the strength of this case, the presentation lacked force. More focus on these key points of comparison would have made this more effective. So would having done more with the Crutchfield material, which was presented and then hardly utilized at all. Oddly, the strongest case for Microsoft against VMware was made in other presentations.

Tuesday, August 10, 2010

Tech-Ed-2010 Session Summaries 9: Laura Chappell on Corporate Espionage

Broadly speaking, there are two types of presenters at Tech-Ed. There are the meticulously outlined and powerpointed, clearly rehearsed presentations. Then there are the seemingly wing-it presentations, where the presenter has few slides, and often laughs it off, preferring to cram demos into the session. The former goes by practice, the latter by passion. Mark Minasi or Jeff Woolsey exemplify the former, and some of Steve Riley’s later presentations represent the latter.

The difficult thing is, six months after the presentation, the video and the powerpoint are usually all we have to rely on. And if all you have is an 80-minute presentation, it’s unnecessarily difficult to find what you need for your work, and you often have to go almost frame by frame to see the details. Maybe you enjoy slow-motion spelunking. Maybe I don’t.

Laura Chappell’s presentation, ostensibly on corporate espionage splits into two halves. The first was narrative, the second was technical. Throughout she tossed in insights, names, sites, and suggestions.

The first part was comprised of five case studies. The first one was extremely illuminating; all were worth hearing:

1. A company had outsourced some development work to India. When they received the code and put on their servers, it made a connection to a website called “five knives”. Then massive outbound traffic started. The outsourcing company had put in code that searched through the entire drive looking for any documents with the words “agreement”, “signature”, “title”, etc., stealing all the contracts. Nothing was encrypted, and all could be seen in plaintext in transit. Incidentally, this story alone was worth the time investment in this presentation. It would have been interesting to learn its outcome.

2. A company planned to release a new cell phone product, expecting it to be a cash cow for the next few years. They decided to outsource some production to India. The product manager was sent to India bearing a single hard drive with all the product plans. He reported back that the drive had been lost. Indian law enforcement was of no assistance. Eventually a competitor released a comparable product first.

3. Outplacement/Separation. Some types of firewalls are verbose, and some are silent. Verbose firewalls in effect inform people their access has been blocked, which enables some to find circumventions. Laura gave the example of a company which discovered almost $200,000 had been siphoned off through a remote office which had been closed (management was not aware of this). The most successful such attacks take place before major holidays.

4. Lost Prototype. IPhone 4 prototype lost in a public place, was retrieved, and sold. Familiar story, with good discussion of the efforts made to carry out and thwart such attempts to steal products.

5. Blabla and Stephen Watt. Involved the theft of 170 million credit/debit card numbers. Watt designed a program that by itself accessed at least 45 million credit and debit card numbers. Other partners were also bought off and brought in.

There was some good insight here. For instance, she mentioned business aspects of breaches, from stuff like fireproof safes to how to interact with law enforcement, how to get in close with law enforcement in HTCIA (and its great value), and valuable products for host forensics (Access Data with their Forensic Toolkit and Guidance Software with Encase). Another example was a recommendation to search for “cybercrime DOJ forensics” and to look for an excellent DOJ paper for first responders.

The second half was non-narrative and more technical, built around a series of network traffic traces. Oddly, it seemed very impressionistic; watching her muse in Wireshark over which capture file to pull up made me wonder how much advance thought and planning had gone into this presentation. But I digress.

Chappell showed a site listing credit card numbers for sale, with related info (security codes, etc.). Next she showed Wireshark in action, taking a sample trace file available from the Wireshark book site. Also discussed NMAP (network scanning and discovery product); ZenMAP, a graphical add-on, lets you traceroute to the found locations and see the relations of devices on the internet. These are the kind of resources used for reconnaissance. Much practical advice on social engineering and traffic analysis techniques in general and Wireshark. For example, even handshake refusals are significant, since they frequently identify chatty firewalls. Chappell gave a good description of using taps to capture traffic for analysis, the logistics involved, what she looks for, and characteristics of troublesome packets.

The second half of the presentation seemed driven by the samples chosen on-the-fly, rather than having samples selected to illustrate planned, organized ideas. That’s not to say there was no insight or value. But it did seem wandering and unstructured. Re-listening and trying to outline her presentation, I was often left scratching my head. That being said, Chappell consistently gets high ratings for her Tech-Ed presentations; I’ll have to listen again to see what I’m missing.

Monday, August 02, 2010

Tech-Ed-2010 Session Summaries 8: Kai Axford et al on Trends in Cybercrime

Kai Axford’s security talks are consistently well-attended, and he has been presenting on security for a long time. This year he teamed with Allyn Lynd, an FBI special agent. Their talk was long on stories and, truth be told, short on insight relevant to the corporate IT security world, as we will see.

Let’s look at the case studies briefly.

Case Study #1: Phone Phreakers. Lynd does a good job of explaining the defective mentality of people who pull these pranks, and the trivial, trite application of occasional technical savviness. The pranks and scams were ingenious, but pointless. The most egregious was an example of “SWATTing”, an attempt to trick an emergency responder to dispatch a team when no emergency exists, usually through spoofing a phone number. Obviously this can lead to confusion, misunderstanding, loss, and injury. Interesting, but I’m not sure why this is central to a presentation to IT professionals.

Case Study #2: Trusted insider case, someone working the night shift at a hospital. Using his access card, he interfered with hospital HVAC operations, and publicly bragged about it, going so far as to post a youtube video about it (nonetheless, he denied doing it when arrested). Entertaining, but also a few lessons here: carefully screen even low-level staff, and audit, because there’s no such thing as trivial physical access to servers.

Lynd tangentially made the point that social network sites are major sources of risk for penetration and intelligence-gathering, and FBI agents do not create them. Identity theft and theft of trade secrets were also covered, albeit quite briefly with little depth. Then nearly twenty minutes for Q&A.

So afterwards, I was impressed by the intricacy and depth of malice out there, and obviously as a society, we have to wonder what’s going on with our culture that we’re producing sociopaths like the ones featured here. That being said, the relevance of the case study material for organizational IT was not all that strong, and at the end, it is difficult to see why some of this material was chosen. Clearer focus on the likely needs of the audience would have made this a more enlightening presentation.

AffinityMask