NetApp SAN Boot with Windows

October 6, 2011 Leave a comment

Thoughts and ideas concerning booting from SAN when we attempted this with our NetApp array.

  1. SAN Boot is fully supported by MSFT.  The first thing that happened is that we were told that SAN boot is not supported and we could not get Microsoft support for this configuration.  It turns out that this is not correct.  SAN boot is fully support by Microsoft along with HW partners like NetApp.  This TechNet article fully outlines MSFT’s support for SAN Boot:  http://support.microsoft.com/kb/305547
  2. Zoning is the #1 issue with SAN Boot on FC. In talking with NetApp support team (who were a HUGE help on this issue) the most common issue in SAN Boot from Fiber Channel is zoning.  Because zoning can be complex, this is the most likely cause of error.  We strongly recommend you check and then double-check your zoning before opening a support ticket.   In our case, the zoning for the server was correct, but we did make a zoning error on another server that we were able to correct on our own.
  3. Windows MPIO support at install time is limited. Because WinPE is not MPIO aware, there can be strange results when deploying against a LUN that is visible via multiple paths.  Keep in mind that at install time, Windows boots to boot.wim which is running WinPE instead of a full Windows install.  After the bits are copied locally, windows reboots to complete the install and at this time Windows is actually running.  Because of this, NetApp support team recommends having only one path to the LUN at install time and then adding paths later once Windows is up and running and you can enable Windows MPIO.
  4. AND YET…  MPIO is strongly recommended for SAN Boot.  Because a Windows host will blue screen if it’s boot lun dies, MPIO is strongly recommended for boot LUNs.  This is documented here:  http://technet.microsoft.com/en-us/library/cc786214(WS.10).aspx.  This can seem contradictory at first, but the bottom line is that MPIO is good, just add it later once Windows is up and running correctly.
  5. Yes, but what about Sysprep?  It turns out that MPIO is not supported for Sysprep based deployments:  http://support.microsoft.com/kb/2504934.  So, again you need to configure MPIO post deployment when you are deploying against sysprep’d images.  In the case of NetApp, we strongly recommend using Sysprep boot LUNS which you can then clone for new machines.  This significantly shortens deployment time as opposed to doing a full Windows install for each new host.
  6. It’s all about BIOS. Actually installing windows on a boot LUN does require that Windows Setup sees your target LUN as a valid install path.  This means that the server must report this drive as a valid install target or Setup will not let you select this disk.  For FC, you will need to enable the BIOS setting and select your target LUN in the HBA setup screen.  This process varies by vendor.  Then you need to make this disk your #1 boot target in your server’s BIOS.  Again, this process varies by manufacturer.  As noted above, you should only establish one path.  This includes dual port HBA’s.  Only configure one of the ports.
  7. Where’s my disk? Once you do all the above correctly, Setup may still refuse to show you the disk.  This could be because the correct driver is not present on the install media.  One way to fix this is to inject drivers into your Boot.WIM and Install.WIM.  This process is required if you are using WDS but optional if you are hand building a server from DVD or ISO.  In our case, we were building a single master image that we were going to Sysprep so we simply inserted the media and added the drivers manually during setup.
  8. OK, the disk is there, but I can’t install! One funny thing about Windows setup is that if you are installing from DVD, that DVD must be present to install (duh).  This is fine, unless you used the process above to insert the driver.  To do this, you need to remove the disk.  Then you get the drives and click install.  Windows will fail to install with a fairly obtuse error.  You need to remove the drivers DVD at this point and put the install DVD back in.  Seems obvious, but it took me a few minutes to figure out what was wrong the first time I tried it.

The Windows Host Utilities Installation and Setup Guide (http://now.netapp.com/NOW/knowledge/docs/hba/win/relwinhu53/pdfs/setup.pdf) has a very detailed description of Windows support for FC LUNs and there is a step by step process in this guide for configuring Boot LUNs.

Categories: netapp Tags: , ,

Avoid Server 2008 VMtools and VUM Woes with PowerCLI

October 3, 2011 Leave a comment

Last week after upgrading to vSphere 4.1U1 I noticed a lot of our guests did not have the proper VMtools installed. After a quick look I realized they were all Windows Server 2008 or 2008 R2 guests. The initial update for all of the guest was done using VUM, but the tools install was completely hung on all those systems.

Prepare to wait.

Apparently VUM triggers the “Interactive Services Dialog Detection” in Windows which looks like the message below.

Just login and click this on all your guest; you’ll be done by update 2.

Luckily there is an incredibly easy workaround. Using PowerCLI you can type 2 commands to update your VMtools install without triggering this nasty little message.

Get-VM | Update-Tools

If you don’t want the machines to reboot just add -NoReboot to the end of the Update-Tools command. Here is the syntax for pulling only Windows 2K8 guests in a cluster named QA.

Get-Cluster "QA" | Get-VM | where {$_.Guest -like "*Server 2008*"} | Update-Tools
Categories: VMware Tags: , ,

Set NetApp NFS Export Permissions for vSphere NFS Mounts

October 3, 2011 Leave a comment

One of the things missing from the NetApp VSC is the ability to set permissions on NFS exports when you add a host to an existing cluster.  If you have a lot of NFS datastores and don’t feel like setting permissions across NetApp arrays when you add a new host this should ease the pain.  Here are a few other use cases.

  1. You change a VMkernel IP for NFS traffic on a host
  2. You add a VMkernel IP for NFS traffic on a host
  3. You add a new host to a cluster
  4. You remove a host from a cluster
You’ll see removing host is a reason to run this script.  This is an important thing to note.  Running this script will replace existing NFS export permissions with those associated to the vCenter you run it against.  If you have any additional IP addresses assigned to the export they will get blown away by this script!  I also thought it would be cool to turn this into a form so I used PrimalForms to design a very simple front-end you can see below.
The DataONTAP PowerShell Toolkit 1.2 has support for networking, but we don’t have any systems running 7.3.3 or greater so I wasn’t able to make use of those cmdlets in this script.  Because of that I hard code the NetApp VIFs.  Additionally the way I parse the data is related to the length of the VIF used, and I have no support for VIFs of different lengths.  The VMkernel ports for NFS are found using a wildcard search for “NFS” in the port group name.
Don’t be intimidated by all this code, 99% of it was generated by PrimalForms in order to build the GUI.  Modify the 5 variables up front to add your NetApp VIFs and controller names.  You can make a simple batch file to call the script and run it with just a desktop icon to get a nice easy way to modify your NFS permissions on vSphere.  Thanks to @jasemccarty and @glnsize for help with finding the NFS mount in vSphere!

 

$array1VIF = "10.1.1.40", "10.1.1.41", "10.1.1.42", "10.1.1.43"
$array2VIF = "10.1.1.44", "10.1.1.45", "10.1.1.46", "10.1.1.47"

$array1Name = "netapp1"
$array2Name = "netapp2"

$vCenters = "server1", "server2"

$vifLength = $array1VIF[0].Length
$volStart = $vifLength + 9

#Generated Form Function
function GenerateForm {
##############################################################
# Code Generated By: SAPIEN Technologies PrimalForms
#(Community Edition) v1.0.8.0
# Generated On: 10/24/2010 9:34 PM
# Generated By: theselights.com
##############################################################

#region Import the Assemblies
[reflection.assembly]::loadwithpartialname("System.Windows.Forms") | Out-Null
[reflection.assembly]::loadwithpartialname("System.Drawing") | Out-Null
#endregion

#region Generated Form Objects
$form1 = New-Object System.Windows.Forms.Form
$cancelButton = New-Object System.Windows.Forms.Button
$okButton = New-Object System.Windows.Forms.Button
$groupBox1 = New-Object System.Windows.Forms.GroupBox
$vcenter = New-Object System.Windows.Forms.ComboBox
$groupBox2 = New-Object System.Windows.Forms.GroupBox
$nfsDatastores = New-Object System.Windows.Forms.ListBox
$InitialFormWindowState = New-Object System.Windows.Forms.FormWindowState
#endregion Generated Form Objects

#----------------------------------------------
#Generated Event Script Blocks
#----------------------------------------------
#Provide Custom Code for events specified in PrimalForms.
$handler_vcenter_DropDownClosed= 
{

Connect-VIServer $vcenter.SelectedItem

$nfsDS = get-datastore | where {$_.Type -eq "NFS"} | get-view | select Name,@{n="url";e={$_.summary.url}}
$nfsDS | % {$nfsDatastores.Items.Add($_.URL.substring($volStart)) | Out-Null }

}

$handler_vcenter_DropDown= 
{

$nfsDS | % {$nfsDatastores.Items.Remove($_.url.substring($volStart)) | Out-Null }
$nfsDatastores.Items.Remove("Select a Virtual Center to gather NFS mounts.")|Out-Null

}

$okButton_OnClick= 
{

$esxNFSIP = Get-VMHostNetworkAdapter -VMKernel | where {$_.PortGroupName -like "*NFS*"} | select IP -Unique
$esxNFSIP = $esxNFSIP | % {$_.IP}

Foreach ($ds in $nfsDS) {

 $nfsVIF = $ds.url.substring(8,$vifLength)
 $nfsMount = $ds.url.substring($volStart)
 $nfsName = $ds.name

 #//// Set permissions on source NFS exports

 $array1VIF | % { If ($_ -eq $nfsVIF) { $storageArray = $array1Name } }
 $array2VIF | % { If ($_ -eq $nfsVIF) { $storageArray = $array2Name } }

 Connect-NaController $storageArray

 Set-NaNfsExport $nfsMount -Persistent -ReadWrite $esxNFSIP -Root $esxNFSIP

 }

}

$cancelButton_OnClick= 
{

$form1.close()

}

$OnLoadForm_StateCorrection=
{#Correct the initial state of the form to prevent the .Net maximized form issue
 $form1.WindowState = $InitialFormWindowState
}

#----------------------------------------------
#region Generated Form Code
$form1.Text = "Set VMware NFS Permissions"
$form1.Name = "form1"
$form1.DataBindings.DefaultDataSourceUpdateMode = 0
$System_Drawing_Size = New-Object System.Drawing.Size
$System_Drawing_Size.Width = 344
$System_Drawing_Size.Height = 379
$form1.ClientSize = $System_Drawing_Size

$cancelButton.TabIndex = 5
$cancelButton.Name = "cancelButton"
$System_Drawing_Size = New-Object System.Drawing.Size
$System_Drawing_Size.Width = 103
$System_Drawing_Size.Height = 23
$cancelButton.Size = $System_Drawing_Size
$cancelButton.UseVisualStyleBackColor = $True

$cancelButton.Text = "Cancel"

$System_Drawing_Point = New-Object System.Drawing.Point
$System_Drawing_Point.X = 204
$System_Drawing_Point.Y = 328
$cancelButton.Location = $System_Drawing_Point
$cancelButton.DataBindings.DefaultDataSourceUpdateMode = 0
$cancelButton.add_Click($cancelButton_OnClick)

$form1.Controls.Add($cancelButton)

$okButton.TabIndex = 4
$okButton.Name = "okButton"
$System_Drawing_Size = New-Object System.Drawing.Size
$System_Drawing_Size.Width = 103
$System_Drawing_Size.Height = 23
$okButton.Size = $System_Drawing_Size
$okButton.UseVisualStyleBackColor = $True

$okButton.Text = "Set Permissions"

$System_Drawing_Point = New-Object System.Drawing.Point
$System_Drawing_Point.X = 45
$System_Drawing_Point.Y = 328
$okButton.Location = $System_Drawing_Point
$okButton.DataBindings.DefaultDataSourceUpdateMode = 0
$okButton.add_Click($okButton_OnClick)

$form1.Controls.Add($okButton)

$groupBox1.Name = "groupBox1"

$groupBox1.Text = "Virtual Center"
$System_Drawing_Size = New-Object System.Drawing.Size
$System_Drawing_Size.Width = 265
$System_Drawing_Size.Height = 94
$groupBox1.Size = $System_Drawing_Size
$System_Drawing_Point = New-Object System.Drawing.Point
$System_Drawing_Point.X = 42
$System_Drawing_Point.Y = 26
$groupBox1.Location = $System_Drawing_Point
$groupBox1.TabStop = $False
$groupBox1.TabIndex = 2
$groupBox1.DataBindings.DefaultDataSourceUpdateMode = 0

$form1.Controls.Add($groupBox1)
$vcenter.FormattingEnabled = $True
$System_Drawing_Size = New-Object System.Drawing.Size
$System_Drawing_Size.Width = 226
$System_Drawing_Size.Height = 21
$vcenter.Size = $System_Drawing_Size
$vcenter.DataBindings.DefaultDataSourceUpdateMode = 0
$vcenter.Name = "vcenter"
$vCenters | % {$vcenter.Items.Add($_) | out-null}
$System_Drawing_Point = New-Object System.Drawing.Point
$System_Drawing_Point.X = 19
$System_Drawing_Point.Y = 35
$vcenter.Location = $System_Drawing_Point
$vcenter.TabIndex = 0
$vcenter.add_DropDownClosed($handler_vcenter_DropDownClosed)
$vcenter.add_DropDown($handler_vcenter_DropDown)

$groupBox1.Controls.Add($vcenter)

$groupBox2.Name = "groupBox2"

$groupBox2.Text = "NFS Mounts"
$System_Drawing_Size = New-Object System.Drawing.Size
$System_Drawing_Size.Width = 262
$System_Drawing_Size.Height = 167
$groupBox2.Size = $System_Drawing_Size
$System_Drawing_Point = New-Object System.Drawing.Point
$System_Drawing_Point.X = 45
$System_Drawing_Point.Y = 141
$groupBox2.Location = $System_Drawing_Point
$groupBox2.TabStop = $False
$groupBox2.TabIndex = 3
$groupBox2.DataBindings.DefaultDataSourceUpdateMode = 0

$form1.Controls.Add($groupBox2)
$nfsDatastores.FormattingEnabled = $True
$System_Drawing_Size = New-Object System.Drawing.Size
$System_Drawing_Size.Width = 226
$System_Drawing_Size.Height = 134
$nfsDatastores.Size = $System_Drawing_Size
$nfsDatastores.DataBindings.DefaultDataSourceUpdateMode = 0
$nfsDatastores.Items.Add("Select a Virtual Center to gather NFS mounts.")|Out-Null
$nfsDatastores.HorizontalScrollbar = $True
$nfsDatastores.Name = "nfsDatastores"
$System_Drawing_Point = New-Object System.Drawing.Point
$System_Drawing_Point.X = 16
$System_Drawing_Point.Y = 24
$nfsDatastores.Location = $System_Drawing_Point
$nfsDatastores.TabIndex = 0

$groupBox2.Controls.Add($nfsDatastores)

#endregion Generated Form Code

#Save the initial state of the form
$InitialFormWindowState = $form1.WindowState
#Init the OnLoad event to correct the initial state of the form
$form1.add_Load($OnLoadForm_StateCorrection)
#Show the Form
$form1.ShowDialog()| Out-Null

} #End Function

#Call the Function
GenerateForm
Categories: netapp, VMware Tags: , ,

Safely Virtualize Oracle on NetApp, VMware, and UCS

October 3, 2011 Leave a comment

Virtualizing your Tier1 applications is one of the last hurdles on the way to a truly dynamic and flexible datacenter. Large Oracle databases almost always fall into that category. In the past a lot of the concern revolved around performance, but with faster hardware and support for larger and larger virtual machines this worry is starting to fade away. The lingering issue remains what is and what isn’t supported in a virtual environment from your software vendor?
Although Oracle has relaxed their stance on virtualization, they take the same approach that most do when it comes to support in virtual environments. Take for example the following excerpt from Oracle’s database support matrix: Oracle will provide support for issues that are known to occur on the native OS, or can be demonstrated not to be a result of running on the server virtualization software. Oracle may request that the problem be reproduced on the native hardware.

That last part is the killer for most companies. How could you quickly re-create a multi-terabyte database on physical hardware once it is virtualized if there is a problem? Luckily NetApp, VMware, and Cisco UCS provide a very elegant solution to address this issue. Let’s take a look at a simple diagram depicting a 10TB virtualized Oracle DB instance connected via 10GbE and utilizing Oracle’s Direct NFS client.
The guest OS has been virtualized and resides on the VMFS datastore, the vSphere host is booting from SAN, and the database is directly hosted and accessed on the NetApp array using NFS. Each data volume in the picture is connected using a different technology to illustrate protocol independence (outside of Oracle where NFS is used for simplicity of setup).
As you can see from the diagram the real challenge is re-creating that 10TB database in a way that is cost effective and fast. NetApp’s FlexClone technology allows the instant creation of zero space virtual copies. The process is similar to VMware’s linked clones, but NetApp does it with LUN’s or file data, and with no performance hit.
To build your safety net follow the steps below.
  1. Create LUN on NetApp array
  2. Create UCS Service Profile Template
  3. Configure Service Profile Template and set to boot from LUN in step 1
  4. Deploy Service Profile from template
  5. Install same OS as virtualized instance (OEL 5.5 in this case)
  6. Create FlexClone of Oracle files/volumes
  7. Create exports and set permissions for newly created server
  8. Configure OS with mount points designed for FlexCloned file/volume
At this point you have a full physical environment of that 10TB virtualized Oracle database. The diagram below shows what this looks like.
The next step is to clean this up since you don’t want this UCS blade occupied with the test environment.
  1. Shut down the OS
  2. Delete the Service Profile (not the template)
  3. Delete the FlexClone(s)
Now in the event you have some nasty database issue, and Oracle tells you to reproduce the issue on physical hardware, you can listen on the phone as the support guys jaw hits the floor when you tell him to give you 5 minutes. The entire process can be scripted easily using the Data ONTAP and UCS PowerShell Toolkit, or using an orchestration tool of your choice.
Reserving a blade or two for this unlikely scenario may seem wasteful to some, but because of the flexibility of UCS you can quickly spin that hardware up into production for things like hardware maintenance without a performance hit or capacity on demand for your vSphere environment. With NetApp, VMware, and Cisco you can safely and efficiently take your company to a 100% virtualized private cloud environment.
Categories: netapp, VMware Tags: , , ,

NetApp and VMware View 5,000-Seat Performance Report

September 28, 2011 Leave a comment

This report is a follow up to the 50,000-Seat VMware View Deployment white paper where NetApp analyzed a single pod of 5,000 virtual desktops.  This report is an in depth storage performance analysis of what VDI really is.  VDI is not only about steady state, login, or boot.  It’s about all phases, in the life span of the virtual desktop.  Below is one of the many charts and graphs that helps to demonstrates this fact.  The chart demonstrates that each phase has its own unique characteristics and such impacts storage very differently.

Lifecycle.png

For simplicity NetApp takes a unique approach in this document and overlay the performance tests on top of a calendar.  This way each of the different events in a “2 weeks in the life” of a virtual desktop can be easily analyzed and explained.

NetApp measured the deployment of 2500 virtual desktops using the NetApp Virtual Storage Console. We then look at first login where the user has never logged into this virtual machine before.  This simulates a scenario where the desktop has been re-provisioned after patching or something similar.  We look at a cached login for example “a Tuesday” where the user has already logged onto the desktop and this is the second time they log in.  Here the user logs in and starts working, which is probably the most common login workload.  We then look at a boot storm where the environment has to be shutdown and rebooted to demonstrate that with NetApp and VST, rebooting an entire VDI environment can be done quite rapidly (5,000 VMs in 18 minutes to be exact).  This demonstrates that the workload of booting or rebooting an entire environment doesn’t have to take forever!

Screen Shot 2011-08-29 at 3.10.58 PM.png

So what does all this mean and what do we look at in this paper?  We  dive deep into read write ratios, IO sizes, Sequential Randomness, and demonstrate that its not just all about IOPS.

Customers are often asked by their partners, virtualization vendors and storage vendors, “how many IOPS are your desktops doing”, they often reply with a number like 16 IOPS or maybe even more generic response like “we have a percentage of task workers, a percentage of power users, and a percentage of developers”.  If the response is along these lines, it will be sized wrong, almost guaranteed.

Lets take the simplest sizing approach…

Vendor: “Mr Customer, how many IOPS do you need each of your desktops to do?”

Customer: “Great question, I need my desktops to each do 16IOPS!”

Vendor: “Thanks for the info!  I’ll get you a sizing right away!”

Ok, does anyone else see the significant flaw in this methodology of sizing?  Lets do some simple math to figure out how this could go wrong…

If my IO size is 4K then: 16IOPS x 4K / IO = 64K/sec

If my IO size is 32K then:  16IOPS x 32K / IO = 512K/sec

So 16 IOPS != 16IOPS  There is a difference of 440Kb/sec in the two calculation

Why does everyone then size for only IOPS and not ask more difficult questions?  There are so many other questions that MUST BE ASKED!!!!

Are the IOPS 4K or 32K or a blend of all sizes? Are these reads or writes? Are they sequential or random?  Each of these has a SIGNIFICANT impact on storage as you can see by the example above!

This is why it is so important to perform an assessment with a product like Liquidware Labs Stratusphere Fit .  Then and only then are you able to get it sized right the first time!

Here are a couple of key takeaways from the paper!

  1. Assessments are the only way to get VDI right!
  2. VDI is not all small block random reads
  3. Theres more to VDI then steady state
  4. Login storms are the hardest workloads as it is reads and writes
  5. IOPS is only one part of the much larger story.
    1. Saying my desktops NEED 16 IOPS is useless!!!
    2. Saying 16 IOPS, 80%r/20%w, 20K reads / 8K write sizes, 50% sequential / 50% random reads gets you correct sizing’s!!!!!
  6. Memory overcommitment hurts really bad… The answer, buy more memory for your host or buy more storage!

http://media.netapp.com/documents/tr-3949.pdf

Categories: netapp, VMware Tags: ,

Convert VMware SQL Server Express to SQL Server

August 10, 2011 Leave a comment

One of the things that I realized in having SQL Server Express installed with the vCenter server is that if the vCenter Server crashes (if it is a stand-alone physical server). You are stuck! You are trying like hell to get this server up. However, if you have a SQL Server 2008 Standard Edition Database server. Why NOT use it. Also, if you can convince your company, non-profit, or hospital to pony up on the SQL Server 2008 License. I would say go for it! Yes, you “can” install SQL Server 2008 Express on a server and have vCenter connect to this also. However, this tutorial is for an environment where you want a centralized SQL Server 2008 Database Server. This server will be the DB SVR for vCenter, VUM and whatever else you want to use it for.

Scenario is based on 2 Physical stand-alone Dell PowerEdge R310 Servers (DC/vCenter) and 5 Dell PowerEdge R710 Servers. Windows 2008 Server Datacenter Edition, VMware ESXi 4.1, vCenter Standard 4.1, and NetApp 3270.

40 VM’s consisting of 2K3, 2K8, 2K8R2, RHEL 5.5, SUSE11-4VMware, UBUNTU10.4 Templates, DHCP on VM, DC2, DC3, File Server Cluster (2 Clustered on iSCSI SAN drive), Print Server Cluster (2), AV, WSUS, SQLSRV2K8 DB SRV, PROXY RHEL Cluster (2), VUM, VDR, VSHIELD (5), vDistributed Switches for 1GB Ethernet Intel NICs, HA/DRS Clusters.

Here we go…Screenshots soon to come.

1. Make sure your ISO’s are on a share so that you can access them.  Use Virtual Clone Drive to kick off the ISOs. Virtual Clone Drive should be an ABSOLUTE MUST for your VMware Arsenal.

1. Stop all of the VMware vCenter Services (i.e. vCenter, VUM, etc).
—-> START >> Run >> services.msc (If it says vCenter stop it. Leave the VMware Tools alone).

2. Copy all of the VIM_VCDB.mdf Database Files to your SQL 2008 Server.
—->START >> Run >> \\SQLSRV2K8DB\c$\Program Files\Microsoft SQL Server\ and then hit
the enter key. You want to go to the “Data” Folder and drop the databases into this folder.

3. Login to your SQL Server. Launch SQL Server. Connect to the server and right-click on Databases and “attach” the database. (I created a database and imported the migrated DB to the new created DB. You may decide to use the attached DB, but I wouldn’t.)

4. Right-click on the database and Backup the Database with a Full Backup.

5. Open up ODBC (under Administrator Tools). Go to System DSN and Test Connectivity to the SQL Server Native Client. Make sure your DB is the default (I named mine VCENTER since you can’t jack this up and even Joe new guy will know not to touch this database.).  Make sure you can connect to the server because if you can’t guess what. vCenter won’t either.

6. Uninstall vCenter Server from the Server and just re-install it. Point the vCenter DB to the new SQL Server 2008 Server and make sure you DO NOT OVERWRITE THE DATABASE!!!

7. Launch vCenter and if it comes up. Your golden!

References:
http://www.vmware.com/files/pdf/vc_microsoft_sql_server.pdf
http://www.ntpro.nl/blog/archives/1423-How-to-migrate-the-vCenter-database-to-Microsoft-SQL-Server-2008.html
http://get-admin.com/blog/?p=646
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1028601
http://www.sysadmintutorials.com/tutorials/vmware-vsphere-4/esx4/installing-vmware-view-4-composer/

The 4 Most Common Misconfigurations with NetApp Deduplication

August 2, 2011 Leave a comment

Being a field engineer I work with customers from all industries. When I tell customers that the usual deduplication ratio I see on production VMware workloads is 60-70% I am often met with skepticism. “But my VM workload is different” is usually the response I get, followed by “I’ll believe it when I see it”. I do also get the occasional “Thats not what your competitor tells me I will see” I love those ones.

Consistently though when the customer does a proof of concept or simply buys our gear and begins their implementation this is exactly the savings they tend to see in their VMware environment. Quite recently one of my clients moved 600+ VMs from their incumbent array which were using 11.9TB of disk to a new NetApp array. Those 600 VMs of varied application, OS type and configuration deduped back to 3.2 TB, a 73% savings!

Once in a while though I get the call from a customer saying “Hey, I only got 5% dedupe! What gives?” These low dedupe numbers are almost always because of one of the following deduplication configuration mistakes.

Misconfiguration #1 – Not turning on dedupe right away (or forgetting the -s or scan option)

As Dr. Dedupe pointed out in a recent blog, NetApp recommends dedulpication on all VMware workloads. You may have noticed that if you use our Virtual Storage Console (VSC) plugin for vCenter that creation of a VMware datastore using the plugin results in dedupe being turned on. We recommend enabling dedupe right away for a number of reasons but here is the primary reason why;

Enabling dedupe on a NetApp volume (ASIS) starts the controller tracking the new blocks that are written to that volume. Then during the scheduled deduplication pass the controller looks at those new blocks and eliminates any duplicates. What if, however, you already had some VMs in the volume before you enabled deduplication? Unless you told the NetApp specifically to scan the existing data, those VMs are never examined or deduped! This results in the low dedupe results. The good news, this is a very easy fix. Simply start a deduplication pass from the VSC with the “scan” option enabled or from the command line with the “-s” switch.

dedupmgmt1.png

Above, where to enable a deduplication volume scan in VSC.

Below, how to do one in Systems Manager;

dedupmgmt2.png

For you command line guys its “sis start -s /vol/myvol” note the -s, amazing what 2 characters can do!

This is by far is the most common mistake I come across but thanks to more customers provisioning their VMware storage with the free VSC plug-in it is becoming less common.

Misconfiguration #2 – LUN reservations

Thin Provisioning has gotten a bad reputation in the last few years. Storage admins who have been burned by thin provisioning in the past tend to get a bit reservation happy. On a NetApp controller we have multiple levels of reservations depending on your needs but with regard to VMware two stand out. First there is the volume reservation. This reserves space away from the large storage pool (the Aggregate) and insures whatever object you place into that volume has space. Inside the volume we now create the LUN for VMware. Again you can choose to reserve the space for the LUN which removes the space away from the available space in the volume. There are two problems with this. First, there is no need to do this. You have already reserved the space with the volume reservation, no need to reserve the space AGAIN with a LUN reservation. Second, the LUN reservation means that the unused space in the LUN will aways consume the space reserved. That is, a 600GB LUN with space reservation turned on will consume 600 GB of space with no data in it. Deduping a space reserved LUN will yield you some space from the used data but any unused space will remain reserved.

For example say I had a 90GB LUN in a 100GB volume and the LUN was reserved. With no data in the LUN the volume will show 90GB used, the unused but reserved LUN. Now I place 37 GB of data in the LUN. The volume will still show 90GB used. No change. Next I dedupe that 37 GB and say it dedupes to 10GB. The volume will no report 63 GB used since I reclaimed 27GB from deduping. However when I remove the LUN reservation I can see the data is actually taking up only 10GB with the volume now reporting 90GB free. [I updated this section from my original post, Thanks to Svetlana for pointing out my error here]

In these occasions, a simple deselection of the LUN reservation reveals the actual savings from dedupe (yes this can be done live with the VMs running). Once the actual dedupe savings are displayed (likely back in that 60-70% range) we can adjust the size of the volume to suit the size of the actual data in the LUN (yes, this too can be done live)

dedupmgmt3.png

Misconfiguration #3 – Misaligned VMs

The problem with some guest operating systems being misaligned with the underlying storage architecture has been well documented. In some cases though this misalignment can cause lower than expect deduplication numbers. Clients are often surprised (I know I was) at how many blocks we can dedupe between unlike operating systems. That is, between say Windows 2003 and 2008 or Windows XP and 2003. However if the starting offset of one of the OS types is different that the starting offset of the other than almost none of the blocks will align.

In addition to lowing your dedupe savings and using more disk space that required, misalignment can also place more load on your storage controller (any storage controller, not a NetApp specific problem). Thus it is a great idea to fix this situation. There are a number of tools on the market that can correct this situation including the MBRalign tool which is free for NetApp customers and included as part of the VSC. As you align the misaligned VMs, you will see your dedupe savings rise and your controller load decrease. Goodness!

Misconfiguration #4 – Large amounts of data in the VMs

Now this one isn’t really a misconfiguration, it’s more of a design option. You see, most of my customers do not separate their data from their boot VMDK files. The simplicity  of having your entire VMs in a single folder is just too good to mess with. Customers are normally still able to achieve very high deduplication ratios even with the application data mixed in with the OS data blocks. Sometimes though customers have very large data files such as large database files, large image file repositories or large message datastores mixed in with the VM. These large data files tend not to deduplicate well and as such drive down the percentage seen. No harm is done though since the NetApp will deduplicate the all the OS and other data around these large sections. However the customer can also move these VMDKs off to other datastores which can then expose the higher dedupe ratios on the remaining application and OS data. Either option is fine.

So there it is, the 4 most common misconfigurations I see with deduplication on NetApp in the field. Please feel free to post and share your savings, we always love to hear from our customers directly.

Categories: netapp Tags: ,