Tuesday, December 10, 2013

NFS VAAI Statistics for NetApp Storage

Welcome: To stay updated with all my Blog posts follow me on Twitter @arunpande !!
 
In this blog I will discuss the NFS VAAI statistics that can be used on the NetApp storage to measure the performance and troubleshoot VAAI related issues. These statistics will help you determine if Copy Offload is being used by the Storage Array. In this blog I will cover 7 Mode and Clustered Data ONTAP.


On the NetApp Storage use the following command to monitor the NFS VAAI statistics. I have highlighted the important stats in RED throughout the blog. Note that I have deliberately removed some metrics from the output to make it more readable.


In general irrespective of the version of ONTAP, you can use sysstat –x 1 to monitor/measure the CPU, Memory, Disk, Network and other parameters. When VAAI primitives are used the Network utilization would be comparatively low compared to the disk usage because the clone & snapshots are now offloaded to the Storage Array. Hence resulting in reduced network usage between the ESXi hosts and the NetApp Storage Array. This command could give you some indication about Copy Offload and other primitives. However this may not be conclusive because there may be other workloads resulting in high network usage even when VAAI is being used. To precisely monitor the Copy Success and Errors use the following commands.


  1. Data ONTAP 7 Mode – In 7 Mode there are two commands available that can be used to view the NFS VAAI statistics.


fas2040> nfs vstorage stats
NFS COL counters are :
                    Copy Reqs: 0
                   Abort Reqs: 0
                  Status Reqs: 0
                  Notify Reqs: 0
                  Revoke Reqs: 0
                Invalid Parms: 0
       Authorization Failures: 0
      Authentication Failures: 0
              Copy Fail ISDIR: 0
            Copy Fail OFFLINE: 0
              Copy Fail STALE: 0
                 Copy Fail IO: 0
            Copy Fail NOSPACE: 0
          Copy Fail DISKQUOTA: 0
           Copy Fail READONLY: 0
               Copy Fail PERM: 0
            Copy Fail EXPIRED: 0
           Copy Fail RESOURCE: 0
           Copy Fail TOOSMALL: 0
        Copy Fail BAD STATEID: 0
              Copy Fail OTHER: 0
               Intravol Moves: 0
               Intervol Moves: 0
               Fail Space RES: 0



fas2040> nfs stat


Server rpc:
TCP:
calls       badcalls    nullrecv    badlen      xdrcall
2           0           0           0           0


UDP:
calls       badcalls    nullrecv    badlen      xdrcall
0           0           0           0           0


IPv4:
calls       badcalls    nullrecv    badlen      xdrcall
2           0           0           0           0


IPv6:
calls       badcalls    nullrecv    badlen      xdrcall
0           0           0           0           0


Server nfs:
calls       badcalls
2           0


Server nfs V3: (2 calls)
null       getattr    setattr    lookup     access     readlink   read
2 100%     0 0%       0 0%       0 0%       0 0%       0 0%       0 0%
write      create     mkdir      symlink    mknod      remove     rmdir
0 0%       0 0%       0 0%       0 0%       0 0%       0 0%       0 0%
rename     link       readdir    readdir+   fsstat     fsinfo     pathconf
0 0%       0 0%       0 0%       0 0%       0 0%       0 0%       0 0%
commit
0 0%


Read request stats (version 3)
0-511      512-1023   1K-2047    2K-4095    4K-8191    8K-16383   16K-32767  32K-65535  64K-131071 > 131071
0          0          0          0          0          0          0          0          0          0
Write request stats (version 3)
0-511      512-1023   1K-2047    2K-4095    4K-8191    8K-16383   16K-32767  32K-65535  64K-131071 > 131071
0          0          0          0          0          0          0          0          0          0



  1. Clustered Data ONTAP 8.x


NOTE: For Clustered Data ONTAP 8.2 you have to execute this command from diagnostic mode and use statistics-v1 command to get the copy_manager statistics.


To enter diagnostic mode use the following:
cluster1::> set diag
Warning: These diagnostic commands are for use by NetApp personnel only.
Do you want to continue? {y|n}: y
cluster1::*>
cluster1::*> statistics-v1 show -node cluster1-01 -object copy_manager


For previous versions of Clustered Data ONTAP use the following:


cluster1::> statistics show -node cluster1-01 -object copy_manager


Node: cluster1-01
   Object.Instance.Counter                                 Value         Delta
   ----------------------------------------------- ------------- -------------
   copy_manager.copy_stats.instance_name             copy_stats
            -
   copy_manager.copy_stats.node_name                           -             -
   copy_manager.copy_stats.instance_uuid                       -             -
   copy_manager.copy_stats.copy_success                        1             -
   copy_manager.copy_stats.copy_failure                        0             -
   copy_manager.copy_stats.copyStatus_success                  0             -
   copy_manager.copy_stats.copyStatus_failure                  0             -
   copy_manager.copy_stats.copyAbort_success                   0             -
   copy_manager.copy_stats.copyAbort_failure                   0             -
   copy_manager.copy_stats.copyCallback_success                0             -
   copy_manager.copy_stats.copyCallback_failure                0             -
   copy_manager.copy_stats.copyNotify_success                  1             -
   copy_manager.copy_stats.copyNotify_failure                  0             -
   copy_manager.copy_stats.copyRevoke_success                  1             -
   copy_manager.copy_stats.copyRevoke_failure                  0             -
   copy_manager.copy_stats.copyAuthCheck_success               0             -
   copy_manager.copy_stats.copyAuthCheck_failure               0             -
   copy_manager.copy_stats.bytes_copied                        0             -
Node: cluster1-01
   Object.Instance.Counter                                 Value         Delta
   ----------------------------------------------- ------------- -------------
   copy_manager.copy_stats.intra_vol_copy_cnt                  1             -
   copy_manager.copy_stats.inter_vol_copy_cnt                  0             -
   copy_manager.copy_stats.inter_node_copy_cnt                 0             -
   copy_manager.copy_stats.inter_clust_copy_cnt                0             -
   copy_manager.copy_stats.fail_mem_alloc                      0             -
   copy_manager.copy_stats.fail_isdir                          0             -
   copy_manager.copy_stats.fail_offline                        0             -
   copy_manager.copy_stats.fail_stale                          0             -
   copy_manager.copy_stats.fail_io                             0             -
   copy_manager.copy_stats.fail_nospace                        0             -
   copy_manager.copy_stats.fail_readonly                       0             -
   copy_manager.copy_stats.fail_authcheck                      0             -
   copy_manager.copy_stats.fail_no_resource                    0             -
   copy_manager.copy_stats.fail_other                          0             -
   copy_manager.copy_stats.intra_volume_copy_success           1             -
   copy_manager.copy_stats.intra_volume_copy_failure           0             -
   copy_manager.copy_stats.intra_volume_copyStatus_success     0             -
   copy_manager.copy_stats.intra_volume_copyStatus_failure     0             -
   copy_manager.copy_stats.intra_volume_copyAbort_success      0             -


Node: cluster1-01
   Object.Instance.Counter                                 Value         Delta
   ----------------------------------------------- ------------- -------------
   copy_manager.copy_stats.intra_volume_copyAbort_failure      0             -
   copy_manager.copy_stats.inter_volume_copy_success           0             -
   copy_manager.copy_stats.inter_volume_copy_failure           0             -
   copy_manager.copy_stats.inter_volume_copyStatus_success     0             -
   copy_manager.copy_stats.inter_volume_copyStatus_failure     0             -
   copy_manager.copy_stats.inter_volume_copyAbort_success      0             -
   copy_manager.copy_stats.inter_volume_copyAbort_failure      0             -
   copy_manager.copy_stats.inter_volume_copyCallback_success   0             -
   copy_manager.copy_stats.inter_volume_copyCallback_failure   0             -


In addition to the above command you can also check the nps1 status to troubleshoot NFS VAAI related issues.

cluster1::> system node run -node cluster1-01 -command stats show nps1
nps1:nps1:instance_name:nps1
nps1:nps1:node_name:
nps1:nps1:instance_uuid:
nps1:nps1:null_success:0
nps1:nps1:null_error:0
nps1:nps1:compound_success:0
nps1:nps1:compound_error:0
nps1:nps1:access_success:0
nps1:nps1:access_error:0
nps1:nps1:verify_success:0
nps1:nps1:verify_error:0
nps1:nps1:write_success:0
nps1:nps1:write_error:0
nps1:nps1:set_ssv_error:0
nps1:nps1:test_stateid_success:0
nps1:nps1:test_stateid_error:0
nps1:nps1:want_delegation_success:0
nps1:nps1:want_delegation_error:0
nps1:nps1:destroy_clientid_success:0
nps1:nps1:destroy_clientid_error:0
nps1:nps1:reclaim_complete_success:0
nps1:nps1:reclaim_complete_error:0
nps1:nps1:copy_notify_success:1
nps1:nps1:copy_notify_error:0
nps1:nps1:copy_revoke_success:1
nps1:nps1:copy_revoke_error:0
nps1:nps1:copy_success:1
nps1:nps1:copy_error:0
nps1:nps1:copy_abort_success:0
nps1:nps1:copy_abort_error:0
nps1:nps1:copy_status_success:0
nps1:nps1:copy_status_error:0


Wednesday, November 27, 2013

esxtop Statistics for Block VAAI

Welcome: To stay updated with all my Blog posts follow me on Twitter @arunpande !!
 
We all know how important esxtop is while troubleshooting various vSphere related issues. In this blog I will share the esxtop metrics that you can use while troubleshooting various VAAI primitives. This will help you not only to diagnose VAAI related issues but will also help you measure the performance benefits that VAAI provides.  
To demonstrate this I have replicated some scenarios where VAAI is used so that I can capture the esxtop stats.
To access the esxtop metrics, login to the ESXi host using SSH
# esxtop
# press u for disk view
# press f to change fields
# press o for VAAI stats
# press p for VAAI latency stats
# press Enter

Block Zero & Hardware Assisted Locking (ATS)


In this section we will cover the Block Zero VAAI primitive.
Scenario 1: Test BLOCK ZEROING primitive by creating a new Windows 2008 R2 VM with Lazy Zeroed Thick disk.
 
On monitoring the ZERO statistics I observed that it incremented from 4 to 7007 during the OS reinstallation.


Scenario 2: Test BLOCK ZEROING primitive by adding a new Eager Zeroed Thick virtual disk.
In this scenario I have added a 150 GB Eager Zeroed thick disk and on monitoring esxtop I observed that the ZERO statistics incremented from 7013 to 148020.



UNMAP

Scenario 3: You can either delete a VM or Storage vMotion the VM to a different datastore to demonstrate this.
We will now use the UNMAP primitive from the ESXi shell using the command
# esxcli storage vmfs unmap -l iscsi_2


On monitoring the esxtop I have observed that the DELETE statistics has increased to 52527.

Full Copy

In this section we will cover the Full Copy VAAI primitive.


Scenario 4: Test VAAI FULL COPY primitive, create multiple clones of the same VM.


In this scenario we will initiate a clone of a Windows 2008 R2 VM from vCenter. While monitoring the esxtop I have observed that the CLONE_RD & CLONE_WR statistics incremented. Note that MBC_RD/s & MBC_WR/s is the throughput for Full Copy Read & Write.




Scenario 5: Test VAAI FULL COPY primitive by relocating VM using Storage vMotion.


In this scenario we have migrated the windows VM to another iSCSI LUN that is being managed by the same controller in the same vServer. While monitoring esxtop I have observed that the CLONE_RD (source datasource), CLONE_WR (destination datastore), ATS, ZERO (destination datastore), AAVG (destination datastore) metrics were incremented.




To all VMware & NetApp Administrators go prepared when you walk into the War Room to discuss VAAI related (break/fix & performance) issues, all the best .

Wednesday, November 20, 2013

Using VAAI UNMAP on vSphere 5.5 & NetApp Storage

Welcome: To stay updated with all my Blog posts follow me on Twitter @arunpande !!


In my previous blog vStorage APIs for Array Integration (VAAI) & NetApp – How to set it right? I have shared steps to use VAAI. In this blog I will cover the steps required to use the VAAI UNMAP primitive in vSphere 5.5. The UNMAP primitive is used by the ESXi host to update the Storage Array about the storage blocks that has to be reclaimed after deleting a VM or migrating it to another datastore using Storage vMotion. In vSphere 5.5 # esxcli storage vmfs unmap command is used whereas in the earlier version vmkfstools –y command was used. You can now specify the number of blocks to be reclaimed using -n option whereas with vmkfstools –y command you had to specify the percentage of blocks that you want to reclaim. It is advised to perform this step after business hours or when there is no active I/O on the datastore, however I have not tested


In this scenario I am using a thin provisioned LUN from NetApp Storage and to demonstrate space reclamation I will create two scenarios (i) deleting the thick disk  (ii) migrating VMs using Storage vMotion. I will also share the storage capacity from NetApp Virtual Storage Console (VSC) which will give a view about the available space not only on the VMFS datastore but also the underlying LUN/Volume/Aggregate.


Scenario 1 – Deleting a thick disk from the virtual machine
Here is an overview about the Capacity of Datastore/LUN/Volume/Aggregate as per VSC.

             


Capacity of the datastore as per ESXi Shell


# du -h /vmfs/volumes/iscsi_2/
1.0M    /vmfs/volumes/iscsi_2/.sdd.sf
8.0K    /vmfs/volumes/iscsi_2/ntap_rcu1374646447227
8.0K    /vmfs/volumes/iscsi_2/ntap_rcu1374459789333
8.0K    /vmfs/volumes/iscsi_2/.naa.600a09802d6474573924384a79717958
194.1G  /vmfs/volumes/iscsi_2/Win2k8-1
194.9G  /vmfs/volumes/iscsi_2/


This indicates that the total used capacity on the datastore is 194.9 GB
We will now delete the 150 GB Eager Zeroed Thick Disk. After deleting this virtual disk the ESXi shell reports the following capacity.


# du -h
1.0M    ./.sdd.sf
8.0K    ./ntap_rcu1374646447227
8.0K    ./ntap_rcu1374459789333
8.0K    ./.naa.600a09802d6474573924384a79717958
44.1G   ./Win2k8-1
44.9G   .


The free space on the datastore is now 205 GB and the used space is approximately 44.9 GB.  However NetApp Storage does not detect this free space on the LUN, here is the output of the lun show command that is executed from the Clustered Data ONTAP CLI.


clus-1::> lun show -v /vol/iscsi_2/iscsi_2
              Vserver Name: vmwaretest
                  LUN Path: /vol/iscsi_2/iscsi_2
               Volume Name: iscsi_2
                Qtree Name: ""
                  LUN Name: iscsi_2
                  LUN Size: 250.3GB
                   OS Type: vmware
         Space Reservation: disabled
             Serial Number: -dtW9$8JyqyX
                   Comment: The Provisioning and Cloning capability created this lun at the request of Administrator
Space Reservations Honored: false
          Space Allocation: enabled
                     State: online
                  LUN UUID: 7fe6d24a-f782-476d-827e-a4d20f371abb
                    Mapped: mapped
                Block Size: 512
          Device Legacy ID: -
          Device Binary ID: -
            Device Text ID: -
                 Read Only: false
Inaccessible Due to Restore: false
                 Used Size: 237.9GB
       Maximum Resize Size: 2.50TB
             Creation Time: 12/16/2010 03:27:26
                     Class: regular
                     Clone: false
  Clone Autodelete Enabled: false
          QoS Policy Group: -


VSC also reports the same capacity for this LUN.

            

We will now use the UNMAP primitive from the ESXi shell using the command
# esxcli storage vmfs unmap -l iscsi_2


NOTE: You can also specify the number of blocks that you want to reclaim using –n option. If you specify 500 then 500 x 1MB (i.e. default block size in VMFS 5) blocks would be reclaimed.


On monitoring the esxtop I have observed that the DELETE statistics has increased to 52527.


VSC now reports the following capacity, where we see that the free space is now updated for LUN & Volume.



Scenario 2 – Test UNMAP after relocating VMs using Storage vMotion.


NetApp VSC reports the following storage usage.

            


Datastore Usage according to ESXi Shell


~ # df -h
Filesystem   Size   Used Available Use% Mounted on
VMFS-5       1.0T 881.5G    143.0G  86% /vmfs/volumes/FC-Infra


Datastore Usage per VM is given below


~ # du -h /vmfs/volumes/FC-Infra /
74.5G   /vmfs/volumes/FC-Infra/VC
78.3G   /vmfs/volumes/FC-Infra/DB
15.4G   /vmfs/volumes/FC-Infra/Oncommand-Proxy
8.0K    /vmfs/volumes/FC-Infra/.vSphere-HA
1.3M    /vmfs/volumes/FC-Infra/.dvsData/7a 4c 23 50 26 82 38 5d-d9 e5 e2 78 4f 7d af 26
32.0K   /vmfs/volumes/FC-Infra/.dvsData/3e 55 23 50 21 27 03 84-e3 f4 4a 7f de 48 08 32
1.3M    /vmfs/volumes/FC-Infra/.dvsData
29.4G   /vmfs/volumes/FC-Infra/AD
64.1G   /vmfs/volumes/FC-Infra/VASA
23.5G   /vmfs/volumes/FC-Infra/VSI Launcher-9
23.5G   /vmfs/volumes/FC-Infra/VSI Launcher-7
12.0G   /vmfs/volumes/FC-Infra/OnCommand Balance
32.7G   /vmfs/volumes/FC-Infra/ViewComposer
63.3G   /vmfs/volumes/FC-Infra/View Connection Server
19.5G   /vmfs/volumes/FC-Infra/VSIShare
19.5G   /vmfs/volumes/FC-Infra/VSI Launcher-10
21.4G   /vmfs/volumes/FC-Infra/UM-6.0
20.6G   /vmfs/volumes/FC-Infra/VSI Launcher
15.3G   /vmfs/volumes/FC-Infra/VSI Launcher-Template
24.4G   /vmfs/volumes/FC-Infra/VSI Launcher-2
23.5G   /vmfs/volumes/FC-Infra/VSI Launcher-4
24.7G   /vmfs/volumes/FC-Infra/VSI Launcher-3
23.5G   /vmfs/volumes/FC-Infra/VSI Launcher-5
25.5G   /vmfs/volumes/FC-Infra/VSI Launcher-6
25.5G   /vmfs/volumes/FC-Infra/VSI Launcher-8
34.4G   /vmfs/volumes/FC-Infra/UI VM
181.5G  /vmfs/volumes/FC-Infra/Analytics VM
1.5G    /vmfs/volumes/FC-Infra/vmkdump
881.3G  /vmfs/volumes/FC-Infra/


To make some free space on the Storage I have Storage vMotioned the following VMs to another datastore.
29.4G   /vmfs/volumes/FC-Infra/AD
78.3G   /vmfs/volumes/FC-Infra/DB


After the above VMs were migrated to other datastores the following datastore usage was reported:


From the Filer, notice that the LUN Used Size remains the same.


veo-f3270::> lun show -v /vol/infra_services/infra


             Vserver Name: Infra_Vserver
                 LUN Path: /vol/infra_services/infra
              Volume Name: infra_services
               Qtree Name: ""
                 LUN Name: infra
                 LUN Size: 1TB
                  OS Type: vmware
        Space Reservation: disabled
            Serial Number: 7T-iK+3/2TGu
                  Comment:
Space Reservations Honored: false
         Space Allocation: disabled
                    State: online
                 LUN UUID: ceaf5e6e-5a6a-11dc-8751-123478563412
                   Mapped: mapped
               Block Size: 512B
         Device Legacy ID: -
         Device Binary ID: -
           Device Text ID: -
                Read Only: false
                Used Size: 848.9GB
            Creation Time: 9/3/2007 18:12:49


NetApp VSC does not report any changes in LUN Usage either.

            


ESXi Shell reports the updated free space.
~ # df -h
Filesystem   Size   Used Available Use% Mounted on
VMFS-5       1.0T 773.8G    250.7G  76% /vmfs/volumes/FC-Infra


I have now performed the reclaim operation from the ESXi Shell using the below command
# esxcli storage vmfs unmap -l FC-Infra


NOTE: You can also specify the number of blocks that you want to reclaim using –n option. If you specify 500 then 500 x 1MB (i.e. default block size in VMFS 5) blocks would be reclaimed.

VSC now reports free space in the LUN Usage.

            



The filer also reports the updated Storage Capacity.
veo-f3270::> lun show -v /vol/infra_services/infra


             Vserver Name: Infra_Vserver
                 LUN Path: /vol/infra_services/infra
              Volume Name: infra_services
               Qtree Name: ""
                 LUN Name: infra
                 LUN Size: 1TB
                  OS Type: vmware
        Space Reservation: disabled
            Serial Number: 7T-iK+3/2TGu
                  Comment:
Space Reservations Honored: false
         Space Allocation: disabled
                    State: online
                 LUN UUID: ceaf5e6e-5a6a-11dc-8751-123478563412
                   Mapped: mapped
               Block Size: 512B
         Device Legacy ID: -
         Device Binary ID: -
           Device Text ID: -
                Read Only: false
                Used Size: 742.6GB
            Creation Time: 9/3/2007 18:12:49