Tuesday, December 10, 2013

NFS VAAI Statistics for NetApp Storage

Welcome: To stay updated with all my Blog posts follow me on Twitter @arunpande !!

In this blog I will discuss the NFS VAAI statistics that can be used on the NetApp storage to measure the performance and troubleshoot VAAI related issues. These statistics will help you determine if Copy Offload is being used by the Storage Array. In this blog I will cover 7 Mode and Clustered Data ONTAP.

On the NetApp Storage use the following command to monitor the NFS VAAI statistics. I have highlighted the important stats in RED throughout the blog. Note that I have deliberately removed some metrics from the output to make it more readable.

In general irrespective of the version of ONTAP, you can use sysstat –x 1 to monitor/measure the CPU, Memory, Disk, Network and other parameters. When VAAI primitives are used the Network utilization would be comparatively low compared to the disk usage because the clone & snapshots are now offloaded to the Storage Array. Hence resulting in reduced network usage between the ESXi hosts and the NetApp Storage Array. This command could give you some indication about Copy Offload and other primitives. However this may not be conclusive because there may be other workloads resulting in high network usage even when VAAI is being used. To precisely monitor the Copy Success and Errors use the following commands.

Data ONTAP 7 Mode – In 7 Mode there are two commands available that can be used to view the NFS VAAI statistics.

fas2040> nfs vstorage stats

NFS COL counters are :

Copy Reqs: 0

Abort Reqs: 0

Status Reqs: 0

Notify Reqs: 0

Revoke Reqs: 0

Invalid Parms: 0

Authorization Failures: 0

Authentication Failures: 0

Copy Fail ISDIR: 0

Copy Fail OFFLINE: 0

Copy Fail STALE: 0

Copy Fail IO: 0

Copy Fail NOSPACE: 0

Copy Fail DISKQUOTA: 0

Copy Fail READONLY: 0

Copy Fail PERM: 0

Copy Fail EXPIRED: 0

Copy Fail RESOURCE: 0

Copy Fail TOOSMALL: 0

Copy Fail BAD STATEID: 0

Copy Fail OTHER: 0

Intravol Moves: 0

Intervol Moves: 0

Fail Space RES: 0

fas2040> nfs stat

Server rpc:

TCP:

calls badcalls nullrecv badlen xdrcall

2 0 0 0 0

UDP:

calls badcalls nullrecv badlen xdrcall

0 0 0 0 0

IPv4:

calls badcalls nullrecv badlen xdrcall

2 0 0 0 0

IPv6:

calls badcalls nullrecv badlen xdrcall

0 0 0 0 0

Server nfs:

calls badcalls

2 0

Server nfs V3: (2 calls)

null getattr setattr lookup access readlink read

2 100% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

write create mkdir symlink mknod remove rmdir

0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

rename link readdir readdir+ fsstat fsinfo pathconf

0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

commit

0 0%

Read request stats (version 3)

0-511 512-1023 1K-2047 2K-4095 4K-8191 8K-16383 16K-32767 32K-65535 64K-131071 > 131071

0 0 0 0 0 0 0 0 0 0

Write request stats (version 3)

0-511 512-1023 1K-2047 2K-4095 4K-8191 8K-16383 16K-32767 32K-65535 64K-131071 > 131071

0 0 0 0 0 0 0 0 0 0

Clustered Data ONTAP 8.x

NOTE: For Clustered Data ONTAP 8.2 you have to execute this command from diagnostic mode and use statistics-v1 command to get the copy_manager statistics.

To enter diagnostic mode use the following:

cluster1::> set diag

Warning: These diagnostic commands are for use by NetApp personnel only.

Do you want to continue? {y|n}: y

cluster1::*>

cluster1::*> statistics-v1 show -node cluster1-01 -object copy_manager

For previous versions of Clustered Data ONTAP use the following:

cluster1::> statistics show -node cluster1-01 -object copy_manager

Node: cluster1-01

Object.Instance.Counter Value Delta

----------------------------------------------- ------------- -------------

copy_manager.copy_stats.instance_name copy_stats

copy_manager.copy_stats.node_name - -

copy_manager.copy_stats.instance_uuid - -

copy_manager.copy_stats.copy_success 1 -

copy_manager.copy_stats.copy_failure 0 -

copy_manager.copy_stats.copyStatus_success 0 -

copy_manager.copy_stats.copyStatus_failure 0 -

copy_manager.copy_stats.copyAbort_success 0 -

copy_manager.copy_stats.copyAbort_failure 0 -

copy_manager.copy_stats.copyCallback_success 0 -

copy_manager.copy_stats.copyCallback_failure 0 -

copy_manager.copy_stats.copyNotify_success 1 -

copy_manager.copy_stats.copyNotify_failure 0 -

copy_manager.copy_stats.copyRevoke_success 1 -

copy_manager.copy_stats.copyRevoke_failure 0 -

copy_manager.copy_stats.copyAuthCheck_success 0 -

copy_manager.copy_stats.copyAuthCheck_failure 0 -

copy_manager.copy_stats.bytes_copied 0 -

Node: cluster1-01

Object.Instance.Counter Value Delta

----------------------------------------------- ------------- -------------

copy_manager.copy_stats.intra_vol_copy_cnt 1 -

copy_manager.copy_stats.inter_vol_copy_cnt 0 -

copy_manager.copy_stats.inter_node_copy_cnt 0 -

copy_manager.copy_stats.inter_clust_copy_cnt 0 -

copy_manager.copy_stats.fail_mem_alloc 0 -

copy_manager.copy_stats.fail_isdir 0 -

copy_manager.copy_stats.fail_offline 0 -

copy_manager.copy_stats.fail_stale 0 -

copy_manager.copy_stats.fail_io 0 -

copy_manager.copy_stats.fail_nospace 0 -

copy_manager.copy_stats.fail_readonly 0 -

copy_manager.copy_stats.fail_authcheck 0 -

copy_manager.copy_stats.fail_no_resource 0 -

copy_manager.copy_stats.fail_other 0 -

copy_manager.copy_stats.intra_volume_copy_success 1 -

copy_manager.copy_stats.intra_volume_copy_failure 0 -

copy_manager.copy_stats.intra_volume_copyStatus_success 0 -

copy_manager.copy_stats.intra_volume_copyStatus_failure 0 -

copy_manager.copy_stats.intra_volume_copyAbort_success 0 -

Node: cluster1-01

Object.Instance.Counter Value Delta

----------------------------------------------- ------------- -------------

copy_manager.copy_stats.intra_volume_copyAbort_failure 0 -

copy_manager.copy_stats.inter_volume_copy_success 0 -

copy_manager.copy_stats.inter_volume_copy_failure 0 -

copy_manager.copy_stats.inter_volume_copyStatus_success 0 -

copy_manager.copy_stats.inter_volume_copyStatus_failure 0 -

copy_manager.copy_stats.inter_volume_copyAbort_success 0 -

copy_manager.copy_stats.inter_volume_copyAbort_failure 0 -

copy_manager.copy_stats.inter_volume_copyCallback_success 0 -

copy_manager.copy_stats.inter_volume_copyCallback_failure 0 -

In addition to the above command you can also check the nps1 status to troubleshoot NFS VAAI related issues.

cluster1::> system node run -node cluster1-01 -command stats show nps1

nps1:nps1:instance_name:nps1

nps1:nps1:node_name:

nps1:nps1:instance_uuid:

nps1:nps1:null_success:0

nps1:nps1:null_error:0

nps1:nps1:compound_success:0

nps1:nps1:compound_error:0

nps1:nps1:access_success:0

nps1:nps1:access_error:0

nps1:nps1:verify_success:0

nps1:nps1:verify_error:0

nps1:nps1:write_success:0

nps1:nps1:write_error:0

nps1:nps1:set_ssv_error:0

nps1:nps1:test_stateid_success:0

nps1:nps1:test_stateid_error:0

nps1:nps1:want_delegation_success:0

nps1:nps1:want_delegation_error:0

nps1:nps1:destroy_clientid_success:0

nps1:nps1:destroy_clientid_error:0

nps1:nps1:reclaim_complete_success:0

nps1:nps1:reclaim_complete_error:0

nps1:nps1:copy_notify_success:1

nps1:nps1:copy_notify_error:0

nps1:nps1:copy_revoke_success:1

nps1:nps1:copy_revoke_error:0

nps1:nps1:copy_success:1

nps1:nps1:copy_error:0

nps1:nps1:copy_abort_success:0

nps1:nps1:copy_abort_error:0

nps1:nps1:copy_status_success:0

nps1:nps1:copy_status_error:0

Wednesday, November 27, 2013

esxtop Statistics for Block VAAI

Welcome: To stay updated with all my Blog posts follow me on Twitter @arunpande !!

We all know how important esxtop is while troubleshooting various vSphere related issues. In this blog I will share the esxtop metrics that you can use while troubleshooting various VAAI primitives. This will help you not only to diagnose VAAI related issues but will also help you measure the performance benefits that VAAI provides.

To demonstrate this I have replicated some scenarios where VAAI is used so that I can capture the esxtop stats.

To access the esxtop metrics, login to the ESXi host using SSH

# esxtop

# press u for disk view

# press f to change fields

# press o for VAAI stats

# press p for VAAI latency stats

# press Enter 

Block Zero & Hardware Assisted Locking (ATS)

In this section we will cover the Block Zero VAAI primitive.

Scenario 1: Test BLOCK ZEROING primitive by creating a new Windows 2008 R2 VM with Lazy Zeroed Thick disk.

On monitoring the ZERO statistics I observed that it incremented from 4 to 7007 during the OS reinstallation.

Scenario 2: Test BLOCK ZEROING primitive by adding a new Eager Zeroed Thick virtual disk.

In this scenario I have added a 150 GB Eager Zeroed thick disk and on monitoring esxtop I observed that the ZERO statistics incremented from 7013 to 148020.

UNMAP

Scenario 3: You can either delete a VM or Storage vMotion the VM to a different datastore to demonstrate this.

We will now use the UNMAP primitive from the ESXi shell using the command

# esxcli storage vmfs unmap -l iscsi_2

On monitoring the esxtop I have observed that the DELETE statistics has increased to 52527.

Full Copy

In this section we will cover the Full Copy VAAI primitive.

Scenario 4: Test VAAI FULL COPY primitive, create multiple clones of the same VM.

In this scenario we will initiate a clone of a Windows 2008 R2 VM from vCenter. While monitoring the esxtop I have observed that the CLONE_RD & CLONE_WR statistics incremented. Note that MBC_RD/s & MBC_WR/s is the throughput for Full Copy Read & Write.

Scenario 5: Test VAAI FULL COPY primitive by relocating VM using Storage vMotion.

In this scenario we have migrated the windows VM to another iSCSI LUN that is being managed by the same controller in the same vServer. While monitoring esxtop I have observed that the CLONE_RD (source datasource), CLONE_WR (destination datastore), ATS, ZERO (destination datastore), AAVG (destination datastore) metrics were incremented.

To all VMware & NetApp Administrators go prepared when you walk into the War Room to discuss VAAI related (break/fix & performance) issues, all the best .

Wednesday, November 20, 2013

Using VAAI UNMAP on vSphere 5.5 & NetApp Storage

Welcome: To stay updated with all my Blog posts follow me on Twitter @arunpande !!

In my previous blog vStorage APIs for Array Integration (VAAI) & NetApp – How to set it right? I have shared steps to use VAAI. In this blog I will cover the steps required to use the VAAI UNMAP primitive in vSphere 5.5. The UNMAP primitive is used by the ESXi host to update the Storage Array about the storage blocks that has to be reclaimed after deleting a VM or migrating it to another datastore using Storage vMotion. In vSphere 5.5 # esxcli storage vmfs unmap command is used whereas in the earlier version vmkfstools –y command was used. You can now specify the number of blocks to be reclaimed using -n option whereas with vmkfstools –y command you had to specify the percentage of blocks that you want to reclaim. It is advised to perform this step after business hours or when there is no active I/O on the datastore, however I have not tested

In this scenario I am using a thin provisioned LUN from NetApp Storage and to demonstrate space reclamation I will create two scenarios (i) deleting the thick disk (ii) migrating VMs using Storage vMotion. I will also share the storage capacity from NetApp Virtual Storage Console (VSC) which will give a view about the available space not only on the VMFS datastore but also the underlying LUN/Volume/Aggregate.

Scenario 1 – Deleting a thick disk from the virtual machine

Here is an overview about the Capacity of Datastore/LUN/Volume/Aggregate as per VSC.

Capacity of the datastore as per ESXi Shell

# du -h /vmfs/volumes/iscsi_2/

1.0M /vmfs/volumes/iscsi_2/.sdd.sf

8.0K /vmfs/volumes/iscsi_2/ntap_rcu1374646447227

8.0K /vmfs/volumes/iscsi_2/ntap_rcu1374459789333

8.0K /vmfs/volumes/iscsi_2/.naa.600a09802d6474573924384a79717958

194.1G /vmfs/volumes/iscsi_2/Win2k8-1

194.9G /vmfs/volumes/iscsi_2/

This indicates that the total used capacity on the datastore is 194.9 GB

We will now delete the 150 GB Eager Zeroed Thick Disk. After deleting this virtual disk the ESXi shell reports the following capacity.

# du -h

1.0M ./.sdd.sf

8.0K ./ntap_rcu1374646447227

8.0K ./ntap_rcu1374459789333

8.0K ./.naa.600a09802d6474573924384a79717958

44.1G ./Win2k8-1

44.9G .

The free space on the datastore is now 205 GB and the used space is approximately 44.9 GB. However NetApp Storage does not detect this free space on the LUN, here is the output of the lun show command that is executed from the Clustered Data ONTAP CLI.

clus-1::> lun show -v /vol/iscsi_2/iscsi_2

Vserver Name: vmwaretest

LUN Path: /vol/iscsi_2/iscsi_2

Volume Name: iscsi_2

Qtree Name: ""

LUN Name: iscsi_2

LUN Size: 250.3GB

OS Type: vmware

Space Reservation: disabled

Serial Number: -dtW9$8JyqyX

Comment: The Provisioning and Cloning capability created this lun at the request of Administrator

Space Reservations Honored: false

Space Allocation: enabled

State: online

LUN UUID: 7fe6d24a-f782-476d-827e-a4d20f371abb

Mapped: mapped

Block Size: 512

Device Legacy ID: -

Device Binary ID: -

Device Text ID: -

Read Only: false

Inaccessible Due to Restore: false

Used Size: 237.9GB

Maximum Resize Size: 2.50TB

Creation Time: 12/16/2010 03:27:26

Class: regular

Clone: false

Clone Autodelete Enabled: false

QoS Policy Group: -

VSC also reports the same capacity for this LUN.

We will now use the UNMAP primitive from the ESXi shell using the command

# esxcli storage vmfs unmap -l iscsi_2

NOTE: You can also specify the number of blocks that you want to reclaim using –n option. If you specify 500 then 500 x 1MB (i.e. default block size in VMFS 5) blocks would be reclaimed.

On monitoring the esxtop I have observed that the DELETE statistics has increased to 52527.

VSC now reports the following capacity, where we see that the free space is now updated for LUN & Volume.

Scenario 2 – Test UNMAP after relocating VMs using Storage vMotion.

NetApp VSC reports the following storage usage.

Datastore Usage according to ESXi Shell

~ # df -h

Filesystem Size Used Available Use% Mounted on

VMFS-5 1.0T 881.5G 143.0G 86% /vmfs/volumes/FC-Infra

Datastore Usage per VM is given below

~ # du -h /vmfs/volumes/FC-Infra /

74.5G /vmfs/volumes/FC-Infra/VC

78.3G /vmfs/volumes/FC-Infra/DB

15.4G /vmfs/volumes/FC-Infra/Oncommand-Proxy

8.0K /vmfs/volumes/FC-Infra/.vSphere-HA

1.3M /vmfs/volumes/FC-Infra/.dvsData/7a 4c 23 50 26 82 38 5d-d9 e5 e2 78 4f 7d af 26

32.0K /vmfs/volumes/FC-Infra/.dvsData/3e 55 23 50 21 27 03 84-e3 f4 4a 7f de 48 08 32

1.3M /vmfs/volumes/FC-Infra/.dvsData

29.4G /vmfs/volumes/FC-Infra/AD

64.1G /vmfs/volumes/FC-Infra/VASA

23.5G /vmfs/volumes/FC-Infra/VSI Launcher-9

23.5G /vmfs/volumes/FC-Infra/VSI Launcher-7

12.0G /vmfs/volumes/FC-Infra/OnCommand Balance

32.7G /vmfs/volumes/FC-Infra/ViewComposer

63.3G /vmfs/volumes/FC-Infra/View Connection Server

19.5G /vmfs/volumes/FC-Infra/VSIShare

19.5G /vmfs/volumes/FC-Infra/VSI Launcher-10

21.4G /vmfs/volumes/FC-Infra/UM-6.0

20.6G /vmfs/volumes/FC-Infra/VSI Launcher

15.3G /vmfs/volumes/FC-Infra/VSI Launcher-Template

24.4G /vmfs/volumes/FC-Infra/VSI Launcher-2

23.5G /vmfs/volumes/FC-Infra/VSI Launcher-4

24.7G /vmfs/volumes/FC-Infra/VSI Launcher-3

23.5G /vmfs/volumes/FC-Infra/VSI Launcher-5

25.5G /vmfs/volumes/FC-Infra/VSI Launcher-6

25.5G /vmfs/volumes/FC-Infra/VSI Launcher-8

34.4G /vmfs/volumes/FC-Infra/UI VM

181.5G /vmfs/volumes/FC-Infra/Analytics VM

1.5G /vmfs/volumes/FC-Infra/vmkdump

881.3G /vmfs/volumes/FC-Infra/

To make some free space on the Storage I have Storage vMotioned the following VMs to another datastore.

29.4G /vmfs/volumes/FC-Infra/AD

78.3G /vmfs/volumes/FC-Infra/DB

After the above VMs were migrated to other datastores the following datastore usage was reported:

From the Filer, notice that the LUN Used Size remains the same.

veo-f3270::> lun show -v /vol/infra_services/infra

Vserver Name: Infra_Vserver

LUN Path: /vol/infra_services/infra

Volume Name: infra_services

Qtree Name: ""

LUN Name: infra

LUN Size: 1TB

OS Type: vmware

Space Reservation: disabled

Serial Number: 7T-iK+3/2TGu

Comment:

Space Reservations Honored: false

Space Allocation: disabled

State: online

LUN UUID: ceaf5e6e-5a6a-11dc-8751-123478563412

Mapped: mapped

Block Size: 512B

Device Legacy ID: -

Device Binary ID: -

Device Text ID: -

Read Only: false

Used Size: 848.9GB

Creation Time: 9/3/2007 18:12:49

NetApp VSC does not report any changes in LUN Usage either.

ESXi Shell reports the updated free space.

~ # df -h

Filesystem Size Used Available Use% Mounted on

VMFS-5 1.0T 773.8G 250.7G 76% /vmfs/volumes/FC-Infra

I have now performed the reclaim operation from the ESXi Shell using the below command

# esxcli storage vmfs unmap -l FC-Infra

NOTE: You can also specify the number of blocks that you want to reclaim using –n option. If you specify 500 then 500 x 1MB (i.e. default block size in VMFS 5) blocks would be reclaimed.

VSC now reports free space in the LUN Usage.

The filer also reports the updated Storage Capacity.

veo-f3270::> lun show -v /vol/infra_services/infra

Vserver Name: Infra_Vserver

LUN Path: /vol/infra_services/infra

Volume Name: infra_services

Qtree Name: ""

LUN Name: infra

LUN Size: 1TB

OS Type: vmware

Space Reservation: disabled

Serial Number: 7T-iK+3/2TGu

Comment:

Space Reservations Honored: false

Space Allocation: disabled

State: online

LUN UUID: ceaf5e6e-5a6a-11dc-8751-123478563412

Mapped: mapped

Block Size: 512B

Device Legacy ID: -

Device Binary ID: -

Device Text ID: -

Read Only: false

Used Size: 742.6GB

Creation Time: 9/3/2007 18:12:49

Data Center on Cloud

Pages

Tuesday, December 10, 2013

NFS VAAI Statistics for NetApp Storage

Wednesday, November 27, 2013

esxtop Statistics for Block VAAI

Block Zero & Hardware Assisted Locking (ATS)

UNMAP

Full Copy

Wednesday, November 20, 2013

Using VAAI UNMAP on vSphere 5.5 & NetApp Storage

Questions or Comments ??

AdSense