Raj2796's Blog

October 15, 2009

VMware ESX 3.5 MRU policy and path persistence

Filed under: san,vmware — raj2796 @ 3:31 pm

VMware ESX 3.5 – MRU policy and path persistence from sakacc on youtube – lots of other interesting videos from the same person on youtube if you’re into vmware.

For other hp Eva 4400 users, or EVA/6xxx/4xxx, the HP recommended settings for ESX 4 are Round Robin or MRU, with the IOPS counter set to allow true load balancing by changing it from 1000 to 1. Round robin however gives best performance, though please note that versions below esx 4 were not truely alua aware.

Advertisements

Vmware Netware Tivoli Slow backup performance tuning parameters for NSS and TSAFS.NLM on an EVA 4400

Filed under: edir,eva,Netware,Software,Tivoli — raj2796 @ 11:57 am

VMware NetWare Tivoli Slow backup performance tuning parameters for NSS and TSAFS.NLM on an EVA 4400

Before i cover what works for me i have posted below the official tid on this issue since different people will have differing environments/versions/setups to myself and will find this useful:

There are many issues that can affect backup/restore performance. There is tuning that can be done on the server and NSS volumes. These are only ballpark figures. The server must be benchmarked to find the optimum settings.

These two parms must be set in c:\nwserver\nssstart.cfg. Make sure there are no typos or NSS won’t load. Nssstart.cfg is not created by default.
/AuthCacheSize=20000
/NumWorkToDos=100
These parms can be set in AUTOEXEC.NCF. Note: If these are placed in this file they must start with NSS. For example – nss /ClosedFileCacheSize=2000. They can also be placed in the C:\NWSERVER\NSSSTART.CFG and there they would be used without the NSS in the beginning.

/ClosedFileCacheSize=200000
/MinBufferCacheSize=20000
/MinOsBufferCacheSize=20000
/CacheBalanceMaxBuffersPerSession=20000
/CacheUserMaxPercent=70
/AllocAheadBlks=63
/NameCacheSize=200000
/NoCopyBuffersOnXlatch
/ReadAheadBlks=:64 — on NetWare 6.5 boxes. A line must be added for each volume. This sets a count for the number of 4k blocks to read with each request. In this case, 256k at a time.
These settings are ballpark figures. They may need to be adjusted depending on how much ram the server has.
Setting these too high can cause excessive memory usage and can affect other apps as well as performance. The “closed file cache size and the name cache size, if set too high, can cause NSS.NLM to take excessive amounts of memory. These can help performance but experience shows that there are usually several problems that add up to one big problem. Setting these two parms too high can actually degrade performance. If the server has about 2 gig or less, then the default of 100000 should be used.

1.
Make sure you have the latest updates for the tape software.
2.
Faster hardware can make a big difference.
3.
The type of data can make a huge difference. Lots of small files will slow down performance, especially if they’re all in one directory. The backup will spend more time opening,scanning and closing files rather than reading data. If there are more large files mixed in with the smaller ones, then performance can increase because more time is spent reading data rather than opening files, which is what increases throughput.
4.
Background processes like compression, virus scans and large data copies will slow performance down.
5.
Virus scanners also can be an issue. They usually hook into the OS file system to intercept file opens so they can scan the files prior to backup. The virus scanner can be configured to run at some other time than the backup. This can also compound the problem if the files being scanned are compressed. The virus scanner can decompress them before scanning for viruses, which will slow things down even more. A good way to see if this is happening is to enable the NSS /COMPSCREEN at the server console during the backup to see if files are being decompressed.
6.
Lots of open files will slow down performance. These are usually seen with the error FFFDFFF5. This means the file is open by some other application. If the tape software can be configured to skip open files until the end of the job rather than retrying to open them immediately, then performance can be increased as some tape software solutions, by default, will retry to open the locked file multiple times before moving on.
7.
Backing up over the wire is slower than backups local to the server especially if most of the files are small files, 64k or less. If there is any LAN latency performance can take a significant hit. The wire is much slower at transferring data than reading the data directly from the disk. One thing that may help is to

set tcp nagle algorithm=off
set tcp delayed acknowledgement=off
set tcp sack option=off

on both host and target servers.

tsatest can be used to determine if the lan is a bottleneck. There is more information about tsatest below.

8.

– Make sure you have the latest disk drivers and firmware updates for your HBAs. There have been issues where performance was increase greatly because of later firmware/drivers.
– Use the tsatest.nlm utilitiy on different lan segments to see if there is a problem. This tool now ships with tsa5up19.exe.exe. Tsatest can be used to test the throughput on the wire and on the server itself to see if the lan could be a bottleneck. Tsatest is also useful because it does not require a tape drive, so the tape drive can be eliminated as a possible problem as well.
-Make sure you have the latest tsa files.

-Raid5 systems with a small stripe size can also be a problem. Check the configuration of the disk storage or san. If using a raid system, a larger stripe size can help performance.

-Creating one large LUN on the raid rather than several smaller ones can result in significant performance loss. It’s faster to have multiple luns with the volumes/data spread out over them.

-Make sure you have the latest bios/firmware updates for your server.

-There have been issues where full backups are fast and incremental/differential backups are slow. This can happen because of the tape software doing its own filtering on inc/diff backups rather than letting the tsafs.nlm do it. There is a parm in tsafs.nlm that can help this:

LOAD TSAFS /NOCACHINGMODE

This will disable the read ahead cache for tsafs.nlm so that files are not cached unnecessarily during inc/diff backups. You can re-enable this cache when doing full backups:

LOAD TSAFS /CACHINGMODE

This is a load time parameter so you could create a script that would load/unload tsafs accordingly.

Tsafs can also be tuned as well. Once tsafs is loaded, typing tsafs again at the server console will show what most of the parameters are set for. If most of the data consists of small files, then make a best estimate as to what the mean file size is. That will help in determining what the best size of the read buffers should be. Tsafs could then be tuned to favor smaller files with the:

tsafs /ReadBufferSize=16384

That would set the read buffers for tsafs to 16k. If the mean file size is 16k or less, that would enable the tsafs to read the files with less read requests. Setting the nss cache balance to a lower percent would give tsafs more memory for caching files. If the mean file size is 64k or thereabouts, set the tsafs /readbuffersize=65536. The read buffers in the tape software could also be set to similar values.

tsafs /cachememorythreshold=5

may help as well. There have been problems with memory setting this value too high. 10 would be a good place to start. The recommended setting is 1 for servers that have memory fragmentation problems. If the server has more memory, then even a setting of 1 would give tsafs more memory to cache file data.

– On servers that have 4 or 2 processors, the tsafs /readthreadsperjob=x can be set to 2 or 4. On machines with only one processor, set the /readthreadsperjob=1. Setting the /readthreadsperjob too high will result in performance loss.

-Tsatest is a good tool for finding out where potential bottlenecks are. This is an nlm that can be loaded on the target server for a local backup, or from another NetWare server over the wire. It’s a backup simulator that requires no special hardware, tape drives, databases, etc. By loading tsatest on the target server, the wire and tape software can be eliminated as potential bottlenecks. Throughput can be gauged and then a backup can be done over the wire to see if the lan could be slowing things down. For a complete listing of tsatest load line parameters, type tsatest /?. Usually it’s loaded like this:

load tsatest /s= /u= /p= /v=

individual paths can be specified as well. By default, tsatest will do full backups. An incremental backup can be specified by adding the /c=2 parameter to the load line. The sys:\etc\tsatest.log file can be created with the /log parameter. This file can be sent to Novell for analysis.
Backup/restore performance can be reduced when backing up over the lan. Sometimes up to 1 half of the performance can be lost due to lan latency alone. Tsatest is a good way to determine if that’s happening. Tests can be run on the target server itself and then the target server can be backed up over the wire from another NetWare server. The results can be compared.
For a good document on tsatest read:

http://developer.novell.com/ndk/doc/samplecode/smscomp_sample/tsatest/tsatest.html

Our renewed tivoli on netware problems arose when we started to migrate our users to the 9 new virtual netware 6.5 sp8 servers i built on a couple of eva 4400’s at our two sites. The virtual netware 6.5 sp8 servers are running on HP DL380g5’s with 32 gigs of ram. Each virtual server has 4 gigs of ram dedicated to it.

Utilising my previouse experience with tivoli and the problems it causes i changed the tsafs parameters. To do this you first need to unload tivoli on the netware servers via the command line :

type > unload dsmcad

enter confirmation on the tivoli screens

now you need to unload tsafs which is originally loaded via smsstart.ncf

type > smsstop.ncf

now that both tivoli and tsafs and related services are stopped navigate to the file

\\SYS\SYSTEM\smsstart.ncf

change the file from:

LOAD SMSUT.NLM
LOAD SMDR.NLM
LOAD TSAFS.NLM

to:

LOAD SMSUT.NLM
LOAD SMDR.NLM
LOAD TSAFS.NLM /NoCluster /NoCachingMode /noConvertUnmappableChars /CacheMemoryThreshold=10

now to restart backup services

type > smsstart

next restart tivoli, change the commands if you’re not using a newer version of tivoli and also remove the second line if you dont use the web interface:

type > dsmcad -optfile=dms.opt
type > dsmcad -optfile=dsm_gui.opt

CacheMemoryThreshold is set to the default of 10 on the servers, however they barely use any memory as u can see in the memory usage charts for the server below, i might try increasing to 25 to see if it speeds up backups. There’s under a million files on each server at the moment however they are only running at 40% load since we haven’t finished moving all the users onto them yet.

The changes i’ve listed above were made at the end of work yesterday, i changed the tsafs load parameters on the server shown below and it seems to have done the trick, backup times reduced by 11 hours! Copies of the backup schedule reports are below the memory diagram for those interested in speed increases and time reduction.

Server mem usage

Tuesday Night/Wednesday morning
10/14/2009 11:21:00 — SCHEDULEREC STATUS BEGIN
10/14/2009 11:21:00 Total number of objects inspected: 814,790
10/14/2009 11:21:00 Total number of objects backed up: 20,070
10/14/2009 11:21:00 Total number of bytes transferred: 596.97 MB
10/14/2009 11:21:00 Data transfer time: 1,043.65 sec
10/14/2009 11:21:00 Network data transfer rate: 585.73 KB/sec
10/14/2009 11:21:00 Aggregate data transfer rate: 12.99 KB/sec
10/14/2009 11:21:00 Objects compressed by: 0%
10/14/2009 11:21:00 Elapsed processing time: 13:04:12
10/14/2009 11:21:00 — SCHEDULEREC STATUS END

Wednesday Night/Thursday morning
10/15/2009 00:36:48 — SCHEDULEREC STATUS BEGIN
10/15/2009 00:36:48 Total number of objects inspected: 821,288
10/15/2009 00:36:48 Total number of objects backed up: 15,844
10/15/2009 00:36:48 Total number of bytes transferred: 562.12 MB
10/15/2009 00:36:48 Data transfer time: 510.50 sec
10/15/2009 00:36:48 Network data transfer rate: 1,127.53 KB/sec
10/15/2009 00:36:48 Aggregate data transfer rate: 72.26 KB/sec
10/15/2009 00:36:48 Objects compressed by: 0%
10/15/2009 00:36:48 Elapsed processing time: 02:12:45
10/15/2009 00:36:48 — SCHEDULEREC STATUS END

October 12, 2009

Rights Required for Novell Edir Subcontainer Administrators

Filed under: edir,Netware — raj2796 @ 2:39 pm

Novell
Rights Required for Novell Subcontainer Administrators to be assigned in Edir via console one or Imanager.

For security reasons, you might want to create one or more subcontainer administrators with sufficient rights to install or upgrade additional OES NetWare servers, without granting them full rights to the entire tree. A subcontainer administrator needs the following rights to install or upgrade a NetWare server in the tree:

• Supervisor right to the container where the server will be installed
• Read right to the Security container object for the eDirectory tree
• Read right to the NDSPKI:Private Key attribute on the Organizational CA object, which is located in the Security container
• Supervisor right to the W0 object located inside the KAP object in the Security container

These rights are typically granted by placing all administrative users in a Group or Role, and then assigning the above rights to the Group or Role.

Some of the products that can be selected to install along with OES NetWare require schema extensions of their own. Currently, only an administrator with rights at [Root] can extend the schema of an eDirectory tree; a subcontainer administrator does not have sufficient rights. One way to work around this is to have a root administrator install an OES NetWare server with all products selected. This would takes care of extending the schema for every possible server configuration. Subcontainer administrators can then install or upgrade subsequent OES NetWare servers without worrying about schema extensions.

An easier method for extending the schema for OES products and services is to run the Schema Update task in Deployment Manager. This task extends the schema for the OES products you select for both the NetWare and Linux platforms.

By default, the first three servers installed in an eDirectory partition automatically receive a replica of the partition. To install a server into a partition that does not already contain three replica servers, the user must have either Supervisor rights at the [Root] of the tree or administrative rights to the container in which the server holding the partition resides.

Whilst this worked fine in our organisation for months, for some reason, despite no schema changes or user trustee changes, it has suddenly stopped working, i’ll post more if we find out anything. Anyone else notice the fall in novell’s share price over the last few years?

Blog at WordPress.com.