Raj2796's Blog

September 30, 2011

Novell Application Launcher (NAL) – diagnosing freezing, slow starting and other problems.

Filed under: edir,Netware — raj2796 @ 2:58 pm

This post is a collated note to myself of useful links and information from goggling these issues when I encountered them in the past.  We encountered an issue in the recent past where nal failed to load at random.

Whilst it turned out to be edir corruption of our user container its reoccurred, my moneys on corruption again!

You can however, investigate in detail by enabling nal debugging and tracing what happens. With debugging enabled to the level you need logs will be generated which should hopefully give you a clue what’s wrong. You have two options

  •  manually hack in the relevant reg settings
Enabling Debug Logging for the Novell Application Launcher
Debug logging for the Novell® Application LauncherTM is enabled in the Windows registry or using the diagnostic tool, naldiag.exe. This section focuses on using the registry to enable debug logging.
NOTE:  The keys in this table are the same as those that are set by nialdiag.exe.
The table below specifies the information you need to edit registry of the workstation where the Application Launcher agent is running:
Registry Key and Hive Location Value Name Value Type Data Notes
HKLM\Software\NetWare\NAL\1.0\Debug Level DWORD 0 (Off) Location: program files\novell\zenworks\
zfdnal.csv (default location if none is specified)
1 (Informational messages only)
2 (Warning messages only)
3 (Informational and Warning messages only)
4 (Critical messages only)
5 (Informational and Critical messages only)
6 (Warning and Critical messages only)
F (All messages)
HKLM\Software\NetWare\NAL\1.0\Debug LogFileLocation STRING location_and_filename_of_output_file Location: program files\novell\zenworks\
zfdnal.csv (default location if none is specified)
HKLM\Software\NetWare\NAL\1.0\Debug Browser DWORD 0 (Off) Destination: Browser log file as specified
1 (On) Location: program files\novell\zenworks\
zfdnal.csv (default location if none is specified)
HKLM\Software\NetWare\NAL\1.0\Debug Explorer DWORD 0 (Off) Destination: Application Explorer log file as specified
1 (On) Location: program files\novell\zenworks\
zfdnal.csv (default location if none is specified)
HKLM\Software\NetWare\NAL\1.0\Debug IPC DWORD 0 (Off) Destination: IPC log file as specified
1 (On) Location: program files\novell\zenworks\
zfdnal.csv (default location if none is specified)
HKLM\Software\NetWare\NAL\1.0\Debug Library DWORD 0 (Off) Destination: Library log file specified

 

1 (On) Location: program files\novell\zenworks\
zfdnal.csv (default location if none is specified)
HKLM\Software\NetWare\NAL\1.0\Debug MUP DWORD 0 (Off) Destination: MUP log file as specified
1 (On) Location: program files\novell\zenworks\
zfdnal.csv (default location if none is specified)
HKLM\Software\NetWare\NAL\1.0\Debug Reporting DWORD 0 (Off) Destination: Reporting log file as specified
1 (On) Location: program files\novell\zenworks\
zfdnal.csv (default location if none is specified)
HKLM\Software\NetWare\NAL\1.0\Debug Service DWORD 0 (Off) Destination: NT Services log file as specified
1 (On) Location: program files\novell\zenworks\
zfdnal.csv (default location if none is specified)
HKLM\Software\NetWare\NAL\1.0\Debug Start DWORD 0 (Off) Destination: Startup log file as specified
1 (On) Location: program files\novell\zenworks\
zfdnal.csv (default location if none is specified)
HKLM\Software\NetWare\NAL\1.0\Debug Workstation DWORD 0 (Off) Destination: Workstation log file as specified
1 (On) Location: program files\novell\zenworks\
zfdnal.csv (default location if none is specified)
HKLM\Software\NetWare\NAL\1.0\Debug MSI DWORD 0 (Off) Location: program files\novell\zenworks\
zappmsi.log
1 (On) Alternatively, zappmsi.log is in the path listed in the LogFileLocation string value, if specified
  • OR use the naldiag.exe tool,  it will create the registry keys for logging for you, You should fine it located under

c:\Program Files\Novell\ZENworks\NalDiag.exe

USEFUL LINKS

a.k.a where I got the information from

The importance of launcher configuration

Enable debug logging in ZDM

Slow Nal start-up or refresh

Advertisements

November 9, 2009

Reducing cached memory usage, Linux high memory usage diagnosing and troubleshooting on Vmware and out of memory (Oom) killer problem and solution

Filed under: Linux,OS — raj2796 @ 3:27 pm

SLES 11 high memory usage diagnosing and troubleshooting on Vmware suddenly became an issue for us when VMware alarms started triggering for our new WordPress blog journalism server running on Sles 11 under Vmware 3.5 U3.

The server surprisingly only had a dozen users, surprising since a commercial WordPress provider i talked to had up to 50,000 hits per day and dozens of users on a box with 2 gigs of memory and no problems. Chances are its a memory leak, probably from a WordPress plug-in that’s causing all the problems, however being a linux server there’s other ways to manage the memory. The following is written for my staff to help bring them upto speed on memory and its troubleshooting.

MEMORY TROUBLESHOOTING

Start by checking your server has enough memory, if processes are dying unexpectedly have a look at your /var/log/messages file and see if you are running out of memory or if processes are being killed of due to lack of memory.

I normally use the free command first to see how memory is being used, i like to use the –m flag to have the output formatted in megs to simply reading the information, e.g.:

[Server] <<-PRODUCTION->> :~ # free -m
total       used       free     shared    buffers     cached
Mem:          3777       3516       260          0          228          2921
-/+ buffers/cache:     366        3410
Swap:          2055        0            2055

I could go over the output in depth however there’s a really easy way to understand what’s happening,  just look at the line:

Used     Free
-/+ buffers/cache:        366       3410

The first value is how much memory is being used and the second value is how much memory can be freed for use by applications. As long as you have memory that can be used by applications you’re generally fine. Another aspect to note is the output is the swap file:

Total      Used    Free
Swap:         2055          0       2055

Swapping generally only occurs when memory usage is impacting performance, unless you manually change its aggressiveness, more on that later.

Swap

If your server is heavily using swap things are bad, you’re running out of memory. The exception to this is where you have a distro with cache problems and may well decide to max swapiness to reduce the problems cache created. To find the space dedicated to swap type:

more /proc/swaps

more swaps

To find your current level of swapiness type:

cat /proc/sys/vm/swappiness

cat swap

The default value is 60. However different systems require different levels of swapiness, a server is not the same as home computer.  The value ranges between 0 and 100. At 100 the server will  try and swap inactive pages, at 0 applications that want ram will shrink the ram to a tiny fraction of cache, i.e. 0 less likely to swap, 100 very likely. You can change the value by echoing a new one to the /proc/sys/vm/swapiness file, e.g.

echo 10 > /proc/sys/vm/swapiness

To change the default level on boot edit the /etc/sysctl.conf file (since kernel 2.6) e.g.

vm.swappiness = 10

MEMORY PROCESS ALLOCATION

Along with other aspects of the server, Virtual memory statistics can be reported with vmstats, its main use for memory diagnosis is that it reports page-ins and page-outs as they happen. The best way to see this is by delaying the output of vmwstat and it comes with options to do this, otherwise it just reports averages since the last boot. State the delay in seconds after the command followed by the number of updates you wish to use, e.g. vmstat 2 4 runs vmstat with a 2 second delay with 4 updates and so on e.g.

vmstat

read the man for detailed info if need be, otherwise just look at:

free       –              free memory
si             –              page ins
so           –              page outs

Page ins are expected e.g. when starting an application and its information is paged in

Regular page outs are not wanted, occasional page outs are expected as the kernel frees up memory. If page outs occure so often the server is spending more time managing paging than running apps performance suffers, this is refered to as thrashing. At this point you could use top and ps to identify the processes that are causing problems.

To see where all your memory is going the easiest way is to use the top command, then press m to sort by memory, press q or crtl+c to exit the top screen.

top sorted by mem

For more detailed information you can always use ps aux and see which process are using memory and how much. Apache and mysql are normally top users, along with psad for busy web servers.

To sort the output of ps by memory you are supposed to be able to use:

ps aux –sort pmem

however i find this does not work on all flavours on linux so i prefer to use the sort command to sort by memory usage order :

ps aux | sort –n +3

Then  if i just want to look at the top 10 memory hogs or the top memory hog i do a further pipe and use the tail command, e.g. to find the 10 highest memory consuming process:

ps aux | sort –n +3 | tail -10

ps aux

If you want to monitor a processes memory usage then look at the pid for the process and setup a cron job to pipe the output of the command ps ev –pid=<PID> to a file you can check later. If you want to check memory usage change straight away keep entering the command:

ps ev –pid=<PID>

Once you know the process that is responsible for the memory problems you can optimise it, or kill it. Here’s a few common tricks for processes that can use a lot of memory

Java

Java memory heaps need a limit to their sizes set by passing a –Xmx option else the heap increases until you’re out of memory. Custom Java apps should be able to use the java command line –XmxNNm. NN = number of megs. With JBoss and Tomcat check the settings in your relevant JBoss (48m to 160m recommended) or Tomcat files (48m to 96m recommended).

A rough way to work out the largest size you can set is to stop the java process’s then look at the free –m output for buffers as shown earlier and subtract the used from the free to allow for unexpected memory usage, the resultant number is the max memory you could set.

However keep in mind these are just guidelines, It’s up to you to decide how high to set the memory limit for the heap since only you really know how much memory you have on the server and how memory the java process needs.

Apache

Apache when it loads starts multiple servers and distributes the traffic amongst these ‘servers’, the memory usage can grow large  as each loads libraries for php and perl. You can adjust the number spawned with the settings:

StartServers
MinSpareServers
MaxSpareServers

These are in the httpd file. However depending on the distro you might need to adjust the prefork values, google for your os. The maxclients value can be worked out by finding out the memoty usage of the largest apache client, stopping apache, looking at free memory and dividing by the free memory by the memory usage size of the largest apache thread. Apache has default configuration for small, medium and large servers. For many of you out there hosting your own low traffic site you’ll get better performance used the settings optimised for small servers.

SQL

However in some cases the problem is down to the cache.

Reducing cached memory

Linux memory management tries to minimise disk access. To do this it will use any unused ram to cache, this is because reading from disk is slow compared to reading from memory. When the cache is used up the data that has been there the longest is freed, theoretically data that is used often will not be removed whilst data that is no longer needed slowly gets moved out of the cache. When an application needs memory the kernel should reduce the size of the cache and free up memory. This is why people sometimes get confused when using the free command, since linux uses memory for cache it can appear to the untrained eye that most of the memory has been used up. This is in fact normal; it’s when the server can no longer free memory from the cache that problems occur.

Freeing cache memory therefore does not usually make your computer faster, but the converse, linux becomes slower having to re read information to the cache. Ironic then that some of the latest distro’s of linux, namely SUSE and Mandriva seem to have forgotten this, there are numerous reports of these, and other linux distro’s, deciding cached memory is too important to free up for actual processes. Luckily a solution was added in kernel 2.6.16 allowing us to free cached memory by writing to /proc/sys/vm/drop_caches. There are three options depending on what you need to do, clean the cache, free dentries and inodes, and free cache, dentries and inodes, we run sync first to ensure all cached objects are freed as this is a non-destructive operation and dirty objects are not freed:

To free cache enter:

sync; echo 1 > /proc/sys/vm/drop_caches

dentries and inodes :

sync; echo 2 > /proc/sys/vm/drop_caches

pagecache, dentries and inodes:

sync; echo 3 > /proc/sys/vm/drop_caches

You can automate these in a cron job e.g. hourly if you have the misfortune to use a distro with problems.

Another issue with cache is that if you copy a large amount of data, e.g. a file tree, the copied data will end up in the cache flushing out your existing cache. There is  an interesting article on improving linux performance by selectively preserving cache state at:

http://insights.oetiker.ch/linux/fadvise/

OOM – 32 bit system memory problems (64 bit safe)

If you are running 32 bit linux and have enough memory then you might be a victim of the out of memory (oom) killer. However in 64 bit linux all memory is low memory so you are safe from Oom, and out of memory errors are really down to out of memory problems!

SOLUTION:

Oom problems can be easily solved by:

running the hugemem kernel

editing /etc/sysctl.conf with the below line to make the kernel more aggressive about recovering low memory:

vm.lower_zone_protection = 250

or finally editing /etc/sysctl.conf to disable oom on boot with the line:

vm.oom-kill = 0

CAUSE:

Oom kills processes on servers even when there is a large amount of memory free.  Oom problems are caused by low memory exhaustion.  Systems that are victim to Oom suffer more as memory is increased since they have kernels where memory allocation is tracked using low memory, so the more memory you have the more low memory is used up and the more you have problems. When low memory starts running out Oom starts killing processes to keep memory free!

DIAGNOSIS

To check low and high memory usage, use the command lines below, though the info is from a 64 bit system since I’m sensible J

[Server] <<-PRODUCTION->> :~ # egrep ‘High|Low’ /proc/meminfo
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      3868296 kB
LowFree:        271872 kB

[Server] <<-PRODUCTION->> :~ # free -lm
total       used       free     shared    buffers     cached
Mem:          3777       3512        265          0        228       2919
Low:          3777       3512        265
High:            0          0          0
-/+ buffers/cache:        364       3413
Swap:         2055          0       2055

DETAILED MEMORY INFORMATION

To obtain detailed memory information type cat /proc/meminfo e.g.:

cat meminfo

I was going to type something up when i found a nice explanation on red hats site which i’ve quoted and amended where relevant below:

http://www.redhat.com/advice/tips/meminfo.html

The information comes in the form of both high-level and low-level statistics. First we will discuss the high-level statistics.

High-Level Statistics

high level

MemTotal: Total usable ram (i.e. physical ram minus a few reserved bits and the kernel binary code)
MemFree: Is sum of LowFree+HighFree (overall stat)
Buffers: Memory in buffer cache. mostly useless as metric nowadays
Cached: Memory in the pagecache (diskcache) minus SwapCache
SwapCache: Memory that once was swapped out, is swapped back in but still also is in the swapfile (if memory is needed it doesn’t need to be swapped out AGAIN because it is already in the swapfile. This saves I/O)

Detailed Level Statistics

VM Statistics

vm stats

VM splits the cache pages into “active” and “inactive” memory. The idea is that if you need memory and some cache needs to be sacrificed for that, you take it from inactive since that’s expected to be not used. The vm checks what is used on a regular basis and moves stuff around.

When you use memory, the CPU sets a bit in the pagetable and the VM checks that bit occasionally, and based on that, it can move pages back to active. And within active there’s an order of “longest ago not used” (roughly, it’s a little more complex in reality). The longest-ago used ones can get moved to inactive. Inactive is split into two in the above kernel (2.4.18-24.8.0). Some have it three.

Active: Memory that has been used more recently and usually not reclaimed unless absolutely necessary.
Inactive — The total amount of buffer or page cache memory, in kilobytes, that are free and available. This is memory that has not been recently used and can be reclaimed for other purposes.

Memory Statistics

mem stats

HighTotal: is the total amount of memory in the high region. Highmem is all memory above (approx) 860MB of physical RAM. Kernel uses indirect tricks to access the high memory region. Data cache can go in this memory region.
LowTotal: The total amount of non-highmem memory.
LowFree: The amount of free memory of the low memory region. This is the memory the kernel can address directly. All kernel datastructures need to go into low memory.
SwapTotal: Total amount of physical swap memory.
SwapFree: Total amount of swap memory free.

Dirty — The total amount of memory, in kilobytes, waiting to be written back to the disk.
Writeback — The total amount of memory, in kilobytes, actively being written back to the disk.
Mapped — The total amount of memory, in kilobytes, which have been used to map devices, files, or libraries using the mmap command.

Slab — The total amount of memory, in kilobytes, used by the kernel to cache data structures for its own use.
ommitted_AS — The total amount of memory, in kilobytes, estimated to complete the workload. This value represents the worst case scenario value, and also includes swap memory.

pagetables etc

PageTables — The total amount of memory, in kilobytes, dedicated to the lowest page table level.
VMallocTotal — The total amount of memory, in kilobytes, of total allocated virtual address space.
VMallocUsed — The total amount of memory, in kilobytes, of used virtual address space.

VMallocChunk — The largest contiguous block of memory, in kilobytes, of available virtual address space.
HugePages_Total — The total number of hugepages for the system. The number is derived by dividing Hugepagesize by the megabytes set aside for hugepages specified in /proc/sys/vm/hugetlb_pool. This statistic only appears on the x86, Itanium, and AMD64 architectures.
HugePages_Free — The total number of hugepages available for the system. This statistic only appears on the x86, Itanium, and AMD64 architectures.

Hugepagesize — The size for each hugepages unit in kilobytes. By default, the value is 4096 KB on uniprocessor kernels for 32 bit architectures. For SMP, hugemem kernels, and AMD64, the default is 2048 KB. For Itanium architectures, the default is 262144 KB. This statistic only appears on the x86, Itanium, and AMD64 architectures.

October 15, 2009

Vmware Netware Tivoli Slow backup performance tuning parameters for NSS and TSAFS.NLM on an EVA 4400

Filed under: edir,eva,Netware,Software,Tivoli — raj2796 @ 11:57 am

VMware NetWare Tivoli Slow backup performance tuning parameters for NSS and TSAFS.NLM on an EVA 4400

Before i cover what works for me i have posted below the official tid on this issue since different people will have differing environments/versions/setups to myself and will find this useful:

There are many issues that can affect backup/restore performance. There is tuning that can be done on the server and NSS volumes. These are only ballpark figures. The server must be benchmarked to find the optimum settings.

These two parms must be set in c:\nwserver\nssstart.cfg. Make sure there are no typos or NSS won’t load. Nssstart.cfg is not created by default.
/AuthCacheSize=20000
/NumWorkToDos=100
These parms can be set in AUTOEXEC.NCF. Note: If these are placed in this file they must start with NSS. For example – nss /ClosedFileCacheSize=2000. They can also be placed in the C:\NWSERVER\NSSSTART.CFG and there they would be used without the NSS in the beginning.

/ClosedFileCacheSize=200000
/MinBufferCacheSize=20000
/MinOsBufferCacheSize=20000
/CacheBalanceMaxBuffersPerSession=20000
/CacheUserMaxPercent=70
/AllocAheadBlks=63
/NameCacheSize=200000
/NoCopyBuffersOnXlatch
/ReadAheadBlks=:64 — on NetWare 6.5 boxes. A line must be added for each volume. This sets a count for the number of 4k blocks to read with each request. In this case, 256k at a time.
These settings are ballpark figures. They may need to be adjusted depending on how much ram the server has.
Setting these too high can cause excessive memory usage and can affect other apps as well as performance. The “closed file cache size and the name cache size, if set too high, can cause NSS.NLM to take excessive amounts of memory. These can help performance but experience shows that there are usually several problems that add up to one big problem. Setting these two parms too high can actually degrade performance. If the server has about 2 gig or less, then the default of 100000 should be used.

1.
Make sure you have the latest updates for the tape software.
2.
Faster hardware can make a big difference.
3.
The type of data can make a huge difference. Lots of small files will slow down performance, especially if they’re all in one directory. The backup will spend more time opening,scanning and closing files rather than reading data. If there are more large files mixed in with the smaller ones, then performance can increase because more time is spent reading data rather than opening files, which is what increases throughput.
4.
Background processes like compression, virus scans and large data copies will slow performance down.
5.
Virus scanners also can be an issue. They usually hook into the OS file system to intercept file opens so they can scan the files prior to backup. The virus scanner can be configured to run at some other time than the backup. This can also compound the problem if the files being scanned are compressed. The virus scanner can decompress them before scanning for viruses, which will slow things down even more. A good way to see if this is happening is to enable the NSS /COMPSCREEN at the server console during the backup to see if files are being decompressed.
6.
Lots of open files will slow down performance. These are usually seen with the error FFFDFFF5. This means the file is open by some other application. If the tape software can be configured to skip open files until the end of the job rather than retrying to open them immediately, then performance can be increased as some tape software solutions, by default, will retry to open the locked file multiple times before moving on.
7.
Backing up over the wire is slower than backups local to the server especially if most of the files are small files, 64k or less. If there is any LAN latency performance can take a significant hit. The wire is much slower at transferring data than reading the data directly from the disk. One thing that may help is to

set tcp nagle algorithm=off
set tcp delayed acknowledgement=off
set tcp sack option=off

on both host and target servers.

tsatest can be used to determine if the lan is a bottleneck. There is more information about tsatest below.

8.

– Make sure you have the latest disk drivers and firmware updates for your HBAs. There have been issues where performance was increase greatly because of later firmware/drivers.
– Use the tsatest.nlm utilitiy on different lan segments to see if there is a problem. This tool now ships with tsa5up19.exe.exe. Tsatest can be used to test the throughput on the wire and on the server itself to see if the lan could be a bottleneck. Tsatest is also useful because it does not require a tape drive, so the tape drive can be eliminated as a possible problem as well.
-Make sure you have the latest tsa files.

-Raid5 systems with a small stripe size can also be a problem. Check the configuration of the disk storage or san. If using a raid system, a larger stripe size can help performance.

-Creating one large LUN on the raid rather than several smaller ones can result in significant performance loss. It’s faster to have multiple luns with the volumes/data spread out over them.

-Make sure you have the latest bios/firmware updates for your server.

-There have been issues where full backups are fast and incremental/differential backups are slow. This can happen because of the tape software doing its own filtering on inc/diff backups rather than letting the tsafs.nlm do it. There is a parm in tsafs.nlm that can help this:

LOAD TSAFS /NOCACHINGMODE

This will disable the read ahead cache for tsafs.nlm so that files are not cached unnecessarily during inc/diff backups. You can re-enable this cache when doing full backups:

LOAD TSAFS /CACHINGMODE

This is a load time parameter so you could create a script that would load/unload tsafs accordingly.

Tsafs can also be tuned as well. Once tsafs is loaded, typing tsafs again at the server console will show what most of the parameters are set for. If most of the data consists of small files, then make a best estimate as to what the mean file size is. That will help in determining what the best size of the read buffers should be. Tsafs could then be tuned to favor smaller files with the:

tsafs /ReadBufferSize=16384

That would set the read buffers for tsafs to 16k. If the mean file size is 16k or less, that would enable the tsafs to read the files with less read requests. Setting the nss cache balance to a lower percent would give tsafs more memory for caching files. If the mean file size is 64k or thereabouts, set the tsafs /readbuffersize=65536. The read buffers in the tape software could also be set to similar values.

tsafs /cachememorythreshold=5

may help as well. There have been problems with memory setting this value too high. 10 would be a good place to start. The recommended setting is 1 for servers that have memory fragmentation problems. If the server has more memory, then even a setting of 1 would give tsafs more memory to cache file data.

– On servers that have 4 or 2 processors, the tsafs /readthreadsperjob=x can be set to 2 or 4. On machines with only one processor, set the /readthreadsperjob=1. Setting the /readthreadsperjob too high will result in performance loss.

-Tsatest is a good tool for finding out where potential bottlenecks are. This is an nlm that can be loaded on the target server for a local backup, or from another NetWare server over the wire. It’s a backup simulator that requires no special hardware, tape drives, databases, etc. By loading tsatest on the target server, the wire and tape software can be eliminated as potential bottlenecks. Throughput can be gauged and then a backup can be done over the wire to see if the lan could be slowing things down. For a complete listing of tsatest load line parameters, type tsatest /?. Usually it’s loaded like this:

load tsatest /s= /u= /p= /v=

individual paths can be specified as well. By default, tsatest will do full backups. An incremental backup can be specified by adding the /c=2 parameter to the load line. The sys:\etc\tsatest.log file can be created with the /log parameter. This file can be sent to Novell for analysis.
Backup/restore performance can be reduced when backing up over the lan. Sometimes up to 1 half of the performance can be lost due to lan latency alone. Tsatest is a good way to determine if that’s happening. Tests can be run on the target server itself and then the target server can be backed up over the wire from another NetWare server. The results can be compared.
For a good document on tsatest read:

http://developer.novell.com/ndk/doc/samplecode/smscomp_sample/tsatest/tsatest.html

Our renewed tivoli on netware problems arose when we started to migrate our users to the 9 new virtual netware 6.5 sp8 servers i built on a couple of eva 4400’s at our two sites. The virtual netware 6.5 sp8 servers are running on HP DL380g5’s with 32 gigs of ram. Each virtual server has 4 gigs of ram dedicated to it.

Utilising my previouse experience with tivoli and the problems it causes i changed the tsafs parameters. To do this you first need to unload tivoli on the netware servers via the command line :

type > unload dsmcad

enter confirmation on the tivoli screens

now you need to unload tsafs which is originally loaded via smsstart.ncf

type > smsstop.ncf

now that both tivoli and tsafs and related services are stopped navigate to the file

\\SYS\SYSTEM\smsstart.ncf

change the file from:

LOAD SMSUT.NLM
LOAD SMDR.NLM
LOAD TSAFS.NLM

to:

LOAD SMSUT.NLM
LOAD SMDR.NLM
LOAD TSAFS.NLM /NoCluster /NoCachingMode /noConvertUnmappableChars /CacheMemoryThreshold=10

now to restart backup services

type > smsstart

next restart tivoli, change the commands if you’re not using a newer version of tivoli and also remove the second line if you dont use the web interface:

type > dsmcad -optfile=dms.opt
type > dsmcad -optfile=dsm_gui.opt

CacheMemoryThreshold is set to the default of 10 on the servers, however they barely use any memory as u can see in the memory usage charts for the server below, i might try increasing to 25 to see if it speeds up backups. There’s under a million files on each server at the moment however they are only running at 40% load since we haven’t finished moving all the users onto them yet.

The changes i’ve listed above were made at the end of work yesterday, i changed the tsafs load parameters on the server shown below and it seems to have done the trick, backup times reduced by 11 hours! Copies of the backup schedule reports are below the memory diagram for those interested in speed increases and time reduction.

Server mem usage

Tuesday Night/Wednesday morning
10/14/2009 11:21:00 — SCHEDULEREC STATUS BEGIN
10/14/2009 11:21:00 Total number of objects inspected: 814,790
10/14/2009 11:21:00 Total number of objects backed up: 20,070
10/14/2009 11:21:00 Total number of bytes transferred: 596.97 MB
10/14/2009 11:21:00 Data transfer time: 1,043.65 sec
10/14/2009 11:21:00 Network data transfer rate: 585.73 KB/sec
10/14/2009 11:21:00 Aggregate data transfer rate: 12.99 KB/sec
10/14/2009 11:21:00 Objects compressed by: 0%
10/14/2009 11:21:00 Elapsed processing time: 13:04:12
10/14/2009 11:21:00 — SCHEDULEREC STATUS END

Wednesday Night/Thursday morning
10/15/2009 00:36:48 — SCHEDULEREC STATUS BEGIN
10/15/2009 00:36:48 Total number of objects inspected: 821,288
10/15/2009 00:36:48 Total number of objects backed up: 15,844
10/15/2009 00:36:48 Total number of bytes transferred: 562.12 MB
10/15/2009 00:36:48 Data transfer time: 510.50 sec
10/15/2009 00:36:48 Network data transfer rate: 1,127.53 KB/sec
10/15/2009 00:36:48 Aggregate data transfer rate: 72.26 KB/sec
10/15/2009 00:36:48 Objects compressed by: 0%
10/15/2009 00:36:48 Elapsed processing time: 02:12:45
10/15/2009 00:36:48 — SCHEDULEREC STATUS END

October 12, 2009

Rights Required for Novell Edir Subcontainer Administrators

Filed under: edir,Netware — raj2796 @ 2:39 pm

Novell
Rights Required for Novell Subcontainer Administrators to be assigned in Edir via console one or Imanager.

For security reasons, you might want to create one or more subcontainer administrators with sufficient rights to install or upgrade additional OES NetWare servers, without granting them full rights to the entire tree. A subcontainer administrator needs the following rights to install or upgrade a NetWare server in the tree:

• Supervisor right to the container where the server will be installed
• Read right to the Security container object for the eDirectory tree
• Read right to the NDSPKI:Private Key attribute on the Organizational CA object, which is located in the Security container
• Supervisor right to the W0 object located inside the KAP object in the Security container

These rights are typically granted by placing all administrative users in a Group or Role, and then assigning the above rights to the Group or Role.

Some of the products that can be selected to install along with OES NetWare require schema extensions of their own. Currently, only an administrator with rights at [Root] can extend the schema of an eDirectory tree; a subcontainer administrator does not have sufficient rights. One way to work around this is to have a root administrator install an OES NetWare server with all products selected. This would takes care of extending the schema for every possible server configuration. Subcontainer administrators can then install or upgrade subsequent OES NetWare servers without worrying about schema extensions.

An easier method for extending the schema for OES products and services is to run the Schema Update task in Deployment Manager. This task extends the schema for the OES products you select for both the NetWare and Linux platforms.

By default, the first three servers installed in an eDirectory partition automatically receive a replica of the partition. To install a server into a partition that does not already contain three replica servers, the user must have either Supervisor rights at the [Root] of the tree or administrative rights to the container in which the server holding the partition resides.

Whilst this worked fine in our organisation for months, for some reason, despite no schema changes or user trustee changes, it has suddenly stopped working, i’ll post more if we find out anything. Anyone else notice the fall in novell’s share price over the last few years?

September 16, 2009

Netware user container moves using JRB utils and relevant changes for user subcontainers

Filed under: edir,Netware — raj2796 @ 10:03 am

Years ago we inherited a few thousand users, a few aging server rooms and a couple of schools that were located on another of the universities campuses. Although i took over the 3com/Cisco network and upgraded to the latest equipment (at that time 2950’s) and our advanced configs, the server team never took over the home drive servers at the site which today are still managed by another department. To cut a long story short the other department wants us of their servers so I’ve built a half dozen new virtualised 6.5 sp8 servers and we’re migrating the data over. Whilst i was doing work on the user’s i decided to split our users up into smaller Subcontainers, diving them by the last digit of their usernames. Couple of things to watch out for in the Subcontainers:

1 – Login scripts – the Subcontainers need login scripts – go to properties then login scripts and add an include for the parent container – this way each Subcontainers can have its default login script inherited from the parent container meaning only one script to update, thus avoiding mistakes maintaining multiple copies of the same code. If you need Subcontainers specific login script changes add them after the include statement

2 – Inheritance levels for applications- check you are inheriting all relevant applications at the new container depth. Open console one and select tools – Zenworks Utilities – Application Launcher Tools – Show Inherited Applications
If you are missing applications available at the parent container then select the Subcontainer and view properties – zenworks – launcher configuration. Now change the mode to view objects effective settings and note down the set application inheritance level (user) value. Change the mode to View/Edit object’s custom configuration and enter the new value for the set application inheritance level (user). The value will be previously value plus one per sub container.

3 – Moving users – easy to script – just use the getrest command and have the output be used by move_obj with delays between moves. e.g.

display just site2 staff that are not logged in and ending then logs to a file on c drive
getrest .*.faculty.staff.site2.org na eq “none” /j /u /yc /l=c:\site2staff.log <- use the file for move_obj

I moved a few thousand users into relevant Subcontainers over night without errors 🙂

To move actual data just use jrb utils !

JRB - saviour of the netware sysadmins

September 11, 2009

Netware 6.5 sp8 and Vmware esx 3.5u4 compatibility problem

Filed under: Netware,vmware — raj2796 @ 11:13 am

It seems Netwares newest server, Netware 6.5 sp8, allegedly the last release of Netware, and this time they mean it, though they really meant it when they said Netware 6.5 sp6 was the last release, has problems with Vmware. In this case Vmware esx 3.5 u4.

We identified an easily repeatable error where the inclusion of a virtual floppy drive causes abends on Netware 6.5 sp8 Vmware servers after a “restart server” or a “reboot server” command is issued ! As of this time neither Vmware or Netware have released tids on the problem or seem aware it exists, though this does raise the question of what kind of an idiot would still be adding floppy drives to Vmware servers ?

Novell Netware install guide

Blog at WordPress.com.