Monday, August 29, 2011

How to extract an individual file from an AIX mksysb on tape

To extract an individual file from a mksysb on tape:

# restore -s4 -xqvf /dev/rmt0.1 /my/filename

Example:

restore -s4 -xqvf /dev/rmt0.1 ./etc/passwd

How to enable full core dumps in IBM AIX

To enable full core dumps in IBM AIX:

# chdev -l sys0 -a fullcore='true'

Example:

chdev -l sys0 -a fullcore='true'

How to find all the rpm packages installed on a particular date

To find all the RPM packages which were installed on a particular date:

# rpm -qa --queryformat "%{NAME}-%{VERSION}.%{RELEASE} (%{ARCH}) INSTALLED: %{INSTALLTIME:date}\n" | grep my_date

Example:

rpm -qa --queryformat "%{NAME}-%{VERSION}.%{RELEASE} (%{ARCH}) INSTALLED: %{INSTALLTIME:date}\n" | grep "29 Sep 2006"

To find the install date and time of an RPM package:

# rpm -qa --queryformat "%{NAME}-%{VERSION}.%{RELEASE} (%{ARCH}) INSTALLED: %{INSTALLTIME:date}\n" | grep rpm_package_name

If you want the epoch time rather than human readable date:

# rpm -qa --queryformat "%{NAME}-%{VERSION}.%{RELEASE} (%{ARCH}) INSTALLED: %{INSTALLTIME}\n" | grep rpm_package_name

Example:

rpm -qa --queryformat "%{NAME}-%{VERSION}.%{RELEASE} (%{ARCH}) INSTALLED: %{INSTALLTIME:date}\n" | grep libaio

How to find the maximum supported logical track group (LTG) size of a disk in AIX

To find the maximum supported logical track group (LTG) size of a disk in IBM AIX, you can use the lquerypv command with the -M flag. The output gives the LTG size in KB.

# /usr/sbin/lquerypv -M hdisk#

Example:

/usr/sbin/lquerypv -M hdisk0

How to find the number of asynchronous i/o servers running in IBM AIX

To find the number of asynchronous i/o servers running in IBM AIX:

To determine you how many Posix AIO Servers (aios) are currently running, as root:

# pstat -a | grep -c posix_aioserver

To determine you how many Legacy AIO Servers (aios) are currently running, as root:

# pstat -a | grep -c aioserver

Example:

pstat -a | grep -c aioserver

How to find the values for asynchronous i/o in IBM AIX

To find the values for asynchronous I/O in IBM AIX:

# lsattr -El aio0

Example:

lsattr -El aio0

How to find what level your IBM VIO Server is running at

To determine which level of Virtual I/O Server (VIOS) you're running:

1) Login in to the VIO partition using the user "padmin"

2) Issue the ioslevel command:

# ioslevel

Example:

# ioslevel

How to find the world-wide name (WWN) of a fibre-channel card in IBM AIX

To find the world-wide name (WWN) or network address of a fibre-channel (FC) card in IBM AIX:

First find the name of your fibre-channel cards:

# lsdev -vp | grep fcs

Then get the WWN (for fcs0 in this example):

# lscfg -vp -l fcs0 | grep "Network Address"

Example:

lscfg -vp -l fcs0 | grep "Network Address"

How to mount a cd manually in AIX

To manually mount a cd in IBM AIX:

# mount -V cdrfs -o ro /dev/cd0 /cdrom

Example:

mount -V cdrfs -o ro /dev/cd0 /cdrom

How to mount an ISO file as a filesystem in AIX

In AIX you "dd" the ISO file into a raw LV, then mount the LV as a filesystem.

Here are the steps for copying the ISO named "image.iso" into "/cd1iso", a JFS filesystem:

1. Create a filesystem with size slightly bigger than the size of the ISO image. Do NOT mount the filesystem:
# /usr/sbin/crfs -v jfs -g rootvg -a size=800M -m/cd1iso -Ano -pro -tno -a frag=4096 -a nbpi=4096 -a ag=8

2. Get the logical volume name associated with the new filesystem:
# lsfs | grep cd1iso (assume it is /dev/lv00)

3. dd the ISO image into rlv00 (raw lv00):
# dd if=image.iso of=/dev/rlv00 bs=10M

4. Alter /cd1iso stanza in /etc/filesystems => vfs=cdrfs and options=ro (read-only)

/cd1iso:
            dev            = /dev/cd1_lv
            vfs             = cdrfs
            log             = /dev/loglv00
            mount           = false
            options         = ro
            account         = false

5. Mount the file system :
# mount /cd1iso

6. When finished, remove the filesystem:
# rmfs /cd1iso

Example:

/usr/sbin/crfs -v jfs -g rootvg -a size=800M ...

How to recover a failed MPIO paths from an IBM VIO server on an AIX LPAR

If you have set up disks from 2 VIO servers using MPIO to an AIX LPAR, then you need to make some changes to your hdisks.

You must make sure the hcheck_interval and hcheck_mode are set correctly:

Example for default hdisk0 settings:

# lsattr -El hdisk0
PCM             PCM/friend/vscsi                 Path Control Module        False
algorithm       fail_over                        Algorithm                  True
hcheck_cmd      test_unit_rdy                    Health Check Command       True
hcheck_interval 0                                Health Check Interval      True
hcheck_mode     nonactive                        Health Check Mode          True
max_transfer    0x40000                          Maximum TRANSFER Size      True
pvid            00cd1e7cb226343b0000000000000000 Physical volume identifier False
queue_depth     3                                Queue DEPTH                True
reserve_policy  no_reserve                       Reserve Policy             True

IBM recommends a value of 60 for check_interval and hcheck_mode should be set to "nonactive".

To change these values (if necessary):

# chdev -l hdisk0 -a hcheck_interval=60 -P

# chdev -l hdisk0 -a hcheck_mode=nonactive -P

Now, you would need to reboot for automatic path recovery to take effect.

If you did not set the check_interval and hcheck_mode as described above or did not reboot, then after a failed path, you would see the following even after the path is back online:

# lspath
Enabled hdisk0 vscsi0
Failed  hdisk0 vscsi1

To fix this, you would need to execute the following commands:

# chpath -l hdisk0 -p vscsi1 -s disable

# chpath -l hdisk0 -p vscsi1 -s enable

Now, check the status again:

# lspath
Enabled hdisk0 vscsi0
Enabled hdisk0 vscsi1

Example:

chpath -l hdisk0 -p vscsi1 -s disable; chpath -l hdisk0 -p vscsi1 -s enable

How to remove a dynamically allocated i/o slot in a DLPAR in IBM AIX

To remove a dynamically allocated I/O slot (must be a desired component) from a partition on a P-series IBM server partition:

1) Find the slot you wish to remove from the partition:

# lsslot -c slot
# Slot Description Device(s)
U1.5-P2/Z2 Logical I/O Slot pci15 scsi2
U1.9-P1-I8 Logical I/O Slot pci13 ent0
U1.9-P1-I10 Logical I/O Slot pci14 scsi0 scsi1

In our case, it is pci14.

2) Delete the PCI adapter and all of its children in AIX before removal:

# rmdev -l pci14 -d -R
cd0 deleted
rmt0 deleted
scsi0 deleted
scsi1 deleted
pci14 deleted

3) Now, you can remove the PCI I/O slot device using the HMC:

a) Log in to the HMC

b) Select "Server and Partition", and then "Server Management"

c) Select the appropriate server and then the appropriate partition

d) Right click on the partition name, and then on "Dynamic Logical Partitioning"

e) In the menu, select "Adapters"

f) In the newly created popup, select the task "Remove resource from this partition"

g) Select the appropriate adapter from the list (only desired one will appear)

h) Select the "OK" button

i) You should have a popup window which tells you if it was successful.

Example:

lsslot -c slot; rmdev -l pci14 -d -R

Access Control Lists (ACLs) in AIX

I have a directory named "/data" and a user called "steve"
To enable the ACL in this directory and add specific permissions to a user, type
# acledit /data
A screen like this will appear:
attributes:
base permissions
owner(root): rwx
group(system): r-x
others: r-x
extended permissions
disabled
Using "vi" commands, change the extended permissions to "enabled", and add the specific permissions, like this: attributes: SGID
base permissions
owner(root): rwx
group(system): rwx
others: ---
extended permissions
enabled
permit rwx u:steve
permit r-x g:group
permit rw- u:test
------------------------------------------------
where:
r = read
w = write
x = execute
u= user
g= group
permit: to grant access
After this, save the file (like in "vi" editor).
To list the ACL´s, type
# aclget /data
To transfer the ACL permissions from a directory to another, type
# aclget /data |aclput /data2
Now we will collect all the acl's permission in an outputfile called as acldefs.
# aclget -o acldefs /data
Edit the file acldefs. and make the necessary changes as you want in the file. After changing the files for necessary ACL permission's save the file as we are going to use this file for our future input values.
Now to put the acl on the other directory / file with as per the changes you made in your file.
# aclput -i acldefs /data3
This will put the acl's on the file with the values specfied in the acldefs file. and now you can play with acl's.
Extended permissions:
AIXC ACL extended permissions allow the owner of a file to more precisely define access to that file. Extended permissions modify the base file permissions (owner, group, others) by permitting, denying, or specifying access modes for specific individuals, groups, or user and group combinations. Permissions are modified through the use of keywords.
The permit, deny, and specify keywords are defined as follows:
permit: Grants the user or group the specified access to the file
deny: Restricts the user or group from using the specified access to the file
specify: Precisely defines the file access for the user or group
A user is denied a particular access by either a deny or a specify keyword, no other entry can If override that access denial.
The enabled keyword must be specified in the ACL for the extended permissions to take effect.
The default value is the disabled keyword.
In an AIXC ACL, extended permissions are in the following format:
extended permissions:
enabled | disabled
permit Mode UserInfo...:
deny Mode UserInfo...:
specify Mode UserInfo...:
Use a separate line for each permit, deny, or specify entry. The Mode parameter is expressed as rwx (with a hyphen (-) replacing each unspecified permission). The UserInfo parameter is expressed as u:UserName, or g:GroupName, or a comma-separated combination of u:UserName and g:GroupName.
Note: If more than one user name is specified in an entry, that entry cannot be used in an access control decision because a process has only one user ID.

Adding default Gateway in Linux SUSE

Changing the default route permanently in SuSe Linux

To make 192.168.2.1 as default route, add the following line into /etc/sysconfig/network/routes file. (Create a file if it does not exist.)

default 192.168.2.1 - -

Syntax: default xxx.xxx.xxx.xxx - -

Linux route add using route command

Route all traffic via 192.168.1.254 gateway connected via eth0 network interface:

# route add default gw 192.168.1.254 eth0

Linux route add using ip command

Just like above but with ip command:# ip route add 192.168.1.0/24 dev eth0

route add -net default gw gateway_ip_address

Change IP address

You can change ip address using ifconfig command itself. To set IP address 192.168.1.5, enter command:# ifconfig eth0 192.168.1.5 netmask 255.255.255.0 up
# ifconfig eth0

EMC Replication Technologies

EMC has many ways to replicate SAN data. Here is a short list of some of the technology and where they're used:
- MirrorView:

MirrorView is used on Clariion and Celerra arrays and comes in two flavors, /A – Async and /S – Sync.

- Celerra Replicator:

Used to replicate data between Celerra unified storage platforms.

- RecoverPoint:

Appliance based Continuous Data Protection solution that also provides replication between EMC arrays. This is commonly referred to a "Tivo for the Datacenter" and is the latest and greatest technology when it comes to data protection and replication.

Each of these replication technologies replicates LUNS between arrays but they have different overhead requirements that you should consider.
MirrorView
-          MirrorView requires 10 to 20% overhead to operate. So if you have 10TB of data to replicate you are going to need an additional 1 to 2TB of storage space on your production array.
Celerra Replicator
-          Celerra Replicator can require up to 250% overhead. This number can vary depending on what you are replicating and how you plan to do it. This means that 10TB of replicated data could require an additional 25TB of disk space between the Production and DR arrays.
RecoverPoint
-          While RecoverPoint is certainly used as a replication technology it provides much more than that. The ability to roll back to any point in time (similar to a Tivo) provides the ultimate granularity to DR. This is accomplished via a Write Splitter that is built in to Clariion arrays. RecoverPoint is also available for non-EMC arrays.
-          RecoverPoint can be deployed in 3 different configurations, CDP (local only/not replicated), CRR (remote) and CLR (local and remote).

CRR replicates data to your DR site where your "Tivo" capability resides. You essentially have an exact copy of the data you want to protect/replicate and a "Journal" which keeps track of all the writes and changes to the data. There is an exact copy of your protected data plus roughly 10 to 20% overhead for the Journal. So 10TB of "protected" data would require around 11 to 12TB of additional storage on the DR array.
CLR is simply a combination of local CDP protection and remote CRR protection together. This provides the ultimate in protection and granularity at both sites and would require additional storage at both sites for the CDP/CRR "Copy" and the Journal.

This is obviously a very simplified summary of replication and replication overhead. The amount of additional storage required for any replicated solution will depend on the amount, type and change rate of data being replicated. There are also many things to consider around bandwidth between sites and the type of replication, Sync or Async, that you need.

AIX Security Expert (aixpert)

One of an important activity of a system administrator is to keep the server's secured. That includes complying health monitoring checks and other scans.

It would be complicated to define base line security settings for your current environment. Using base line security settings on newly created systems seems to be a difficult job for a system admin.

The Base line security includes

User Settings

Network settings

Services and Daemons

Root access

File permissions

Creating a baseline security setting for the above mentioned list is a tedious work to be done and a lot of time and man power to be spent.

Rather we could now use aixpert a simple system hardening utility can be found free on AIX 5.3 ML 03 and later.

In this document I'm gonna go through setting up aixpert, creating a base line security for your system and gathering the proof for audit and logs for aixpert.

What is aixpert?

aixpert is a AIX hardening utility which help's us secure the system and do the checks with the help of predefined scripts. aixpert can be used by using commands, smit or websm. aixpert has 300 and more AIX Standard Settings defined with four level high, medium, low or default setting.

Notable things in aixpert

It can create baseline security in an xml file which can be passed to other servers and implemented there

When we implement aixpert it will create an undo xml file by which we can go back the original settings before aixpert is implemented.

A security check on the baseline security can be performed and identify any compromises.

Command:

aixpert

-l sets the security level to {high|medium|low|default}

-n the associated security level settings are written to a file(-o needs to be specified to mention the file)

-o stores the security output to a file

-u undo the security settings (uses undo.xml created in the core directory)

-c checks for failed baseline security settings and write to the check_report.txt

What Security Setting's to be used and when.

High:

When the server is exposed to internet, incase of web server and other application servers which are connected to internet. ftp, telnet
are disabled.

Medium:

Server connected to the internal network only without a highly secured firewall.

Low:

Server connected to local network only and connected to the WAN with the highly secured firewall. ftp, telnet are enabled.

Default:

This is the default settings that come with the AIX.

Example:

aixpert –l high –n –o /tmp/high_security.xml

Now you can edit the file /tmp/high_security.xml and remove the security settings which are not required to you environments. The security settings under the xml file will have the description and the script it will use to implement the security setting.

Once you have completed the editing you can consider it as a baseline security for your system.

Now apply the security setting to your system by

aixpert –f /tmp/high_security.xml

If you find something gone wrong, you could undo all the changes by

aixpert –u

Now to put a check every day to find the security is not compromised you can use

aixpert –c

You can find a log in the default directory /etc/security/aipert/check_report.txt which shows all the compromised security settings.

# cat check_report.txt

nwwkopts.sh: Network option extendednetstats's value should be 1, but it is 0 now

chuserstanza.sh: User attribute rlogin in stanza root, should have value false, but its value is NULL now

cominetdconf.sh: Process ps is still running

Understanding How Cluster Quorums Work

Based on conferences that I have attended and E-mails that I receive, it always seems to me that when it comes to clustering, quorums are one of the most commonly misunderstood topics. In order to effectively administer a cluster though, you need to understand what a quorum is and you need to know about the various types of quorums. In this article, I will explain what a quorum is and what it does. Since this tends to be a confusing topic for a lot of people, I will attempt to keep my explanations as simple as I can.

Clustering Basics

Before I can really talk about what a quorum is and what it does, you need to know a little bit about how a cluster works. Microsoft server products support two main types of clustering; server clusters and network load balancing (NLB). The design philosophy behind these two types of servers couldn't be more different, but the one thing that both designs share is the concept of a virtual server.

There are several different meanings to the term virtual server, but in clustering it has a specific meaning. It means that users (and other computers) see the cluster as a single machine even though it is made up of multiple servers. The single machine that the users see is the virtual server. The physical servers that make up the virtual server are known as cluster nodes.

Network Load Balancing

These two different types of clusters have two completely different purposes. Network Load Balancing is known as a share all cluster. It gets this name because an application can run across all of the cluster's nodes simultaneously. In this type of cluster, each server runs its own individual copy of an application. It is possible that each server can link to a shared database though.

Network Load Balancing clusters are most often used for hosting high demand Web sites. In a network load balancing architecture, each of the cluster's nodes maintains its own copy of the Web site. If one of the nodes were to go down, the other nodes in the cluster pick up the slack. If performance starts to dwindle as demand increases, just add additional servers to the cluster and those servers will share the workload. A Network Load Balancing cluster distributes the current workload evenly across all of the cluster's active nodes. Users access the virtual server defined by the cluster, and the user's request is serviced by the node that is the least busy.

Server Clusters

The other type of cluster is simply known as a server cluster. A server cluster is known as a share nothing architecture. This type of cluster is appropriate for applications that can not be distributed across multiple servers. For example, you couldn't run a database server across multiple nodes because each node would receive updates independently, and the databases would not be synchronized.

In a server cluster, only one node is active at a time. The other node or nodes are placed in a sort of stand by mode. They are waiting to take over if the active node should fail.

As you may recall, I said that server clusters are used for applications that can not be distributed across multiple nodes. The reason that it is possible for a node to take over running an application when the active node fails is because all of the nodes in the cluster are connected to a shared storage mechanism. This shared storage mechanism might be a RAID array, it might be a storage area network, or it might be something else. The actual media type is irrelevant, but the concept of shared storage is extremely important in understanding what a quorum is. In fact, server clusters is the only type of clustering that uses quorums. Network load balancing does not use quorums. Therefore, the remainder of this discussion will focus on server clusters.

What is a Quorum?

OK, now that I have given you all of the necessary background information, let's move on to the big question. What is a quorum? To put it simply, a quorum is the cluster's configuration database. The database resides in a file named \MSCS\quolog.log. The quorum is sometimes also referred to as the quorum log.

Although the quorum is just a configuration database, it has two very important jobs. First of all, it tells the cluster which node should be active. Think about it for a minute. In order for a cluster to work, all of the nodes have to function in a way that allows the virtual server to function in the desired manner. In order for this to happen, each node must have a crystal clear understanding of its role within the cluster. This is where the quorum comes into play. The quorum tells the cluster which node is currently active and which node or nodes are in stand by.

It is extremely important for nodes to conform to the status defined by the quorum. It is so important in fact, that Microsoft has designed the clustering service so that if a node can not read the quorum, that node will not be brought online as a part of the cluster.

The other thing that the quorum does is to intervene when communications fail between nodes. Normally, each node within a cluster can communicate with every other node in the cluster over a dedicated network connection. If this network connection were to fail though, the cluster would be split into two pieces, each containing one or more functional nodes that can not communicate with the nodes that exist on the other side of the communications failure.

When this type of communications failure occurs, the cluster is said to have been partitioned. The problem is that both partitions have the same goal; to keep the application running. The application can't be run on multiple servers simultaneously though, so there must be a way of determining which partition gets to run the application. This is where the quorum comes in. The partition that "owns" the quorum is allowed to continue running the application. The other partition is removed from the cluster.

Types of Quorums

So far in this article, I have been describing a quorum type known as a standard quorum. The main idea behind a standard quorum is that it is a configuration database for the cluster and is stored on a shared hard disk, accessible to all of the cluster's nodes.

In Windows Server 2003, Microsoft introduced a new type of quorum called the Majority Node Set Quorum (MNS). The thing that really sets a MNS quorum apart from a standard quorum is the fact that each node has its own, locally stored copy of the quorum database.

At first, each node having its own copy of the quorum database might not seem like a big deal, but it really is because it opens the doors to long distance clustering. Standard clusters are not usually practical over long distances because of issues involved in accessing a central quorum database in an efficient manner. However, when each node has its own copy of the database, geographically dispersed clusters become much more practical.

Although MNS quorums offer some interesting possibilities, they also have some serious limitations that you need to be aware of. The key to understanding MNS is to know that everything works based on majorities. One example of this is that when the quorum database is updated, each copy of the database needs to be updated. The update isn't considered to have actually been made until over half of the databases have been updated ((number of nodes / 2) +1). For example, if a cluster has five nodes, then three nodes would be considered the majority. If an update to the quorum was being made, the update would not be considered valid until three nodes had been updated. Otherwise if two or fewer nodes had been updated, then the majority of the nodes would still have the old quorum information and therefore, the old quorum configuration would still be in effect.

The other way that a MNS quorum depends on majorities is in starting the nodes. A majority of the nodes ((number of nodes /2) +1) must be online before the cluster will start the virtual server. If fewer than the majority of nodes are online, then the cluster is said to "not have quorum". In such a case, the necessary services will keep restarting until a sufficient number of nodes are present.

One of the most important things to know about MNS is that you must have at least three nodes in the cluster. Remember that a majority of nodes must be running at all times. If a cluster only has two nodes, then the majority is calculated to be 2 ((2 nodes / 2) +1)-2. Therefore, if one node were to fail, the entire cluster would go down because it would not have quorum.

Conclusion

In this article I have explained the differences between a network load balancing cluster and a server cluster. I then went on to describe the roles that the quorum plays in a server cluster. Finally, I went on to discuss the differences between a standard quorum and a majority node set quorum.

migratepv VS replacepv

Replacepv simply moves all the logical partitions on one physical volume to another physical volume. The command is designed to make it easy to replace a disk in a mirrored configuration.

Migratepv is very similar. The biggest difference is that migratepv allows you to copy the LPs on a logical volume basis, not just on a physical volume basis. For instance, if you have a disk that has two logical volumes on it and you want to reorganize and put each logical volume on a different disk, migratepv can do it.

migratepv -l lv01 hdisk1 hdisk2
migratepv -l lv02 hdisk1 hdisk3

In this case, the logical partitions from logical volume lv01 are moved from hdisk1 to hdisk2. The logical partitions from logical volume lv02 are moved to hdisk3

Enable load balance for MPIO devices in AIX and monitor the usage of load balance / Check which paths are available or missing

To configure the load balance of MPIO devices in AIX:

chdev -l hdisk1 -areserve_policy=no_reserve -a algorithm=round_robin

Note: To run the above command, disk should be detached from the VG.

To check the modified parameters:

lsattr -El hdisk1

Monitoring I/O traffic of Load balance:

iostat -a |grep fcs

run nmon and use the "a" option to show the adapter IO.. if its load balanced, then both adapters should show as being about the same amount of busy..if its failover, one adapter will be busy, the other will be doing nothing.

To check which paths are available / missing with AIX MPIO:

lspath -l hdisk1 -s available -F"connection:parent:path_status:status"

lspath -l hdisk1 -F"connection:parent:path_status:status"

Virtualization Tricks - Four steps to implement and support virtualization on the System p platform

Today's competitive corporate environment requires nimble IT departments with the capability to respond quickly to changes in capacity and the use of innovative methods to reduce time to market for new applications and systems. Escalating costs for power, raised floor capacity and administrative requirements also drive the need to utilize technology in new ways to maximize a company's IT investment.

Virtualization is an excellent vehicle to address business needs while controlling costs. The IBM* System p* Advanced Power Virtualization (APV) offers advanced technology to facilitate server consolidation, reduce costs, increase utilization and adapt capacity to quickly meet demand. APV, standard on the p5* 590 and 595 servers, can be used to reduce the need for static adapters and respond to increasing capacity demands.

Increasing Utilization

A common implementation benefit of server consolidation is curing the malady of underutilized servers. Most companies without virtualization and server consolidation report UNIX* utilization under 25 percent, indicating underutilized servers.

APV facilitates server consolidation by allowing rapid response to changes in memory or CPU as well as removing the need to physically move I/O or network adapters. System p customers typically drive 60- to 80-percent utilization using virtual CPUs, Micropartitioning* and capped/uncapped capacity techniques. Capacity Upgrade On Demand is designed to provide rapid deployment of spare CPUs. The unparalleled granularity for CPU virtualization that System p virtualization may provide places APV in a category of its own in increasing flexibility and driving better utilization of IT resources.

Cost savings can be realized not only in reducing the number of servers and network and I/O adapters, but also in reducing floor space, power and cooling. Some companies see significant reductions in their overall capacity spending, making cost reduction a significant benefit of virtualization.

The capability to quickly change course and put up a new, isolated partition provides businesses with significant competitive advantage over less flexible methods. The degree of isolation provided by the IBM POWER5* hypervisor translates into reduced risk for new applications and increased opportunities to test changes. Although virtualization may be a newer concept to the UNIX world and many users hesitate to implement test partitions within the same footprint as production, this technology was fine tuned on the IBM mainframe; 40 years of knowledge and lessons learned provide the System p platform with an edge with respect to delivering stability and reliability.

Because deciding how best to utilize virtualization may be daunting at first, we're outlining an easy, four-step approach to implementing and supporting virtualization on System p servers.

Step One

The first and easiest step is to review the number of CPUs on the system. Using this information, create a pool of CPUs and simply allow the hypervisor to distribute the workload most efficiently. This drives increased utilization and can significantly reduce software savings as well. Each LPAR can "cap" or limit the number of CPUs utilized, reducing the number of CPUs that must be licensed for a specific software package. One example might be to have a System p5 570 server with 16 CPUs and create 20 LPARs letting the system distribute workload as needed.

Companies wishing to preserve some CPU capacity for upgrade on demand may choose larger machines with some capacity in reserve. It's also important when choosing the size of memory DIMMs to consider future expansion needs or requests. Smaller DIMMs that fill all available slots may be less expensive at the time, but having some unpopulated slots may be cheaper in the long run.

Step Two

The next step involves virtualizing the I/O by creating a pair of virtual I/O servers. Two virtual I/O servers provide redundancy. We place rootvg on a storage-area network (SAN) and boot from the SAN. This helps save on internal disk space as SAN is more cost-effective and generally provides faster I/O throughput. On average, only two 2 Gb Host Bus Adapter (HBA) cards are necessary. This should handle the workload and paging space requirements of 20 rootvg. Since the majority of bandwidth is consumed during the boot process, there will be unused bandwidth on HBAs unless all 20 LPARs were booting concurrently, which would be unlikely.

Another area of virtualization to consider is network administration and sharing network adapters. This can create a requirement for network backup, which we'll explore further. Examining any system for single points of failure (SPoF) is a crucial step as more components are virtualized. Although the current generation of the System p platform has superior reliability, availability and serviceability (RAS), server consolidation makes it necessary to provide the right availability architecture to support virtualization and consolidation levels.

Looking again at the p570 server running 20 LPARs, we configure two virtual I/O servers for load balancing and redundancy. The system administrator reviews other potential areas of concern (e.g., application availability, failover partitions and ensuring dual power feeds are coming from two different circuits to the system).

Most issues that occur on well-designed systems are done by sleep-deprived system administrators doing work at 2 a.m. Having two virtual I/O servers is one way to protect against that same sleep-deprived administrator typing the wrong command and bringing a single virtual I/O server down. A sample configuration is depicted in Figure 1.

The system with 20 LPARs needs eight HBAs to serve the rootvg for 20 partitions instead of requiring two for each partition. This removes 32 HBAs from the solution and may provide significant cost savings for hardware as well as greater flexibility. For eight HBAs, we reduce the number of PCI slots to four, which allows for greater future growth. This design allows the system to present a single logical unit number (LUN) to both virtual I/O servers per LPAR. Multipath I/O (MPIO) software (such as SSDPCM, PowerPath and HDLM) on the virtual I/O servers is utilized for load balancing. Using MPIO on the AIX* OS/LPARs eliminates a potential SPoF by providing dual paths either to virtual I/O server 1 or virtual I/O server 2. Some installations run all odd-numbered LPARs to virtual I/O server 1 and even-numbered LPARs to virtual I/O server 2. If one of the virtual servers was brought down for maintenance (or even brought down in error), then this prevents administrative overhead dealing with stale disks in each of the LPARs.

Step Three

The third step involves examining the networking aspect of virtualization. This is very similar to the concept of disk. The network's complexity is determined by network requirements. The virtual I/O server provides advanced options for network administration beyond basic networking, including using the Network Interface Backup or using the Shared Ethernet Adapter (SEA).

Network Interface Backup (NIB) provides not only redundant network cards, but it also provides rudimentary load balancing across network I/O adapters. Each adapter is defined and utilized rather than having one defined network card and a standby failover card. Combining this technique with AIX OS features provides network redundancy with fast recovery from network card failure.

If a network failure occurs, the traffic will route to one virtual I/O server. This is a cost-effective way to set up a virtual network as it needs fewer network cards, but requires more administration time as this feature must be set up on each LPAR. Each LPAR must be touched in the event of a network failure to set the paths back to the original virtual I/O server. There are also some limitations such as MTU size of 1500 or smaller and 802.1q (VLAN tagging), which aren't supported with NIB configurations. Figure 2 depicts a typical NIB configuration.

The second way to set up a virtual network is the SEA method. SEA supports VLAN tagging and large packets. It can be set up on the virtual I/O servers and is therefore easier to manage. SEA requires backup network cards on the opposite virtual I/O server, but if the goal in the enterprise is providing the best highly available network and lower cost, this may be the preferred choice. In our example of 20 LPARs with two separate VLANs, instead of having two network cards in each LPAR for 40 total network ports, we reduce the required number to eight network cards. Again, best practices would recommend that a schema to split LPARs between virtual I/O servers be deployed such as odd number LPARs to virtual I/O server 1 and even numbers to virtual I/O server 2. This SEA setup would have four network cards sitting idle waiting for an event but it's a low cost for such high availability. Figure 3 shows one possible configuration.

Step Four

The fourth step looks at virtualizing the non-rootvg or datavg disks. On systems with very high, busy or volatile I/O, this may be the least desirable virtualization. A guideline to keep in mind: If your current system is very busy with I/O, you shouldn't virtualize the disk. System p hardware provides flexibility in this respect. Physical or virtual disk or networking options can be used in any of the LPARs. Start by virtualizing the rootvg to reduce the rootvg HBAs and further virtualize and reduce HBAs for datavg when warranted. Allocate four HBAs for each virtual I/O server in a 20 LPAR system and add the datavg to existing OS LUNs that are already defined for normal workloads (i.e., systems that aren't utilizing significant bandwidth).

Careful analysis of the current SAN environment and port utilization is necessary before planning out the transition, but it's relatively simple to move from physical to virtual when data is on a SAN. When capacity needs outgrow the bandwidth of the virtual subsystem, it's a matter of adding more dedicated HBAs and migrating back from virtual to physical. A hypothetical configuration is shown in Figure 4.

Vital Virtualization

Virtualization provides flexibility and cost savings and allows your System p platform to grow and change with business needs. Hopefully, this four-step process gives you a way to approach the adoption of this important technology. Virtualization is a powerful tool, which provides the flexibility necessary for IT to evolve as business changes.

Howto mount an ISO image in AIX

First Create a filesystem with crfs command:

#/usr/sbin/crfs -v jfs -g rootvg -a size=800 -m/cd1iso -Ano -pro -tno -a frag=4096 -a nbpi=4096 -a ag=8

This command creates the /cd1iso file system on the rootvg volume group.

Now dd the "iso" image into rlv00
# dd if=image.iso of=/dev/rlv00 bs=10M

Use chfs to change the attributes of a file system:# chfs -a vfs=cdrom cd1iso
# mount /cd1iso
# cd /cd1iso

When done, remove the filesystem/unmount, enter:
# rmfs /cd1iso

This removes the /cd1iso file system, its entry in the /etc/filesystems (created using chfs command) file, and the underlying logical volume

AIX default users / special users & Removing unnecessary default user accounts

System special user accounts

AIX® provides a default set of system special user accounts that prevents the root and system accounts from owning all operating system files and file systems.

Attention: Use caution when removing a system special user account. You can disable a specific account by inserting an asterisk (*) at the beginning of its corresponding line of the /etc/security/passwd file. However, be careful not to disable the root user account. If you remove system special user accounts or disable the root account, the operating system will not function.

The following accounts are predefined in the operating system:

adm

The adm user account owns the following basic system functions:

Diagnostics, the tools for which are stored in the /usr/sbin/perf/diag_tool directory.

Accounting, the tools for which are stored in the following directories:

/usr/sbin/acct

/usr/lib/acct

/var/adm

/var/adm/acct/fiscal

/var/adm/acct/nite

/var/adm/acct/sum

bin

The bin user account typically owns the executable files for most user commands. This account's primary purpose is to help distribute the ownership of important system directories and files so that everything is not owned solely by the root and sys user accounts.

daemon

The daemon user account exists only to own and run system server processes and their associated files. This account guarantees that such processes run with the appropriate file access permissions.

nobody

The nobody user account is used by the Network File System (NFS) to enable remote printing. This account exists so that a program can permit temporary root access to root users. For example, before enabling Secure RPC or Secure NFS, check the /etc/public key on the master NIS server to find a user who has not been assigned a public key and a secret key. As root user, you can create an entry in the database for each unassigned user by entering:

newkey -u username

Or, you can create an entry in the database for the nobody user account, and then any user can run the chkey program to create their own entries in the database without logging in as root.

root

The root user account, UID 0, through which you can perform system maintenance tasks and troubleshoot system problems.

sys

The sys user owns the default mounting point for the Distributed File Service (DFS) cache, which must exist before you can install or configure DFS on a client. The /usr/sys directory can also store installation images.

system

System group is a system-defined group for system administrators. Users of the system group have the privilege to perform some system maintenance tasks without requiring root authority.

esaadmin - The Electronic Service Agent application automatically monitors and collects hardware problem information

ficheck

invscout - The invscout program is "a setuid root application, installed by default under newer versions of IBM AIX, that surveys the host system for currently installed microcode or Vital Product Data (VPD)".

snapp - An extensible, XML-based application that provides a menu-driven interface for UNIX system administration tasks on a handheld

Removing unnecessary default user accounts

During installation of the operating system, a number of default user and group IDs are created. Depending on the applications you are running on your system and where your system is located in the network, some of these user and group IDs can become security weaknesses, vulnerable to exploitation. If these users and group IDs are not needed, you can remove them to minimize security risks associated with them.

Note: You can remove unneeded users and group IDs from systems that do not undergo system updates (for example, CAPP/EAL4 systems). However, if you remove unneeded users and group IDs from AIX® systems that are updated, installation errors may occur during AIX update installations. To avoid these errors, use one of the following methods:

Instead of deleting the users, use the following command to lock those accounts so that users cannot log into the system:

chuser "account_locked=true"

Before deleting a user, uninstall the fileset associated with that user. For example: if you plan to delete the users uucp and nuucp, remove the bos.net.uucp fileset before you delete the users.

The following table lists the most common default user IDs that you might be able to remove:

Table 1. Common default user IDs that you might be able to remove.

User ID	Description
uucp, nuucp	Owner of hidden files used by uucp protocol. The uucp user account is used for the UNIX-to-UNIX Copy Program, which is a group of commands, programs, and files, present on most AIX systems, that allows the user to communicate with another AIX system over a dedicated line or a telephone line.
lpd	Owner of files used by printing subsystem
guest	Allows access to users who do not have access to accounts

The following table lists common group IDs that might not be needed:

Table 2. Common group IDs that might not be needed.

Group ID	Description
uucp	Group to which uucp and nuucp users belong
printq	Group to which lpd user belongs

Analyze your system to determine which IDs are indeed not needed. There might also be additional user and group IDs that you might not need. Before your system goes into production, perform a thorough evaluation of available IDs.

Accounts created by security components

When security components such as LDAP and OpenSSH are installed or configured, user and group accounts are created.

The user and group accounts created include:

Internet Protocol (IP) Security: IP Security adds the user ipsec and the group ipsec during its installation. These IDs are used by the key management service. Note that the group ID in /usr/lpp/group.id.keymgt cannot be customized before the installation.

Kerberos and Public Key Infrastructure (PKI): These components do not create any new user or group accounts.

LDAP: When the LDAP client or server is installed, the ldap user account is created. The user ID of ldap is not fixed. When the LDAP server is installed, it automatically installs DB2®. The DB2 installation creates the group account dbsysadm. The default group ID of dbsysadm is 400. During the configuration of the LDAP server, the mksecldap command creates the ldapdb2 user account.

OpenSSH: During the installation of OpenSSH, the user sshd and group sshd are added to the system. The corresponding user and group IDs must not be changed. The privilege separation feature in SSH requires IDs.

We remove snapp, invscout, ipsec, lp, and uucp along with the associated packages. Why IBM still insists on installing the SNAPP package is beyond me...