October 2014

File Systems

January 31, 2012

An enhancement has been made available in IBM i 7.1 and 6.1 releases that may help with integrated file system SAV performance. A new parameter, Asynchronous Bring (ASYNCBRING) has been added.

The ASYNCBRING support will enable objects to be asynchronously brought into memory early so they will not need to be paged in when first accessed by the SAV processing. While results will vary, testing in the lab and by a few customers have shown some dramatic improvements - up to 60% faster save times in some cases. The performance gain seen is dependent on the directory structure, number and size of objects saved, amount of memory available, and the system configuration. This enhancement is available on both the SAV command and QsrSave() API.

To use this new function, you will need to install necessary the PTFs:

One final note - the best performance improvement may be seen with a well-balanced directory tree in which all objects qualify for the save. In situations where a large number of objects reside in a single directory, few objects qualify for the save, or the system is memory constrained, performance may degrade with ASYNCBRING(*YES) specified.

I'd like to thank Jerry Simon from the Save/Restore team and David Bhaskaran from the BRMS team for their assistance in creating this blog article.

September 13, 2011

A few weeks ago “iZone Shows You the Enhancements”described how IBM has been developing updates to IBM i software that gets delivered via PTFs and how those enhancements are being written about on the IBM i developerWorks website. Recently, additional topics have been added to the developerWorks site; here's a brief preview of two of those enhancements.

Have you ever wanted to search your spooled files for key words or phrases? How about searching text in IFS stream files? Do you need more than simple pattern matching in your search, such as word variations on the search terms?

In V1R2, OmniFind adds new interfaces to support text indexing and searching IBM i spool files and IFS stream files.

The article that has been published on the developerWorks site provides more detail about how to leverage the new search support in your applications. The product is available at no additional charge and the capabilities it provides should prove very useful for many applications.

Many customers run their Domino servers on IBM i. Domino on IBM i differentiates itself from on other platforms in that you can install multiple versions of Domino on one single IBM i partition and configure multiple Domino partitions (instances) of each version of Domino, while on other platforms on each server you usually only install and configure one Domino server instance. This enables IBM i to be the natural platform for enterprises who need to deploy multiple Domino servers.

But do you know that there is a set of Domino for IBM i unique APIs which can be easily used to manage your Domino server on IBM i automatically? This set of Domino for IBM i unique APIs allows you to list all the Domino servers, retrieve information from a server, get or set Domino server's Notes.ini, and so on. By using the Domino for IBM i APIs together with some of the Domino CL commands you can easily to write some programs to manage your Domino server automatically. If you are seeking a solution for this, take a few minutes to explore this article written by Bin Yang and Shuang Hong Wang of the IBM i Domino team.

April 19, 2011

The Work Management team recently released PTF SI42845 for the 7.1 release that changes how IBM i manages jobs that exceed their CPU or storage limits.

The class object defines the processing attributes for a job. The routing entry in the subsystem description is used to determine which class object is used when a job is initiated. Two of these processing attributes within the class object are Maximum processing unit time (CPUTIME) and Maximum temporary storage allowed (MAXTMPSTG), which both have default values of *NOMAX. Prior to this recent PTF, if values were entered for these parameters, the job would be ended if one of the limits was hit. For the maximum processing unit time, the job would be ended with CPC1218 (Job ended abnormally); for the maximum temporary storage allowed, the job would be ended with CPC1217 (Job ended abnormally). The cause for each of these messages tells you whether the job ended abnormally due to the maximum CPU time being consumed or the maximum temporary storage limit being exceeded.

The system can’t know if the job was actually near the completion of the work it had to do when it would end the job. It's possible that given a little more CPU time or temporary storage, the job would be able to run to completion. Because of the difficulty in predicting the upper CPU or temporary storage limits required by a job, along with the fact that the job would be ended when these limits were hit, many customers simply left these values at their default setting.

The above PTF that was recently released changes the behavior so that jobs are no longer ended when they have exceeded their maximum processing unit time or their maximum temporary storage limit. Rather, the jobs will be held. When a job is held by the system due to these conditions, a message will be sent to the QSYSOPR message queue:

This change allows the system operator to determine whether the jobs should be ended or if they should be allowed to continue to run to completion.

If you want the jobs to continue to run, you must change the limit that was hit and then use the Release Job (RLSJOB) command (you can’t release a job that’s above the limit). To allow these values to be changed, the Change Job command and the Change Job APIs have been enhanced.

The Change Job (CHGJOB) command has been enhanced with two new parameters:

• Maximum CPU time (CPUTIME): The maximum CPU time parameter specifies the maximum processing unit time (in milliseconds) that the job can use. If the maximum time is exceeded, the job is held.• Maximum temporary storage (MAXTMPSTG): The maximum temporary storage parameter specifies the maximum amount of temporary auxiliary storage (in megabytes) that the job can use. This temporary storage is used for storage required by the program itself and by implicitly created internal system objects used to support the job. (It doesn’t include storage for objects in the QTEMP library.) If the maximum temporary storage is exceeded, the job is held.

The Change Job (QWTCHGJB) API has been enhanced to support two new keys on the JOBC0100 and JOBC0200 formats:

This PTF makes it easier for you to protect your system from the effects of a run-away job that either consumes more CPU than expected or uses more temporary storage than expected. By setting these limits larger than what any job should use, you can protect the system from the potentially negative affects of a run-away job. Because the job will be held rather than ended, the limits don’t need to be set perfectly. If either limit is hit, you can increase the limit with the change job command or API then release the job to allow it to continue to run. If the new upper limit is hit, the system will once again hold the job.

With the change introduced with this PTF, you should start to move away from the default *NOMAX values and set appropriate limits. Particularly with the temporary storage limit, you can prevent a system outage by setting an upper limit on the class object for the maximum temporary storage that a job can use (but be sure to keep that limit lower than the amount of storage available on the system). With the new behavior of the job being held when the limit is hit, you have the capability to assess and determine the best action for the job.

I'd like to thank Dan Tarara from the IBM i work management development team for his assistance in writing this blog article.

December 14, 2010

This week, I thought I’d write about some enhancements that have been made to the history log in recent releases. While these changes have been available for some time, they are well hidden because they were generally small changes on existing interfaces. The history log, as we all know, can be very valuable for understanding what has happened on the system, but it can also be a little overwhelming to deal with because of the potential volume of messages that get logged.

History Log Size - *DAILY

In V5R4, we made a little enhancement to the History Log Size (QHSTLOGSIZ) system value. A special value of *DAILY was added. This option allows for a new version of the history log to be created each day, rather than based upon size. If you have a busy system and reach the maximum number of records in a day, you will get a new log version when that occurs, so on these busy systems, you may have more than one log version a day even if you specify *DAILY for the system value.

I'll make an observation about the values allowed for the history log size. IBM i allows you to specify a size of 1. I have no idea why this is, but it’s silly! In fact, if you do specify 1 for the QHSTLOGSIZ system value, the system actually uses 10 – which is still ridiculously small. The default size is 5,000, and that is also too small for today's environments! I like the capability to have history logs created on a daily basis and that approach eliminates the need to play around with log size values.

It’s a formal programming interface; prior to the introduction of the API, you had to read from the files directly to programmatically access the history log messages.

You can retrieve many messages with one invocation of the API, as opposed to reading the files record by record

Using the API, you can filter on message severity or you can filter on the message type; in addition you can specify with the filters to either include or omit the filtered data

You can also identify specific message IDs that should be retrieved or omitted

This API can be used to get messages from the QHST message queue into the history log files. Sometimes you need to do a DSPLOG twice in order to see the latest messages, but that is not the case when using the API.

Watches

I wrote a blog on watches some time ago. Using watches you can watch for messages that are sent to the history log. But be careful! Do not watch for common messages (such as job started/job ended messages) since those will be matched frequently. If you do need to watch for a commonly sent message, qualify it with a comparison data so the match criteria are more selective.

If you have a need for real-time notification of messages sent to the history log, using watches is more effective than using the API or the DSPLOG command. Note that while Management Central has message monitors, you cannot monitor the history log messages with these message monitors; rather, you use file monitors to monitor the history log through Management Central. However, file monitors are more resource intensive than watches.

Print Second Level Message Text

Way back in V5R3, support was added to the Display Log (DSPLOG) command to allow you to print the second level text of messages from the history log. This was done by extending the keywords on the Output (OUTPUT) parameter to support *PRTSECLVL.

December 07, 2010

There is a system value, Library Locking Level (QLIBLCKLVL), that you can use to control whether libraries that are in a job's library search list are locked. This system value is not new – it has been around since the V5R1 release – but it seems that it’s not very well known as I've talked with several people who’ve never heard of it.

The Library Locking Level system value has the default value of '1', which indicates that the libraries in the library search list are to be locked in each job. Each library in a job's library search list is locked when the job is started and the locks are released when the job ends.

As the number of active jobs on a system grows, the number of locks just for libraries can get very large. Consider an example where there are 1,500 user jobs and the library search list has 10 libraries – that's 15,000 locks held on the system just to protect those 10 libraries. This is not an efficient use of the system's locking table. This affects the performance of job initiation and termination when the locks are allocated and released; it also has an effect on overall system performance as there is a single lock table and additional overhead as this table grows very large.

While IBM recommends setting this system value to 0, to not lock the libraries, it could not make that value the default value because, without additional changes, setting this system value to 0 would allow a library to be deleted or renamed even when the library is in the library search list.

If you change the QLIBLCKLVL system value to 0, you need to take an additional step to prevent the libraries from being deleted or renamed—you just need one job to allocate the libraries you want to protect. This job can be set up as an autostart job that runs in the controlling subsystem; it would simply do an Allocate Object (ALCOBJ) on each library that needs to be protected, then loop on a Delay Job (DLYJOB) command. This job would remain active until the controlling subsystem is ended or the job is explicitly ended.

By using this strategy, you can protect the libraries from being deleted or renamed and you can greatly reduce the number of locks on the system.

One final point - the QSYSARB system job locks the libraries that are defined in the system library list (QSYSLIBL) and the user library list (QUSRLIBL) system values. IBM has a support article on this.

November 03, 2010

IBM i (and its predecessors) has had the capability to automatically identify and report software problems to IBM for many releases. This was first introduced as the “Software error logging” (QSFWERRLOG) system value.

The capability to automatically identify problems when they occur is something we call First Failure Data Collection (FFDC). The intent is the first time a problem occurs, the data necessary for problem determination is automatically collected. That data, for operating system problems, can then be sent to IBM for diagnostic purposes. The lofty goal is to identify and resolve problems the first time they happen without ever having to recreate the problem.

This is an admirable goal, but in reality is very difficult to achieve. The challenge is that defects occur in unknown and unexpected places –– simply because defects are unexpected failures and programmers don't intend for unexpected failures to occur. As such, the capability to programmatically identify what data should be collected for a potential software bug is inherently flawed as it requires the ability to predict knowledge about what the problem may be in order to program the data collection routines.

In the 5.4 release, we decided to take a new approach to our FFDC support; key to that new approach is the recognition that one can never predict where a problem may occur. As such, there needed to be a dynamic way to modify the parameters around the identification of potential problems and the data that is collected for those problems –– this new support for software problem reporting was called Service Monitor. With the introduction of Service Monitor in the 5.4 release, we moved to a design that is dependent upon a “policy” that identifies potential problems along with the data that should be collected for those problems; this policy can be updated dynamically. Service Monitor is automatically started and supported with the *LOG option of the QSFWERRLOG system value.

While the Service Monitor has existed for a few releases now, it seems that it is a relatively unknown feature. The biggest reason is probably due to the fact that there’s no high-level overview of the Service Monitor function within the Information Center. Bits and pieces of information can be found by reviewing APIs, but there’s no summarization of the function and its capabilities.

Service Monitor, then, is a policy-based software function that is used to automatically identify problems that occur and to take the defined actions defined by the policy when the problem occurs. It’s primarily used for problems within the operating system and licensed internal code.

In the 5.4 release, customers primarily noticed the presence of the Service Monitor function by the many QSRVMONxxxx jobs that run in the QUSRWRK subsystem; a job started for each potential problem that could be reported to IBM. A common question I've heard is “What are all those jobs for?” In the 6.1 release, the design was changed to use prestart jobs rather than individual jobs, so the number of jobs has been reduced, but the Service Monitor function remains.

The policy file that is used by Service Monitor is maintained by the IBM Support Center in Rochester. As experience is gained with known problems, or when new problems are encountered, the policy file can be updated. The latest version of the policy file is downloaded to your system when you connect to IBM using Electronic Service Agent.

The policy file identifies the symptom of the problem, which can be a message, a licensed internal code (LIC) log (also known as a Vlog), or a product activity log (PAL) entry. The policy also identifies the action that is taken when that symptom occurs –– that action could be to collect diagnostic data to send to IBM, download a PTF that corrects the issue or some other action.

Several months ago I wrote a blog about Watches. Service Monitor uses watches as the underlying mechanism to implement the automated notification mechanism for problems that can be detected. Service Monitor policies identify the conditions to watch for –– which can be messages, LIC logs and problem activity logs (PAL entries) –– and thus Service Monitor sets up a watch for each item in the policy file. When the watch condition is matched, Service Monitor has an exit program that gets invoked and uses the policy definition as the way to identify the actions that should be taken when the watch occurs.

If you use the WRKWCH command to look at the *SRVMON watches, you will find many active watches (unless you have changed the QSFWERRLOG system value to *NOLOG). By displaying individual watch entries, you can see the kinds of things that service monitor watches for –– messages, LIC logs or PALs. For example, on a 6.1 system, if you look at the SRVMON0003 watch, you can see that it’s monitoring for CPF1101 sent to the QSYSOPR message queue. CPF1101 is “Subsystem &1 had a function check.” You can imagine that IBM probably wants to know if a subsystem takes a function check and has some basic diagnostic information that should be reviewed if this situation occurs. When this message occurs, the policy indicates that the job log should be collected and the problem reported to IBM.

Most of the policies used by Service Monitor are for LIC logs. Since LIC logs are commonly used to log diagnostic information for problems detected by the Licensed Internal Code, LIC logs are something easy to monitor for and then to automatically send the log data to IBM for review.

September 27, 2010

In the 7.1 release, IBM Systems Director Navigator was enhanced with what’s called Set Target System. This option is at the top of the list of available IBM i Management tasks.

When you select Set Target System, you get the following display. It allows you to specify the system you'd like to manage and to sign on to that target system.

A few key points when using Set Target System are:• You only need to have the Web infrastructure set up on the partition that you initially sign on to. You don’t need this running on the target system that you’re connected to. • Set Target System allows the target system to be at 5.4, 6.1, or 7.1. Thus, with this support, you can now manage a 5.4 partition with IBM Systems Director Navigator. Not all functions will be available to you if you’re managing a 5.4 partition (for example, the performance tasks are only available on 6.1 or later), but many of the tasks are available on 5.4.• This is only available when you initially sign on to a 7.1 partition with IBM Systems Director Navigator.

After you’ve set the target system, you’ll see the following confirmation panel that identifies the release of the target system. Once you've set a target system, you’ll see the system name displayed at the top of the window on all subsequent panels; it should be clear which system you’re managing. If you go back and change to manage the local system, the target system name will be updated to reflect the fact you are now managing the local system.

What is the temporary file system? To understand, it’s helpful to know what happens with objects that
exist in a traditional (we sometimes call it permanent) file system.

The IFS must perform many disk operations to ensure the
integrity of the objects. For objects that are created only to be deleted a
short time later, the disk operations can slow down an application. The
temporary file system is a special type of user-defined file system (UDFS), which contains temporary objects that are automatically deleted by the system
when the system is restarted or the file system is unmounted. Since objects are temporary, the disk operations aren’t required, and applications that
use temporary objects may be able to increase their performance. A temporary
file system isn’t intended for objects that contain critical data that must
persist across an IPL.

Since the create and destroy of any object in a temporary
file system will be faster, applications that create and destroy a large number
of objects will see the most improvement in performance due to the large number
of disk operations that are no longer required. In lab testing, we've seen some
applications that create and destroy a large number of objects complete in half
the time when using the temporary file system. The results for each application
depend on what other processing is performed.

One possible use of this support would be to mount a
temporary file system over the '/tmp' directory so that the behavior of that
directory would be more like other platforms. On IBM i, however, you’re not
limited to a single temporary file system.

Temporary UDFSs are created and can be used much like the
permanent UDFSs. The Create User-Defined FS (CRTUDFS) command is used to create
a temporary UDFS specifying the extension .tmpudfs for the name. Permanent
UDFSs must end with .udfs. For example:

CRTUDFS UDFS('/dev/QASP01/mynew.tmpudfs')

Only users who have *ALLOBJ special authority are allowed
to create a temporary UDFS, and they must be created in the system auxiliary
storage pool (ASP).

The Add Mounted FS (MOUNT) command is then used to mount
the temporary UDFS into the namespace. For example, to mount the temporary UDFS
created above over the directory '/mytemporaryfilesystem':

Now any objects created under '/mytemporaryfilesystem'
will be temporary objects. The temporary UDFS may not be mounted as a read-only
file system. This makes sense because there are no objects in the temporary
UDFS when it’s mounted.

The Remove Mounted FS (UNMOUNT) command may be used to
manually unmount the temporary UDFS if you don’t wish to wait for an IPL of the
system. For example:

UNMOUNT TYPE(*UDFS) MNTOVRDIR('/mytemporaryfilesystem')

As stated earlier, unmounting a temporary UDFS will
delete all of the objects in it. If there are a large number of objects, this
could be a long-running operation and could affect other processes on the
systems that are attempting to access mounted file systems, including
system-supplied file systems such as QSYS.LIB or QDLS. You can avoid this delay
by deleting all the objects first. Additionally, the unmount of a temporary
UDFS requires the same authorization as the Delete User-Defined FS (DLTUDFS)
command since it will be performing the same function. This means the user will
need enough authority to the objects within the UDFS to delete them. If the
user issuing the UNMOUNT command doesn’t have the required authority, the
command will fail and diagnostic messages will be sent to the joblog indicating
the cause of the failure as well as a message indicating how many objects were
removed.

It should also be pointed out that the Reclaim Storage
(RCLSTG) command automatically forces all mounted file systems to be unmounted,
including all UDFSs.

Users of temporary file systems must also be aware that
the storage for the objects isn’t accounted for against the owning user
profile's maximum storage allowed, nor against the process. Therefore, it’s
possible for a user to own objects whose storage totals more than the maximum allowed
for the user profile. The Retrieve Directory Information (RTVDIRINF) command can be used to manage this storage.

Objects in a temporary UDFS can be secured like any other
object in the IFS with one exception. Objects in a temporary UDFS may not be
secured with an authorization list. Otherwise, the owner and primary group may
be assigned as well as user and *PUBLIC authority.

Is a temporary file system a good fit for you? It’d be
impossible to list all the situations where a temporary file system may be
used, but if you (or your application) use integrated file system objects that
are considered temporary and contain noncritical data, then you may want to
consider using a temporary file system for those objects.

The author of this week's post is Margaret Fenlon.
Margaret is the team leader of the IFS and servers development team in the IBM
i development lab. Thanks, Margaret!

IBM Systems Magazine is a trademark of International Business Machines Corporation. The editorial content of IBM Systems Magazine is placed on this website by MSP TechMedia under license from International Business Machines Corporation.