AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

I want to find a proved solution (compatible disk and interface) using a big1TB or 2TB disk (like new ones used with PCs) recognized by OpenVMS, to perform image/backup to it, instead of tape-drive, and then later-on from there to tape.

The main purpose is to speed up the backup process of data-bases, which is taking an average of 5.5 hrs every night.

Please, I had done all kind of things to improve the backup process with quotas, system parameters, etc, etc.

I want the hardware solution.

Does anybody been able to use something like and/or a tape drive been capable of backing-up 1TB let say in 1 to 2 hr time frame, and compatible with OpenVMS?

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

Hello Edmundo

Can you give us more informations:1. What is your OS version?2. How much data do you backup every day?3. What type of tape device you have?3. What is your storage subsystem? Is it external storage (EVA, MSA or some other storage or some DAS)?

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

Edmundo,

I have to agree with Andy!

It takes:1 extra device (set) such as you are using1 shadowing licensea little modification of bootstrap procedure (modify MOUNT command(s) to use shadow set(s) i.o. disk(s)modification of Backup procedure (quiesce database; dismount 1 member of shadow set(s); reactivate database).Your database interruption period will be seconds, and, depending on the particular database, can be just reduced performance of your database has online checkpointing capability.

Now you have (nearly) until your next backup schedule to copy the dissolved member(s) to external, and MOUNT the dissolved member(s) again to the set(s) - for which NO database actions are desired.

>>>I want the hardware solution. <<<

Ah well -- this may be stressing a little, but you can consider this to be some extra hardware, and hardware manipulation...

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

Edmundo,

In general, I would agree with Andy and Jan. The optimum way to collapse a backup window is to use host-based shadowing to expand a shadow set; momentarily quiesce the database; disconnect the extra shadow set members; and restart the database. Done within a procedure, this can be accomplished in seconds.

In terms of increasing hardware, the OP did not mention the details of the current mass storage configuration. Details always matter.

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

Fewer disk spindles will be slower than configurations with more disk spindles, as disks are glacial devices.

Newer storage is SAS and SATA, and those are not the easiest devices to connect onto an Alpha.

If you want faster storage, you want SSD. Again, disks are glacial-speed rotating rust devices. HP offers SSD as an option on BladeSystem, though apparently isn't making a particular push to get SSD into other configurations.

To add to the difficulty, you have slow I/O buses on this VMS box, which means you're also limited to some older storage I/O controllers. Newer stuff tends to be PCIe based, and the PCIe storage is massively faster than the parallel SCSI possible on this box. (I don't know off-hand if Alpha has any possible 8 Gb FC HBA options with the PCI-X, that's typical on PCIe on other platforms.)

Others have discussed RAID-based archival processing and the split-RAID and that's certainly functional. You must have a way to quiesce the environment.

An alternative approach is to use the features of a replication-capable database, and run multiple boxes and online backup. Various database packages offer this, and can perform continuous backups. Most places now use continuous backups; the classic "backup window" is becoming a rarity.

Various x86 boxes routinely obliterate the performance of the Alpha boxes, too. You may be in line for an upgrade, depending on your requirements. Possibly a replacement server, or potentially a dedicated database server, and connecting to that from the VMS box and the clients. (This box generation is around seven years old, and it's going to be showing its age, in terms of performance and available options and I/O buses.)

If you're waving some money around for some new storage (or a new server), then call up an HP reseller or sales rep and make them do some work.

If you're looking to get speed by connecting an off-the-shelf SAS or SATA drives by yourself, then you're looking at buying off on your own debug and integration and testing and support effort. And at just getting to a SAS or SATA bus. Or buying off on somebody that will do that work for you.

But I'd look at the distributed database replication or on-line backups first, then at quiescing and splitting volumes, then at storage or more serious upgrades.

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

Figure out where you're spending your 5.5 hours. What's the bottleneck. Could be getting stuff out of the database. Could be the I/O bus or the controller or the SAN giblets, could be the disks, or the target archival media, or could be the processor or memory.

The following is a generic answer...

Your SAN is probably using 1 or 2 Gb HBAs, which are slow. 4 Gb is around, and 8 Gb HBAs are around, but I don't know off-hand if VMS has support for those.

Your disks and your SAN controller are probably also slow. There are differing 14-slot I/O connections around.

While tape bandwidth is faster than disk, tapes are not usually the primary archival medium chosen if you're looking for speed.

AFAIK, there is no way to upgrade the existing PCI-X slots of an AlphaServer ES47 box to a more modern PCIe bus, short of wholesale replacement of the box. PCIe (which is now widespread in the industry) only appeared in some of the more recent Integrity boxes.

Look at your online archival processing for the database and at quiescing and splitting RAID (shadowing, mirroring) volume sets.

Then start looking at whether you should move forward with an effort to upgrade your existing I/O, or start looking at a port to a faster box. If your applications are local or available on Itanium, moving to a newer box might be the most feasible approach.

If you have backup windows, then your database archives are not being run as a 24x7 operation; you're working with an older system management model and older archival design. Whether you're willing to take a hit of a day on your activities, or if there are secondary and shorter-term archives available here is an open question. I'd definitely look to see if there's a continuous-archive option in the database package(s) involved here.

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

Sometimes is difficult for some people to get a broad expectrum if they donâ t see details!

Again â ¦ more for you, nothing tangible for me yet

The disk-volumes are mounted in a IBM8100 SAN with a farm switches connected to the Alpha ES47 BY 2 GB FCA2684 PCI HBA

This is a 24*7 operation. The only time the application environment could freezeDatabases and generate a journal is at 8:00 PM every night and last a little bit more than 6 minutes. The, only then the process dismount each one member of the shadow-sets in order to produce a image/backup save-set of each it to tape.

This application environment is running an instance of 72 databases with very complexstructure were some of them have over 1000 Globals. We need more disk-volumes in the SAN, in order to have a better distribution of the databases across more spindles and reduce the actual per/volume IO, and I know that moving into a much modern box like aItanium will make a difference, but I can not do that at this point.

Important: There are multiple background jobs that run after 6:00 PM every day (besides the interactive access) and they compete with the backup. Due to time scheduling and compliance there is no way to change this. Take a look at the attachementto obtain an idea of the disk IO rate in a normal week day!

So, yea I want to improve the overall present IO and reduce the backup time.

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

Edmundo,

Thank you for the attachment. It clarifies some things and leaves a great many questions unanswered.

1TB in an hour translates to 284KB/sec (1,024MB/(60*60). Converted to bps, that translates to 2.27Mbps. Granted this is a sustained rate, which must be discounted significantly, but offhand, I am not sure I would be so fast to attribute the problem to the SAN.

I would recommend a deep examination of that backup procedure. I have seen more than my share of backup and other end-of-day procedures that were effectively fratricidal in one way or another.

If this is the case, adding a disk drive may very well make the problem worse, not better. I would recommend a careful, in-depth review of the backup/end-of-day procedures, either in-house or out-of house (Disclosure: We provide services in this area, as do others who regularly contribute to this forum).

As a side note, I would also be interested in seeing the CPU utilization data corresponding to the same time period as the IO queue length chart.

There is a good chance that working the problem without looking at the script in detail is a futile exercise.

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

As Bob (Robert G) points out, it's time to collect and review system metrics. Before deciding on any approach, identify what the system is really doing, what bottlenecks may exist and potential solutions.

Look at CPU load, processes in the COM state, memory used including XFC numbers.

Some potential options including adding a third set of shadow set members or clustering another server to manage the backup operation while the production server continues to service user and report loads. Before deciding an an approach, identify the current system metrics.

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

Trust me Robert (and all others), you donâ t want to see (less analyze) a 3rd party procedure of 3334 lines, where you can see all kind of routines dealing not only with OpenVMS DCL commands but also MUMP, SQL.

Sorry to say like this, but there is nothing to be improved here.

This is a fault proven procedure running here and others places for 7 years.

Attached you will find more graphics snap-shots of the AlphaServer ES47 for the same period.

You will notice (if you can rationale: performance vs. load) that the only things bugging a little bit (during only the high picks of day) is physical memory, which is not the problem during the backup time from 8:00 PM to 1:15 AM

A single spindle mechanical PC drive won't cut it, even if you could connect it to your ES47.

Edmundo,

You said you are using shadowing, but it isn't clear that you are using it for backups.

If you are backing up 1TB in 5.5 hours, that is about 53MB/sec, which is reasonably good.

Why is 5.5 hours too slow? If you are using volume shadowing with split members for backup, that is essentially using your storage a snapshot. If you are going to use shadowning, make sure you have the HBVS patch that supports mini-copy on 7.3-2, and make sure you ensure that minicopy gets used when you reintroduce your shadowset members.

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

Ok here is more for the good, the bad and whatever understanding â ¦

The backup procedure, after checking the overall standing of the applicationenvironment and setting an immediate time for cutoff transactions, freezes the databases and generate a journal. This phase takes in average around 7 min. and should not take more than 5. Why? Because the application is constantly processing transactions in order to generate data transfers via communications and they can not wait longer.

We are expecting that more and more users are going to be interacting with system so more of these transactions are going to be occurring during the backup period.

Then the backup procedure after analyzing the status of the disk-volumes proceeds to dismount the secondary member of all shadow-sets and only then proceeds to backup/image each one. As they are finishing there is aBackground process checking that the member is free and mounts it back into the pertinent shadow-set using minicopy.

As all this is going on the operator console is actively receiving messages of whatever is going on. If the first tape is used then a second tape is mounted and the process proceeds.

Even all these seems simple, is a very complicated procedure taking in consideration multiple variants and possibilities, but it is straight forward. Nothing in the procedure cause a road block to slow down the backup process.

Is the actual backup process, which depends in the system tune and hardware capabilities, the one that could cause a detriment or improve in the results.

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

There seems to be an assumption that you need to backup the whole data bases every night.

I don't believe all that data changes every day, so why spend so much time shoveling the same bits again and again?

Think backwards. What needs to be restored, under what circumstances. What strategies can you use to achieve that restored state, without using the big hammer of moving all your bits every day. Maybe a full backup on the weekend with incrementals during the week?

It's all very well buying bigger and faster devices, but the brute force approach will eventually get too expensive.

Re: AlphaServer ES47 7/1000 - Need a SUPER fast (big) io device!

Jon,

Mea culpa. I did skip one step. The problem remains; while 2Gb SAN would be a problem for achieving the one hour goal, it is far less of an issue over a longer period.

I have seen many occasions where optimization can backfire. What works for one configuration can backfire in a subtle fashion on a different configuration. In this particular case, one aspect that comes to mind are the activities of the storage array.

I would question whether the SAN is achieving its maximum capability. What is the performance profile of the IBM SAN/controller? Individual volumes are not the only potential bottleneck, the volumes also all share an array controller, and that may very well be the bottleneck.

Yes, understanding a 3,300 line procedure with multiple threads can be a challenge, but there has been nothing mentioned that excludes some issue relating to task sequencing. Such issues are always perfectly obvious retrospectively; up front they always seem unreasonable. I will not lengthen this post with examples, however I will mention that even as simple a thing as BACKUP has had challenges with performance (outside of quotas).

More performance data illuminates, but tuning metrics does not correct problems with the underlying code.