I have two SUN 280R servers running Solaris 8 and Informix Dynamic server 9.3 that
were an HDR pair. A previous DBA added/changed the dbspaces and chunks to accommodate
immediate space needs of the primary HDR server, but failed to do the same on the
secondary HDR server. HDR was still performing it's replication from the primary
to the secondary after several space "improvements" until an array cable
became dislodged and HDR failed on the secondary server. The secondary HDR server
is now in fast recovery/off line. I am attempting to take a level 0 backup (ontape)
from the primary server to restore to the secondary HDR server to fix HDR and start
it working again. The problem that I am running into is the dbspaces and chunks
name, size must be the same. Since the Primary server was altered, and the secondary
server was not, the ontape restore fails. I have attached a cut/paste of an "onstat
-d" command from both the primary and secondary HDR servers:

I am relatively new to Informix administration, and having trouble constructing
the "onspace" commands need to bring the secondary server's dbspaces/chunks
name, size into alignment with the primary server. Investigation of the onspace
command shows me examples of it as follows:

If anyone has the time to examine the above "onstat -d" output and construct/help
get me started with some onspace commands that will bring the secondary's dbspaces
and chunks back into alignment with the primary server, it would be greatly appreciated.

Popular White Paper On This Topic

Administrator's Guide for IBM Informix Dynamic Server Version 9.3, Part Number 000-8324, section 10-17 "adding a Chunk to a Dbspace or Blobspace" has all this information. Can be downloaded from IBM's website.

Yes, there currently is a level 0 backup of the primary server.
The primary server is up, and can be backed up to remote tape at
any time. ontape is used for level 0 backups to tape and cold
restores from tape. ISM used for Logical and Physical log
backups on a continuous basis. ISM and ontape output files are
not interchangeable for restores.

It doesn't appear that the systems need to be identical, the target just
needs to be configured large enough to hold the replication.

I'm running replication on IDS 9.4 on AIX, and my systems are not
exactly identical. When I have needed to do a level 0 restore, I always
use ontape -p. BYW, it will stay in fast recovery for quite awhile
until everything syncs up. Monitor it with onstat -m.

Tim

What do you need: The instance
The dbspaces must be similarly configured on the two servers, since the
first step in initiating HDR is to restore a Level 0 archive from the
primary onto the secondary. However, whilst the same link for each chunk
must exist, that link can point to anything, as long as it is large
enough to contain the chunk. For instance, the root dbspace on the
primary may be a 1Gb raw disk partition pointed to by the link
$INFORMIXDIR/links/rootdbs, but the same link on the secondary could
point to a cooked file. You do use links for the chunk pathnames, don't
you? Make sure that the TAPE and LTAPE parameters are the same between
the two instances . you will be restoring archives and perhaps log tapes
written on the primary, and HDR will not start to initiate replication
if the block sizes or capacities differ.

What do you need: The network
The two servers must be able to see each other on the network, and be
able to connect to each other. If there is a firewall in the way, this
must be opened up to allow traffic on the tcp ports used by the two
instances. Once the restore to the secondary has been done (or there is
any sort of instance on the secondary), bring it online, and use
dbaccess on the primary to connect across the network to the secondary,
and vice versa. Even though you may be prompted for a username/password,
if you see the list of databases on the remote server the connectivity
is OK.

It's also worth checking the volume of logical logs you get through, it
is essentially this which is transferred from primary to secondary and
you will need sufficient bandwidth.

Initiating Replication.
Using ontape even if the normal backup strategy is onbar (we're assuming
here no-one still uses Onarchive!) it is far easier to do this with
ontape. See below for additional info on using onbar for this. On the
primary run onstat -g dri. This will tell us the current HDR status. You
should see something like this .

If .Last DR CKPT (id/pg). has any values replication is unlikely to
start . run onmode -d standard and restart the instance. If you don't do
this you are likely to get all sorts of errors. Start with the level 0
restore. It doesn't have to be the latest . the main criteria are that
the configuration on the secondary will take the restore (same chunk
links, compatible onconfig file etc.) and that the secondary is able to
access all logical logs backed up from the primary since the level 0 was
taken. The command to use is ontape -p, since we do not want to bring
the instance online. Don't ask it to back up the logs, and don't restore
a level 1 tape . replication can only start from a Level 0.

Once you get .program over. start replication (you can do this earlier,
but you.ll just fill the log files with errors). Run onmode -d primary
secondary_server_name on the primary, and onmode -d secondary
primary_server_name on the secondary. Check the online log on the
secondary. If there is a message DR: Start failure recovery from tape
... you need to recover logs from tape. The primary is unable to start
sending logs completed since the level 0 because the one it needs to
start with has been backed up. On the secondary log tape device mount
the tape containing that log, and run ontape -l this will start the log
recovery. This is why it is essential that the tape parameters match
between the servers. If more than one log archive has been done since
the level 0 keep loading the tapes and continuing the restore.

If no log tapes need restoring, or after all that are required have been
restored, you will see DR: Failure recovery from disk in progress . in
the secondary online log. Now the primary is sending all non-archived
log files to the secondary, which is in turn rolling forward. In due
course each server will confirm that DR is up and running i.e. DR:
Primary server operational and DR: Secondary server operational will
appear in the respective log files. The banner from onstat commands will
show the status On-Line (Prim) or On-Line Read-Only (Sec). The secondary
server is now accessible exactly as though it were a normal server,
except that it is read-only. For normal operation no further action need
be taken on either server.

Restarting Replication
Depends entirely on why it's failed in the first place. One thing that
is common to almost all scenarios is to make sure the secondary doesn't
come up as standard. This almost certainly means restarting from a Level
0.

Loss of network connection
Later versions of the engine will try to reconnect every few seconds .
this can cause the log file to become quite large. But it does mean that
as soon as the network is back replication will resume.

HDR is a bit sensitive over VPNs, and has been known to reset the odd
firewall parameter, blocking it's own traffic.

Other reasons
Assuming the logical logs on the primary haven't wrapped round, i.e. the
last one rolled forward by the secondary has not yet been overwritten,
there's a good chance HDR will resume. Shut down the secondary, and
restart it in quiescent mode. Switch the primary back to standard, then
primary again. You may have to try a combination of these.

Common 'gotchas'
You cannot change the logging type of an unlogged database whilst
replication is in progress. ondblog will just ignore the command, but if
you use ontape you'll get an error. You have to break replication on the
primary, set the database logging as required and then do a Level 0
archive. Restore this archive on the secondary, and re-initialise
replication. Basically Online is not prepared to copy an entire database
via HDR.

Adding chunks/spaces
This is a common one! Make sure that the raw devices and links are set
up on the secondary as well as the primary, and run the onspaces (or
onmonitor) command on the primary only . the command is then replicated
over the network to the secondary, and the space/chunk is added
automatically.

I think I have it sorted out. There are four dbspaces on the secondary
which are missing one chunk each; they are:

1. rootdbs;
2. ap;
3. ori;
4. cts.

A dbspace exists on the secondary that does not exist on the primary; it
is:

hier.

Since dbspace hier is not part of the primary, it probably should be
deleted. Note of caution: make SURE there are no tables located in this
dbspace which you need to keep before deleting it. To delete a dbspace:

Using the proper offsets when creating chunks on raw devices is very
important. If a new chunk is added without the proper offset, two
things might happen:

1. Part of the disk may not be utilized between offsets; i.e., chunks;
2. Part of the disk may be overwritten because of overlapping offsets;
i.e., chunks.

Point number 2 could be disastrous because it would overwrite a chunk
and corrupt any table that exists at the intersection of the offsets.
Please double and triple check the offsets I have calculated.

The first offset is the chunk's offset, and the last offset is
the offset for a mirrored chunk. Chunk Offsets and size need to
to given in kilobytes, and are usually displayed as a page count
in onstat or onmonitor, thus the calculation (number pages *
page size = chunk bytes). Thanks to all who contributed info to
this thread so I could get to this point.