Converts a sparse matrix in SenseClusters format to Harwell-Boeing (HB) sparse format, which is the format required by SVDPACKC. This program also creates (optionally) the lap2 file which provides parameter settings for SVDPACKC.

A sparse MATRIX in SenseClusters' format that is to be converted into Harwell Boeing format.

First line should show exactly 3 numbers separated by blanks as :

#nrows #ncols #nnz

where

#nrows = Number of rows
#ncols = Number of columns
#nnz = Total number of non-zero values

in the MATRIX.

Each line thereafter should show a row of the MATRIX in sparse format. A sparse row should be a space separated list of pairs of numbers where the first number shows the column index of a non-zero value and second number is the non-zero value itself that appears at that column index.

Column index counting starts from 1.

Sample MATRIX examples =>

5 5 15
2 9 4 9
1 6 2 5 3 7 4 8 5 6
1 4 2 5
1 7 2 6 3 7
1 9 2 8 3 9

Shows a 5 x 5 integer matrix containing total 15 non-zero elements. Each ith line after the first line shows the non-zero elements in the ith row. e.g. 2nd line (1st row) has 2 non-zero values (both 9) at column indices 2 and 4. 6th line (5th row) has 3 non-zero values; 9 at index 1, 8 at index 2 and 9 at index 3.

Specifies the Column Pointer Format. The column pointer should have the format of type MiN which indicates that each line in Block1 contains M integer pointers each occupying N character spaces. Default format is 10i8.

Specifies the number of iterations for las2. I, if specified, should not exceed the number of columns in the MATRIX and I should be at least as high as maxprs. Default I = min((3 * maxprs),#cols) where maxprs = min(K,N/RF).

The header file las2.h in SVDPACKC specifies values of various constants for las2. This section provides some guidelines on setting these constants for using SenseClusters. Please note that the version of SVDPACKC found in /External has been modified with the settings as described below.

NMAX

Specifies the maximum possible number of columns in the matrix given to las2. las2.h initially has a value of NMAX = 3000, which allows a maximum of 3000 columns. However, we have found this default is too small for many of our experiments, so we recommend setting NMAX much higher. We routinely use a value of 30,000, and will assume that the user has reset NMAX in las2.h to this value in the rest of this discussion.

In general, this value should be higher than NCOLS shown by the 3rd column on the 3rd line in the output of mat2harbo.pl.

NZMAX

Specifies the maximum possible number of non-zero values in the matrix. Initially the settings in las2.h have NZMAX = 100000. However, again we have found this to be too small. If the user sets NMAX to 30,000, and if we assume a 30,000 x 30,000 matrix is approximately 1% dense, NZMAX could be set to 9,000,000 (30,000 x 30,000 / 100). This is the value we routinely use, and we will assume that the user has reset NZMAX to this value in the rest of this discussion.

The user can check the exact NZMAX for their matrix on line 3 column 4 of the output matrix displayed by mat2harbo.pl and then set NZMAX to something higher than that.

LMTNW

This specifies the maximum total memory to be allocated by las2. The initial setting of LMTNW in las2.h is 600000, however, we find that this is often too small. In general, the size of LMTNW is determined by the values you set NMAX and NZMAX to. LMTNW should be at least as large as :

LMTNW = (6*NMAX + 4*NMAX + 1 + NZMAX*NZMAX)

mat2harbo.p assumes that NMAX has been reset to 30,000 and that NZMAX is set to 9,000,000. Thus,

LMTNW = ((6 * 30,000) + (4 * 30,000) + 1 + (30,000 * 30,000))

This leads to the new value for LMTNW of 900,300,001, which is equivalent to a maximum working memory size of 1 GB. We have found this size to be more than adquate to do SVD on a 25,000 x 25,000 matrix.

math2arbo.pl will show an advisory message indicating the minimum size that LMNTW should be set for, and will issue a warning message if the actual size needed for the user matrix exceeds 900,300,001 (approx 1 GB).

Memory is dynamically allocated by las2 depending upon the size of the input matrix, irrespective of the value of LMTNW. In short, LMTNW specifies the upper limit on memory consumption and the actual consumption depends on the size of the matrix. Hence, LMTNW doesn't specify the total memory that las2 will *always* consume rather its an upper limit that could be consumed if necessary.

In case if las2 fails due to insufficient values of these parameters as indicated by the las2.h file, an error message will be shown in output file lao2 suggesting that the matrix is too large or something ... User is adviced to check 3rd line of the matrix in Harwell-Boeing format (as produced by this program) that is given to las2. Check if NCOLS shown at column 3 of line 3 in the HB matrix exceeds NMAX. If so, increase NMAX to something higher than NCOLS. If not, check if NNZ shown by column 4 on line 3 of the HB matrix exceeds NZMAX in las2.h, if so, increase NZMAX. If not, increase the LMTNW to something higher than (6*NMAX + 4*NMAX + 1 + NMAX*NMAX), or simply increase it without too much computations until las2 succeeds :-)

The other problem that a user might notice is that sometimes las2 runs for a very long time like more than few days. In such case, user is advised to restart las2 by reducing the values of parameters 'maxprs' and 'iter' in parameter file lap2. Specifically, the 2nd parameter in lap2 is iter and the 3rd one is maxprs. Remember that, iter has to be >= maxprs.

Pointers and Row Indices could have MiN type of format which specifies that there are M intergers on each line and each represented with N digits. (M x N must be = 80 as this format only supports column width of maximum 80 characters)

Numeric Values can have either MiN format with same interpretation of M and N as above or MfD.F format which specifies that there are M real numbers on each line, each occupying total D digit space of each last F digits show the fractional part.

Note: D is that total space used to represent a number that includes the decimal point and +/- sign if any.

The first block is an array whose entries show the indices (in block3) of the leading non-zero value of every column.

e.g. If a given matrix is

4 6
2 3 0 0 0 1
0 2 0 1 2 0
0 0 2 4 1 0
1 1 0 0 5 0

Then the first block will contain the pointers

[1 3 6 7 9 12 13]

This shows that

The first column begins at the 1st non-zero entry (2) The second column begins at the 3rd non-zero entry (3) [in COLUMN ORDER] The third column begins at the 6th non-zero entry (2) The forth column begins at the 7th non-zero entry (1) and so on ...

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to