5-6 FILE PORTING AND FTP
*************************
Files may be transferred between machines:
1) Over some network, e.g. using the FTP protocol (TCP/IP),
Kermit (usually serial line but necessarily).
2) Over some network, archived by a program that adds some
file-system specific info, e.g. ZIP (VMS), BACKUP (VMS).
This is important for files created on a record-oriented
file-system, where files have an internal structure,
and the operating system keeps the relevant info.
3) Using standard ANSI magnetic tape
Porting formatted files between different machines using FTP is usually
no problem, the FTP protocol automatically performs the few needed
conversions (ASCII/EBCDIC, record structure conventions).
Unformatted files, are very machine dependent, and the FTP protocol
doesn't support the required conversions, so porting them between
different machines may be very difficult.
Some relevant information the purpose of file porting:
Hardware Floats Endianity Unformatted Control
----------------- ------ --------- ------------ -------
Sun UNIX IEEE BIG Variable 4
IRIX IEEE BIG Variable 4
CRAY CRAY BIG
DEC VAX DEC LITTLE Segmented 2+2
DEC ALPHA IEEE+DEC LITTLE Variable 4
IBM PC compatibles IEEE LITTLE
IBM mainframes IBM BIG
DEC compilers provide good options for the conversion of unformatted
files between different platforms. Sun provides conversion software.
These machines can be used as "conversion platforms" for others,
however, the best methods are:
1) Modify the program that produced the unformatted file
to produce a formatted one, and run it on the original
machine or similar one.
2) Write a program that will read the unformatted file and
write an equivalent formatted one, ON A MACHINE LIKE THE
ONE THAT WROTE IT, thus avoiding the machine-specific
complications discussed below.
The translation program may use code excerpts from the
original program, or based on some knowledge of the
unformatted file's structure.
Some people use the XDR routine library to solve the problem of
porting unformatted files. They write and read files using the XDR
routines instead of Fortran I/O statements, but of course this is
not standard Fortran, and makes the programs less portable.
By the way, HDF files are self-documenting and should be read with
HDF routines.
A short digression on FTP
-------------------------
The File Transfer Protocol (FTP), is usually used interactively by
invoking a program with that name.
Many of the transfer options proposed by Postel and Reynolds in RFC959
were not implemented, and FTP programs can properly handle only text
file transfers. Binary transfers are properly handled only in the
simplest case, between two byte-oriented (e.g. UNIX) file-systems.
FORTRAN require record-oriented files, on byte-oriented systems the
FORTRAN compiler has to support this requirement, it produces and
reads files with variable-length records.
However, binary FTP transfers between a record-oriented system (e.g. VMS)
and a byte-oriented one are not supported, and all or some of the control
information of each record is discarded in one direction, and is passed
without proper translation in the other.
FTP shortcomings can be worked around by proper modification in the
FORTRAN source code. When writing files intended to be transferred
from a record-oriented system to a byte-oriented one, a count-field
value can be prefixed to each record. In the other direction a routine
that understands the foreign record format should be used for reading.
FTP of archived files
---------------------
Archiving programs like the VMS version of ZIP (used with "-V") and
the VMS BACKUP program store some control information of the file.
When the file is restored that information can be used.
This is useful when transferring files between two VMS machines,
via a UNIX one.
Porting formatted files
-----------------------
This is relatively simple, possible problems are:
1) Different character codes (EBCDIC on IBM mainframes,
ASCII on all others).
2) File type translations (Variable-size-records on VMS,
some Stream type on almost all others).
Direct FTP can take care of both these problems, character codes are
transformed into a standard character set (standard 8-bit Network
Virtual Terminal-ASCII) before transmission and are transformed again
to the local character set upon reception.
Similarly, records are translated to a standard form (stream CR/LF)
before transmission and transformed to the local structure upon reception.
It is recommended to use formatted files to transfer information
between different systems. The disadvantages are that the formatted
files are larger and some precision is lost on the radix translations.
Porting unformatted files
-------------------------
Here the problems start:
1) Different endianity (DEC machines and PCs are little
endian, all else are big endian).
2) Different integer sizes / float formats (integers have
the same general format, most floats are now IEEE).
3) Different character codes (EBCDIC on IBM mainframes,
ASCII on all others).
4) File type translations (Variable-size-records on VMS,
some Stream type on others).
5) All the above problems are solvable in principle,
but if you don't know the layout of variables in
the unformatted file, you would have to guess it
using too little information, with unsafe results.
The required knowledge can be found in the source
of the program that wrote it, or in notes left by
the programmer(s).
Problems #1-3 makes porting unformatted files content dependant,
i.e. you need to know the contents of a file in order to port it.
In the general case each variable has to be converted separately,
so the converting program has to know in detail the layout of
variables in the file.
Provided you know the internal structure of the file, porting
unformatted files is less frightening than the above list of
problems suggests. For example, UNIX workstations are compatible
except for the endianity problem.
Reading general binary files from Fortran
-----------------------------------------
Sometimes you want to read the content of a file "as it is", and
bypass the logical structure.
The record-oriented Fortran I/O routines must consider files either
as formatted or unformatted, and in both cases they treat one or more
bytes at the end of each record as control information, not as data.
You may need this ability when you want to process non-ASCII files,
e.g. files in one of the many graphics formats, or unformatted files
written on another machine.
There is no portable solution to this problem, some possible solutions
are:
1) Some compilers support a special OPEN keyword:
STREAM (VMS, Digital UNIX)
BINARY (MS Powerstation)
TRANSPARENT
2) Some UNIX compilers allow you to open a file
in DIRECT access mode, with RECL=1.
You can read then each byte by specifying
its location in the file.
Compilers suporting a DELETE statement for
direct files (VMS with the default /VMS option,
Digital UNIX with the -vms option), expect a
special flag located inside the record.
DEC uses the first byte of the record with
value equal to '@' (or NUL ASCII value 0).
To do the trick, the support for the DELETE
statement has to be disabled on VMS by the
/NOVMS compiler option, on digital UNIX the
compiler option -vms should not be used.
On VMS you should use RECL=2, as RMS assumes
all records are word aligned, file attributes
have to be modified by:
SET FILE/ATTRIBUTES=(RFM:FIX,LRL:2,MRS:2)
When OPENing the file use: RECORDTYPE='FIXED'.
3) VMS offers in addition many special techniques:
o Mapping the file to a memory area,
e.g. a common block, and reading it.
o Low-level routines: RMS block-mode.
o Getting the file size, declaring it to
contain fixed-size records, and reading
it with a buffering routine.
Porting from a typical UNIX to Digital UNIX
-------------------------------------------
This is an easy case, The DEC Fortran compiler supports options
that makes such porting easy (again, provided you know the internal
structure of the file).
Transfer the file to the DUNIX machine (I didn't use FTP as our
machines here share filesystems, but I think that FTP wouldn't
make a difference)
The following conversion program assumes:
1) All variables are REAL*4 (can be modified)
2) There is no record with more than MAXREC records
program convuf
integer MAXREC, BYTE2REAL
parameter (MAXREC = 100000, BYTE2REAL = 4)
real data(MAXREC)
integer count1, count2, i
C ------------------------------------------------------------------
open (unit = 10,
& file = 'unixfile',
& status = 'OLD',
& form = 'UNFORMATTED',
& convert = 'BIG_ENDIAN',
& recordtype = 'STREAM')
open (unit = 11,
& file = 'decfile',
& status = 'NEW',
& form = 'FORMATTED')
C ------------------------------------------------------------------
100 continue
read(unit=10, end=999) count1,
& (data(i), i = 1, count1/BYTE2REAL),
& count2
C write (*,*) '... ', count1, count2
if (count1 .eq. count2) then
write(unit=11,fmt=*) (data(i), i = 1, count1/BYTE2REAL)
else
write (*,*) ' something is wrong '
write (*,*) ' prefix count is: ', count1
write (*,*) ' suffix count is: ', count2
stop ' '
endif
goto 100
C ------------------------------------------------------------------
999 write (*,*) ' end of file reached '
close (10)
close (11)
end
The docs are not clear about the "RECORDTYPE" OPEN keyword,
DEC Fortran 90 docs are self-contradictory on this point.
It seems that the keyword once meant to support text files
with records delimited by CR/LF, but evolved to support
non-record-oriented files.
Endianity conversion
--------------------
Integer/Float format conversion
-------------------------------
Control information conversion
------------------------------
If you have the program source you can do it with a few modifications,
in the general case you'll need a conversion program.
1) Unformatted file from VMS to UNIX:
On VMS you can use unformatted variable records if your records
are no longer than 32764 bytes, specify RECORDTYPE='VARIABLE'
in the OPEN statement, as the default for unformatted I/O is
'SEGMENTED'.
FTP discards the the 2-byte count-field of the variable records,
you can re-prefix (and re-suffix) the record length to the data
in each WRITE statement:
INTEGER RECLEN
REAL X, Y, Z
......................
RECLEN = SIZEOF(X) + SIZEOF(Y) + SIZEOF(Z)
WRITE (10) RECLEN, X, Y, Z, RECLEN
If your unformatted records has to be longer, write each record
in parts, each one smaller than 32764 bytes, write the record
length in the beginning of the first part, and in the end of
the last part.
An unrecommended option is using the C Run-Time-Library function
'write', it converts VAX/ALPHA little-endian longwords (4 bytes)
to big-endian. 'write' doesn't add the prefix & suffix count-fields,
and creates a stream/LF file, an unsuitable type. Include the unixio.h
and file.h standard headers, they contain the function prototype and
associated argument constants.
2) Unformatted file from UNIX to VMS:
FTP will create by default 512 bytes long fixed-length records.
If the records are of the same known length, you may change the formal
record-length to that value, without changing anything in the file.
Use either: SET FILE/ATTRIBUTES=(LRL:size) filespec
or Joe Meadows FILE utility, then use OPEN with RECORDTYPE='FIXED'
and RECL=size, read and ignore the count field (first 4 bytes).
If the records are not the same length, you'll need a routine that
can reconstruct the original structure.
EBCDIC/ASCII conversion
-----------------------
File type conversion
--------------------
FTP options
-----------
FTP transfer options can be divided into 5 categories, most
of them are unimplemented:
File structure (e.g. stru file)
-------------------------------
file Byte-oriented file system
record Record-oriented file system
page ----
mount ----
vms [On VMS/MULTINET] Preserve all VMS characteristics
automatically negotiated.
Transfer type (e.g. type ascii)
-------------------------------
ascii For text files, default.
ebcdic For IBM mainframes
backup [On VMS/MULTINET] for VMS/BACKUP files
binary Same as IMAGE
image For unformatted data files and executables.
local-byte-size
logical-byte Same as LOCAL-BYTE-SIZE
tenex
Transfer mode (e.g. mode stream)
--------------------------------
stream Usual mode
compressed Supported by TGV/MULTINET only?
block
Form formats
------------
non-print
telnet format effectors
carriage control (ASA)
Auxiliary
---------
record-size [On VMS/MULTINET]
site rms recsize [On VMS/MULTINET]
block [On VMS/MULTINET]
case