Fundamentals of Sequence Analysis, 1998-1999Problem set 1: Computing basicsProblem group 1. Logging in
1A. Which version of the Genbank database is available locally?
GenBank 109.0 10/98
1B. What does the GCG program "DIVERGE" do?
From
$ HINTS
you learned about GENHELP, and then did
$ GENHELP DIVERGE
Diverge measures the percent divergence of two protein coding
sequences using the method of Perler et al.
1C. A disk block is 512 bytes. How much disk space do you
have available in bytes, and how many bytes can you
put on disk before you run out of space?
At the end of the login you will see two lines like:
User [GROUP,USERNAME] has 99 blocks used, 19901 available,
of 20000 authorized and permitted overdraft of 50000 blocks on USRDISK
So you can use up to 512 * 20000 = 10 Mbytes of disk space.
If you go over your diskquota you will be unable to put new files on disk,
which will break just about all of the software, and you will see a message
about "quota exceeded".
(The overdraft is available so that a running program can go over diskquota
without being forced to fail. If this happens, be sure to clean up after
it or you will see the standard over diskquota problems for all subsequent
programs.)
Problem group 2. Commands.
2A. $ COPY/CONFIRM CLASS:generic_login.com []
It copies the file "generic_login.com" from the "CLASS" directory to
your default directory.
2B. $ DIR/SINCE
This lists all files that have been placed in the default directory since
midnight.
2C. $ TYPE/PAGE generic_login.com
Display the contents of the file, one page (screenful) at a time.
2D. $ HELP @LOCAL GENER OPEN EDITORS
Enter the HELP utility and look at information on text editors.
2E. $ HELP @LOCAL -
GENER OPEN -
EDITORS
Same as above - this demonstrates continuation lines. Not important
here, but it would be in a very long command.
2F. $ RECALL HE
Recall the last command beginning with "HE", which is the previous
help command.
2G. Try pushing the up/down/right/left arrows on the keyboard
Up and down move among recent commands, left and right move across them
so that they can be edited and reissued.
2H. $ mytype :== type/page
$ mytype generic_login.com
Define your own symbol for a particular action and then execute it. If
you want some shortcuts always defined put the definitions in your
login.com file.
Problem group 3. Directories and files
3A. How many files are in your directory and how much
space do they occupy?
The answer varies. To find out, use the command:
$ DIR/SIZE
3B. Print jobs can be directed to local laserprinters.
Issue the command: $ SHOW QUEUE *
What is the name of the queue that goes to your local
laser printer?
(If you don't see one, and you have a networked printer,
request that one be set up for you.)
The answer varies depending on your lab. If you were in the Zinn
lab the answer would be: ZINN_LW. Most labs have print queues
with a similar syntax
3C. Do you have a LOGIN.COM file in your home directory?
(The commands in this file run automatically when
you login to configure your process.) If not,
rename generic_login.com (from 2A, above) and edit
it (see 2D, above) to reflect the appropriate print
queue for your lab. Invoke it with the command:
$ @login
then verify that print jobs come out on your printer
with the command:
$ print login.com
Ok, there's no answer - it was a ruse to get you to create
a working login.com file!
3D. What command do you use to clean out old versions of
files that are in your directory? Try it now, did it
work?
$ PURGE
removes older versions of files (those with smaller version numbers).
The easiest way to tell if it worked, meaning, that files were deleted,
is to use
$ PURGE/LOG
which will list the name of each file as it is deleted.
Problem group 4. File protections
4A. What is the protection on the files in your directory?
(Hint, HELP DIR)
Use the command:
$ DIR/OWNER/PROT
and you should see a series of lines like this:
CMD.HTH;1 [GROUP,USERNAME] (RWED,RWED,RE,)
4B. What happens when you try to read a file that you don't
have access to? Try: $ COPY [-.MATHOG]login.com []
It doesn't let you do it, and gives you this error message:
Error opening SEQAXP$DKA200:[USERS.MATHOG]LOGIN.COM;55 as input
Insufficient privilege or file protection violation
4C. What do you think will happen when you block access
to a file from the SYSTEM account? Daily backup
tapes are made of all user files from the SYSTEM account.
If the user disk fails, and the files are restored from
tape onto a replacement drive, will a file that was
protected from SYSTEM read access be restored to your directory?
If you block read access to a file from a SYSTEM account you're living
very dangerously. The file may not be backed up on tape, and
consequently cannot be restored. Furthermore, you're not protecting the
file's contents in any way since anybody with SYSTEM level privileges can
bypass the file protection system. (Because users sometimes do this,
the backup process runs with BYPASS privileges.)
Problem group 5. Data transfer
5A. Use FTP on your PC or Macintosh. Copy login.com
from your account to your PC/Mac, then back to seqaxp.
(Remember, this is a text file.) Call the new copy
"new_login.com". Did the transfer work correctly?
Look for subtle errors with this command:
$ DIFF login.com new_login.com
There is no answer - it should have worked.
5B. Login to seqaxp, create a subdirectory called [.KILLME],
and copy your login.com file into it. Repeat the transfer
as in 5A, but this time against the file in the new
subdirectory. Again, check that the transfer did
not change the file's content. Now remove any files
in the [.KILLME] directory, and then delete the
directory itself.
There is no answer - it should have worked.
5C. There are numerous ways to mess up file transfers, sending ASCII
as BINARY, or vice versa, or sending files with lines that are too
long. If a file that you have loaded on SEQAXP misbehaves you can
analyze it to see what is wrong. Issue the command:
$ ANALYLZE/RMS CLASS:TOOLONG.TXT
and look at the RMS FILE ATTRIBUTE section. Why might this file
cause problems for some programs?
This is the line that matters:
Longest Record: 326
some programs may choke on lines this long.
Optional question, only for Pathworks users.
5D. DECNET allows most OpenVMS commands to function over
the net. This can be very convenient for moving text
files to/from a Macintosh. (A version for PCs also
exists, but I don't believe that anybody here uses it.)
Let's assume that your hard disk is called "BIGDISK"
on the machine "MACNAME", and that you have run the NCP
program on your Macintosh and configured it to allow
proxy connections from your SEQAXP account. Try this
from your SEQAXP account:
$ DEFINE DESKTOP MACNAME::BIGDISK:[DESKTOP_FOLDER]
$ copy login.com desktop:
If DECNET is working, you should see a file called
"login.com" on your desktop. What command would you
use *on SEQAXP* to view the contents of that file?
(Hint, DESKTOP has been defined as a logical name.)
$ type desktop:login.com
Problem group 6. GCG basics
6A. Configure your graphics device appropriately (for most terminal
emulators that is some form of Tektronix emulation). Issue the
command: $ SHOWPLOT
What do you see?
You should see a bunch of squares, circles, and text, with the phrase
"Genetics Computer Group" in fancy text at the bottom.
6B. What are the command line options for REFORMAT?
$ reformat/check
Reformat rewrites sequence file(s), scoring matrix file(s), or enzyme
data file(s) so that they can be read by GCG programs.
Minimal Syntax: $ REformat [/INfile=]Reformat.Txt /Default
Prompted Parameters: None
Local Data Files:
/DATa=Translate.Txt three-letter to one-letter codes
Optional Parameters:
/LINesize=50 sets number of characters per line
/BLOcksize=10 sets number of characters per block
/BLAnklines=1 puts blank lines between the sequence lines
/NONUMbering suppresses numbering
/NOCOMments suppresses comments
/DNA changes U into T
/RNA changes T into U
/UPPer makes all sequence characters uppercase
/LOWer makes all sequence characters lowercase
/LIStfile[=Reformat.List] writes a list file of output sequence names
Press q to quit or for more:
/MSF reformats sequences into an MSF output file
/DEGap removes gap characters (.) from the sequence
/THReeintoone translates three-letter peptides into one-letter
/ONEIntothree translates one-letter peptides into three-letter
/COMparison reformats a table instead of a sequence
/ENZymedata reformats an enzyme data file instead of a
sequence (used with /PROtein, reformats a
protein enzyme data file)
/PROtein insists that the sequences are reformatted as
protein sequences
/NUCleotide insists that the sequences are reformatted as
nucleic acid sequences
/PROFile reformats an old profile into the new profile
format
/EXTension=.Seq defines a file name extension
/TRANSlate=FileName.Txt lets you name the output translation table
[/OUTfile=]NewSeqName lets you name the output file
/NOMONitor suppresses the screen trace showing each output
file
/BEGin beginning of range, defaults to 1
/END end of range, defaults to Maximum sequence length
Use these to extract a subsequence from a sequence or MSF file.
/DELete delete the subsequence in the range, leave the rest
/LOOKup="U.,T," Convert characters in first string to matching character
in second string.
/NODots Assume input sequence has no ".."
Note that the SAF uses a locally modified version of REFORMAT,
and that options from "/BEGin" on are only available here.
Commands like the following are easier than going into SEQED:
$ reformat/infile=initial.seq/outfile=final.seq/begin=100/end=200
$ reformat/infile=initial.seq/outfile=final.seq/begin=100/end=200/delete
$ reformat/infile=initial.msf{*}/outfile=final.msf/msf -
/begin=100/end=200/delete
The first creates an output file containing only bases 100-200
(inclusive), the creates an output file with those same bases deleted,
the third deletes a column of sequence.
6C. Use REFORMAT to put into GCG format the following PROTEIN
sequence: AAAGCTCTTGGGTTTT
(Hint, put that sequence into a file, and then run REFORMAT on it).
Now look at the resulting file (TYPE) - does the line with a ".."
indicate that this is protein? Figure out the correct operation
to make this sequence into a GCG protein sequence file.
(Note, get in the habit of naming GCG protein sequence files
whatever.pep, and GCG nucleic sequence files whatever.seq -
that way you can more easily keep track of them.)
REFORMAT has to guess if this is a peptide or a nucleic acid and does so
based on composition. In this case, it (understandably) guesses wrong:
Killme.Seq Length: 16 January 16, 1999 16:47 Type: N Check: 625 ..
^
If it were in fact a peptide, you could have forced it to the correct
type with:
$ reformat/infile=killme.pep/protein
6D. Use the GCG program SEQED to edit this sequence - put a P on the end
and change the first A to an S. What happens if you leave the program
with a QUIT, and what happens when you leave with an EXIT? (Look
at the edited file to see).
QUIT doesn't save your changes, EXIT does. This is the same as the
OpenVMS EDT editor.