From: steveb@shade.UUCP (Steve Barber)
Newsgroups: unix-pc.general,comp.sys.att
Subject: Kernel Patch to Improve Serial Line Response Time (update)
Date: 15 Apr 90 06:15:25 GMT
Reply-To: steveb@shade.ann-arbor.mi.us (Steve Barber)
Organization: Ripley Computing/Consulting Services (Ripley, MI)
Back in December, Gene Olson (gene@digibd) posted information explaining
how to patch your 3.51a kernel to allow it to move characters from the
raw interrupt queue to the raw input queue every 1/60th of a second,
rather than every 4/60ths of a second as the code in the kernel is written.
The rest of this article is an update to Gene's original article
explaining how to make the change to the 3.51m kernel. (Yes, due to
the MeterMaid code, the addresses are different.)
I should add a few comments here regarding the use of this code, now
that I've been running this way for a while:
1) I can definitely notice a difference when typing on a terminal at 19200.
The characters are echoed almost instantaneously, giving the system
a much crisper feel.
2) These changes do tend to bog the system down if you have a lot of data
moving in and out of your serial ports. I don't have any kind of numbers
to back this up, but during a 19200 uucp transfer to a directly connected
Sun, doing much of anything else with the system can be frustrating.
The same number of interrupts are being serviced, but now characters are
being moved 4 times as often. I dunno, maybe it's just me. A busy 19200
port is going to slow things down with or without this modification.
3) I can testify that setting nttyhog in ktune to 0 and running two
serial ports heavily (in my case, 19200 and 2400) can lock up your
system. I guess 19200 is fast enough to steal all the clists if you
don't use ttyhog to flush the buffers.
4) My friend whose Sun I'm now connected to used to have a 3B1. When we
applied this patch to both of our machines, our average UUCP transfer
rates jumped from around 750 bytes/sec to almost 1400 bytes/sec. When
only one machine of the two had the patch, average transfer rates were
around 1000 bytes/sec, but presumably the machine with the patch could
read data at 1400 bytes/sec, and the other machine was still reading
at 750 bytes/sec, thus causing the average to be 1000 bytes/sec.
5) Just as another data point in the hardware flow control issue (please
don't start another argument out of this!): I've been using hardware
flow control on 19200 ports on my 3b1 since OS 3.51. For the most part,
it works. uucp works great that way, and in fact due to the packet
nature of the g protocol, I seem to recall that it doesn't end up needing
to use the flow control much anyway. The only problems I've noticed
have been when I'm logged in with cu over a 19200 serial line with
flow control. Occasionally there are lost characters and sometimes I
suspect (during a long ls, for instance) that sometimes I lose more like
a few lines than a few characters, but I've never worried about it too
much. pcomm 1.1 handled the speed with no problem, but 1.2 seems to lose
some characters now and then too. (That and the feeble vt100 emulation
slows it down a lot. I may just comment that out.)
The remainder of this article was written by Gene Olson. It describes
how to patch your 3.51m kernel using adb. I've changed the portions that
need to be changed after the upgrade from 3.51a to 3.51m.
---------------------------------------------------------------------------
BACKGROUND
----------
I noticed that the response time to keystrokes on my UNIX-pc
was not as good as I have come to expect from terminals and
other systems. I poked around a bit and found the cause.
In the Kernel, there is a routine called "serscan" which moves
data (this is conjecture) from the raw interrupt queue to the
raw input queue. As it turns out, "serscan" (apparently) is
only called by the "clock" routine, and only every 4 clock TICs.
This 4 clock delay causes up to a 4/60 second variation in
keystroke echo times which is quite noticeable to some people.
(Me for example).
The following "adb" patch changes the "clock" routine to call
"serscan" every TIC instead, improving both instant response
time and improving consistency.
Some extra CPU time is taken with the extra call. I found on
my system that CPU intensive benchmarks too .3% longer. This
decrease in performance was not noticeable of course, and it is
well worth it to a crackpot like myself.
*WARNING* +----------------------------------------------------------+
*WARNING* | ***** DISCLAIMER ***** |
*WARNING* | |
*WARNING* | If you do this and blow it, you have just corrupted |
*WARNING* | your UNIX kernel and may wind up reloading your system |
*WARNING* | from scratch. Make a backup copy of /unix before |
*WARNING* | beginning so you can copy it back if necessary. |
*WARNING* | |
*WARNING* | You may wish to patch the in-core version of UNIX |
*WARNING* | first so you can try it out before altering your |
*WARNING* | permanent version of /unix. |
*WARNING* | |
*WARNING* | You take personal responsibility for any patches to |
*WARNING* | your kernel; you will have no-one to blame but |
*WARNING* | yourself if you lose or corrupt data. |
*WARNING* | |
*WARNING* | [SWB: Please DO NOT try this if you are not familiar |
*WARNING* | with the use of the adb debugger. It is very easy to |
*WARNING* | make a mistake or typo without realizing it. At that |
*WARNING* | point recovery can be tricky! I recommend making a |
*WARNING* | copy of your kernel to work with. I also recommend |
*WARNING* | stepping through the procedure the first time WITHOUT |
*WARNING* | the write (-w) option to adb!] |
*WARNING* +----------------------------------------------------------+
--------------- Log of ADB session begins here --------------
--------------- # comments added --------------
##### The -w means WRITE. With the -w option set, any changes you make
##### while in adb WILL be written. Make sure you know what you're doing!
# adb -w unix
##### Check out a portion of the Clock
##### interrupt routine.
#####
##### Note that it calls "serscan" only after "fserscan"
##### is incremented to 4, so "serscan" is called only
##### every 4 clock tics.
clock+490,10?ia
clockspecial+a4: tst.b serinprogress
clockspecial+aa: bne.b clockspecial+bc
clockspecial+ac: tst.l serbufcnt
clockspecial+b2: beq.b clockspecial+bc
clockspecial+b4: clr.l (%sp)
clockspecial+b6: jsr serrint
clockspecial+bc: and.w &-701,sr
clockspecial+c0: mov.b fserscan,%d0
clockspecial+c6: addq.b &1,fserscan
clockspecial+cc: cmp.b &2,%d0
clockspecial+d0: ble.b clockspecial+de
clockspecial+d2: clr.b fserscan
clockspecial+d8: jsr serscan
clockspecial+de: and.w &-701,sr
clockspecial+e2: tst.l idleflag
clockspecial+e8: beq.b clockspecial+104
##### Change the compare so we test "fserscan" against
##### 0 instead of 2. Now we call "serscan" every 2
##### clock tics.
clock+4ba?x
clockspecial+ce: 2
.w? 0
clockspecial+ce: 2 = 0
##### Change the "ble" to a "blt", so now we call "serscan"
##### every clock tic.
clock+4bc?x
clockspecial+d0: 6f0c
.?w 6d0c
clockspecial+d0: 6f0c = 6d0c
##### Check our work. Note that the intent of the original
##### code is preserved. If you wish to slow the scan back
##### down, you can still do it by patching clock+4ba.
clock+490,10?ia
clockspecial+a4: tst.b serinprogress
clockspecial+aa: bne.b clockspecial+bc
clockspecial+ac: tst.l serbufcnt
clockspecial+b2: beq.b clockspecial+bc
clockspecial+b4: clr.l (%sp)
clockspecial+b6: jsr serrint
clockspecial+bc: and.w &-701,sr
clockspecial+c0: mov.b fserscan,%d0
clockspecial+c6: addq.b &1,fserscan
clockspecial+cc: cmp.b &0,%d0
clockspecial+d0: blt.b clockspecial+de
clockspecial+d2: clr.b fserscan
clockspecial+d8: jsr serscan
clockspecial+de: and.w &-701,sr
clockspecial+e2: tst.l idleflag
clockspecial+e8: beq.b clockspecial+104
--
--**-Steve Barber----steveb@shade.Ann-Arbor.MI.US----(cmode)-------------------