sburke@cpan.org

From: gkn@ucsd.Edu (Gerard K. Newman) Newsgroups: alt.folklore.computers Subject: HAKMEM (well, the pieces I have anyway) Keywords: HAKMEM, PDP-10 Message-ID: <18495@ucsd.Edu> Date: 6 Sep 90 16:01:18 GMT Organization: San Diego Supercomputer Center

I got 8 requests for HAKMEM in a period of about 36 hours, so I decided to go ahead and post it. It follows the dotted line below. Enjoy.

Programming hacks for the PDP10/DECSYSTEM-20. The first 36 hacks are taken verbatim from:

M. Beeler, R. W. Gosper & R. Schroeppel, February 29, 1972

Items 1 - 36 correspond to items 145 - 180 (pages 72 - 86) in the original document.

105=69.. (And 105.=69 hexadecimal.)

0/ TLCA 1,1(1) 1/ see below 2/ ROT 1,9 3/ JRST 0

This is a display hack (that is, it makes pretty patterns) with the low 9 bits = Y and the 9 next higher = X; also, it makes interesting, related noises with a stereo amplifier hooked to the X and Y signals. Recommended variations include:

CHANGE: GOOD INITIAL CONTENTS OF 1: none 377767,,377767; 757777,,757757; etc. TLC 1,2(1) 373777,,0; 300000,,0 TLC 1,3(1) -2,,-2; -5,,-1; -6,,-1 ROT 1,1 7,,7; A0000B,,A0000B e ROTC 1,11 ;Can't use TLCA over data. AOJA 1,0

It is thought that this was discovered by Jackson Wright on the RLE PDP-1 circa 1962.

DATAI 2 ADDB 1,2 ROTC 2,-22 XOR 1,2 JRST .-4

2=X, 3=Y. Try things like 1001002 in data switches. This also does
interesting things with operations other than XOR, and rotations other
than -22. (Try ```
IOR; AND; TSC; FADR; FDV(!); ROT -14, -9,
-20,
```

...)

`Y = X XOR T`

for consecutive values for T = time.
`FADR 2,1`

before the
`XOR`

.
Try data switches =

4000,,4 1000,,2002 2000,,4 0,,1002 (Notation: <left half>,,<right half>Also try the FADR after the XOR, switches = 1001,,1.

NEW X = OLD X - e * OLD Y NEW Y = OLD Y + e * NEW(!) X

This makes a very round ellipse centered at the origin with its size determined by the initial point. e determines the angular velocity of the circulating point, and slightly affects the eccentricity. If e is a power of 2, then we don't even need multiplication, let alone square roots, sines, and cosines! The "circle" will be perfectly stable because the points soon become periodic.

The circle algorithm was invented by mistake when I tried to save one register in a display hack! Ben Gurley had an amazing display hack using only about six or seven instructions, and it was a great wonder. But it was basically line-oriented. It occurred to me that it would be exciting to have curves, and I was trying to get a curve display hack with minimal instructions.

X(N+1) = (2-e^2)*X(N) - X(N-1) Y(N+1) = (2-e^2)*Y(N) - Y(N-1).These are just the Chebychev recurrence with cos A (the angular increment) = 1-(e^2)/2. Thus X(N) and Y(N) are expressible in the form R cos(N A + B). The B's and R for X(N) and Y(N) can be found from N=0,1. The B's will differ by less than Pi/2 so that the curve is not really a circle. The algorithm is useful nevertheless, because it needs no sine or square root function, even to get started.

X(N) and Y(N) are also expressible in closed form in the algebra of ordered pairs described under linear recurrences, but they lack the remarkable numerical stability of the "simultaneous" form of the recurrence.

X^2 - e * X * Y + Y^2 = constant,where the constant is determined by the initial point. This ellipse has its major axis at 45 degrees (if e > 0) or 135 degrees (if e < 0) and has eccentricity

(e/(1 + e/2)^2.

If the result loops with period = 1 with sign +, you are on a sign-magnitude machine.

If the result loops with period = 1 at -1, you are on a twos-complement machine.

If the result loops with period > 1, including the beginning, you are on a ones-complement machine.

If the result loops with period > 1, not including the beginning, your machine isn't binary--the pattern should tell you the base.

If you run out of memory, you are on a string or Bignum system. If arithmetic overflow is fatal error, some fascist pig with a read-only mind is trying to enforce machine independence. But the very ability to trap overflow is machine dependent.

By this strategy, consider the universe, or, more precisely algebra:

`AOBJN`

counter):
IMUL A,[377777,,1]

`AOBJN`

pointer when the origin is fixed and the
length is a variable in A:
HRLOI A,-1(A) EQVI A,ORIGIN

HRLOI A,-ORIGIN(A) EQVI A,ORIGIN

Slightly faster: change the `HRLOI`

s to `MOVSI`

s and
the `EQVI`

addresses to `-ORIGIN-1`

. These two
routines are clearly adjustable for BLKOs and other fenceposts.

COS: FADR A,[1.57079632679] ;Pi/2 SIN: MOVM B,A ;argument in A CAMG B,[.00017] ;< 3^(-3) / 2^(13) POPJ P, ;sin X = X, within 27. bits FDVRI A,(-3.0) PUSHJ P,SIN ;sin -X/3 FMPR B,B FSC B,2 FADRI B,(-3.0) FMPRB A,B ;sin X = 4(sin -X/3)^3 -3(sin -X/3) POPJ P, ;sin in A, sin or |sin| in B ;|sin| in B occurs when angle is smaller than end test

Changing both -3.0's to +3.0's gives sinh:

sinh X = 3 sinh X/3 + 4 (sinh X/3)^3.

Changing the first -3.0 to a +9.0, then inserting `PUSHJ P,.+1`

after `PUSHJ P,SIN`

gains about 20% in speed and uses half the
pushdown space (< 5 levels in the first 4 quadrants). ```
PUSHJ
P,.+1
```

is a
nice way to have something happen twice. Other useful angle multiplying
formulas are:

tanh X = (2 tanh X/2)/(1 + (tanh X/2)^2) tan X = (2 tan X/2)/(1 - (tan X/2)^2)if infinity is handled correctly. For cos and cosh, one can use:

cos X = 1 - 2 (sin X/2)^2, cosh X = 1 + 2 (sinh X/2)^2.

In general, to compute functions like e^X, cos X, elliptic functions, etc. by iterated application of double and triple argument formulas, it is necessary to subtract out the constant in the Taylor series and transform the range reduction formula accordingly. Thus:

F(X) = cos(X)-1 F(2X) = 2F*(F+2) F(e) = -e^2/2 G(X) = e^X-1 G(2X) = G*(G+2) G(e) = e

This is to prevent the destruction of the information in the
range-reduced argument by the addition of a quantity near 1 upon the
success of the e test. The addition of such a quantity in the actual
recurrences is OK since the information is restored by the multiply.
In fact, a cheap and dirty test for F(e) sufficiently small is to see
if the addition step has no effect. People lucky enough to have a
square root instruction can get natural log by iterating
`X <- X/((1+X)^2 + 1)`

until 1+X = 1.

Then multiply by 2^(number of iterations). Here, a LSH or FSC would work.

The correct epsilon test in such functions as the foregoing SIN are generally the largest argument for which addition of the second term has no effect on the first. In SIN, the first term is x and the second is -(x^3)/6, so the answer is roughly the x which makes the ratio of those terms (1/(2^27); so x = (3^(-3))/(2^(13)). But this is not exact, since the precise cutoff is where the neglected term is the power of 2 whose 1 bit coincides with the first neglected (28th) bit of the fraction. Thus, (x^3)/6 = 1/(2^(27)) * 1/(2^(13)), so x = (3^(-3)) / (2^(13).

`PUSHJ P,LOG2`

with a floating point argument in A.
LOG2: LSHC A,-33 MOVSI C,-201(A) TLC C,211000 ;Speciner's bum MOVEI A,200 ;exponent and sign sentinel LOGL: LSH B,-9 REPEAT 7, FMPR B,B ;moby flunderflo LSH B,2 LSHC A,7 SOJG A,LOGL ;fails on 4th try LSH A,-1 FADR A,C POPJ P, ;answer in A

Basically, you just square seven times and use the low seven bits of the exponent as the next seven bits of the log.

EXCH A,Note:LOC1EXCH A,LOC2EXCH A,LOC1

`MOVE-EXCH-MOVEM`

, clobbering A.
TRCE A,BITS TRCE A,BITS TRCE A,BITSNote (Nelson): last

`TRCE`

never skips, and used to be
`TRC`

, but `TRCE`

is less forgettable. Also, use
`TLCE`

or `TDCE`

if the bits are not in the right
half.
(SETQ X (PROG2 O Y (SETQ Y X)))

ROTC A,6 CAMG A,B EXCH A,B ROTC A,-6

MULI A,<# bytes/word> SUBI B,1-<# b/w>(A)

To get full word character address, use `SUB`

into magic table.

ROTC A,N ROT B,-N ROTC B,N

Thus M AC's can be `ROTC`

'ed in 2M-3 instructions. (Stallman):
For 73. > N > 35.:

ROTC A,N-36 EXCH A,C ROT B,36.-N ROTC A,N-72.

;B gets 7 bit character in A with even parity IMUL A,[2010040201] ;5 adjacent copies AND A,[21042104377] ;every 4th bit of left 4 copies + right copy IDIVI A,17 ;casting out 15.'s in hexadecimal shifted 7 ;odd parity on 7 bits (Schroeppel) IMUL A,[10040201] ;4 adjacent copies IOR A,[755555540] ;leaves every 3rd bit+offset+right copy IDIVI A,9 ;powers of 2^3 are +- mod 9 ;changing 7555555400 to 27555555400 gives even parity ;if A is a 9 bit quantity, B gets number of 1's (Schroeppel) IMUL A,[1001001001] ;4 copies AND A,[42104210421] ;every 4th bit IDIVI A,17 ;casting out 15.'s in hexadecimal ;if A is 6 bit quantity, B gets 6 bits reversed (Schroeppel) IMUL A,[2020202] ;4 copies shifted AND A,[104422010] ;where bits coincide with reverse repeated base 2^8 IDIVI A,377 ;casting out 2^8 -1's ;reverse 7 bits (Schroeppel) IMUL A,[10004002001] ;4 copies set by 000's base 2 (may set arith. o'flow) AND A,[210210210010] ;where bits coincide with reverse repeated base 2[8] IDIVI A,377 ;casting out 377's ;reverse 8 bits (Schroeppel) MUL A,[100200401002] ;5 copies in A and B AND B,[20420420020] ;where bits coincide with reverse repeated base 2[10] ANDI A,41 ;" DIVI A,1777 ;casting out 2[10]-1's

foo, lat /DATAI switches adm a /ADDB and (707070 adm b iot 14 /output AC sign bit to a music flip-flop jmp foo

Makes startling chords, arpeggios, and slides, with just the sign of the AC. This translates to the PDP-6 (roughly) as:

FOO: DATAI 2 ADDB 1,2 AND 2,[707070707070] ;or 171717171717, 363636363636, 454545454545, ... ADDB 2,3 LDB 0,[360600,,2] JRST FOOListen to the square waves from the low bits of 0.

LDB B,[014300,,A] ;or MOVE B,A then LSH B,-1 AND B,[333333,,333333] SUB A,B LSH B,-1 AND B,[333333,,333333] SUBB A,B ;each octal digit is replaced by number of 1's in it LSH B,-3 ADD A,B AND A,[070707,070707] IDIVI A,77 ;casting out 63.'sThese ten instructions, with constants extended, would work on word lengths up to 62.; eleven suffice up to 254..

Example (Gosper):

DPT: IDIVI F,12 HRLM G,(P) ;tuck remainder on pushdown list SKIPE F PUSHJ P,DPT LDB G,[220600,,(P)] ;retrieve low 6 bits of remainder TRCE G,"0 ;convert digit to character SETOM CCT ;that was no digit! TYO: .IOT TYOCHN,G ;or DATAO or IDPB ... AOS G,CCT POPJ P,

This is the standard recursive decimal print of the positive number in
F, but with a `LDB`

instead of a `HLRZ`

. It
falls into the typeout routine which returns in G the number of characters
since the last carriage return. When called with -36.,
`DPT`

types carriage return, line feed, and resets CCT, the
character position counter.

`CONS`

, is defined to: find a free cell in memory, store
the argument in it, remove it from the set of free cells, return a pointer
to it, and call the garbage collector when the set is empty. This can be
done in two instructions:
CONS: EXCH A,[EXCH A,[...[PUSHJ P,GC]...]] EXCH A,CONSOf course, the address-linked chain of

`EXCH`

's indicated by the
nested brackets is concocted by the garbage collector. This method has
the additional advantage of not constraining an accumulator for the free
storage pointer.
UNCONS: HRLI A,(EXCH A,) EXCH A,CONS EXCH A,@CONSReturns cell addressed by A to free storage list; returns former cell contents in A.

MULI A,400 ;exponent to A, fraction to A+1 TSC A,A ;1's complement magnitude of excess 200 exponent ASH A+1,-200-27.-8(A) ;answer in A+1

If number is known positive, you can omit the `TSC`

. On the PDP-10

UFA A,[+-233000,,] ;not in PDP-6 repertoire TLC A+1,233000 ;if those bits really bother youWhen you know the sign of A, and |A| < 2^(26), you can

FAD A,[+-233400,,] ;or FADR for rounded fix! TLC A,233400 ;if those bits are relevantwhere the sign of the constant must match A's. This works on both machines and doesn't involve A+1. On the 10,

`FADRI`

saves a
cycle and a constant, and rounds.
MOVE B,A MOVN C,B AND C,B ADD A,C MOVE D,A XOR D,B LSH D,-2 IDIVM D,C IOR A,C

Many curves and contours are definable by F(X,Y) = 0 with F changing sign on opposite sides of the curve. The following algorithm will draw most such curves more accurately than polygonal approximations and more easily than techniques which search for a "next" X and Y just one move away.

We observe that a good choice of lattice points is just those for which F, when evaluated on one of them, has opposite sign and smaller magnitude than on one or more of its four immediate neighbors.[+] This tends to choose the nearer endpoint of each graph paper line segment which the curve crosses, if near the curve F is monotone with distance from the curve.

First, divide the curve into arcs within which the curve's tangent lies within one 45 degree semiquadrant. We can show that for reasonable F, only two different increments (say north and northwest) are needed to visit the desired points.

Thus, we will be changing one coordinate (incrementing Y) every step, and we have only to check whether changing the other (decrementing X) will reduce the magnitude of F. (If F increases with Y, F(X,Y+1) > -F(X-1,Y+1) means decrement X.) F can often be manipulated so that the inequality simplifies and so that F is easily computed incrementally from X and Y.

As an example, the following computes the first semiquadrant of the circle

F = X^2+Y^2-R^2 = 0. C0: F <- 0, Y <- 0, X <- R C1: F <- F+2Y+1, Y <- Y+1 C2: if+ F +> X, F <- F-2X+1, X <- X-1 C3: if Y < X-1, go to C1 C4: (Link to next arc) if Y = X-1, Y <- Y+1, X <- X-1

This can be bummed by maintaining Z = 2Y+1 instead of Y. Symmetry may be
used to compute all eight semiquadrants at once, or the loop may be closed
at C2 and C3 with two `PUSHJ`

's to provide the palindrome
of decisions for the first quadrant. There is an expression for the
number of steps per quadrant, but it has a three-way conditional
dependent upon the midpoint geometry. Knowing this value, however, we can
replace C3 and C4 with a simple loop count and an odd-even test for C4.

The loop must be top-tested (C3 before C1) if the "circle" R=1, with four diagonal segments, is possible.

All this suggests that displays might be designed with an increment mode which accepts bit strings along with declarations of the form: "0 means north, 1 means northwest". 1100 (or 0011) will not occur with a curve of limited curvature; thus, it could be used as an escape code, but this would be an annoying restriction.

See the following illustration of circles drawn this way.

[+] In case of a tie, i.e., F has equal magnitudes with opposite signs on adjacent points, do not choose both points but rather have some arbitrary yet consistent preference for, say, the outer one. The problem can't arise for C2 in the example because the inequality F >= X is really F > -(F-2X+1) or F > X-.5.

P(X)Y(Nth derivative) + ..... + Q(X) =R(X)where P, ..... Q, and R are polynomials in X (for example, Bessel's equation, X^2Y''+XY'+(X^2-N^2)Y = 0) and A is an algebraic number. Then Y(A) can be evaluated to N places in time proportional to N(ln N)^3.

Further, e^X and ln X or any elementary function can be evaluated to N places in n(ln N)^2 for X a real number. If F(X) (by Newton's method), and the first derivative of F(X). Also zeta(3) and gamma can be done in N(ln N)^3.

`ILDB-CAIE-JRST`

on the PDP6/10) once for each character in the
sought string. The destinations of the transfers (address fields of
the JRST's) must, however, be computed as functions of the sought string.
Let
0 1 2 3 4 S A S S Y 0 1 0 2 2stand for the program

T0: ILDB C,A ;C gets next char from pointer in A T1: CAIE C,"S ;skip if it's an S JRST TO ;loop back on failure ILDB C,A ;next T2: CAIE C,"A ;skip if A JRST T1 ;could be an S ILDB C,A T3: CAIE C,"S JRST TO ;S, A, non S, so start over ILDB C,A ;next T4: CAIE C,"S JRST T2 ;could be SAS.ASSY ILDB C,A CAIE C,"Y JRST T2 ;could be SASS.ASSY ;found SASSY

In other words, a number >0 in the top row is a location in the program where the corresponding letter of the middle row is compared with a character of th input string. If it differs, the number in the bottom row indicates the location where comparison is to resume. If it matches, the next character of the middle row is compared with the next character of the input string.

Let J be a number in the top row and K be the number below J, so that TK
is the address of the Jth `JRST`

. For each J = 1, 2, ... we
compute K(J) as follows:

K(1) = 0. Let P be a counter, initially 0.

For each succeeding J, increment P. If the Pth letter = the Jth, K(J) = K(P). Otherwise, K(J) = P, and P is reset to 0. (P(J) is the largest number such that the first P characters match the last P characters in the first J characters of the sought string.)

J= 0 1 0 1 2 3 4 5 M I S S I S S I P P I I S S I S S I P P I K(J)= 0 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 0 5 1 0 0 1 2 3 0 1 2 3 C O C A C O L A S A S S A F R A S 0 1 0 2 0 1 3 1 0 1 0 2 1 3 1 1 0

To generalize this method to search for N strings at once, we produce a
program of `ILDB-CAIE=JRST`

's for each of the sought strings,
omitting the initial `ILDB`

from all but the first. We
must compute the destination of the Jth `JRST`

in the Ith
program, `TKM(I,J)`

which is the location of the Kth compare in
the Mth program.

It might be reasonable to compile such an instruction sequence whenever a search is initiated, since alternative schemes usually require saving or backing up the character pointer.

The perpendicular distance which the vector C lies from the line connecting vectors A and B is just

(C - A) x (B - A) ------------------- , 2 |A - B|but maximizing this can lose on very pointy V's. The distance sum hack can lose on very squashed Z's.

;Initialize random-number generation GTAD ;Get current time and date MOVEM T1,SEED ;Make it the seed ... CALL RANDOM ;Go get a random number ... ;Return with psuedo random number in T2 RANDOM: MOVE T1,SEED ;Get the seed MUL T1,RAN ;Make next random number MOVEM T2,SEED ;Save low order as new seed IDIVI T1,10 ;Get random number in range 0-7 RET RAN: 343277,,244615 ;Low order portion of 5^15 SEED: 0

gkn Gerard K. Newman gkn@sds.sdsc.edu 619.534.5076 San Diego Supercomputer Center gkn@sdsc.bitnet 619.534.5152 FAX PO Box 85608 sdsc::gkn (27.1/span) San Diego, CA 92186-9784 ucsd!gkn