D. Allen [CGL]" <idallen@watcgl.waterloo.edu> (11/26/90)
Using MIT's R4 xdvi on our MIT Xqdss colour VAXes often hangs the display. Killing the X server gets things going again. Echoing "xxx" to /dev/console also unsticks things, but the system enters a semi-comatose state. A bunch of these appear in the uerf log: EVENT CLASS OPERATIONAL EVENT OS EVENT TYPE 250. ASCII MSG SEQUENCE NUMBER 573. OPERATING SYSTEM ULTRIX 32 OCCURRED/LOGGED ON Sun Nov 25 15:16:20 1990 EST OCCURRED ON SYSTEM watdaffy SYSTEM ID x08000000 SYSTYPE REG. x01010000 FIRMWARE REV = 1. PROCESSOR TYPE KA630 MESSAGE write_ID: timeout trying to write to _VIPER and the kernel becomes very sluggish: cgl# ping -v watdaffy 64 bytes from 129.97.128.71: icmp_seq=0. time=10. ms 64 bytes from 129.97.128.71: icmp_seq=6. time=0. ms 64 bytes from 129.97.128.71: icmp_seq=1. time=5349. ms 64 bytes from 129.97.128.71: icmp_seq=2. time=4349. ms 64 bytes from 129.97.128.71: icmp_seq=3. time=3359. ms 64 bytes from 129.97.128.71: icmp_seq=4. time=2359. ms 64 bytes from 129.97.128.71: icmp_seq=5. time=1360. ms 64 bytes from 129.97.128.71: icmp_seq=10. time=0. ms 64 bytes from 129.97.128.71: icmp_seq=11. time=0. ms 64 bytes from 129.97.128.71: icmp_seq=7. time=4879. ms 64 bytes from 129.97.128.71: icmp_seq=8. time=3889. ms 64 bytes from 129.97.128.71: icmp_seq=9. time=2890. ms 64 bytes from 129.97.128.71: icmp_seq=16. time=0. ms 64 bytes from 129.97.128.71: icmp_seq=17. time=0. ms 64 bytes from 129.97.128.71: icmp_seq=12. time=5829. ms 64 bytes from 129.97.128.71: icmp_seq=13. time=4839. ms 64 bytes from 129.97.128.71: icmp_seq=14. time=3839. ms 64 bytes from 129.97.128.71: icmp_seq=15. time=2849. ms 64 bytes from 129.97.128.71: icmp_seq=20. time=0. ms What is going on, and how can I fix it? The only way out of this seems to be a reboot. -- -IAN! (Ian! D. Allen) idallen@watcgl.uwaterloo.ca idallen@watcgl.waterloo.edu [129.97.128.64] Computer Graphics Lab/University of Waterloo/Ontario/Canada
chris@mimsy.umd.edu (Chris Torek) (11/28/90)
In article <1990Nov25.202720.11199@watcgl.waterloo.edu> idallen@watcgl.waterloo.edu (Ian! D. Allen [CGL]) writes: >Using MIT's R4 xdvi on our MIT Xqdss colour VAXes often hangs the display. The QDSS is a horrible thing. (The Ultrix 1.2 QDSS driver is even worse. The story I heard was something like `VMS programmer gets first Ultrix assignment: write kernel driver for QDSS'.) It is not possible for X11 and the kernel to stay completely in sync always, but as long as something has console output captured (so that console writes go to some pty rather than directly to the display) this should not be a problem. Here is what I wrote for my own documentation when I rewrote the QDSS driver for my own purposes. This is not a solution, but might give people some insights (and will tell you why I say the QDSS is a horrible thing.) It was intended eventually to be standalone documentation, but at the moment really needs the VCB02 hardware manual as a companion. Incidentally, there is a section in the VCB02 manual that says `do not do XYZ as it can short out the drivers in the vipers'. The Ultrix driver did (and probably still does) XYZ. (Mine does not.) ------------------------------------------------------------------------ Before we begin, here is a short description of the hardware. (Well, okay, so it is a long description. The hardware is very complextificated.) The QDSS (VCB02 or `Dragon') is composed of a bunch of special-purpose chips. The simplest (from our point of view) are the so-called `vipers', or video processors. There is one viper per memory plane, with a maximum of 8. Say we have a four-plane system. Any point on the screen has a `color value' between 0 and 15 inclusive. This value is a composite of planes 0, 1, 2, and 3 as bits 0,1,2,3 respectively. Fiddling with plane 0 changes the low-order bit of the color value. (The color value goes through a lookup table to produce red, green, and blue intensities as in many conventional display systems; like those systems, only the green intensity is used on grey-scale monitors.) Each viper can only talk directly to its own plane. This creates an interesting problem: how to communicate between vipers, e.g., to make all pixels that were odd have the value 15 (copy plane 0 to all other planes). This is accomplished through the `I/D' bus. The I/D bus is only 8 bits wide, but each `cycle' is actually made of two bus cycles to make it look 16 bits wide. It operates in pairs of these pairs, alternating instructions and data (hence the name). The instructions are piddly little things that can affect only 16 bits of bitmap memory or viper register at a time, but a tremendous number of instructions run each second, so things move reasonably fast. Moreover, all vipers operate in parallel, so each instruction can diddle up to 8 16-bit words at a time. Often you do not want all the vipers to do the same thing. For instance, often one viper should write bits from its plane onto the I/D bus for others to see, but the others should not overwrite those bits with their own. To disable some vipers, there is a separate chip select register. When it is set to 0xff all 8 planes are enabled and will work together. Setting it to 0x01 disables all but plane 0. Typically a rasterop will have all planes enabled; the select register is mainly used to set various viper registers differently, but rasterops do honor the chip select, and have no effect on planes that are not selected. There is a separate chip select used for scrolling (to be described later), since it is a sort of `background' operation and should not disturb normal rasterops. The viper chips were designed with some flexibility in mind, and not all of their `features' are used in the QDSS. In addition, there appear to be some bits left over from earlier revisions. In a few cases we just have to ignore some oddity and press on. The (up to 8) vipers are all directed by a chip called the `adder' (yes, this Dragon is full of snakes). The adder (`address processor') mostly computes rasterops. It does a lot of other stuff too, but we cannot make much use of that since it has some peculiar limitations. So the adder does rasterops, telling the vipers when to do their things by counting scan lines and pixels and providing the right instructions at the right times. The adder is also responsible for moving data from the I/D bus to or from the CPU (polled or DMA, but see below) so that we can get bits into and out of the consarned thing in the first place. In any case, a plane's memory can only be reached through its viper; the adder and vipers have to cooperate, and the CPU has to tell them how. For the most part, `how' is determined by the contents of various registers in each viper. A `logical function' register controls the combination of a `source' and a `destination' to produce a new bit of a new color; the new bit is written, or the old left alone, depending on two mask values that are ANDed together. (The adder supplies a third mask that is all ones in the center of the rasterop, and has zeroes as needed along the horizontal fringes to keep the rasterop in bounds.) The `destination' value is always whatever was on the screen before, but the `source' value and the contents of both mask registers are under control of still more viper registers. (Now how much would you pay? But wait, there's more!...) The bits derived from the logical function select either a foreground or background color. That is: /* approximation of viper rasterop algorithm */ old_data = *screen_location; /* 16 bits */ switch (control_register) { case discard: break; case to_source: source_register = old_data; case to_m1m2: mask1_register = mask2_register = old_data; case to_m2: mask2_register = old_data; } v = apply_lf(logical_function_register, old_data, source_register); new_data = (v & foreground_register) | (~v & background_register); mask = mask1_register & mask2_register & rasterop_mask_from_adder; *screen_location = (new_data & mask) | (old_data & ~mask); Remember, though, that each plane gives only one bit of a displayed color, so typically the fg and bg registers of all vipers are all 1s and all 0s, and the `color' is determined by the result v of the logical function. In essence, a zero in the fg and bg registers has the effect of *suppressing* the result at that point; a one in both has the effect of *setting* it; and a zero in fg and a one in bg *inverts* it. (Confused yet? *I* was.) To write a solid color c, one sets the source register to all 1s, the control register to `discard', and the various vipers' fg registers to all-1s or all-0s according to the value c (e.g., color 5 has vipers 0 and 3 all-1s, and 1 and 2 all-0s) (there is an easy way to do this, called `Z-axis' register setting, described below). But this is *still* not the whole story: each LF register---there are four available---also contains bits that control whether the source, mask 1, and mask 2 are complemented, and whether something called `resolution mode' is applied. Resolution mode is only for displays with fewer than 1024x864 pixels, so we shall ignore it, and complementing the source can be done directly with the function, so we shall ignore that too. (Perhaps the source complement mode is useful with resolution mode. It seems completely useless otherwise.) Before we can correct the approximation above, though, we need to know a bit more about the way the hardware does rasterops. A rasterop specifies 0, 1 or 2 sources and 1 destination. (A rasterop without a destination is pointless.) With only a destination, the rasterop simply combines the source register with the bits at the destination according to the given logical function. With one or two sources, we get one or two memory read operations before the r/m/w that updates the plane's screen memory. Those operations are dumped through two control registers. These are chosen from one of two `banks' of operand control registers. In addition to the disposition of the screen data, the control register tells whether to read the I/D bus, and whether to write it. The actual algorithm, then, is this: /* viper registers */ static short ctl[2][4]; /* ctl[x][3] present but unused */ static short lf[4]; /* logical function registers */ static short src, m1, m2; /* source, mask1, mask2 regs */ static short fg, bg; /* fore- and background regs */ /* there are 0, 1, or 2 src_cycles (but can have #2 without #1) */ /* these parameters come from adder */ /* there is also a shift constant, which I have simplified away */ void rop_src_cycle(bank, mem, which) int bank; /* bank 0 or 1 */ short *mem; /* bitmap memory address */ int which; /* source 1 or 2 */ { short c = ctl[bank][which - 1], id, md; /* * All these operations occur in parallel. Presumably, * if c&SEND, id will be the same as md, but while something * like this is mentioned in passing in the manual, I would * not count on it myself (it depends on the timing in * the viper). */ md = *mem; /* shifted left or right if necessary */ if (c & SEND) *ID_BUS = md; id = *ID_BUS; /* * These may occur in parallel too (i.e., do not route * remote and local data into the same register; it is * not guaranteed and might even break the hardware): */ RD_REGS(c) = id; /* NONE, SRC, M1M2, or M2 */ LD_REGS(c) = md; /* NONE, SRC, M1M2, or M2 again */ if (c & SS) magic(); else slow_to_half_speed(); } /* again, parameters are from adder */ void rop_rmw_cycle(bank, mem, lfnum, edgemask) int bank; /* bank 0 or 1 */ short *mem; /* bitmap memory address */ int lfnum; /* logical function 0/1/2/3 */ short edgemask; /* left or right edge mask, or ~0 */ { short c = ctl[bank][2], id, md, s, v, mask, f; /* * The same comments as for rop_src_cycle above apply. * In addition, it is not obvious that all of this really * works in the hardware (the r/m/w timing is tighter). * But RD_SRC does work for PTB X mode, despite the * manual's claim that ctl[bank][2] ``may be unnecessary'' * since ``there may be no reason to program either * destination CSR to other than 000000''. */ md = *mem; if (c & SEND) *ID_BUS = md; id = *ID_BUS; RD_REGS(c) = id; LD_REGS(c) = md; if (c & SS) bad_stuff_happens_I_guess(); f = lf[lfnum]; s = f & LF_NOTNOTSRC ? src : ~src; if ((f & LF_NORES) == 0) s = smear(s); /* ``resolution mode'' */ mask = (f & LF_NOTM1 ? ~m1 : m1) & (f & LF_NOTM2 ? ~m2 : m2) & edgemask; v = apply_lf(LF_MASK(f), md, src); *mem = (((v & fg) | (~v & bg)) & mask) | (md & ~mask); } Using two sources, and directing one of the sources to the mask register(s) and the other to the source register, we can get the effect of tiling or stippling, or more generally, writing under a mask. In particular, for tiling, the adder has a way to specify that source 2 (but not source 1) has a size which is a small power of two; the adder will feed the vipers a repeating address pattern (thus repeating the tile apparently-infintely). This brings us back to rasterops, and in particular, rasterop `modes'. In addition to the two optional sources, the logical function register index, and the control register bank index, the rasterop can be in one of three modes: `normal', where the source and destination are the same size (but see below); `linear pattern', where source 1 repeats as needed if it is smaller than the destination, and `fill', for polygon filling, where the source and destination are not used as rasterops at all. Two more bits are used for fill mode: X or Y fill; and normal two-edge fill, or baseline fill. Filled polygons are described below. For regular rasterops (and, presumably, polygons), there are four more mode bits: hole fill enable (normally on, but off for single-pixel-wide lines); source 1 index enable; source 2 index enable; and pen down. Pen down must be set; if it is not, nothing happens. (Pretty stupid, eh? But apparently REGIS wants it.) `Indexing' is used to make up for the sins of scrolling. It should be enabled whenever source 1 and/or the destination are in on-screen memory and scrolling might be going on (more below). [N.B.: the manual calls the banks 1 and 2, and the logical function registers 1, 2, 3, and 4. I have subtracted 1 since things make more sense that way.] Before I can describe fill mode, I need to explain something else. Some clever fellow observed that, if the destination of a rasterop were defined by an arbitrary pair of vectors, the `rasterop' could draw solid-color lines in arbitrary directions, or rotate text, or accomplish all manner of uninteresting things. So, while sources 1 and 2 must be rectangular, the destination is described by a `fast vector' and a `slow vector'. Bits are read and written along the fast vector until it runs out, then the adder steps along the slow vector. If the fast vector points along the X axis, and the slow vector along the Y axis, we get a normal rectangular rasterop. It also goes much faster: when the fast vector has no Y component, the adder does its thing 16 bits at a time. (The slow vector can have an X component; this does not hobble the adder.) These vectors are defined with origin-x, origin-y and delta-x, delta-y pairs so as to make it convenient for the adder to use Bresenham's Algorithm to paint the pixels. This (B's A) can result in writing some pixels twice, or in skipping some; the hole fill enable is used to fix up the latter, and the former only matters if the rasterop uses the destination bits for exclusive-or, or complements them. Note that holes and doubling cannot occur for normal (x/y axis aligned) rectangular rasterops. For the most part, we can ignore these phenomena. Note also that this does not write the last point along the vector (so we get a half-open interval). Polygon filling is done by taking over the source 1 and destination slow vectors. Starting from a point (normally one held in common), the adder will draw lines along either the X or Y axis until one or both vectors run out. (Thus, the `fast' vector has dy=0 [x axis] or dx=0 [y axis] and has its dx or dy depend on the difference between the current points along each of the two vectors, where that point is scanned in the direction of that axis.) When a vector runs out, the adder says it is done, and by reloading one or both vectors and doing the polygon fill again, one can finish or continue the polygon. Really, this is a fill-from-line-to-line operation, where the filling is done by drawing horizontal (X mode) or vertical (Y mode) lines. Optionally, the source 2 vector can be replaced with a horizontal or vertical line; this is the `baseline fill' mode. (Why it exists at all, when one of the two edge vectors can be horizontal or vertical anyway, is beyond me.) Polygon fill can suffer from from doubling, but not from holes. Polygon fill does write the last point: the lines it draws are over the *closed* interval that includes the two edge points. All rasterops and polygon fills use Bresenham's error-accumulation technique to define which points will be plotted. Two `error adjument' registers in the adder allow changing the initial error value for the fast and slow destination vectors (only occasionally useful) or for the polygon lines. The latter allows shifting the polygon edges by half a pixel, which *is* often useful. Scrolling, and the index enable bits, are another clever hack. Someone noted that since the display has to sweep across the screen horizontally (as a `fast' vector) and vertically (as a `slow' one) anyway, it should be possible to read bits from the screen offset from where they would normally be displayed, and to copy them to their `correct' position at the same time. The adder contains a set of scroll registers for controlling this action. The scrolling area is a rectangle somewhere on the screen (off-screen memory cannot be scrolled this way since it is not displayed). Bits within that rectangle are read at some offset. The offset can be any positive value in the Y direction, but cannot be more than +15/-16 in the X direction. Bits beyond the offset are replaced with a `scroll fill' value from the viper's FILL register. A negative Y offset would cause duplication, so negative offsets are not allowed; instead, another bit `everts' the region, so that everything *not* in the scrolling region moves upward, and the video-memory Y-offset register is adjusted when the display frame is all displayed. The index enable bits simply tell the adder that, if its operation reaches into the area that is scrolling, it should add the new or old index values to the x and y coordinates of those points, to compensate for the fact that the bits are about to show up elsewhere. Alas, the scroll hardware will only do vertical scrolls on a four-bit boundary, so most of the time we cannot use it. When we can, it seems like too much trouble anyway, as various operations must be done at the start of a video frame, which appears to require instantaneous response to a framing interrupt. X11 does not use the scrolling hardware. Just when you thought you were done with rasterops...: The source 1 raster can also be scaled up or down during a rasterop (but not a polygon fill). A 13-bit binary fraction is available for up- or down- scaling. We have no particular use for it and never touch it. Of course, there has to be a way to set the viper registers and the two chip select registers. This is done with a `register load' command. There are three kinds of register load (write) operations: external, viper, and `Z-axis viper' loads. External loads are used to set the chip selects; viper loads are used to set viper registers. Each viper load sets that register in all the selected vipers, so to load just one viper's foreground color register, for instance, we have to disable all the others, do the load, and then reenable them. This rapidly gets annoying, so there is the third kind of load. A Z-axis load writes one (1) bit to each of the currently-selected vipers, by writing 16 bits and having each viper pick up the one corresponding to its plane number. The viper then makes 16 copies of that bit and shoves it into the appropriate register. Only the foreground and background color and the fill and source registers can be loaded this way (but those are the ones needing all-1 or all-0 values most often). These Z-axis loads also specify a `Z block', which must be 0 for the VCB02. It appears to be intended for 24-bit color displays, which appear never to have got off the ground. That appears to a good thing. All of this lets us move bits around on the screen, but not get them there in the first place. Fortunately, the adder also supports CPU-to- bitmap and bitmap-to-CPU (`processor') transfers, and in two modes. PTB and BTP transfers can be done in `Z-axis' mode, where the 8 vipers get or put one bit at a time from each screen position. The 8 bits are assembled to (or disassembled from) a byte, which shows up as the low byte on the I/D bus. Thus we can read or write the current color at any pixel location. The other mode, `X-mode', lets us read from one viper (which must be set up beforehand with the chip select register) or write to one or more vipers (but only if they all get the same value for each pixel---this can only write all 1s or all 0s; Z-axis transfers are easier, so it will usually be one viper). Both of these are actually implemented as a form of rasterop; most of the same features apply, except that PTB rasterops do only an r/m/w cycle regardless of which csr register is used (see the pseudo-code below). PTB and BTP transfers can be assisted by the DMA gate array (more about this below). Finally, the adder also does all the timing and sync generation for the QDSS display. It is explained a bit in the VCB02 manual, but is irrelevant for our purposes, and need only be set up once, thence to be correct forever (unless some goon fiddles with the knobs). The MicroVAX hardware takes care of it for the console display; the driver does it once for other displays. Take heart, for we are done with the adder and viper chips. All we have left are DMA and template RAM, and the DUART, video RAM CSR, and color maps. Of these, only the DMA gate array is fancy. The DGA acts as the interface between the rest of the QDSS and the Q-bus, so it has interrupt enable registers that affect everyone. It also takes care of displaying a cursor. The cursor is simply a 16x16 pixel object that either obscures the bits underneath it, or allows them to show through; for each obscured bit, the cursor is either on or off. The first 16 words (`A data') are the enables (obscures), and the second 16 (`B data') the bits to show where enabled. The bits for the cursor appear in the last 32 words of the `template RAM'. This `template RAM' is an 8 Kword (16 KB) chunk of memory on the QDSS. The first 64 words are used as a DMA FIFO. When it is not busy showing the cursor, the DMA gate array can be in one of three modes: idle; doing PTB or BTP DMA; or processing `display list' commands. The FIFO must be always empty before changing modes, but otherwise (when doing another operation like the last one), it suffices to wait only for the DMA byte count to reach zero. PTB and BTP DMA are straightforward: simply set up the adder to do the appropriate PTB or BTP, then ask the DGA to do it. If the DMA is a Z-axis transfer, it can (but need not) be done with `byte packing' mode, to make each 16-bit Qbus transaction carry two bytes to or from the bitmap. The hardware appears to be able to transfer an odd number of bytes even with packing enabled. Display list mode has nothing to do with real display lists; forget whatever you may know about them. The QDSS uses it instead to mean running a bunch of microcoded commands. The commands are loaded into the FIFO before being run; a special command (JMPT) saves the current FIFO-execution address (if running from the FIFO) and loads a new address, which must be somewhere in template RAM. It continues to run from that location until it gets another JMPT. A JMPT that jumps to location 0 (actually, anywhere in words 0..63) fetches the saved address and resumes. JMPT is a thus subroutine call instruction in the FIFO and a branch or a return in template RAM. Another special command (PTB n) tells the DGA to treat the next n words as data for the adder's IDD register. (Byte unpacking is not available here.) Otherwise, unless bit 15 is set, the command is treated as data to be stuffed into the adder's ADCT register (thus indirecting into some other adder register, since bit 15 is off). If bit 15 is set, bits 14, 13, and 12 have special meanings. Bit 14 suppresses writing to the ADCT register. Unless it is on, bits 11..0 are sent to ADCT (along with bit 15, thus setting ADCT itself; my guess is that all 16 bits are sent, and the adder ignores the extras). Bit 13 makes the DGA read and execute one word from the FIFO (even if it is already running from the FIFO, though it is then pointless). Bit 12 forces the next `execute' cycle to treat the whole word as data, to be stuffed into ADCT. Typically 14 and 13 would be set together, lest the `fetch from the FIFO' command be written to ADCT. (But this could be useful for, e.g., `write the next argument to the foo register'.) The template RAM is thus used to hold `macros' for oft-repeated operation. These can end in infinite JMPT-loops, provided the loop reads the FIFO, as the loop will stop when the DMA byte count runs out, and the next operation will start from the FIFO. The DUART is a perfectly ordinary DUART, probably some flavour of Intel or Signetics part. The color map (what do you mean, me inconsistent? the QDSS manual says it is a color map, not a colour map) is also perfectly ordinary, except that instead of a red, green, and blue value for each position, it has a red table, a green table, and a blue table, each 256 words long. Both the DUART and the color map are entirely write-only. Here is the overall rasterop algorithm again, with all the nonsense compressed out. `which==3' is the r/m/w cycle. (Refer to the expanded version for details.) short ctl[2][4], src, m1, m2, lf[4], fg, bg; do_rop_cycle(which, bank, is_ptb, mem, lfnum, edgemask) int which, bank, is_ptb; short *mem; int lfnum; short edgemask; { short c = ctl[bank][which-1], id, md, s, v, mask, f; md = *mem; if (c & SEND) *ID_BUS = md; id = *ID_BUS; <<simultaneous>> RD_REGS(c) = id; LD_REGS(c) = md; <<set null|src|m1m2|m2>> <<should have c&SS iff which!=3>>; if (which == 3 || is_ptb) { /* do an r/m/w cycle */ f = lf[lfnum]; mask = (f&LF_NOTM1? ~m1:m1) & (f&LF_NOTM2? ~m2:m2) & edgemask; v = apply(LF_MASK(f), md, src); *mem = (((v & fg) | (~v & bg)) & mask) | (md & ~mask); } } -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris