[comp.os.vms] Problems With VMS 4.6/.7 Relative To LAT Printers...

CLAYTON@XRT.UPENN.EDU ("Clayton, Paul D.") (04/10/88)

Information From TSO Financial - The Saga Is Coming To A Close...
Chapter 34 - April 9, 1988


It has been a lengthy period since my last saga, and a new episode has 
presented the subject matter for this article.

I have been watching the messages over INFO-VAX with special interest in 
stories that other sites have had to date with VMS 4.6/.7. The bottom line 
is that there were few messages, and they dealt with topics that did not 
impact me, therefore I felt comfortable upgrading to VMS 4.6/.7.

So on a brisk night, the upgrade was done to a VAX 11/750 which I have 
configured as a printer driver. After some problems trying to get the system 
to boot from a brand new, out of the box, RM80 HDA assembly all appeared to be 
fine and there was no problem with the resulting system. By the way, did you 
know that a special INTERNAL drive diagnostic MUST be run on the RM80 HDA 
assembly after it is installed if you are to boot from it. If it is not, the 
drive appears and behaves PERFECTLY, with ONE minor exception. You CAN NOT 
boot from it. Oh well.

Back to VMS 4.6/.7. The 11/750 worked well for two weeks after which we felt 
that the upgrade could be done on the first of the two clusters we have. The 
first cluster used consists of six nodes ranging from a 8200 to 8700's and is 
what I call an AC/DC cluster. It's a hetro/homogeneous cluster. Three nodes 
do one thing in combination and three do another. Sometimes. We did the 
upgrade on a Sunday and it finally completed, after a several major and minor 
problems, at noon time on Monday. The major problem was a mass cluster suicide, 
with the informative bugcheck message 'CLUEXIT'. This stands for 'Cluster Exit'
which is EXACTLY what occurred. Sigh...

Anyway, the systems came back up, the bank opened for business, late, and the 
user community proceeded to 'pounce' on the system to make up for lost time. 
Then the fun started. Our systems and network are large, by some standards. On 
most of the systems, we have 60+ printer queues defined in various ways. If 
you are on the cluster, there is one generic queue and two execution queues 
PER printer. This allows for continued printing in the event a system is not 
up. This adds up to 60+ generic queues and 130+ execution queues. Then there 
are the 30+ BATCH queues that are also on the systems. Now multiply this by six 
or seven separate systems and the numbers are LARGE. If you had not guessed, 
queues are a way of life here. The network we have is also a big thing for us 
and consists of T1's, Ethernet, Bridges and 158+ terminal servers ranging from 
DS100's to DS500's.

Well the fastest way to waste a day, is to be in work when the T1's go south. 
From that point on, you are dedicated to STOPping/STARTing queues with the 
/ON qualifier and in some cases deleting jobs, logging terminal server ports 
out and in some worst case events, giving a printer its own processor 
(LATSYM) image to run. This is done on EACH system. Needless to say this is 
time consuming.

Why not use .COM files to automate this procedure for us, I hear you 
asking!! We have, and the problems eliminate the abilities of .COM files. The 
list of problems that we have or had, is the basis for my suggestion that if 
you have lots of LAT printers, maybe 4.6/.7 is NOT for you JUST YET. The 
following is a partial list of the problems we have had on the systems.

1. Aborting of print jobs with a 'Structure Level' error returned from the 
print symbiont.

2. Print jobs going to a RANDOM printer, with total disregard of the '/ON' 
qualifier.

3. Multiple jobs printing AT THE SAME TIME on the same printer. This results 
in one line of one file, with the next from some other file. It is very 
interesting reading memos printed on these printers. 

4. Inability to delete jobs from a queue once it is in a 'ABORTING' state 
and the printer is STALLED or PAUSED. A REBOOT is the fix here.

5. Failure of the JBCSYSQUE system to correctly process a START/STOP command 
which has qualifiers on it. There appears to be an internal TIME delay and 
the result changes due to processor load, number of queues and other such 
factors.

6. Queues STOPping themselves for NO apparent reason. Also getting into 
PAUSEd states for no apparent reason at times.

7. Apparent problems with high speed printers on the LAT. We have about a 
dozen 600 LPM printers out there on the network and the throughput appears 
to cause problems for the LATSYM image. Single processors for these are our 
only hope for this one.

8. Print symbionts going into RWAST state and never getting out. Reboot is the 
fix here also.

The bottom line, after spending hours on the phone with local field service 
and Colorado Tech Support, is that problems ARE present in the software.

We have upgraded the LTDRIVER.EXE image twice and the LATSYM.EXE image three 
times. Problems continue to occur, so we will be upgrading again. One option 
that has most recently been offered is to use the latest upgraded LTDRIVER 
image with the LATSYM 1.1 (VMS 4.5) image. This results in single threaded, 
one processor (LATSYM) image per printer, which is the behavior of VMS 4.5. 
Not something I want for long. Having 60+ processes hanging there in the system 
for the printers is one MAJOR factor in GOING to 4.6/.7 from my vantage 
point. I do not want to argue the HIB state, or the swapped out state as 
eliminating this concern.

So, should you be thinking UPGRADE, and have LOTS of printers you MAY want 
to hold off and start calling your local office and Colorado to get the 
latest news AND images. They should be able to pull it off the internal DEC 
network. The latest version of the images we have is listed below.

LTDRIVER.EXE 		X7N-17
LATSYM.EXE		2.4-001

Hope this helps, so you are not in the same shape we are.

Paul D. Clayton - Manager Of Systems
Advanta Corp. - Horsham, Pa. USA
Formerly - TSO Financial
Address - CLAYTON%XRT@CIS.UPENN.EDU

CHRIS@YMIR.BITNET (Chris Yoder) (04/23/88)

[RAID, Rid Any Insects Directly]

     Regarding Paul Clayton's problems with the print queues and VMS 4.6/4.7. I
had many of the same problems with queues getting stuck and staying stuck, of
SYMBIONT_XXXX processes hanging around in RWAST states essentially forever
(thus keeping those printers permanently allocated), and general havoc with the
queues.  The solution to this problem turned out to be increasing the sysgen
parameter MAXBUF to over 5000.  Since doing so (and then rebooting...), I
haven't experienced any problems with my queues.  Several other sites have also
used this solution with success.

     I suppose that I should have posted this earlier, but I heard it through
the grapevine and naturally assumed that I had just missed it on Info-Vax.  (We
BITNETers seem to never get any of our postings back and miss about 1/4 of the
messages...)

-- Chris Yoder
     Harvey Mudd College        Bitnet ----- Chris@Ymir.Bitnet
-------