CLAYTON@XRT.UPENN.EDU ("Clayton, Paul D.") (04/10/88)
Information From TSO Financial - The Saga Is Coming To A Close... Chapter 34 - April 9, 1988 It has been a lengthy period since my last saga, and a new episode has presented the subject matter for this article. I have been watching the messages over INFO-VAX with special interest in stories that other sites have had to date with VMS 4.6/.7. The bottom line is that there were few messages, and they dealt with topics that did not impact me, therefore I felt comfortable upgrading to VMS 4.6/.7. So on a brisk night, the upgrade was done to a VAX 11/750 which I have configured as a printer driver. After some problems trying to get the system to boot from a brand new, out of the box, RM80 HDA assembly all appeared to be fine and there was no problem with the resulting system. By the way, did you know that a special INTERNAL drive diagnostic MUST be run on the RM80 HDA assembly after it is installed if you are to boot from it. If it is not, the drive appears and behaves PERFECTLY, with ONE minor exception. You CAN NOT boot from it. Oh well. Back to VMS 4.6/.7. The 11/750 worked well for two weeks after which we felt that the upgrade could be done on the first of the two clusters we have. The first cluster used consists of six nodes ranging from a 8200 to 8700's and is what I call an AC/DC cluster. It's a hetro/homogeneous cluster. Three nodes do one thing in combination and three do another. Sometimes. We did the upgrade on a Sunday and it finally completed, after a several major and minor problems, at noon time on Monday. The major problem was a mass cluster suicide, with the informative bugcheck message 'CLUEXIT'. This stands for 'Cluster Exit' which is EXACTLY what occurred. Sigh... Anyway, the systems came back up, the bank opened for business, late, and the user community proceeded to 'pounce' on the system to make up for lost time. Then the fun started. Our systems and network are large, by some standards. On most of the systems, we have 60+ printer queues defined in various ways. If you are on the cluster, there is one generic queue and two execution queues PER printer. This allows for continued printing in the event a system is not up. This adds up to 60+ generic queues and 130+ execution queues. Then there are the 30+ BATCH queues that are also on the systems. Now multiply this by six or seven separate systems and the numbers are LARGE. If you had not guessed, queues are a way of life here. The network we have is also a big thing for us and consists of T1's, Ethernet, Bridges and 158+ terminal servers ranging from DS100's to DS500's. Well the fastest way to waste a day, is to be in work when the T1's go south. From that point on, you are dedicated to STOPping/STARTing queues with the /ON qualifier and in some cases deleting jobs, logging terminal server ports out and in some worst case events, giving a printer its own processor (LATSYM) image to run. This is done on EACH system. Needless to say this is time consuming. Why not use .COM files to automate this procedure for us, I hear you asking!! We have, and the problems eliminate the abilities of .COM files. The list of problems that we have or had, is the basis for my suggestion that if you have lots of LAT printers, maybe 4.6/.7 is NOT for you JUST YET. The following is a partial list of the problems we have had on the systems. 1. Aborting of print jobs with a 'Structure Level' error returned from the print symbiont. 2. Print jobs going to a RANDOM printer, with total disregard of the '/ON' qualifier. 3. Multiple jobs printing AT THE SAME TIME on the same printer. This results in one line of one file, with the next from some other file. It is very interesting reading memos printed on these printers. 4. Inability to delete jobs from a queue once it is in a 'ABORTING' state and the printer is STALLED or PAUSED. A REBOOT is the fix here. 5. Failure of the JBCSYSQUE system to correctly process a START/STOP command which has qualifiers on it. There appears to be an internal TIME delay and the result changes due to processor load, number of queues and other such factors. 6. Queues STOPping themselves for NO apparent reason. Also getting into PAUSEd states for no apparent reason at times. 7. Apparent problems with high speed printers on the LAT. We have about a dozen 600 LPM printers out there on the network and the throughput appears to cause problems for the LATSYM image. Single processors for these are our only hope for this one. 8. Print symbionts going into RWAST state and never getting out. Reboot is the fix here also. The bottom line, after spending hours on the phone with local field service and Colorado Tech Support, is that problems ARE present in the software. We have upgraded the LTDRIVER.EXE image twice and the LATSYM.EXE image three times. Problems continue to occur, so we will be upgrading again. One option that has most recently been offered is to use the latest upgraded LTDRIVER image with the LATSYM 1.1 (VMS 4.5) image. This results in single threaded, one processor (LATSYM) image per printer, which is the behavior of VMS 4.5. Not something I want for long. Having 60+ processes hanging there in the system for the printers is one MAJOR factor in GOING to 4.6/.7 from my vantage point. I do not want to argue the HIB state, or the swapped out state as eliminating this concern. So, should you be thinking UPGRADE, and have LOTS of printers you MAY want to hold off and start calling your local office and Colorado to get the latest news AND images. They should be able to pull it off the internal DEC network. The latest version of the images we have is listed below. LTDRIVER.EXE X7N-17 LATSYM.EXE 2.4-001 Hope this helps, so you are not in the same shape we are. Paul D. Clayton - Manager Of Systems Advanta Corp. - Horsham, Pa. USA Formerly - TSO Financial Address - CLAYTON%XRT@CIS.UPENN.EDU
CHRIS@YMIR.BITNET (Chris Yoder) (04/23/88)
[RAID, Rid Any Insects Directly] Regarding Paul Clayton's problems with the print queues and VMS 4.6/4.7. I had many of the same problems with queues getting stuck and staying stuck, of SYMBIONT_XXXX processes hanging around in RWAST states essentially forever (thus keeping those printers permanently allocated), and general havoc with the queues. The solution to this problem turned out to be increasing the sysgen parameter MAXBUF to over 5000. Since doing so (and then rebooting...), I haven't experienced any problems with my queues. Several other sites have also used this solution with success. I suppose that I should have posted this earlier, but I heard it through the grapevine and naturally assumed that I had just missed it on Info-Vax. (We BITNETers seem to never get any of our postings back and miss about 1/4 of the messages...) -- Chris Yoder Harvey Mudd College Bitnet ----- Chris@Ymir.Bitnet -------