[comp.sys.hp] buggy hp9000/800 3.01 code?

gentry@kcdev.UUCP (Art Gentry) (03/30/89)

A few words to the wise on possible nasty bugs in the 3.01 release of 
HP9000/850 UNIX.

1) terminfo/curses? screens are reversed, ie: what should be inverse video
   is normal and visa versa.

2) IMAGE/9000 - new error code 8200 - suddenly started appearing about every
   20-25 transactions.  A 'gotcha' for programmers, normally we would only
   capture and print word(1) of the status array for error info.  With this
   new code (8200), you need to print all 10 words, as they contain additional
   info as to what the error is.  This is another of those infamous "call HP
   for assistance" errors.  They are supposed to be able to take the additional
   numbers and make some sense out of them.  By the way, this is a DBCore
   error.

3) several new programs - PATH conflicts - HP has released their menu driven
   system management package.  Nice, BUT, they stuck several new programs 
   into /usr/bin which can conflict with program/script names you may already
   have in directories defined after /usr/bin in $PATH.  I had one that drove
   me buggy trying to figure out why a previously perfectly acting program
   was suddenly giving me "permission denied".  Turned out I was grabbing one
   of thier new programs, not mine.  Moral of this story, put your directories
   ahead of the system default directories in $PATH or check VERY carefully
   the program list in the update manuals (which by the way are a TREMENDOUS
   improvement over previous issues).

In our case, HP could not readily figure the cause of items 1 & 2 so we
ended up reinstalling (read that - restored backups) for 2.1 and all is
happy again.

On the subject of restoring, learned another valuable lesson.  If you restore
root and/or /usr while in single user mode, do NOT do an INIT 2 to restart
the system!  During the restore, you will have overlaid several "in use"
system files and the INIT 2 will drop you back to 'login'.  One of the files
overlaid is utmp and login will barf as you no longer have a valid entry in
there and WILL NOT LET YOU LOG IN!!!  Always do a 'shutdown -r' to bring the
system back up after a restoral. That will get everything back into sync.

Like, later kids......
Art
opinions are my own, etc......

scf@statware.UUCP (Steve Fullerton) (03/31/89)

In article <659@kcdev.UUCP> gentry@kcdev.UUCP (Art Gentry) writes:
>A few words to the wise on possible nasty bugs in the 3.01 release of 
>HP9000/850 UNIX.

I have encountered another curious problem, anomaly, bug, with the
HP-UX 3.0 release.  Our code for doing raw input that worked fine
under HP-UX 2.0 (as well as about 20 other UN*X systems) now fails
under 3.0.  At first the 2.0x executable failed (I'll explain below)
when run under 3.0.  Next, we tried linking under 3.0 and got the same
failure.  Finally, re-compiling and linking yielded the same result.

What we are doing is setting termio to raw and reading characters 1 at
a time.  It works fine until a cursor key is pressed and then the 2nd
character gets hung in the buffer until the next read...very strange.
We are doing blocked reads so what happens is:

  1) Issue the read
  2) User enters a character, it is read and all is fine, go to 1)
  3) User enters a cursor key so the read returns an ESC, issue another read
  4) This second read blocks waiting for input.  When a key is
     pressed, the second character from the cursor key is what
     is read.
  5) Subsequent reads remain out-of-sync; i.e., when a character is
     entered, the previous entry is what is read.

The code we are using for raw input was extracted directly from Marc Roshkind's
"Advanced UNIX Programming".  Specifically, pages 88-89.  He suggests
termio.c_cc[VMIN] = 5 and termio.c_cc[VTIME] = 2.  I wasn't sure whether
or not the bug was related to VMIN and VTIME or else ICANON.  I was
finally able to get it to work by setting the VMIN value to 1 and the
VTIME value to 0.

I am confident in the code---it has been ported to all major UN*X systems
and even worked fine for HP-UX 2.0x as well as on HP9000/3xx 6.x systems.
So something is really wrong and programs doing screen handling might
have a big surprise when moving to 3.0.

-- 
Steve Fullerton                        Statware, Inc.
scf%statware.uucp@cs.orst.edu          260 SW Madison Ave, Suite 109
orstcs!statware!scf                    Corvallis, OR  97333
                                       503/753-5382

paul@prcrs.UUCP (Paul Hite) (03/31/89)

In article <659@kcdev.UUCP>, gentry@kcdev.UUCP (Art Gentry) writes:
> A few words to the wise on possible nasty bugs in the 3.01 release of 
> HP9000/850 UNIX.
> 
> 1) terminfo/curses? screens are reversed, ie: what should be inverse video
>    is normal and visa versa.

We are running 3.01 on our 850.  We have not experienced this problem with
curses.  We "ported" a curses program to the 850 and it worked fine.  The
program has inverse-video regions on the screen.  The same code works on
several different versions of hp-ux.  Can you provide more info on your
problem?   Could it be your terminfo entry?  Have you tried it more than
one type of terminal?  Can you post a small sample program that illustrates
the problem?

Paul Hite   PRC Realty Systems  McLean,Va   uunet!prcrs!paul    (703) 556-2243
                      DOS is a four letter word!

jp@otter.hpl.hp.com (Julian Perry) (04/01/89)

>The code we are using for raw input was extracted directly from Marc Roshkind's
>"Advanced UNIX Programming".  Specifically, pages 88-89.  He suggests
>termio.c_cc[VMIN] = 5 and termio.c_cc[VTIME] = 2.  I wasn't sure whether
>or not the bug was related to VMIN and VTIME or else ICANON.  I was
>finally able to get it to work by setting the VMIN value to 1 and the
>VTIME value to 0.

The following code fragment is the correct way to set the terminal modes to
read one character at a time (blocking for each one):

	ioctl(0,TCGETA,&modes);		/* Read the current settings */
	modes.c_lflag &= ~ICANON;	/* Off with line mode */
	modes.c_cc[VMIN] = 1;		/* Ask for 1 at a time */
	modes.c_cc[VTIME] = 0;		/* Ignore the timer */
	ioctl(0,TCSETAW,&modes);	/* Use the new settings */

I think that many implemetations of VMIN and VTIME are to ignore the timer
completely and effectively VMIN is always 1.  HP-UX used to be like this
a long time ago.  It's very MUX dependant.

Jules
-----
E-MAIL:		jp@hplb.hpl.hp.com || jp@hplb.hp.co.uk || jp@hplb.uucp
IN-REAL-LIFE:	Julian Perry
ORGANISATION:	Hewlett-Packard Laboratories, Bristol
ADDRESS:	Filton Road, Stoke Gifford, Bristol, England, BS12 6QZ
TELEPHONE:	+44 272 799910 x 24019

guy@auspex.auspex.com (Guy Harris) (04/02/89)

>I think that many implemetations of VMIN and VTIME are to ignore the timer
>completely and effectively VMIN is always 1.

I should hope implementations like that are dying off, especially since
the SVID says that VMIN and VTIME should work the way they're documented
in the "termio" man page....

>HP-UX used to be like this a long time ago.  It's very MUX dependant.

You mean "vendor dependent".  As distributed by AT&T, VMIN and VTIME are
handled by the line discipline, not by the serial port driver, so it
should, at least in principle, work the same for all terminal muxes.  I
think the intent is to do so in the S5R4 streams-based tty mechanism as
well, so that it'll be handled by a streams module, not the driver (the
SunOS 4.0 streams-based driver does it in a streams module and the
stream head). 

markf@hpupnja.HP.COM (Mark Fresolone) (04/02/89)

>Steve Fullerton
>>A few words to the wise on possible nasty bugs in the 3.01 release of 
>>HP9000/850 UNIX.
>I have encountered another curious problem, anomaly, bug, with the
>HP-UX 3.0 release. ...  >What we are doing is setting termio to raw and
>reading characters 1 at >a time. ...

As I understand it, the change in non-canonical processing for the VMIN>0,
VTIME>0 case was made quite intentionally, to better conform with the specs.
Aparently, VMIN was never intended to be used with reads of less than VMIN
characters.

>  1) Issue the read
>  2) User enters a character, it is read and all is fine, go to 1)

The scenario in steps 1 and 2 is this: at the first read, the initial timeout
is infinite.  When the first character is received, the read is still not
satisfied (neither VMIN=5 nor VTIME, which has not been started, is satisfied.
Of course, one can argue that the read() is satisfied if not the line disci-
pline...  You see the ambiguity.)  Anyhow, after this character is received,
the intercharacter timer is started (2 x 0.10 seconds, in your case).  For
normal keystrokes, this timer expires before the next character is received by
the LD, and for THIS reason, the read(,,1) returns the one character.

>  3) User enters a cursor key so the read returns an ESC, issue another read
>  4) This second read blocks waiting for input.  When a key is
>     pressed, the second character from the cursor key is what >     is read.
>  5) Subsequent reads remain out-of-sync; i.e., when a character is
>     entered, the previous entry is what is read.

Perhaps you can guess what happens when a cursor control key is pressed.  Both
characters are received by the LD, but the read is not satisfied yet as far as
VMIN/VTIME are concerned.  A fifth of a second later, the LD times out, and
proceeds to give the user program just what it asked for - one of the two
characters.  For the next read, the initial timeout begins as infinite
(arguable, perhaps, since one might expect it to treat the queued character as
"just received", and kick off the intercharacter timer).  The LD waits for the
"first character received".  When this (second) character is received, the
timer starts, causing the queued character to be copied to user space when it
expires.

>.. He suggests >termio.c_cc[VMIN] = 5 and termio.c_cc[VTIME] = 2. ...
>I was> finally able to get it to work by setting the VMIN value to 1 and the
>VTIME value to 0.

This is the most common "workaround" for this new behavior.  One might consider
whether this, or [ VMIN=0, VTIME>0 ], is not the behavior you wanted in the
first place.  [ VMIN>0, VTIME>0 ] is generally used for burst mode.  By the
way, while Mr. Rochkind does not show the cooperating read() for setraw() in
Sec. 4.5, in his use of non-canonical mode in Sec. 4.4.8, he goes so far as
to set VMIN explicitly as:

	" tbuf.c_cc[4] = sizeof(buf);  /* MIN */ "

where he later uses:

	" switch (total = read(0, buf, sizeof(buf))) { "

What I find interesting about his non-canonical examples is that in Sec. 8.7,
he sets VMIN = VTIME = 0, in a program where he wishes to time a read using
alarm() (see top paragraph on p. 224).  It appears, however, that he might
end up with quite a few "Mysterious EOF"s!  I have not tried it.

>I am confident in the code---it has been ported to all major UN*X systems
>and even worked fine for HP-UX 2.0x as well as on HP9000/3xx 6.x systems.
>So something is really wrong and programs doing screen handling might
>have a big surprise when moving to 3.0.

Many machines and programs use the pre-3.01 behavior, where the read size is
less than VMIN, and I have a feeling we have not heard the last of this.

Mark Fresolone
HP Systems Engineer
These opinions are solely mine, and not necessarily those of my employer...

gentry@kcdev.UUCP (Art Gentry) (04/03/89)

In article <1370@prcrs.UUCP>, paul@prcrs.UUCP (Paul Hite) writes:
> In article <659@kcdev.UUCP>, gentry@kcdev.UUCP (Art Gentry) writes:
> > A few words to the wise on possible nasty bugs in the 3.01 release of 
> > HP9000/850 UNIX.
> > 
> > 1) terminfo/curses? screens are reversed, ie: what should be inverse video
> >    is normal and visa versa.
> 
> We are running 3.01 on our 850.  We have not experienced this problem with
> curses.  We "ported" a curses program to the 850 and it worked fine.  The
> program has inverse-video regions on the screen.  The same code works on
> several different versions of hp-ux.  Can you provide more info on your
> problem?   Could it be your terminfo entry?  Have you tried it more than
> one type of terminal?  Can you post a small sample program that illustrates
> the problem?
> 
Hmmmm, interesting.  We tried both the "standard" terminfo entry that came
with 3.01 and our own, which are slightly modified (for printer usage).  Both
caused the inverse video to be reversed on 3.01 and worked correctly when we
went back to 2.1

As a follow up to the bug I reported in HPImage, we have discovered 5 chains
that appear to have been corrupted by 3.01.  On doing a hpifind to locate the
head of the chain and then doing a chained hpiget, Image returns an 8100
error code (and, yep, ya gotta print out the rest of the status array to 
find out what the problem is!)  Status(2) returns a 13137 DBCore internal
error - missing tuple.  Remedy is to unload the database, erase the database,
recreate the database and reload.  And you MUST do a SERIAL unload, by doing
a defaulted CHAIN unload, you'll get the same 13137 error and the unload will
abort.  Priliminary investigation suggests that 3.01 caused several chains
to lose their pointers from MASTER to DETAIL datasets.  More to follow as
we discover more.

Art

scf@statware.UUCP (Steve Fullerton) (04/03/89)

In article <2980004@otter.hpl.hp.com> jp@otter.hpl.hp.com (Julian Perry) writes:
>>The code we are using for raw input was extracted directly from Marc Roshkind's
>>"Advanced UNIX Programming".  Specifically, pages 88-89.  He suggests
>>termio.c_cc[VMIN] = 5 and termio.c_cc[VTIME] = 2.  I wasn't sure whether
>>or not the bug was related to VMIN and VTIME or else ICANON.  I was
>>finally able to get it to work by setting the VMIN value to 1 and the
>>VTIME value to 0.
>
>The following code fragment is the correct way to set the terminal modes to
>read one character at a time (blocking for each one):
>
>	ioctl(0,TCGETA,&modes);		/* Read the current settings */
>	modes.c_lflag &= ~ICANON;	/* Off with line mode */
>	modes.c_cc[VMIN] = 1;		/* Ask for 1 at a time */
>	modes.c_cc[VTIME] = 0;		/* Ignore the timer */
>	ioctl(0,TCSETAW,&modes);	/* Use the new settings */

This should probably read "correct for HP-UX 3.0".  It doesn't explain why
the code worked fine for HP-UX 2.0x, HP-UX 6.x (on the 300's) as well as
on the Series 500's.  The only thing that changed was the operating system
revision from 2.0x to 3.0.  The same code also works fine for every other
non-BSD UNIX system I have tried which includes Pyramid, Sequent, AT&T 6386,
Altos, Unisys 5000/7000, NCR Tower,..., the list is long.

Furthermore, Rochkind's book "Advanced UNIX Programming" was supplied by
HP with our manuals for our HP9000/825.  To quote from the book, "The idea
behind MIN and TIME is to allow a process to get characters as, or soon
after, they are typed without losing the benefits of reading several
characters with a single read system call."

>I think that many implemetations of VMIN and VTIME are to ignore the timer
>completely and effectively VMIN is always 1.  HP-UX used to be like this
>a long time ago.  It's very MUX dependant.

The idea of having VMIN and VTIME be MUX dependent is quite scary to me
and would appear to make it non-SVID compliant.

-- 
Steve Fullerton                        Statware, Inc.
scf%statware.uucp@cs.orst.edu          260 SW Madison Ave, Suite 109
orstcs!statware!scf                    Corvallis, OR  97333
                                       503/753-5382