[comp.sys.hp] DIL bug in hp9000-s300...

jeffh@weycord.WEYCO.COM (08/15/88)

Here's one for Drew... 

Configuration:
       Hp9000-s300  (350)
       HP-UX 6.01   (6.2 DIL)
       (the rest is irrelevant)

Operation:
       The 350 has passed control to an instrument and
       is in a loop:

	   forever {
		 hpib_status_wait(raw_id,6); /* wait addressed to listen */
		 read(raw_id,buffer,131872); /* read a CHUNK of data */
		 crunch_the_data(); /* fft's, synchronous averages etc... */
	   }

Well, that's about the extent of it. The program loops every 2.133 seconds
and runs just fine for EXACTLY 9.6506 hours. Ever try waiting around for
9.6506 hours to catch a bug and discover you didn't have enough debug stuff
in your code. :-( Anyway, it will die and return -1 on the read and errno
will be set to EINVAL. So what happens after 9.6506 hours that will cause 
this program to crash? In 9.6506 hours it's looped 16288 times? 

It turns out that it is not related to the run time at all- it's the total 
number of I/O bytes accumulated with the open device file. When the total 
number of bytes exceeds 2,147,483,647 (ie.. a long) DIL don't like it one bit!

The work around is to close the device file, open the device file and 
continue what you were doing every time you've accumulated (2,147,483,647 - 1)
I/O bytes on the open device file. Unless, of course, it's a time critical
task LIKE THE ONE I'M WORKING ON KNOW!

I've looked in the documentation and haven't found this documented
so I'm assuming it's a bug. Of course, we all know what assuming does. :-)
Unless you're doing a continuous data acquisition task you will not likely 
see it. But, if it happens, it's guaranteed to drive you up the wall!

--
Jeff (Spectra Software) Harrell

I know what you're thinking:
   Hay, I've got an idea! Let's use a 48 bit counter. Then, Jeff will 
   have to wait for 140,737,488,355,327 bytes before it craps out.  
   That's 275,556.4647 hours- he'll be retired by then. 
   
   I'll wait...  :-)(