[comp.sys.next] Strange optical disk problems

carlton@ji.Berkeley.EDU (Mike Carlton) (03/31/89)

We've been having some strange problems with our optical disks and I was
hoping someone in netland might be able to explain what's going on (and 
how to fix them!).

We're running a SCSI-based NeXT and using the optical disks for data storage.
The data files tend to be quite large (on the order of 75 megabytes each).
We've reformatted the optical to a single partition to allow us to store the
large data files.

Now for the fun stuff.  The first problem is 'soft timeouts'.  These don't
appear to be a real problem but can someone explain what's going on?  Here is
a portion of the console log when the problems occured (we were in the process
of writing one of these 75MB files to the optical):
	Home is /bootdisk/Homes/holmer
	Path is .:~/Apps:/bootdisk/Apps:/bootdisk/NeXT/Apps:
	 	    /bootdisk/Programming/Demos
	 	Disk Label: backup
		Label Version #2
	off -8 lastexp 94 lastmode 1 RJ 0x513f
	od0a: write re-spin (soft timeout) block 4440 phys block 4720 (4444:0:0)
	off -8 lastexp 93 lastmode 1 RJ 0x513f
	od0a: write re-spin (soft timeout) block 67800 phys block 68400 (8424:0:0)

The other problem is more serious.  We aren't able to write more than one
75 meg file.  When we try to generate the second one, we get lots of strange
disk full errors when df shows something like 134 meg unused.  The file also
appears to be there when we do an ls (it shows a size of ~75 megs).  However,
the file is empty when we look at it!?  Below is a script showing some of
the oddities, anyone have any ideas?


--------------------------------------------------------------------------------
	Script started on Sun Mar 19 19:52:52 1989
	bam:/MyDisk/Homes/carlton <1> cd ~holmer/backup/traces
	bam:/bootdisk/Homes/holmer/backup/traces <2> ls -l
	total 148688
	-rw-r--r--  1 carlton  77019406 Mar 19 19:42 foo
	-rw-r--r--  1 holmer   76697566 Mar 18 19:42 foo.old
	-rw-r--r--  1 carlton   166812 Mar 19 17:24 nrev

	bam:/bootdisk/Homes/holmer/backup/traces <3> head foo.old
	f 1 0
	f 1 1
	f 1 2			
	w 2 10
	w 2 11
	f 1 3
	w 2 12
	w 2 13
	f 1 4
	f 1 5

# The foo.old file was generated first and is fine

	bam:/bootdisk/Homes/holmer/backup/traces <4> head foo

# The foo file was generated second, however, it is empty!?  
# Why does 'ls' shows it to have 77 meg?

	bam:/bootdisk/Homes/holmer/backup/traces <5> du .
	148689  .
	bam:/bootdisk/Homes/holmer/backup/traces <6> df
	Filesystem            kbytes    used   avail capacity  Mounted on
	/dev/sd0a             258991  146798   86293    63%    /
	/dev/sd0b              88423   70482    9098    89%    /bootdisk/NeXT
	/dev/od0a             234963  149706   61760    71%    /bootdisk/Homes/holmer/backup
	bam:/bootdisk/Homes/holmer/backup/traces <7> wc foo
	       0       0       0 foo

# Here is the console log

	bam:/bootdisk/Homes/holmer/backup/traces <8> more /tmp/console.log
	Home is /bootdisk/Homes/holmer
	Path is .:~/Apps:/bootdisk/Apps:/bootdisk/NeXT/Apps:/bootdisk/Programming/Demos
	Mar 19 18:47:37 bam su: carlton on /dev/ttyb
	Mar 19 18:50:10 bam su: carlton on /dev/ttyb
	Mar 19 19:00:29 bam su: carlton on /dev/ttyb
	IO error on pageout: error = 22.
	vnode_pageout: failed!
	IO error on pageout: error = 22.
	vnode_pageout: failed!

# and 28 more 'IO error on pageout: error = 22' deleted

	IO error on pageout: error = 28.
	vnode_pageout: failed!
	IO error on pageout: error = 28.
	vnode_pageout: failed!

# Filesystem full??

	Mar 19 19:42:19 bam vmunix: /bootdisk/Homes/holmer/backup: file system full

# and 35 more 'IO error on pageout: error = 28.' deleted
	
	script done on Sun Mar 19 19:57:21 1989
--------------------------------------------------------------------------------

And finally, just to make things more confusing.  We unmounted the optical,
rebooted the system and remounted the optical.  The contents of the foo file 
now appeared, but the end of the file was truncated (we didn't find out how
much of the file was truncated, unfortunately.

My guess is that somehow the pageout process died or got confused (I don't
know anything about the internals of Mach, so this is almost certainly wrong).
Could the problem be the virtual memory size of the process creating the file?
Or, is there some hard-coded limit to partition size?  I notice that 150 meg 
is the size of the /a partition on a normally formatted, two-partition 
optical.  Could there be some relation here?  

Thanks in advance for any and all hints you can offer.  This is a real problem
if we can't put more than 1 75 meg file per optical.