[comp.sys.encore] Misc. Umax 4.3 Problems

dcourte@valhalla.wright.edu (Dale Courte) (11/20/90)

(I am sorry if this is posted multiple times. We are experiencing some
posting problems here.)

Before I start a game of phone tag with Encore, I thought I'd find out
if the following are "known bugs". Below I describe several problems I
have been encountering under Umax 4.3. Feel free to provide any help or
guidance if any of you have had similar problems.

We have a Multimax 320, 2 APC boards, 64 Meg, and 2 NEC 650 Meg drives.

1) Problems with 'dump'.

   I have encountered two problems with the 'dump' program distributed
   with 4.3. Oe of them is minor, the other caused me a couple of days
   work. First, dump seems more liberal in determining how much can fit
   onto a tape than the previous (4.2) version. Using the same tapes we
   had been using in backup rotation for about two years, suddenly dump
   would write clear to the end of tape, at which point a write error
   would occur. This happenen at least three weeks in a row during
   routine backups. The problem was solved by using the -s option and
   giving a size of 2150, rather than 2300 for tape length.

   Secondly, I discovered the hard way that the new dump program does
   not seem to be writing end-of-file markers at the end of all but the
   last volume in multiple-volume dumps. I backed up all file systems,
   repartitioned the disks, and attempted a restore. Restore failed
   with a read error at the end of the first tape volume for both file
   systems I had backed up. This was a near disaster, as these were user
   file systems. I then did some investigating and found that all weekly
   backups done since 4.3 was installed had the same problem (A simple
   'dd if=/dev/rtape of=/dev/null' resulted in a read error at the end
   of tape each time). So, I had no immediate backup of the file systems
   I had repartitioned, and no usable weekly backups. After considerable
   head-scratching, I discovered that if I ran these tapes out to the
   point of error, without rewind, then used mt to backspace two records
   and write an end-of-file mark, the tapes could be used. I have since
   resorted to using the old (4.2) version of dump, reloaded from tape.

   Anyone have a clue on this?

2) Problems with rmt.

   We recently purchased and installed NFS on our Multimax in the hopes
   that we could mount remote file systems that cost much less per byte
   than any storage options available from Encore. The plan was to have
   backups done across the net via rdump on the file server. The server
   in this case is a DECstation 3100, but in testing we have found
   identical problems to occur with a VAXstation 3100 and a SUN 3
   system. The problem is best shown by an example:

   zebra # rdump 0of /deeve:/dev/rtape /dev/rz1a
     DUMP: Date of this level 0 dump: Tue Oct 23 09:16:17 1990
     DUMP: Date of last level 0 dump: the epoch
     DUMP: Dumping /dev/rrz1a (/) to /dev/rtape on host eve
     DUMP: Mapping (Pass I) [regular files]
     DUMP: Mapping (Pass II) [directories]
   
     DUMP: Estimates based on 2300 feet of tape at a density of 1600 BPI...
     DUMP: This dump will occupy 905 (10240 byte) blocks on 0.23 tape(s).
   
     DUMP: Protocol to remote tape server botched (in rmtgets).
   Lost connection to remote host.
   Try using the -o option.

   The -o option is an Ultrix option to allow for "non-Ultrix" systems.
   Note that the option is already being used, so the message suggesting
   it can be ignored. Note also that the same problem occurs from a SUN
   3 system, which rules out the possibility of an Ultrix compatibility
   problem.

   This could be a big problem, becuase the only other way I have to
   back up a Gigabyte of storage is via a _very_ slow DEC TK50 drive.

   Again, any info would be appreciated.

3) Problems with the new 'date' program.

   The 4.3 date program does not seem to want to realize daylight
   savings time. The paramter in /Umax.param is set to indicate EDT, but
   date will only display EST. If I set the time and display it with the
   old (4.2) date program, it displays one hour ahead of what the new
   'date displays, and clearly indicates EDT, while new date will only
   display EST.

4) New sh very slow.

   This isn't really a big problem, but tests I have run on several
   scripts I run frequently show that the new sh is very slow compared
   to the old sh. using 'time' to time the scripts under both shells has
   revealed a slowdown of nearly 100%. The slow down is significant
   enough that I have put "#!/usr/old/sh" in most of my scripts.
   Anyone else encounter this?

5) Looking for a "watcher" program. Untamo has broken.

   I had been using a pd piece of software named 'untamo' to log idle
   users off the system. This seems to have broken under 4.3. I can
   start the daemon. It works for a while, then seems to just hang. Goes
   into a permanent wait state of some kind. Never logging anyone off,
   and never using any additional CPU time. Does anyone have any
   alternative daemons they use for this task. Right now I am doing it
   by hand.

Sorry to be so long winded here, but I have been saving these up. All in
all, 4.3r4.0 has been quite reliable for a major release ending in '.0',
and I do not mean for this posting to indicate I am unhappy with it. My
only complaint is that resolving problems via Encore's 1-800 number is
often slow and tedious (numerous phone calls, having to re-explain the
problem multiple times, etc.). Is there an electronic alternative to the
1-800 number at Encore? 

-Dale Courte
 Wright State University
 CSNET: dcourte@eve.wright.edu
-- 
-----------------------------       -------------------------------------
Dale Courte                         CSNET: dcourte@eve.wright.edu
University Computing Services       BITNET: dcourte@wsu
Wright State University             UUCP: ..!uunet!ncrlink!wright!dcourte

jrm@stegosaur.cis.ohio-state.edu (John R. Mudd) (11/22/90)

In article <1990Nov19.182329.868@cs.wright.edu> dcourte@valhalla.wright.edu (Dale Courte) writes:
 >My
 >only complaint is that resolving problems via Encore's 1-800 number is
 >often slow and tedious (numerous phone calls, having to re-explain the
 >problem multiple times, etc.). Is there an electronic alternative to the
 >1-800 number at Encore? 

Send mail to cmc@encore.com.  cmc stands for (I believe) Call Management
Center.

... John

john@loverso.leom.ma.us (John Robert LoVerso) (11/22/90)

Dale Courte writes:
> 4) New sh very slow.
> 
>    This isn't really a big problem, but tests I have run on several
>    scripts I run frequently show that the new sh is very slow compared
>    to the old sh. using 'time' to time the scripts under both shells has
>    revealed a slowdown of nearly 100%. The slow down is significant
>    enough that I have put "#!/usr/old/sh" in most of my scripts.
>    Anyone else encounter this?

You should do yourself a favor and replace the Umax4.3 /bin/sh with the
"/usr/old/sh".  Why?  Up until Umax"4.3", /bin/sh was the BRL bourne
shell, which is a SysVr2 /bin/sh with job control and various built-ins
(test).  However, it was compiled without job control and without
some of the built-ins.  Never the less, this is a vast improvement
over the ancient V7 /bin/sh that 4BSD still includes.  In other words,
it was a good thing.

For Umax4.3, in order to be "more" 4.3-compatible, they apparently
decided (and documented in the release notes) to toss out the BRL /bin/sh
and use the straight 4.3 /bin/sh.  This is to say ``for our newly upgraded
system, we have *removed* shell functions and replaced a vital tool with
a version 7 years older in order to improve things'' is ridiculous.

So, don't bother with `#!/usr/old/sh', just move /bin/sh to /bin/sh.4.3
(where it was in Umax4.2) and put back the better /bin/sh.

John LoVerso, Our House, john@loverso.leom.ma.us
To reach the corporate puppet: loverso@westford.ccur.com

terryk@encore.com (Terence Kelleher) (11/22/90)

In article <1990Nov19.182329.868@cs.wright.edu>, dcourte@valhalla (Dale Courte) writes:
>(I am sorry if this is posted multiple times. We are experiencing some
>posting problems here.)
>1) Problems with 'dump'.
>
>   would occur. This happenen at least three weeks in a row during
>   routine backups. The problem was solved by using the -s option and
>   giving a size of 2150, rather than 2300 for tape length.
>

I have personally not heard of this problem.  It may be possible that
you are seing high error rates on your tape deck, which would cause it
to consume more tape.  

>   Secondly, I discovered the hard way that the new dump program does
>   not seem to be writing end-of-file markers at the end of all but the
>   last volume in multiple-volume dumps. I backed up all file systems,
>
>   Anyone have a clue on this?
>

This problem we have seen, and a fixed version is in release 4.1 on
Umax4.3.  This is not yet distributed to customers.  It will be as
soon as our QA is satisfied with it.

In the mean time, the fixed dump program can be ftp'ed from
encore.encore.com.  Anonymous login is allowed.
-- 
Terence Kelleher
Encore Computer Corporation
terryk@encore.com

terryk@encore.com (Terence Kelleher) (11/22/90)

Regarding the Umax4.3 dump EOF marker problem, there is also an
updated "restore" program available on encore.encore.com that will
read the dump tapes made with the problem dump utility.  When an error
is encountered, the operator has the option of continuing the restore,
by assuming the error is actually the end-of-file.

The 2 files are:
	pub/dump 
	pub/restore.4.0
-- 
Terence Kelleher
Encore Computer Corporation
terryk@encore.com

naresh@encore.Berkeley.EDU (Naresh Dharnidharka) (11/27/90)

In article <1990Nov19.182329.868@cs.wright.edu>,
dcourte@valhalla.wright.edu (Dale Courte) writes:
> 
> 3) Problems with the new 'date' program.
> 
>    The 4.3 date program does not seem to want to realize daylight
>    savings time. The paramter in /Umax.param is set to indicate EDT, but
>    date will only display EST. If I set the time and display it with the
>    old (4.2) date program, it displays one hour ahead of what the new
>    'date displays, and clearly indicates EDT, while new date will only
>    display EST.
> 
I believe if you do a "zic -d /usr/etc/zoneinfo -l EST5EDT" to set localtime
to EST5EDT it should solve your problem.

----
Naresh Dharnidharka
Encore Computer Corporation
Mach Softare Development Group
UUCP: {talcott,linus,necis,decvax}!encore!naresh
Internet: naresh@encore.com

wtm@buengc.BU.EDU (W. Thomas Meier) (11/29/90)

 Hi,

 1) Dumps: 

	There is a patch for the EOT marker problem however I have not
	run across the length problem that you described.  If you have
	already tried the "dsf" options to dumps to make sure of your
        settings, then you may have found something new.

 2) rmt:

	We don't run remote backups on our Multimaxes (yet).  I haven't
	seen that error but then again we are not running the 4.3 NFS 
 	on our multies (yet).  I know that the NFS versions are sometimes
 	"older version" than the plain Umax.image files that Encore sends
	you.

 3) date:

	Haven't felt that one yet.  You may have something.

 4) sh: 
	
	I haven't noticed that sh is slower.  It may make sense though.
	I think that the old shell has many /bin functions embedded in 
	the shell binary.  The new 4.3 shell is pretty small (1/2 the 
	size of the old one).  Somehow I don't believe that they just
 	wrote more efficient code.

 5) watcher:

	I would have to see the code to figure that one out.

 _Tom Meier
  
 P.s.

	If you have problems like these that go on and on, you should
	get in touch with Encore management.  Specifically, Bob Pisacani
	in Marlborough.  He is a regional manager for the Customer Support
	division.

 	As an asside.  I have been going to the Encore Users Group meetings
	in Florida.  There are not a lot of Multimax owners there.  How come?
	

	
	

wtm@buengc.BU.EDU (W. Thomas Meier) (11/29/90)

In article <13329@encore.Encore.COM> terryk@encore.com (Terence Kelleher) writes:
>Regarding the Umax4.3 dump EOF marker problem, there is also an
>updated "restore" program available on encore.encore.com that will
>read the dump tapes made with the problem dump utility.  When an error
>is encountered, the operator has the option of continuing the restore,
>by assuming the error is actually the end-of-file.
>
>The 2 files are:
>	pub/dump 
>	pub/restore.4.0
>-- 
>Terence Kelleher
>Encore Computer Corporation
>terryk@encore.com

 I didn't see these files at encore.com.  Also, when we found this error
 Encore didn't give us a new restore.4.0.  We still see this problem with
 your new dump when we load multiple dump files across two volumes.  

 The dump file that is across the volume set will continue to experience
 data loss.

 _Tom Meier

rickert@mp.cs.niu.edu (Neil Rickert) (11/29/90)

In article <6253@buengc.BU.EDU> wtm@buengc.bu.edu (W. Thomas Meier) writes:
>>
>>The 2 files are:
>>	pub/dump 
>>	pub/restore.4.0
>>-- 
>>Terence Kelleher
>>Encore Computer Corporation
>>terryk@encore.com
>
> I didn't see these files at encore.com.  Also, when we found this error

  As soon as that message showed up, I 'ftp'ed to Encore.com, and quickly
found the files.  I am not sure if they were in pub, or in an obviously
name subdirectory of pub.  I had no difficulty finding them, getting a copy,
and installing them.

> Encore didn't give us a new restore.4.0.  We still see this problem with
> your new dump when we load multiple dump files across two volumes.  
>
> The dump file that is across the volume set will continue to experience
> data loss.

 There may be good reasons for multiple dump files on a single volume.  But
why would you ever want multiple dump files across two volumes.  That seems to
be asking for trouble.  Have you ever thought about how you would do an
interactive restore, when it asks you to load the second volume first -- how
will you correctly reposition the first volume when it is called for?

-- 
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
  Neil W. Rickert, Computer Science               <rickert@cs.niu.edu>
  Northern Illinois Univ.
  DeKalb, IL 60115.                                  +1-815-753-6940

wtm@buengc.BU.EDU (W. Thomas Meier) (12/14/90)

Hi,

Terence removed the files from encore.com:/pub.  Apparently Customer Service
wants to keep a list of sites getting this particular patch.

In reference to my message on multiple dumps to a single volume:  there may
not be a bug after all.

The background is that Operations puts the dump of more than one partition on
a tape volume to save time and resources.  On occasion, more than the average
number of files will be touched so the next incremental may exceed a single
tape volume.  

Last month I was restoring a file from a two volume, multi-partition dump.
My file started on vol 1 and continued to vol 2, or so I thought.  When 
restore complained that it hit the EOT, I mounted the second tape and tried
to continue.  Restore became momentarily confused, lost some data, and then
continued restoring the file.

I concluded that restore had fumbled when in fact it had acted pretty well.
What I didn't know at the time was that Operations forces the incremental 
dump of the effected partition, to start over on a second  tape.  What I 
could have done was to skip volume one and run the restore directly from
volume 2.


_Tom Meier