[comp.sys.apollo] Unix eof detection and long filenames under tar.

GBOPOLY1@NUSVM.BITNET (fclim) (05/25/89)

Hi,
    This is a pretty long reply.  My apologies to others.

    In article <8905230413.AA23343@umix.cc.umich.edu>,
CORNELLC.cit.cornell.edu:Jacques_Gelinas@CMR001.BITNET writes

>     First question:   tar and very-very-long-filenames
>[stuff deleted]
>      Can I get these files out of the tape ?
>      (How did prof. Mackay get them on the tape ?)
>[stuff deleted]
>%  tar xf /dev/rct8
>tar: ./tex82/README.WRITE-WHITE - cannot create
>tar: ./DVIware/laser-setters/dvi2adobe_fonts/
>                 StoneInformal-SemiboldItalic.tfm - cannot create
>                           (10 similar lines deleted)

The maximum filename length on Domain/IX at SR9.7 is 32; see the MAXNAMLEN
macro in /usr/include/sys/dir.h.  A file with namelength longer than that
can't be created.

Even though strlen("README.WRITE-WHITE") is < 32, the length of the name
actually stored in the Domain VTOC (their version of UNIX inode) is > 32.
Aegis at Sr9.7 is case-insensitive (eg /COM/SH is no different from /cOm/Sh),
but Unix has always been case-sensitive.  Their workaround is to map upper-
case char to a ':' followed by the lower-case char.  Eg README.WRITE-WHITE
is stored in the VTOC as :r:e:a:d:m:e.:w:r:i:t:e-:w:h:i:t:e which is why
this file can't be extracted.

There are two ways to extract those files:
(1) hack John Gilmore's pd tar.  When the fileNameLength is > MAXNAMLEN,
    then prompt for a new filename or truncate the filename.
(2) get SR10 and use BSD4.3.  MAXNAMLEN should be 1024 (me think).
    Furthermore, Aegis at SR10 is case-insensitive.

Prof MacKay probably created the tape on a Sun box which has MAXNAMLEN set at
1024 (me think again).

>(also shows that BSD4.2 at SR9.7 is compatible with other systems)
What'll you say now?

>     Second question:   Paranoia and (text) eof
>[stuff deleted]
>      Can this be simplified on Apollo BSD4.2 systems ?
>[stuff deleted]
>testeof(iop)
>FILE *iop;
>{       register int c;
>        if (feof(iop))
>                return(TRUE);
>        else { /* check to see if next is EOF */
>                c = getc(iop);
>                if (c == EOF)
>                        return(TRUE);
>                else {
>                        (void) ungetc(c,iop);
>                        return(FALSE);
>}       }       }

The simplest way is to delete all but the else body.  Hence, if the file
has n bytes, then there will be n less tests.  This should work for Domain/IX
at SR9.7 and most probably for BSD4.3 at SR10 or at least when /lib/clib
becomes ANSI-compatible.
However, Harbison and Steele in "C: A Reference Manual" sez that feof()
should be used to check for EOF.

>The 2nd ed. of the K.R. white book ...

The 2nd ed. describe the ANSI definition of C and standard library.  Domain/C
and /lib/clib is not ANSI-compatible at SR9.7.  I suggest you refer to the
manuals provided by Apollo.

>     Third question:   eof and binary files.
>[stuff deleted]
>      Could someone explain to me the line
>            "fgetc returns EOF: Error 0"   ?
>      Why is the first use of fgetc different ?
>      (By permuting the calls to getc and fgetc, you
>       can get other results. This looks weird.)
>[stuff deleted]
>% cat  fgetc.c==getc.c
>/* ------ is  fgetc  "like"  getc  ? -------- */
>  main                          (){
># include        <stdio.h>
>  FILE * datf                   ;
>  int   c                       ;
>
>  datf = fopen("fgetc.dat","w+")                                ;
>
># define    BYTE     0377
>  printf(  "BYTE = %o, (int)(char)BYTE = %o\n",BYTE,(int)(char)BYTE);
>  if(fputc( BYTE, datf)==EOF )    perror("fputc returns EOF")   ;
>  if( putc( BYTE, datf)==EOF )    perror(" putc returns EOF")   ;
>  c = fputc(BYTE, datf )        ; printf("fputc: c = %o\n", c ) ;
>  c =  putc(BYTE, datf )        ; printf(" putc: c = %o\n", c ) ;
>
>  fseek( datf, 0L, 0)                                           ;
>
>  if( (c = fgetc(datf)) == EOF )  perror("fgetc returns EOF")   ;
>  printf("fgetc: c = %o\n", c)                                  ;
>  if( (c =  getc(datf)) == EOF )  perror(" getc returns EOF")   ;
>  printf(" getc: c = %o\n", c)                                  ;
>  if( (c = fgetc(datf)) == EOF )  perror("fgetc returns EOF")   ;
>  printf("fgetc: c = %o\n", c)                                  ;
>  if( (c =  getc(datf)) == EOF )  perror(" getc returns EOF")   ;
>  printf(" getc: c = %o\n", c)                                  ;
>
>  if(fclose(datf))                perror("fclose")              ;
>  system( "od -b fgetc.dat      ; rm -i fgetc.dat" )            ;
>                                }
>% cc !*
>cc fgetc.c==getc.c
>
>% a.out
>BYTE = 377, (int)(char)BYTE = 37777777777
>fputc returns EOF: Error 0
> putc returns EOF: Error 0
>fputc: c = 37777777777
> putc: c = 37777777777
>fgetc: c = 377
> getc: c = 377
>fgetc returns EOF: Error 0
>fgetc: c = 37777777777
> getc: c = 377
>0000000  377 377 377 377
>0000004

Fgetc() is broken.

Getc() is a macro defined in /usr/include/stdio.h as
      #define getc(p)    (--(p)->_cnt >= 0 ? *(p)->_ptr++ & 0377 : _filbuf(p))
Fgetc() and getc() are among the buffered I/O routines.  p->_base points
to the buffer and p->_ptr points to the next byte to be read in.
Normally, getc() will return an int with a value equal to the byte masked
with 0377.  In effect, this returns an unsigned char.
When the buffer is empty, a (undocumented) routine _filbuf() is called to
fill the buffer.  After filling, _filbuf() also returns the next byte as
an unsigned char if there is a next byte.
Otherwise, when the end-of-file has been reached, _filbuf() returns EOF
which is -1 or 0xffff or 0377...7.

Fgetc() is similar to getc() but it is a function and not a macro.
The first time it's called, it returns the value of _filbuf() which is
an unsigned char since eof has not been reached.
The next time fgetc() is called, it should returns an unsigned char or EOF.
(this is the ANSI definition of fgetc()).  However, Domain/IX.SR9.7 fgetc()
returns the next char promoted to an int.
In Domain/C, when a char is promoted to an int, the signed-ness is preserved.
Therefore, 377 (a char -1) is promoted to 377...7 (an int -1).  This int
value is undistinguishable from the EOF -1 value.

No error had occurred.  This is indicated by perror()'s output: "Error 0".

Fgetc() does work consistently except when it needs to call _filbuf() to
fill the buffer.  Normally, it will return the next byte promoted to an int;
or when _filbuf() is called, it return the next byte as an unsigned char.
To illustrate this, let's
      cat fgetc.dat fgetc.dat fgetc.dat fgetc.dat > foo
Foo has 12 bytes of 377.  When we run
      f = fopen("foo", "r");
      for (i = 0; i < 12; i++)
          printf("%o\n", fgetc(f));
we'll get
      377
      377...7    \
      377...7     |__ 11 times
        ...       |
      377...7    /

By default, I/O is buffered with a 1024 bytes buffer.  We can change this
by
      char buf[4];
      f = fopen("foo", "r");
      (void) setbuffer(f, buf, sizeof(buf));
      for (i = 0; i < 12; i++)
          printf("%o\n", fgetc(f));
Now, we'll get
      377       \
      377...7    |__ pattern
      377...7    |
      377...7   /
      377
      377...7
      377...7
      377...7
      repeated 2 more times.
Here, we have a 4 bytes buffer, so _filbuf() is called every 4 bytes.

>     Last question:   default  cc  flags
>All the machines we have are DN3000 or DN4000. Why is it necessary
>to specify the  -M3000  flag for the cc compiler? The RT/11 operating
>system permitted me -in 1979- to customize the compilers by setting
>some switches (like the number of lines per page for listings).
>Can this be done also at installation time for the Apollo system?

Don't know why -M3000 is needed.
You can edit the Makefile and add -M3000 to CFLAGS.

Hope this helps.  :-)

fclim          --- gbopoly1 % nusvm.bitnet @ cunyvm.cuny.edu
computer centre
singapore polytechnic
dover road
singapore 0513.

krowitz@RICHTER.MIT.EDU (David Krowitz) (05/25/89)

The -M3000 switch to /bin/cc is the same as the -CPU 3000
switch to /com/cc. It tells the compiler that the code
being generated will be run on a machine with a
Motorola 68020 or 68030 processor and a 68881 or 68882
floating point chip (or an FPX or FPA floating point
accelerator option running in 68881 emulation mode), and
the the code does not have to be downwards compatible
with the older 68010 based machines (ie. the DN300/320,
the DSP80/80A, the DN400/420/600, and the DN460/660).
By default, the compiler will generate code which can
be run on all of the Motorola based Apollo nodes (ie.
everything except the DN10000). This means using on
(only) 68010 integer instructions and performing all
floating point arithmetic using system calls (since some
of the earlier machines had no floating point processors).

Code compiled with the -M3000 / -CPU 3000 switch will
run quite a bit faster, especially if it does a lot of
floating point arithmetic, and it will execute on any of
the following machines:

DSP90, DN330, DN560/570/580/590, DN570-T/580-T/590-T,
DN3000/3500/4000/4500.


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter@eddie.mit.edu
krowitz%richter@athena.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

krowitz@RICHTER.MIT.EDU (David Krowitz) (05/25/89)

Actually, the maximum file name length under SR10 is 255
characters (the 256th character is a null terminator), and
the maximum length of the entire pathname (ie. including
all of the subdirectory names and the '/' characters) is
1023 characters (again, the 1024th character is a null
terminator).


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter@eddie.mit.edu
krowitz%richter@athena.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)