[comp.sys.atari.st] FILE I/O

jdn@homxc.UUCP (J.NAGY) (11/05/87)

The functions Fseek, fseek (and Fread, fread, Fwrite, fwrite, etc.)
seem to perform essentially equivalent functions.  To do simple file
i/o, I could use the capital F's or the small f's to do essentially
the same things.  Is there a prefered choice?  How do you choose?

My inclination is to use the small f's since I'll end up with code
which is easier to port to a unix environment.  Is there something subtle
going on here that I'm missing?

Thanks,

Jonathan Nagy
{ihnp4|allegra|harvard}!homxc!jdn
(201) 615-4349

ws1i+@andrew.cmu.edu (William Manchester Shubert) (11/09/87)

	As far as I can tell, it's up to you.  The "F" functions are probably
a bit faster, since they're just #defines that jump you right into the system
stuff; the "f" functions, while it is true that they're more compatible with
other environments, will probably be a bit slower since they go through some
translation routines first.  However, any extra time that the "f" functions
take will probably be so small (as in less than a hundred machine
instructions) that it won't matter (especially since the functions take
forever, anyway, since they all need to do disk I/O).  Anyway, there's your
difference.

braner@batcomputer.tn.cornell.edu (braner) (11/09/87)

[]

The GEMDOS Fread(), etc are more similar to the UNIX read(), etc,
NOT fread(), etc.  The raw GEMDOS functions are faster (due to
no buffering) but you should set up your own buffering.  (I found out
that a 9Kbyte buffer is good enough for almost-maximum floppy-disk
performance.)  If you want to write code so that it is easy to port
to UNIX and such, write read(), etc calls in the UNIX syntax, and
set up a bunch of routines with those names that simply translate
to the GEMDOS calls.   (BEWARE of int-vs-long problems!)

- Moshe Bran.]6

stuart@cs.rochester.edu (Stuart Friedberg) (11/09/87)

In article <2023@homxc.UUCP>, jdn@homxc.UUCP (J.NAGY) writes:
> The functions Fseek, fseek (and Fread, fread, Fwrite, fwrite, etc.)
> seem to perform essentially equivalent functions.
> Is there a prefered choice?  How do you choose?

The capital F functions are GEMDOS functions, similar but not
identical to UNIX system calls.  That is, Fread is similar to read,
Fseek is similar to lseek.

The little f functions are STDIO library functions, identical to
standard IO library functions everywhere.  (Or at least as close
as MW could make them.)  That is, fread under MWC on an Atari does
exactly what fread under UNIX on a VAX does.

If you want to write portable code, I suggest using the little f
functions from the standard IO library.  They're standard. :-)

If you need more control over exactly what it going on, use the
capital F functions from GEMDOS.  If you port something that close
to the machine, changing a function name from "Fssek" to "lseek" will
be the least of your problems.

Stu Friedberg  {ames,cmcl2,rutgers}!rochester!stuart  stuart@cs.rochester.edu

jdn@homxc.UUCP (11/09/87)

In article <2862@batcomputer.tn.cornell.edu>, braner@batcomputer.tn.cornell.edu (braner) writes:
> 
> The raw GEMDOS functions are faster (due to
> no buffering) but you should set up your own buffering.  

I don't quite understand this.  If I have to do my own buffering with
Fread, while fread does the buffering for me, then the fread call
appears easier to use.  And since the application has to buffer Fread
calls itself, any speed advantage of Fread (due to no buffering!) is
negated. So I can see no advantage to the Fread.

From your posting, I gather that you use the capital F's rather than the
small f's.  Moshe, could you please clarify this.

Jonathan Nagy
{ihnp4|allegra|harvard}!homxc!jdn
(201) 615-4349

apratt@atari.UUCP (Allan Pratt) (11/10/87)

in article <2023@homxc.UUCP>, jdn@homxc.UUCP (J.NAGY) says:
> 
> The functions Fseek, fseek (and Fread, fread, Fwrite, fwrite, etc.)
> seem to perform essentially equivalent functions.  To do simple file
> i/o, I could use the capital F's or the small f's to do essentially
> the same things.  Is there a prefered choice?  How do you choose?

The cap-F functions are OS calls, and the little-f functions are library
calls.  If you use little-f functions, you are bound to the vagaries
of your library.  In the case of Alcyon's GEMLIB, these vagaries
are actually debilitating bugs.  I don't use GEMLIB's little-f functions.

Among the pitfalls you can encounter are: newline translation and redirection
problems.  The cap-F functions all treat files as bags o' bits, with no
translation or anything.  Many libraries translate \r\n in a file into
\n alone; this takes longer and may not be what you want.

Also note that the parameters are not in the same order:  the guy who
wrote GEMDOS got it wrong relative to UNIX.

============================================
Opinions expressed above do not necessarily	-- Allan Pratt, Atari Corp.
reflect those of Atari Corp. or anyone else.	  ...ames!atari!apratt

braner@batcomputer.tn.cornell.edu (braner) (11/10/87)

[oops...]

What I meant was that reading a big block is a lot faster than reading
char by char using fgetc() and such (getc, getchar).  If you are going
to read a big block you might as well use low level calls (Fread) for
maximum speed.  But I guess you _could_ use fread() as supplied by your
compiler vendor.  Just hope they implemented it efficiently...

One more warning: if you use a compiler with 16-bit int's (as it should be
on a 68000!) you are limited as to the amounts you can specify in the
count arguments for fread(), malloc(), etc.  You _have_ to descend to
the raw OS calls if you need to use a long count.

- Moshe Braner

dillon@CORY.BERKELEY.EDU (Matt Dillon) (11/10/87)

> The raw GEMDOS functions are faster (due to
> no buffering) but you should set up your own buffering.  

	Amusing.  If the raw GEMDOS functions are faster due to not
buffering, why would you want to add buffering?  

	In fact, the raw GEMDOS functions are only faster if you do huge
read/write requests, and even then the difference shouldn't be noticeable.
Try reading a character at a time.... or a line at a time... this is why
buffering exists.

>I don't quite understand this.  If I have to do my own buffering with
>Fread, while fread does the buffering for me, then the fread call
>appears easier to use.  And since the application has to buffer Fread
>calls itself, any speed advantage of Fread (due to no buffering!) is
>negated. So I can see no advantage to the Fread.

	fread() USES Fread().  Using fread() means that more code is imported
from the C link library (or whatever languge you are using).  Fread() is a 
direct OS call.  So if you are an application written in straight assembly,
fread() isn't available to you without setting up the proper enviroment... 
done automatically by the startup code in, say, a C link library.  fread()
would be a function IN that library.  Or, perhaps, you want to save space
and make the executable as small as possible; in that case, you wouldn't 
want to include the X Kbytes of code that stdio takes.

					-Matt

minow@decvax.UUCP (Martin Minow) (11/11/87)

In article <2023@homxc.UUCP> jdn@homxc.UUCP (J.NAGY) asks about the
difference between Fseek and fseek (etc.), noting that they look
quite similar.

The "lower-case" functions match the C stdio library routines.
The "Upper-case" functions are direct calls to the operating system.
(As such, they are closer to lseek(), open(), etc.)  As noted, the
lower-case functions are generally preferable for portability.

The Upper-case functions are the only ones that can be used in a desk
accessory (they don't expand memory).  They are *much* harder to use.
As a public-service, I'm including a small open a file and read a byte
routine.  The code is made ugly as the operating system doesn't seem
to have a notion of "end of file."  (It's extracted from a longer program
and untested as such, so use it as a model only.)

Martin Minow
decvax!minow

-----
#include <osbind.h>
#include <stat.h>

#define FileBufferSize	512
typedef struct {
    int		handle;
    long	filesize;
    long	offset;
    char	*bp;
    char	*ep;
    char	buffer[FileBufferSize];
} FileInfo;

#define FileTell(fid)	((fid)->offset)

#define strchr	index			/* Brain-damaged Software Dev.	*/

FileInfo	InFile;
char		filename[64];		/* Stores file name for open	*/


typeout(infile)
char		*infile;
{
	register int		c;

	if (strchr(infile, ':') == NULL
	 && strchr(infile, '\\') == NULL) {
	    filename[0] = Dgetdrv() + 'a';
	    filename[1] = ':';
	    filename[2] = EOS;
	    Dgetpath(&filename[2], 0);	/* Must use "default" here	*/
	    strcat(filename, "\\");
	    strcat(filename, infile);
	}
	else {
	    strcpy(filename, infile);	/* User specified drive name	*/
	}
	if (FileOpen(&InFile, filename, 0) < 0) {
	    Cconws("Can't open the file\r\n");
	    Pterm(1);
	}
	while ((c = FileGetc(&InFile)) != EOF)
	    Cconout(c);
	Fclose(Infile.handle);
	Pterm(0);
}

int
FileOpen(fid, filename, mode)
register FileInfo	*fid;		/* Store stuff here		*/
char			*filename;	/* Fully-qualified file name	*/
int			mode;		/* Open mode			*/
{
	DMABUFFER	stupid;		/* Needed to get file size	*/
	int		status;

	Fsetdta(&stupid);
	if ((status = Fsfirst(filename, 0)) < 0)
	    return (status);
	fid->filesize = stupid.d_fsize;
	fid->offset = 0;
	fid->bp = fid->ep = NULL;
	fid->handle = Fopen(filename, mode);
	return (fid->handle);
}

int
FileGetc(fid)
register FileInfo	*fid;
{
	register long		readsize;
	register int		result;

	if (fid->offset >= fid->filesize)
	    return (EOF);
	while (fid->bp >= fid->ep) {
	    readsize = fid->filesize - fid->offset;
	    if (readsize > FileBufferSize)
		readsize = FileBufferSize;
	    if ((readsize = Fread(fid->handle, readsize, fid->buffer)) <= 0)
		return (EOF);
	    fid->bp = &fid->buffer[0];
	    fid->ep = &fid->buffer[readsize];
	}
	fid->offset++;
	return(*fid->bp++ & 0xFF);
}

preston@felix.UUCP (Preston Bannister) (11/11/87)

In article <183@decvax.UUCP> minow@decvax.UUCP (Martin Minow) writes:
>In article <2023@homxc.UUCP> jdn@homxc.UUCP (J.NAGY) asks about the
>difference between Fseek and fseek (etc.), noting that they look
>quite similar.
>
>The Upper-case functions ...  are *much* harder to use.
[ text of example deleted ]

There is a much simpler way to find the length of a file.  You simply
do an Fseek to the end of the file.  This means of getting the length
of a file is also usable with Unix and MSDOS (if not always necessary).
(I'm doing this from memory):

void
Example (filename)
  char *filename;
{
  int f, length;
  f = Fopen(name,0);
  if (f > 0)
  {
    length = Fseek(f,0,2);	/* seek to end of file */
    Fseek(f,0,0);		/* seek back to beginning of file */
    /* we now know the size of the file */
    ProcessFile(f,length);	/* for instance */
    Fclose(f);
  }
}
--
Preston L. Bannister
USENET	   :	ucbvax!trwrb!felix!preston
BIX	   :	plb
CompuServe :	71350,3505
GEnie      :	p.bannister

c9c-eh@dorothy.Berkeley.EDU (Warner Young (WHY)) (11/13/87)

In article <2064@homxc.UUCP> jdn@homxc.UUCP (J.NAGY) writes:
>In article <2862@batcomputer.tn.cornell.edu>, braner@batcomputer.tn.cornell.edu (braner) writes:
>> 
>> The raw GEMDOS functions are faster (due to
>> no buffering) but you should set up your own buffering.  
>
>I don't quite understand this.  If I have to do my own buffering with
>Fread, while fread does the buffering for me, then the fread call
>appears easier to use.  And since the application has to buffer Fread
>calls itself, any speed advantage of Fread (due to no buffering!) is
>negated. So I can see no advantage to the Fread.
>

	I wondered about this also, for some time.  Then I decided to
	test it out.  A friend and I each wrote the same program, but
	I used the capital F calls, and he the lower case f calls.  I
	even put more bells and whistles into my program, and the speed
	is significantly higher, even accounting for the fact that I
	have to do my own buffering and managing.

	8K to 16K are the best speeds, to get the most out of your
	floppies.  I haven't done any tests to see how much difference
	buffer size makes on a hard disk (my Supra died!).

						\          /
Disclaimer:  I'm not associated			 \  /\    /arner
	with the latest revision		  \/  \__/
	of SANITY.				         |oung
						     \___|
Last known address: c9c-eh@dorothy.Berkeley.EDU
			or
		    ucbvax!dorothy!c9c-eh

singer@XN.LL.MIT.EDU (Matthew R. Singer) (11/13/87)

In article <12361@felix.UUCP>, preston@felix.UUCP (Preston Bannister) writes:
> 
> There is a much simpler way to find the length of a file.  You simply
> do an Fseek to the end of the file.  This means of getting the length
> of a file is also usable with Unix and MSDOS (if not always necessary).
> (I'm doing this from memory):
> 
> void
> Example (filename)
>   char *filename;
> {
>   int f, length;
>   f = Fopen(name,0);
>   if (f > 0)
>   {
>     length = Fseek(f,0,2);	/* seek to end of file */
>     Fseek(f,0,0);		/* seek back to beginning of file */
>     /* we now know the size of the file */
>     ProcessFile(f,length);	/* for instance */
>     Fclose(f);
>   }
> }
> --
> Preston L. Bannister

This is a VERY slow way to do it. Especially on LARGE files where
the seek is worse than the open.

Why not just use Fsfirst and read the file size out of the DTA?
This works on both Gemdos and MSdos.

Matt Singer

rwa@auvax.UUCP (Ross Alexander) (11/14/87)

Preston Bannister and Matt talk about (to my mind, d*mned baroque) ways
to get the size of a file in the context of the MWC programming environment.

Try stat().  That's what it's for, among other things.  It also works on
Un*x, which to my mind is a far nicer thing to have compatability with than
mushdos.  One trap you are ignoring is: what if the file is a directory?
Nice to have it's attributes, perhaps?

hmmpf grump mutter. (its 5 am and my code *still* doesn't work.  Someone
must pay, and as luck would have it, you two win...)

Ross Alexander @ Athabasca University

preston@felix.UUCP (Preston Bannister) (11/15/87)

>> There is a much simpler way to find the length of a file.  You simply
>> do an Fseek to the end of the file.  This means of getting the length
>> of a file is also usable with Unix and MSDOS (if not always necessary).
>> (I'm doing this from memory):

>This is a VERY slow way to do it. Especially on LARGE files where
>the seek is worse than the open.

The Fseek operation doesn't actually cause any physical disk activity.
All the file system has to do implement Fseek is change it's internal
'position in file' number.  (Assuming that the file system
implementation isn't incredibly brain-damaged :-)

--
Preston L. Bannister
USENET	   :	ucbvax!trwrb!felix!preston
BIX	   :	plb
CompuServe :	71350,3505
GEnie      :	p.bannister