[comp.os.research] Toy Experiment with Shared Memory Programming

jms@central.cis.upenn.edu (Jonathan M. Smith) (11/09/90)

Some Notes on Experiences programming shared memory

1. Purpose
I, and others, have claimed that using shared memory support for
programming can reduce the copying required in IPC. Being a "doubting
Thomas", I decided to play a little bit and try to apply shared memory
to a task for which other mechanisms (highly optimized) exist. Specifically,
on a System V UNIX system, there exist both pipes and access to shared
memory facilities. In my experiments, I employed a HP9000 Series 370 with
16 megabytes of main memory, running HPUX version 6.5 (deans.cis.upenn.edu).
The system runs diskless except for a 70meg swap device, and my home
directory is on "central.cis.upenn.edu". 

2. Experimental Description
The experiment was as follows: 
	1. Write a two-process program using pipes to communicate
	between reader and writer.
	2. Rewrite the program using shared memory for the same
	communication.
	3. Compare writing effort.
	4. Measure the performance of the two implementations.

3. Pipes
"piper.c" follows:
#include <stdio.h>

#ifndef PIPESIZ
#define PIPESIZ 8192
#endif

main()
{
	int pfd[2], pid, fork(), nread, nwrite;
	char buf[PIPESIZ];

	if( pipe( pfd ) < 0 )
	{
		fprintf( stderr, "Can't open pipe. Exiting.\n" );
		exit( 1 );
	}
	
	switch( (pid = fork()) )
	{
	case -1:
		fprintf( stderr, "Can't fork. Exiting.\n" );
		exit( 2 );
		break;

	case 0: /* child */
		close( pfd[1] );
		while( (nread = read( pfd[0], buf, PIPESIZ ) ) > 0 )
		{
			nwrite = write( 1, buf, nread );
			if( nread != nwrite )
			{
				fprintf( stderr, "IO error. Exiting.\n" );
				exit( 3 );
			}
		}

		break;

	default: /* parent */
		close( pfd[0] );
		while( (nread = read( 0, buf, PIPESIZ ) ) > 0 )
		{
			nwrite = write( pfd[1], buf, nread );
			if( nread != nwrite )
			{
				fprintf( stderr, "IO error. Exiting.\n" );
				exit( 4 );
			}
		}

	}

	exit( 0 );
}

The program was written as it is displayed, and was only changed to
make the buffer size PIPESIZ so that the hidden buffering of pipe data
provided by UNIX did not distort measurements. The size had originally been
set to BUFSIZ. The program took about 15 minutes to type in and remove errors
(mainly lexical). I am an experienced UNIX programmer, and IPC by pipe is the
idiom, so this is to be expected. The program (if you're counting) is about
55 lines of source text.

4. Shared Memory
I tried to keep the same model, that of a parent process reading from the
standard input file, dumping data to a child which writes it to the standard
output. Thus, in their IPC relationship, the parent is the "WRITER" and the
child is the "READER". The shared data structure was a buffer+counter, and
was set up to be the same size as in "piper.c". I realized that I needed 
mutual exclusion, and had to give it some thought. At first, I envisioned
using the memory itself, but one problem seemed to be that C provides no
access to an atomic "test-and-set" style instruction, and with preemptive
multiprocessing, this is obviously necessary. Having used semaphores while
writing code for my thesis, I was attracted to this mechanism for mutual
exclusion, and the HPUX kernel provides semaphores. So I had to add code to
	A. allocate the shared memory for the buffer
	B. allocate the semaphore
	C. initialize the semaphore
	D. perform mutual exclusion when accessing the shared buffer
	E. explicitly deallocate the semaphore and shared memory on
	   completion
I was able to subtract code to
	A. create the pipe
	B. read/write from/to the pipe

In some initial trials, I found that the shared memory program ("mem")
performed far worse than "piper". This surprised me, so I profiled the code
to find out where the time was going, and it was being eaten up by the
calls to "semop()", which are used to implement P() and V(). But the cost
of the calls was not the problem; it was the *number* of calls! For a file
(my "mbox") with 184 8192-byte blocks, about 30-40 K calls to semop were
made! This didn't sem possible from examining the code; and I must admit to
a lot of hacking trying to "fix" the problem, which I was sure was a BUG.

The problem was that pipes are well-integrated with process synchronization
and scheduling. While executing a "V()" operation freed the semaphore, it
did not give up the processor, so that the process which had just executed
the "V" could now execute a "P" again immediately. After I saw this, I quickly
rewrote the program with two semaphores so that better control over process
control (such as in "sleep" and "wakeup") and response to events could be
achieved.

The correct (and reasonably fast) "mem.c" is given below:

#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/sem.h>

#ifndef PIPESIZ
#define PIPESIZ 8192
#endif

struct shared_buf {
	char buf[PIPESIZ];
	int count;
};

#define READ 0
#define WRITE 1
#define NSEM 2

int Semid;
struct sembuf Sops[1];

get_sema()
{
	if( (Semid = semget( IPC_PRIVATE, NSEM, 0600 )) < 0 )
	{
		fprintf( stderr, "get of sema failed. Exiting.\n" );
		exit( 1 );
	}
}

set_sema()
{
	int i;

	for( i = 0; i < NSEM; i += 1 )
	{
		if( semctl( Semid, i, SETVAL, 1 ) < 0 )
		{
			fprintf( stderr, "set of sema failed. Exiting.\n" );
			exit( 1 );
		}
	}
	
	return;
}

P( sema )
int sema;
{ 
	Sops[0].sem_flg = 0;
	Sops[0].sem_num = sema;
  	Sops[0].sem_op = -1;
	semop( Semid, Sops, 1 );

	return;
}

V( sema )
int sema;
{
	Sops[0].sem_flg = 0;
	Sops[0].sem_num = sema;
	Sops[0].sem_op = 1;
	semop( Semid, Sops, 1 );

	return;
}

main()
{
	int shmid, pid, fork(), shmget();
	struct shared_buf *sbp;
	char *shmat();

	if( (shmid = shmget(IPC_PRIVATE, sizeof( *sbp ), 0600) ) < 0 )
	{
		fprintf( stderr, "Can't get shared mem. Exiting.\n" );
		exit( 1 );
	}
	sbp = (struct shared_buf *) shmat( shmid, (char *) 0, 0 );
	if( sbp == (struct shared_buf *) -1 )
 	{
		fprintf( stderr, "Can't attach shared mem. Exiting.\n" );
		exit( 1 );
	} 
	get_sema();
	set_sema();
	P( READ );
	
	switch( (pid = fork()) )
	{
	case -1:
		fprintf( stderr, "Can't fork. Exiting.\n" );
		exit( 2 );
		break;

	case 0: /* child */
		do
		{
			P( READ );
			write( 1, sbp->buf, sbp->count );
			V( WRITE );
		} while( sbp->count > 0 );

		/* clean-up */
		shmdt( sbp );
		shmctl( shmid, IPC_RMID, 0 );
		semctl( Semid, 0, IPC_RMID, 0 );
		
		break;

	default: /* parent */
		do
		{
			P( WRITE );
			sbp->count = read( 0, sbp->buf, PIPESIZ );
			V( READ );
		} while( sbp->count > 0 );
	}

	exit( 0 );
}

This program takes about 130 lines of code, although the length of code 
inside main() is about the same. The main processing loops are much more
elegant, but the setup is uglier. In order to force the WRITER (parent) to
go first, the child is blocked by calling P( READ ) in the parent, thus
blocking the child until the writer calls V( READ ) after it has read data
into the buffer. It took me quite a long time (4-7 hours, depending on how
you count, spread over several days) to get the program working, although I
am now sure I could apply the paradigm effectively in a much shorter period
of time, since I understand the details. Note that a program with multiple
readers and writers would require a semaphore for mutual exclusion as well as
mechanism for scheduling. (I originally included such a semaphore, MUTEX,
but discarded it after I concluded it was unnecessary here).

5. Comparative Measurements
The programs take the same time!
My "mbox" file is a 1.5M text file.
The programs were compiled with optimization turned on,
and run using the file as input. Output was /dev/null, as is shown below:

$ time ./piper <mbox >/dev/null

real    0m4.04s
user    0m0.00s
sys     0m0.78s
$ time ./piper <mbox >/dev/null

real    0m3.96s
user    0m0.02s
sys     0m0.70s
$ time ./mem <mbox >/dev/null  

real    0m3.50s
user    0m0.08s
sys     0m0.62s
$ time ./mem <mbox >/dev/null

real    0m3.46s
user    0m0.00s
sys     0m0.66s

The "mem" programs seem to be slightly faster, but there is no substantial
difference. Once again, I profiled, (using PROFDIR=. in the environment, so
that I could examine the profiles from the child and parent separately); here
are the top few lines for the parent:

$ prof -m2765.mem mem
 %Time Seconds Cumsecs  #Calls   msec/call  Name
  78.8    0.52    0.52     184      2.83    _read
  15.2    0.10    0.62     369      0.27    _semop
   3.0    0.02    0.64       1     20.      _main
   3.0    0.02    0.66       1     20.      _fork
   0.0    0.00    0.66     185      0.00    _P

and the child:

$ prof -m2766.mem mem
 %Time Seconds Cumsecs  #Calls   msec/call  Name
  72.0    0.16    0.16     369      0.43    _semop
  18.0    0.06    0.22     185      0.32    _write
   0.0    0.00    0.22       1      0.      _main
   0.0    0.00    0.22     185      0.00    _P

(there was a bogus timing for fork() here, and
there are of course many lines of calls taking no time edited out
for the reader's sanity....)

For piper, the parent's profile was:
$ prof -m2779.piper piper
 %Time Seconds Cumsecs  #Calls   msec/call  Name
  58.1    0.50    0.50     184      2.72    _read
  37.2    0.32    0.82     184      1.74    _write
   2.3    0.02    0.84       1     20.      _main
   2.3    0.02    0.86       1     20.      _pipe

and the child's:
$ prof -m2780.piper piper
 %Time Seconds Cumsecs  #Calls   msec/call  Name
  82.0    0.36    0.36     184      1.96    _read
   9.0    0.04    0.40     184      0.22    _write
   4.5    0.02    0.42       1     20.      _main
   4.5    0.02    0.44       1     20.      _pipe


Thus, we see that it's the I/O (read+write=.58) and semaphores (.26)
that are eating up the time; the shared memory version looks faster
(.84 versus 1.30), and IPC is definitely faster; subtracting I/O due to
external sources, gives .30 versus .72, or more than a factor of two.
I will try to cook up an experiment a little later
to get timings which are based only on the IPC time.

6. Summary (so far)
But it is helpful to observe the following facts:
1. IPC may not be the dominant cost in any case (here, it's external I/O)
2. There was no major difference in the response times of "mem" and "piper"
3. This may be a best case comparison for pipes - pipes are highly optimized
on UNIX and pipe flow control is tightly integrated with scheduling and
process control.

In fact, I suspect that fact 3 is the most reasonable explanation for fact 2,
given the profile results.

pardo@cs.washington.edu (David Keppel) (11/11/90)

jms@central.cis.upenn.edu (Jonathan M. Smith) writes:
>[Programming with pipes vs. shared memory]

Some other lessons from the experiment:
* Programming with good, familiar primitives is always easier.
* Never confuse MODEL with IMPLEMENTATION -- there are pipes that
  are implemented using shared memory (Mach, Topaz) and shared memory
  implemented using pipes (more or less, anyway; Ivy).
* The results are always different on a multiprocessor: the
  shared-memory version has a simple implementation that (after setup)
  makes no kernel calls and does no copying -- at least on some
  platforms.
* There is also a pipe model, implemented using shared memory, that
  runs on (some) multiprocessors with no kernel calls and no copying.
* Simple, understandable examples are almost controversial :^)

Thank you for this opportunity to generalize...

    ;-D on  ( General Semaphore?  This is Private Memory.  Sir! )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

ajudge@maths.tcd.ie (Alan Judge) (11/13/90)

In <8754@darkstar.ucsc.edu> jms@central.cis.upenn.edu (Jonathan M. Smith) writes:
>Some Notes on Experiences programming shared memory

Just out of interest, I thought that I would try piper and mem on a
different machine type.  It should give a better idea of the effects of
differing kernel implementations on the performance.

Below are the results of piper and mem run on a DECsystem 5810 (MIPS
based) running Ultrix 3.1C.  The system has 32Mb of memory and is
running from local disks.  The size of the buffer cache should remove
the effects of the disks after the first execution.  /vmunix, the test
file used, is 2.8Mb in size.

parade 1% time pipe < /vmunix > /dev/null
0.0u 2.0s 0:05 39% 3+4k 1+0io 0pf+0w av:8k m:4k refpf: 0
parade 2% time pipe < /vmunix > /dev/null
0.0u 2.0s 0:02 67% 3+4k 0+0io 0pf+0w av:8k m:4k refpf: 0
parade 2% time mem < /vmunix > /dev/null
0.0u 1.1s 0:01 80% 3+2k 3+1io 0pf+0w av:6k m:5k refpf: 0
parade 3% time mem < /vmunix > /dev/null
0.0u 1.1s 0:01 91% 3+2k 0+0io 0pf+0w av:6k m:5k refpf: 0

It is interesting to note that, in this system, shared memory is
almost twice as fast as pipes.
-- 
Alan Judge   ajudge@maths.tcd.ie  a.k.a. amjudge@cs.tcd.ie +353-1-772941 x1782

File names are infinite in length where infinity is set to 255 characters.
                                  - Peter Collinson, "The Unix File System"

bdf@rex.cs.tulane.edu (Brett Fleisch) (11/14/90)

In article <8754@darkstar.ucsc.edu> jms@central.cis.upenn.edu (Jonathan M. Smith) writes:
>Some Notes on Experiences programming shared memory
>

>In some initial trials, I found that the shared memory program ("mem")
>performed far worse than "piper". This surprised me, so I profiled the code
>to find out where the time was going, and it was being eaten up by the
>calls to "semop()", which are used to implement P() and V(). But the cost
>of the calls was not the problem; it was the *number* of calls! For a file
>(my "mbox") with 184 8192-byte blocks, about 30-40 K calls to semop were
>made! This didn't sem possible from examining the code; and I must admit to
>a lot of hacking trying to "fix" the problem, which I was sure was a BUG.
>
>The problem was that pipes are well-integrated with process synchronization
>and scheduling. While executing a "V()" operation freed the semaphore, it
>did not give up the processor, so that the process which had just executed
>the "V" could now execute a "P" again immediately. After I saw this, I quickly
>rewrote the program with two semaphores so that better control over process
>control (such as in "sleep" and "wakeup") and response to events could be
>achieved.
>

I observed a similar problem in Mirage.  Why not read over the article
"Mirage: A Coherent Distributed Shared Memory Design" in SOSP12 noting the 
yield() instruction which was added to better control UNIX synchronization
and scheduling interactions.

If you have questions contact me:

Brett D. Fleisch
Assistant Professor
Computer Science Department
Tulane University
301 Stanley Thomas Hall
New Orleans, LA 70118-5698

(504)865-5840
bdf@cs.tulane.edu
-- 
brett@cs.tulane.edu

aglew@crhc.uiuc.edu (Andy Glew) (11/14/90)

..> Pipes versus shared memory, spinning on a non-running processor,
..> and scheduling.  [jms@central.cis.upenn.edu (Jonathan M. Smith)]

>I observed a similar problem in Mirage.  Why not read over the article
>"Mirage: A Coherent Distributed Shared Memory Design" in SOSP12 noting the 
>yield() instruction which was added to better control UNIX synchronization
>and scheduling interactions.
>[Brett D. Fleisch]

The yield() operation sounds very much like the suspend()/resume()
facility found in many real-time executives, which I put into Gould
UTX-32/RT. As I recall it took me 45 minutes to prototype - it was
essentially a variant of SIGSTOP, with much reduced overhead - and
maybe a man-month for someone else to productify, after the fact 
design, etc.

Suspend()/resume(), and the combined suspres(), were low overhead
system calls to suspend the current process, and start another process
running. So, in the shared memory pipe example, the producer would
suspend itself and start the consumer when the buffer is full, and the
consumer would vice versa when the buffer is empty.

By the way, some UNIX pipe implementations provide another feature:
they provide a priority boost PPIPE to a process made runnable by a
pipe.
--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]

jesup@cbmvax.cbm.commodore.com (Randell Jesup) (01/26/91)

In article <8754@darkstar.ucsc.edu> jms@central.cis.upenn.edu (Jonathan M. Smith) writes:
>I, and others, have claimed that using shared memory support for
>programming can reduce the copying required in IPC. 

	I agree that it can, and have some proof (albeit from another OS).
I rewrote your examples for AmigaDos (which is based on lightweight processes).
I also coded up a null version (reads from input, writes to output) to 
subtract out the IO overhead reading and writing the input/output channels.  I
ran all IO to and from the ramdisk to reduce FS overhead (the ramdisk does
about 3 to 5Mb/sec for read/write on this machine).

	Times for a 2.5Mb file were (averaged over several runs):
null: 0.60 sec
msg:  0.70 sec
pipe: 2.92 sec

	So, it looks like .10 seconds for msg, versus 2.32 seconds for
pipes.  I think 23:1 is a pretty clear win for shared memory.

	Note that on a system built around shared memory, there's no need
to reinvent the wheel in order to have processes share memory.  I simply
pass messages back and forth telling the child the buffer is available and
how many bytes are in it.  Messages are shared and the entire OS is built
on the message primitives (GetMsg, PutMsg, ReplyMsg, WaitPort).  It took
no longer for me to write the shared-memory version than to write the
pipe-version (AmigaDos uses named pipes).  BTW, PutMsg in AmigaDos merely
links a message into the list of messages waiting at a port, and does
whatever notification is needed (usually a signal to a process).

	For those that care, this is an Amiga 3000, 25mhz '030, 8 meg of
on-board 80ns SCRAM, 8 meg of expansion bus ram (Zorro-III bus), about 1/2
as fast as the on-board ram.  All execution was out of the on-board ram.
The version AmigaDos is 2.0.

>6. Summary (so far)
>But it is helpful to observe the following facts:
>1. IPC may not be the dominant cost in any case (here, it's external I/O)

	In many cases, true.  However, extensive use of shared-mem techniques
can cause large improvements in OS/application performance, especially
on a system where most IO is handled by FileSystem processes and device
processes (i.e. not running as part of a kernel, nor special in any way).
This can be quite important in interactive processes, especially graphical
ones where any delay in response is noticable and annoying to the user.

>2. There was no major difference in the response times of "mem" and "piper"

	In my case there was (including FS overhead, it's still 3:1).

>3. This may be a best case comparison for pipes - pipes are highly optimized
>on UNIX and pipe flow control is tightly integrated with scheduling and
>process control.

	And of course AmigaDos is optimized for fast task-switching (where
most of that .1 seconds went), and fast messaging.  Pipes are just another
user-mode filesystem.  Most Unix systems have more state to switch (and
caches to flush: with LW processes no flush is needed).

	Here's the source code for those that are interested:

/* pipe.c */
#include <dos/dos.h>
#include <dos/dostags.h>
#include <clib/dos_protos.h>
#include <pragmas/dos_pragmas.h>
#include <stdio.h>

#define PIPESIZ 8192
#define PIPENAME "Pipe:bar"

extern struct DosLibrary *DOSBase;

struct Process *parent;
int sig;
void child();

struct TagItem tags[] = {NP_Entry,child,
			 NP_Output,NULL,
			 NP_CloseOutput,FALSE,
			 TAG_END,0 };

char buffer[PIPESIZ];

int main (argc,argv)
{
	BPTR pipe,input;
	struct Process *p;
	LONG len;

	parent = (struct Process *) FindTask(0);
	input  = Input();

	if (!(pipe = Open(PIPENAME,MODE_NEWFILE)))
	{
		printf("Can't open pipe!\n");
		return RETURN_ERROR;
	}

	if ((sig = AllocSignal(-1)) == -1)
	{
		printf("Can't get signal?\n");
		Close(pipe);
		return RETURN_ERROR;
	}
	SetSignal(0,1<<sig);		/* clear signal, paranoia */

	tags[1].ti_Data = Output();	/* fill in NP_Output */
	p = CreateNewProc(tags);
	if (!p)
	{
		printf("Can't create process!\n");
		FreeSignal(sig);
		Close(pipe);
		return RETURN_ERROR;
	}

	while ((len = Read(input,buffer,PIPESIZ)) > 0)
	{
		Write(pipe,buffer,len);
	}
	if (len < 0)
		PrintFault(IoErr(),"Error on Read:");

	Close(pipe);
	Wait(1<<sig);			/* wait for child to exit */
	FreeSignal(sig);

	return RETURN_OK;
}

char child_buffer[PIPESIZ];

void child ()
{
	BPTR pipe,output;
	LONG len;

	output = Output();

	if (!(pipe = Open(PIPENAME,MODE_OLDFILE)))
	{
		Signal(parent,1<<sig);
		return;
	}

	while ((len = Read(pipe,child_buffer,PIPESIZ)) > 0)
	{
		Write(output,child_buffer,len);
	}

	Close(pipe);
	Signal(parent,1<<sig);
}


/* msg.c */
#include <dos/dos.h>
#include <dos/dostags.h>
#include <clib/dos_protos.h>
#include <pragmas/dos_pragmas.h>
#include <clib/exec_protos.h>
#include <pragmas/exec_pragmas.h>
#include <stdio.h>

#define PIPESIZ 8192
#define PIPENAME "Pipe:bar"

extern struct ExecBase *SysBase;
extern struct DosLibrary *DOSBase;

struct Process *parent;
LONG sig;
void child();

struct TagItem tags[] = {NP_Entry,child,
			 NP_Output,NULL,
			 NP_CloseOutput,FALSE,
			 TAG_END,0 };

char buffer[PIPESIZ];

int main (argc,argv)
{
	struct Process *p;
	struct Message msg;
	struct MsgPort *port;
	BPTR input;
	LONG len;

	parent = (struct Process *) FindTask(0);
	input  = Input();

	if ((sig = AllocSignal(-1)) == -1)
	{
		printf("Can't get signal?\n");
		return RETURN_ERROR;
	}
	SetSignal(0,1<<sig);		/* clear signal, paranoia */

	tags[1].ti_Data = Output();	/* fill in NP_Output */
	p = CreateNewProc(tags);
	if (!p)
	{
		printf("Can't create process!\n");
		FreeSignal(sig);
		return RETURN_ERROR;
	}

	Wait(1<<sig);			/* wait for child to initialize */
	port = FindPort(PIPENAME);
	if (!port) {
		printf("No port!\n");
		FreeSignal(sig);
		return RETURN_ERROR;
	}

	msg.mn_ReplyPort = &(parent->pr_MsgPort);

	while ((len = Read(input,buffer,PIPESIZ)) > 0)
	{
		msg.mn_Length = len;
		PutMsg(port,&msg);

		WaitPort(&(parent->pr_MsgPort));	/* wait for reply */
		GetMsg(&(parent->pr_MsgPort));
	}
	if (len < 0)
		PrintFault(IoErr(),"Error on Read:");

	msg.mn_Length = 0;		/* tell child to exit */
	PutMsg(port,&msg);
	WaitPort(&(parent->pr_MsgPort));
	GetMsg(&(parent->pr_MsgPort));

	Wait(1<<sig);			/* wait for child to exit */
	FreeSignal(sig);

	return RETURN_OK;
}

void child ()
{
	struct MsgPort *port;
	struct Message *msg;
	BPTR output;

	if (!(port = CreateMsgPort()))
	{
		Signal(parent,1<<sig);
		return;
	}
	port->mp_Node.ln_Name = PIPENAME;
	AddPort(port);			/* add to system list */

	Signal(parent,1<<sig);		/* tell parent to wak up */

	output = Output();
	while (1)
	{
		WaitPort(port);
		msg = GetMsg(port);

		if (msg->mn_Length) {
			Write(output,buffer,msg->mn_Length);
			ReplyMsg(msg);
		} else {
			ReplyMsg(msg);
			break;	/* EOF */
		}
	}

	RemPort(port);
	DeleteMsgPort(port);
	Signal(parent,1<<sig);
}

/* null.c */
#include <dos/dos.h>
#include <dos/dostags.h>
#include <clib/dos_protos.h>
#include <pragmas/dos_pragmas.h>
#include <stdio.h>

#define PIPESIZ 8192

extern struct DosLibrary *DOSBase;
char buffer[PIPESIZ];

int
main (argc,argv)
{
	BPTR output,input;
	LONG len;

	input  = Input();
	output = Output();

	while ((len = Read(input,buffer,PIPESIZ)) > 0)
	{
		Write(output,buffer,len);
	}
	if (len < 0)
		PrintFault(IoErr(),"Error on Read:");

	return RETURN_OK;
}