[comp.os.vms] 2 C questions

STEINBERGER@KL.SRI.COM (Richard Steinberger) (09/13/87)

    I have 2 questions concerning VAX C.  Thanks in advance to anyone who can
help.

Question 1: I wrote a short main routine (see below) to test a function.  One
of the inputs is an integer number, and the next input is a filename.  The
problem is that the scanf that reads the number apparently leaves a LF
character in a buffer that is then read by the code that is expecting the
filename.  Because it's a LF, I never get a chance to enter a filename (see
code).  My "kludgy" solution was to put the line "i = getchar()" after the
scanf to remove the LF character; when this is done the following lines that
get a filename work fine.  Am I missing something fundamental?  I've tried
using gets after the scanf and the result is the same, i.e. unless the "i =
getchar()" is there to remove the LF, gets doesn't "work" either.  Why does
scanf leave a LF character, or am I misinterpreting what's going on?  Are
there solutions other than using a getchar() call after a scanf that preceeds a
gets or getchar call?  The relevant code section is below:


	printf("\nEnter a string size: ");
	scanf("%d",&n_chars);
/*
                             clear NEWLINE char left after scanf 
*/
	i = getchar();          /* WHY DO I NEED THIS ? */

	printf("\nEnter a file to use: ");
	for (i=0; (input_file[i] = getchar()) != '\n'; i++);
        input_file[i] = '\0';

______________________________________________________________________________

Question 2: I am trying to read and write binary files similar to ones 
created by FORTRAN open statements (that produce sequential, unformatted,
fixed recordsize files).  Specifying "rb" in an fopen statement seems to
be enough to access existing files (using an fread call).  If I want to
create a new binary file with fixed-length records, which if any, of
the keywords on p 4-5 of the C RTL Ref Man are appropriate? Is "rfm=fix"
enough?  Do I need "mrs=size" as well, and if so, is size in units of bytes,
or longwords (like Fortran)?

Also, is there any reason to use the chapter 4 open and read or write
functions instead of the chapter 2 functions (fopen, fread and fwrite)?
______________________________________________________________________________


Ric Steinberger
steinberger@kl.sri.com
-------

leichter@VENUS.YCC.YALE.EDU.UUCP (09/17/87)

	Question 1: I wrote a short main routine ... to test a function.  One
	of the inputs is an integer... and the next a filename....  [T]he
	scanf that reads the number apparently leaves a LF character in a
	buffer that is then read by the code that is expecting the filename....
	Why does scanf leave a LF character, or am I misinterpreting what's
	going on?  Are there solutions other than using a getchar() call? ...

		printf("\nEnter a string size: ");
		scanf("%d",&n_chars);
	/*
	                             clear NEWLINE char left after scanf 
	*/
		i = getchar();          /* WHY DO I NEED THIS ? */

		printf("\nEnter a file to use: ");
		for (i=0; (input_file[i] = getchar()) != '\n'; i++);
	        input_file[i] = '\0';

You've been bitten by a very common Unix programming "gotcha'".  The problem
is that you are thinking of scanf() as operating on lines - a seemingly
natural way to look at things, because after all you are typing one value
per line.  Unfortunately, that's NOT the way scanf() is actually defined; it
reads streams of bytes, and attaches no special significance to the end of
an input line - as far as it is concerned, '\n' is just another whitespace
character.  You've provided nothing to "swallow" that whitespace character,
so it remains in the buffer, ready to screw up the next input request.

There are two work-arounds.  A direct, strange-looking, and not very good
technique is to provide something to "swallow" the newline.  Since a SPACE in
the format specification matches zero or more whitespace characters in the
input, you need merely change your scanf() call to:

		scanf("%d ",&n_chars);
	
(Actually,

		scanf("%d\n",&n_chars);
	
is equivalent and looks a bit better.)  The big problem with this technique
is that it will keep reading input until it sees a non-whitespace character
(which it will unget()).  This is fine for reading files, terrible for inter-
active input.  So, scratch the direct approach.

The RIGHT way to solve this problem is to forget about scanf() entirely.  You
want line-at-a-time parsing, which scanf doesn't give you - but you can easily
build it:  Use gets() or fgets() to read a line, then use sscanf() to parse
it.  (In fact, my advice - it's not just mine, my Unix-oriented officemate
agrees - is that you'll probably be happiest if you take your manual and ink
out the entries for scanf() and fscanf().  Actually, he says he wouldn't be
at all unhappy if sscanf() suddenly vanished at the same time....)

	Question 2: I am trying to read and write binary files similar to ones
	created by FORTRAN open statements (that produce sequential,
	unformatted, fixed recordsize files).  Specifying "rb" in an fopen
	statement seems to be enough to access existing files (using an fread
	call).  If I want to create a new binary file with fixed-length
	records, which if any, of the keywords on p 4-5 of the C RTL Ref Man
	are appropriate? Is "rfm=fix" enough?  Do I need "mrs=size" as well,
	and if so, is size in units of bytes, or longwords (like Fortran)?

Yes, you must specify "mrs=512" as well.  (The size is in bytes.)  You may
also need to specify the carriage control attributes (as none, probably) -
check a full directory listing of a FORTRAN file.  Note:  You can specify one
setting per argument to open/fopen.  That is:

		fopen("foo","w","rfm=fix","mrs=512");	/* Correct	*/
		fopen("foo","w","rfm=fix,mrs=512");	/* **WRONG**	*/

While the documentation says this, it's easy to misunderstand. 

	Also, is there any reason to use the chapter 4 open and read or write
	functions instead of the chapter 2 functions (fopen, fread and
	fwrite)?

open/read/write may be somewhat faster; I have no idea if the difference would
be noticeable.  I'd probably use them for clarity - I use the Chapter 2
functions when I want their "added value".  In using the Chapter 4 ones, I'm
saying that I'm NOT thinking of the files as Unix-style byte streams.

							-- Jerry
------

R022DB3L@VB.CC.CMU.EDU.UUCP (09/18/87)

For Richard Steinberger's query:

>        I wrote a short main routine (see below) to test a function. One
> of the inputs is an integer number, and the next input is a filename.
> The problem is that the scanf() that reads the number apparently leaves
> a LF character in a buffer that is then read by the code that is
> expecting the filename.       ..... 
> 					      Why does scanf leave a LF
> character, or am I misinterpreting what's going on?  Are there solutions
> other than using a getchar() call after a scanf() that preceeds a gets()
> or getchar() call?  ....

I've never actually programmed in Vax C, but just from my general C work,
I would expect this to be the case, according to the definition of SCANF.
Basically, your call to scanf is causing an I/O request, to which you
enter the string size and press RETURN.  Thus, your input buffer contains
the number followed by a newline.

You're asking scanf to read a decimal input field into an integer
variable.  According to scanf (or at least the definition of it that I
have handy), an "input field" is:

    * All characters up to **(but not including)** the next whitespace
      character
    * All characters up to the first one that cannot be converted under
      the current format specification (such as an 8 or 9 under octal
      format)
    * Up to 'n' characters where 'n' is the specified field width.

"Whitespace" is defined as one of the characters blank ( ), tab (\t) or
**newline (\n)**.

Therefore, when you do a scanf("%d",&variable), scanf simply parses through
the input until it hits the whitespace character (the newline).  The number
it reads gets assigned into the variable, and according to the first rule
above, the whitespace character is not parsed, and therefore stays in the input
buffer for the next call to scanf (or any function accessing the standard
input stream).

As far as a solution, how about just changing your scanf to:

	scanf("%d\n",&variable)

(Actually, according to scanf, if there is any whitespace in the format
 string, it will scan over as much whitespace as necessary in the input at
 that point, so I suppose technically, you could use a space or \t in
 place of the \n above since they're all whitespace, but the \n seems logical).

I do presume however, that even the above code would run into problems if the
user were to input a string like "10    20{RETURN}", since the scanf would
stop at the space, and you'd still have the rest of the string in the buffer
for the next scanf.  Perhaps an even better solution would be:

	scanf("%d%*[^\n]\n",&variable)

What this code will do is read an integer, followed by a string which will
be ignored (due to the *), the string consisting of the series of characters
in the input which aren't a newline.  The [] construct is a character search
set specifier - since the first character is ^, it's a inverted set, matching
any characters not listed.  Effectively, this will skip over any extra
characters on the input up to the newline, and then the \n in the format
string will read the newline out of the buffer.

Please note that I haven't actually had the time to try out these function
calls, but I used function calls very similar this summer, so if they
aren't quite perfect, play around a bit with the concept... :-)


-- David Bolen				Arpanet:  R022DB3L@VB.CC.CMU.EDU
   Carnegie-Mellon University		Bitnet :  R022DB3L@CMCCVB
   Pittsburgh, PA  15213

STEINBERGER@KL.SRI.COM (Richard Steinberger) (04/04/88)

C vs. Fortran speed:  Since C variables are automatic (i.e., dynamic) by
default and Fortran variables are static, is it fair to conclude that in
general, a routine in C having the same number of local variables as
a "roughly identical" Fortarn routine will take a bit longer because
the OS must allocate (and deallocate) space for the local variables?
If the C local variables are made static, does this possible performance
advantage disappear?

In Fortran, it is sometimes convenient to use the END=n, where n is a label
number, in a READ statement to transfer control when an EOF is detected.
Has anyone found a *simple* way to get the same effect in C?  I had been
fopen on TT:, then using a while (!feof(fptr)) {...}, but this still results
in the loop body getting executed unless yet another foef() is put within the
loop to "double" check for EOF.

Thanks for any help or suggestions.

-Ric Steinberger
steinberger@kl.sri.com

-------

hvo@hawk.ulowell.edu (Huy D. Vo) (04/05/88)

In article <12387779883.20.STEINBERGER@KL.SRI.COM> STEINBERGER@KL.SRI.COM (Richard Steinberger) writes:
>C vs. Fortran speed:  Since C variables are automatic (i.e., dynamic) by
>default and Fortran variables are static, is it fair to conclude that in
>general, a routine in C having the same number of local variables as
>a "roughly identical" Fortarn routine will take a bit longer because
>the OS must allocate (and deallocate) space for the local variables?

This requires one instruction to raise the stack pointer to point
beyond the C local variables.

Now my point: most of the time, C functions want values, whereas
Fortran subroutines invariably expect addresses. If fetching an
address is faster than fetching a value, then Fortran wins. Any 
microcode experts out there?

How about some benchmarks?

Huy D. Vo
hvo@hawk.ulowell.edu

jayz@cullsj.UUCP (Jay Zorzy) (04/05/88)

From article <12387779883.20.STEINBERGER@KL.SRI.COM>, by STEINBERGER@KL.SRI.COM (Richard Steinberger):
> C vs. Fortran speed:  Since C variables are automatic (i.e., dynamic) by
> default and Fortran variables are static, is it fair to conclude that in
> general, a routine in C having the same number of local variables as
> a "roughly identical" Fortarn routine will take a bit longer because
> the OS must allocate (and deallocate) space for the local variables?
> If the C local variables are made static, does this possible performance
> advantage disappear?

It depends on the size of the variables.  Dynamic variables are allocated
from the stack, so if you've got huge arrays, VMS will obviously have more
work to do to allocate them on the stack.  Another point to consider, is
that static variables are allocated during image activation in R/W, copy-
on-reference (CRF) sections.  Depending on the frequency these sections
are accessed, you may have paging activity to consider.

Jay Zorzy
Cullinet Software
San Jose, CA

levy@ttrdc.UUCP (Daniel R. Levy) (04/06/88)

In article <5981@swan.ulowell.edu>, hvo@hawk.ulowell.edu (Huy D. Vo) writes:
> Now my point: most of the time, C functions want values, whereas
> Fortran subroutines invariably expect addresses. If fetching an
> address is faster than fetching a value, then Fortran wins. Any 
> microcode experts out there?

I'm not a "microcode expert" but I can say right off the bat that I
can't imagine indirection ever being faster, and it may well be slower
(needing two bus cycles to fetch the address and then the object being
addressed).  If the object is available directly as a value, that obviates
the need for a double fetch.  In both cases, the object might then be cached
in a register, which makes any further fetching unnecessary.
-- 
|------------Dan Levy------------|  Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
|         an Engihacker @        |  	<most AT&T machines>}!ttrdc!ttrda!levy
|     AT&T Data Systems Group    |  Disclaimer?  Huh?  What disclaimer???
|--------Skokie, Illinois--------|

jeh@crash.cts.com (Jamie Hanrahan) (04/07/88)

In article <281@cullsj.UUCP> jayz@cullsj.UUCP (Jay Zorzy) writes:
>From article <12387779883.20.STEINBERGER@KL.SRI.COM>, by STEINBERGER@KL.SRI.COM (Richard Steinberger):
>> C vs. Fortran speed:  Since C variables are automatic (i.e., dynamic) by
>> default and Fortran variables are static, is it fair to conclude that in
>> general, a routine in C having the same number of local variables as
>> a "roughly identical" Fortarn routine will take a bit longer because
>> the OS must allocate (and deallocate) space for the local variables?
>> If the C local variables are made static, does this possible performance
>> advantage disappear?
>
>It depends on the size of the variables.  Dynamic variables are allocated
>from the stack, so if you've got huge arrays, VMS will obviously have more
>work to do to allocate them on the stack.  Another point to consider, is
>that static variables are allocated during image activation in R/W, copy-
>on-reference (CRF) sections.  Depending on the frequency these sections
>are accessed, you may have paging activity to consider.
>

Er, no.  The job of allocating the space on the stack is accomplished by
a single SUBL2 instruction -- the stack pointer value is decremented by the
number of bytes in the dynamic variables.  All references to such variables
are handled as displacements from either the SP or another register in which
the appropriate value of the SP is stored (this because other things might
be pushed on the stack).  The stack is in a copy-on-reference section just
like Fortran static variables are; offhand I'd say the number of page faults
would be similar for similarly-designed programs.  

Another question in the original posting had to do with the efficiency of 
pass-by-reference (Fortran default) vs. pass-by-value (C default).  It will
take longer for the called procedure to pick up an argument passed by 
reference, as one additional fetch is required.  On the other hand, if we're
comparing C to Fortran, recall that C always pushes its argument lists on the
stack and uses CALLS, while Fortran allocates static argument lists, modifies
only those arguments that need to be modified at runtime, and uses CALLG.  So
if your argument list has nothing that needs to be evaluated at runtime (an
array element with a non-constant subscript, for instance), the Fortran call
will be faster.  Now you get to worry about frequency of procedure call vs.
frequency of the procedure's access to its arguments...

darin@laic.UUCP (Darin Johnson) (04/08/88)

In article <281@cullsj.UUCP>, jayz@cullsj.UUCP (Jay Zorzy) writes:
> From article <12387779883.20.STEINBERGER@KL.SRI.COM>, by STEINBERGER@KL.SRI.COM (Richard Steinberger):
> > C vs. Fortran speed:  Since C variables are automatic (i.e., dynamic) by
> > default and Fortran variables are static, is it fair to conclude that in
> > general, a routine in C having the same number of local variables as
> > a "roughly identical" Fortarn routine will take a bit longer because
> > the OS must allocate (and deallocate) space for the local variables?
> 
> It depends on the size of the variables.  Dynamic variables are allocated
> from the stack, so if you've got huge arrays, VMS will obviously have more
> work to do to allocate them on the stack.

I haven't looked at the compiler output for a long time, BUT...
doesn't the allocation of automatic variables just involve adjusting
the stack (and possibly frame) pointers?  (In fact, I am pretty sure
that this is what happens, but I have been known to be worng :-)

If this is all that happens, then the size and number of automatic
variables makes no difference at all to the overhead, which would just
be one or two instructions.  Since I don't know how FORTRAN allocates
static variables (I hope they aren't put into a demand paged section :-)
I would assume the C method is just as efficient.  It is probably more
efficient memory-wise since variables in rarely called routines never get
allocated, whereas the memory for these variables would always hang around
(of course, they could be paged out).
-- 
Darin Johnson (...ucbvax!sun!sunncal!leadsv!laic!darin)
              (...lll-lcc.arpa!leadsv!laic!darin)
	All aboard the DOOMED express!

IMHW400@INDYVAX.BITNET (04/09/88)

Regarding relative speed of C and FORTRAN code:  no, you can't assume that
dynamic allocation takes significantly more time.  Most VAX languages
"allocate" dynamic local storage by assuming that the stack has enough space
left and defining their addresses relative to the stack pointer.  The runtime
overhead involved is a single MOVAx instruction with registers for source
and destination, plus indirection overhead.  Depending on the degree of
overlap during instruction decoding for the particular machine, the indirection
overhead could in fact be zero.  So, the best-case cost for stack allocation
is one very brief instruction per CALL.  So the overhead may hold academic
interest, but no real practical interest.

On the other hand, IF your machine uses a significant amount of time for
each indirection, AND the FORTRAN compiler is using absolute rather than
PC-relative addressing (probably not!) then the overhead in C might become
significant.  But if it is really causing you grief, you probably have refined
your code into too many itty-bitty subroutines.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mark H. Wood    IMHW400@INDYVAX.BITNET   (317)274-0749 III U   U PPPP  U   U III
Indiana University - Purdue University at Indianapolis  I  U   U P   P U   U  I
799 West Michigan Street, ET 1023                       I  U   U PPPP  U   U  I
Indianapolis, IN  46202 USA                             I  U   U P     U   U  I
[@disclaimer@]                                         III  UUU  P      UUU  III

nagy%warner.hepnet@LBL.GOV (Frank J. Nagy, VAX Wizard & Guru) (04/10/88)

> C vs. Fortran speed:  Since C variables are automatic (i.e., dynamic) by
> default and Fortran variables are static, is it fair to conclude that in
> general, a routine in C having the same number of local variables as
> a "roughly identical" Fortarn routine will take a bit longer because
> the OS must allocate (and deallocate) space for the local variables?
> If the C local variables are made static, does this possible performance
> advantage disappear?
     
On the VAX there is very little performance lost due to the automatic
variables.  Assuming the variables are not initialized (in which case
they would be essentially same as static variables in terms of
performance), then all that is needed is a single instruction:

	SUBL2	#<# of bytes of auto variables>,SP

to bump the stack pointer down to allocate the auto variables.

Nothing need be done to deallocate them since the old value of the
stack pointer is taken from the call frame by the RET instruction.
This automatically "pops" the auto variables, the call frame AND
the argument list (when a CALLS instruction is used).

With VAX C versus VAX FORTRAN the performance "hit" in doing a
routine call is that C (also PASCAL for that matter) uses a CALLS
and must PUSH the argument list onto the stack (in reverse order)
before the CALLS.  (Well, normally the argument is build directly
onto the stack and is not PUSHed on from the static location).
VAX FORTRAN uses CALLG and a static argument list; but here
FORTRAN must overwrite argument values which have changed from
the values in the static list - this is true when a subroutine
passes arguments it was called with to an inner routine.  In these
cases, much of the performance advantage of VAX FORTRAN is lost.

The other side of the coin is that by building the argument list
on the stack at run-time, C and PASCAL automatically support
recursion and ASTs in a natural and easy manner.  Try that in
FORTRAN (I have, I know and that's why I use C these days).

= Frank J. Nagy   "VAX Guru & Wizard"
= Fermilab Research Division EED/Controls
= HEPNET: WARNER::NAGY (43198::NAGY) or FNAL::NAGY (43009::NAGY)
= BitNet: NAGY@FNAL
= USnail: Fermilab POB 500 MS/220 Batavia, IL 60510

jayz@cullsj.UUCP (Jay Zorzy) (04/13/88)

From article <281@cullsj.UUCP>, by jayz@cullsj.UUCP (Jay Zorzy):
> It depends on the size of the variables.  Dynamic variables are allocated
> from the stack, so if you've got huge arrays, VMS will obviously have more
> work to do to allocate them on the stack.  Another point to consider, is
> that static variables are allocated during image activation in R/W, copy-
> on-reference (CRF) sections.  Depending on the frequency these sections
> are accessed, you may have paging activity to consider.

I've taken a lot of heat on this one.  Yes, it's true that all that's 
required to expand the stack is a simple SUBL instruction to reset the stack
pointer to a lower address in P1 space.  Where you pay a penalty is when this
address is beyond any previously allocated P1 space; the next instruction that
attempts to access this address will incur an access violation.  The access
violation is then handled by an exception service routine which, if possible,
must allocate additional virtual memory in P1 space.  

So there is some dynamic memory allocation overhead involved in some cases, 
particularly in recursive routines that have large local data segments.  The
difference is the memory is allocated during execution instead of image 
activation.

By the way, if the aforementioned exception service routine fails to allocate
the needed memory (insufficient PGFLQUO/VIRTUALPAGECNT/etc), then the access
violation is passed on to user or system condition handlers.  So an ACCVIO
with a reason mask bit 0 set and virtual address in P1 space (i.e. 3FFFFFFF
< VA < 80000000) is really an insufficient memory condition.

Speaking of stacks, here's a common gotcha to watch out for:  If you're 
issuing any asynchronous $QIOs (no wait, I/O completion handled by AST
routine), make sure to declare your IOSB variables as static.  Otherwise,
once the calling routine returns, the stack is reset; an IOSB on the stack
may now point to never-never land.

Hope this information is useful...

Jay Zorzy
Cullinet Software
San Jose, CA