[comp.lang.c] Inline assembler; a quiz

mash@mips.UUCP (John Mashey) (06/19/87)

In article <21211@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
>> >asm("<assembler code>");
>>    Isn't this spo'sta be apart of the "Standard C".  I seem to remember
>> it being metioned in the K&R.
>It is absolutely NOT supposed to be in standard C!

	.... Guy gives good list of reasons why not....
In support of at least one, our code generator normally passes its output
to the assembler in binary form, although assembler can be generated.
Use of asm would of course force the less efficient form, or force the
code generator to parse the asm statement itself, a distasteful thought.

More important:
And finally, "asm" and global-optimizing compilers are fairly
contradictory.  A good optimizer:
	a) Mostly ignores register declarations. It allocates
	registers as appropriate.  The same variable may well appear
	in several different places during the code.
	b) May, given slightly different source code, rearrange the
	register use substantially.
	c) Will find it VERY hard to figure out what an arbitrary
	"asm statement" is doing, in terms of side-effects.

Finally, some questions:
	a) What global optimizing C compilers do people use out there?
		Known ones: MIPS, Green Hills, Sun's, DEC [newest Ultrix
		one, I think], Multiflow, Tartan Labs, newest IBM one
		for RT PC, HP's Spectrum compiler.
		Any others?
		Any for PC's?
	b) What global optimizers do people use to compile their 
	UNIX kernels? i.e., which of them have implemented "volatile",
	without which device drivers are not easy.

Having used both asm and serious optimizers, I know which I'd rather have.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

jimv@omepd (Jim Valerio) (06/22/87)

This is a paraphrase and clarification of some mail that I sent to John Mashey
which he urged me to post to the net.

In article <464@winchester.UUCP> mash@winchester.UUCP (John Mashey) argues
against inline assembly language:
>And finally, "asm" and global-optimizing compilers are fairly
>contradictory.  A good optimizer:
>	a) Mostly ignores register declarations. It allocates
>	registers as appropriate.  The same variable may well appear
>	in several different places during the code.
>	b) May, given slightly different source code, rearrange the
>	register use substantially.
>	c) Will find it VERY hard to figure out what an arbitrary
>	"asm statement" is doing, in terms of side-effects.

I don't believe that inline assembly language and globally optimizing compilers
need to be contradictory.  In particular, the compiler I've been using here
implements inline assembly lanuage in such a way that (a) and (b) are transparent,
and (c) is not a problem in practice.

When I say inline assembly language what I mean is not the the usage found, for
example, in 4.3bsd Unix, where an arbitrary string is randomly inserted between C
statements.  The compiler I've been using here implements inline assembly language
functions using a construction similar to the asm function declarations found in
the PCC2 (?) C compiler.  A sample of our syntax and usage can be found at the end
of this article.

Basically, what the programmer does is declare a function in the usual C
way, but with the special storage class "asm".  The body of the function is coded
as a set of assembly language templates, one of which is selected by the compiler
depending on the storage class of the operands and results.
Registers, temporary storage, local labels, and so on are declared by name,
and the compiler just uses it's regular allocation schemes and naming conventions
to generate the true names.

The programmer is expected to follow certain sensible rules when writing the code
templates.  For example, the source operands to the function may not be modified
unless the compiler matches a template saying that the operand is a temporary.
Similarly, the compiler has sensible rules to follow when generating code that
interacts with code in asm functions.  For example, in the case that the compiler
does not or chooses not to recognize the side-effects of the instructions in the
selected asm template, it makes a worst-case assumption about side effects and
performs the cleanup actions it would do if a subroutine call were emitted.


I've found this asm function facility useful in a variety of situations.
I don't use it for inline functions that I can write in C, since we have
a separate facility for that (thanks, Steve).  Instead, I use it in the
following ways (and more).

1)  I've used it most in our floating-point math library, to cause calls
    to some basic floating-point function to emit the processor's equivalent
    instruction inline.

2)  I've used it to get at processor control registers, such as examining
    floating-point exception flags.

3)  I've used it to do double-precision and quad-precision integer arithmetic
    by defining the simple asm functions that get at the add-with-carry
    sorts of instructions, and the extended multiply and divide instructions.

4)  Last, I've found the asm functions to be useful for generating particular
    test cases for odd instruction combinations or instructions not normally
    generated by our compiler; it allows me to do most of the work using our
    regular programming environment, and diving down to the assembly language
    only for that particular case I care about.

Thus, in 3 cases the benefit is efficiency without sacrificing the advantages
of writing most of the code in C, and in the other case the benefit is
programmer convenience for knock-off programs.

In his mail to me, John argues that these 4 cases are less important to them
because in their implementation:
    a) leaf procedures like these only have a 2-cycle overhead,
    b) in most cases, features that cannot be gotten to from a high-level
       language don't exist,
    c) the compiler would need to know a great deal about the side effects,
       and a function call automatically has the right effects.

In our case, (c) is the same, (a) is nearly the same, and (b) is apparently
a legitimate difference.  (I suspect that the true cost of leaf procedures
is more than 2 cycles here, when you count overhead due to i-cache misses,
software calling conventions required for separate linking, and whatever.)


In conclusion (and despite the reasons John gives), I feel that the inline
assembly language function facility was well worth its implementation cost.
--
Jim Valerio	{verdix,intelca!mipos3}!omepd!jimv, jimv@omepd.intel.com

--- appended example, as promised above ---

/*
 * You will find below fragments of the standard header file that defines
 * the many inline and asm functions used in our floating-point support
 * library, followed by a sample function that uses these (and other)
 * functions.  In the "generic" function fp_exp, there is only 1 true subroutine
 * call: fp_fault.  All the other functions are either inline functions or
 * asm functions.
 */

/*
 * Return the current floating-point environment.
 */
asm
fp_env
fp_getenv(void)
{
%reglit return;
	modac	0,0,return
%error;
}

/*
 * Set the floating-point environment to `env',
 * and return the previous environment.
 */
asm
fp_env
fp_setenv(fp_env env)
{
%reglit return; reglit env;
	ldconst	0xdf1f0000,return
	modac	return,env,return
%error;
}


/*
 * Return the current exception flags.
 */
inline
fp_except
fp_getflags(void)
{
	register fp_env env;

	env = fp_getenv();
	return env.flags;
}

/*
 * Set exception flags to `flags', and return the previous
 * flags.
 */
asm
fp_except
fp_setflags(fp_except flags)
{
/*
 * Optimized cases when return value is ignored.
 */
%void return; const(0) flags; tmpreg mask; /* FPX_NONE */
	ldconst	31<<16,mask
	modac	mask,0,mask

%void return; const flags; tmpreg mask,tflags;
	ldconst	31<<16,mask
	ldconst	flags<<16,tflags
	modac	mask,tflags,mask

%void return; tmpreg flags; tmpreg mask;
	ldconst	31<<16,mask
	shlo	16,flags,flags
	modac	mask,flags,mask

%void return; reglit flags; tmpreg mask,tflags;
	ldconst	31<<16,mask
	shlo	16,flags,tflags
	modac	mask,tflags,mask

/*
 * Same as above, but return value required.
 */
%reglit return; const(0) flags; /* FPX_NONE */
	ldconst	31<<16,return
	modac	return,0,return
	shlo	11,return,return
	shro	27,return,return

%reglit return; const flags; tmpreg tflags;
	ldconst	31<<16,return
	ldconst	flags<<16,tflags
	modac	return,tflags,return
	shlo	11,return,return
	shro	27,return,return

%reglit return; tmpreg flags; tmpreg mask;
	ldconst	31<<16,return
	shlo	16,flags,flags
	modac	mask,flags,mask
	shlo	11,return,return
	shro	27,return,return

%reglit return; reglit flags; tmpreg mask,tflags;
	ldconst	31<<16,return
	shlo	16,flags,tflags
	modac	mask,tflags,mask
	shlo	11,return,return
	shro	27,return,return

%error;
}

/*
 * Restore the previously saved environment `env', adding in whatever
 * masked exceptions (from `flags') that have occurred.
 * The unmasked exceptions (i.e. traps) are returned.
 */
inline
fp_except
fp_restorenv(fp_env env, fp_except flags)
{
	register fp_except masks;

	masks = env.masks;
	env.flags |= flags & masks;
	fp_setenv(env);
	return (flags &~ masks);
}

/*
 * Scale `x' by `i' (i.e. multiply x by 2^i).
 */
asm
float
fps_scalb(float x, int i)
{
%reglit return; reglit x, i;
	scaler	i,x,return
%error;
}

asm
double
fpd_scalb(double x, int i)
{
%reglit(2) return; reglit(2) x; reglit i;
	scalerl	i,x(0),return(0)
%error;
}


/*
 * The exponential function e^x is approximated using the mathematical
 * identity:
 *	e^x = 2^(log2(e) * x)
 * Since the underlying approximation function is 2^f - 1, for abs(f) <= 0.5,
 * the actual algorithm used is
 *	e^x = scale( (2^f - 1) + 1, I)
 * where (log2(e) * x) = I+f and I is the (IEEE) nearest integer to
 * (log2(e) * x).
 */
GENERIC
fp_exp(GENERIC x)
{
	register GENERIC r;
	register fp_except traps;
	register fp_env env;

	switch (fp_class(x)) {

	case FPC_POSZERO:
	case FPC_NEGZERO:
		return (GENERIC)1.0;

	case FPC_POSINF:
		return x;

	case FPC_NEGINF:
		return (GENERIC)0.0;

	case FPC_POSQNAN:
	case FPC_NEGQNAN:
	case FPC_POSSNAN:
	case FPC_NEGSNAN:
		env = fp_setenv(FP_DEFENV);
		r = fp_nan1(x); /* propagate NaN */
		break;

	case FPC_POSNORM:
	case FPC_NEGNORM:
	case FPC_POSDENORM:
	case FPC_NEGDENORM:
		env = fp_setenv(FP_DEFENV);

		/*
		 * We make a range check here to avoid two different overflow
		 * conditions.  If x is a very large extended precision number,
		 * then x*log2(e) can overflow to infinity, which will then
		 * precipitate an invalid operation exception when computing
		 * ex-ei.  The more likely overflow avoided by this check is
		 * when ei is too large to fit in an integer.  In no case
		 * should exp() ever signal integer overflow.
		 */
		if (fp_abs(x) < (GENERIC)65536.0) {
			register long double ex, ei;

			/*
			 * Two subtle points here:
			 * (1) Exp2m1() might generate a spurious underflow
			 *     when ei = 0.  The spurious flag must be cleared.
			 *     The true underflow (and overflow) indication
			 *     comes from the scale() operation.
			 * (2) The inexact exception will always be signaled
			 *     because either the multiplication or the round()
			 *     operation (and usually both) will signal inexact.
			 */
			ex = x * log2_e;
			ei = fpe_round(ex);
			r = (GENERIC)1.0 + fpe_exp2m1(ex - ei);
			fp_clrflags(FPX_UNFL);
			r = fp_scalb(r, (int)ei);
		} else {
			/*
			 * When x >= 2^16, overflow is certain.
			 * When x <= -2^16, underflow is certain.
			 */
			if (x > 0) {
				fp_setflags(FPX_OVFL | FPX_INEX);
				r = fp_posinf;
			} else {
				fp_setflags(FPX_UNFL | FPX_INEX);
				r = (GENERIC)0.0;
			}
		}
	}
	traps = fp_restorenv(env, fp_getflags());
	if (traps == FPX_NONE)
		return r;
	return fp_fault(FPSL_EXP, traps, r, x);
}

mcg@omepd (Steven McGeady) (06/22/87)

My colleague, Jim Valerio, posted a long response (with which I agree) to
John Mashey's negative comments about the value of asm inserts in C.  Jim
includes a long batch of example code, which, I fear, will be incomprehensible
to someone who doesn't know the meaning of the constructs used in the asm
inserts.  Herewith follows a (slightly sanitized) version of the
specification for the asm inserts that Jim refers to.  They were designed by
Jim and myself based on some ideas gleaned from the 3B2 PCC-2 C compiler.

This specification is implemented in an existing compiler.

Questions may be addressed to me.

S. McGeady
Intel Corp.
(503) 696-4393

This specification is:
	 (c) Copyright, 1986, 1987, Intel Corporation, all rights reserved




               Assembly-Language Inclusion in C Code
                             Third Draft

                             S. McGeady
                             Jim Valerio

  
                        Language Software Group
                     Component Development Support
                    Oregon Microcomputer Engineering
                              Intel Corp.


Introduction and Background

     The mechanism defined here is very similar to, and derived from the
mechanism provided by the AT&T System V.3 PCC2 (QCC) compilers for the
3b2 and Intel 386 processors.

     In this draft of this document, the mechanism has been significantly
extended and generalized from the PCC2 version, but retains its flavor.

1.  Traditional Mechanism

     If the keyword 'asm' is seen, immediately followed by a parenthesis
and a quoted string, e.g.:

       asm("movq    g0,r0");

then the quoted string is stripped of the quotes and emitted.  If the
compiler is generating assembly language output, the compiler does no
interpretation on the string.  The string may be an instruction, pseudo-
operation, or any other string, whether legal or illegal P7 assembly 
language.  If the compiler is generating COFF output, the compiler
transparently switches to assembly-output mode.  The front_end emits a
special return code to the driver instructing it to invoke the assembler.

     The asm() statement may occur anywhere a C statement may occur.  The
asm() statement is *not* an expression, and has no return value.

     Unlike the PCC-class of compilers, asm() may not occur in declaration
sections of programs.  Since the compiler does complex data reordering, 
assembler directives (e.g. .align) embedded in the data section may not 
have the intended effect.

     The traditional mechanism is provided in the compiler as well as a new
mechanism that is defined below.

2.  New Mechanism

     If a function declaration has the storage class asm, e.g.:

       asm int foo() {...}

then the function declaration and any formal parameters are parsed normally,
but formal declarations of the formal parameters must be present, in the
prototyped form, and text within the function block (between the
'{'...'}') is interpreted specially.  The interpretation is described
below.


     This form of asm insertion effectively allows the programmer to
dynamically define new code-generation templates to be used by the compiler.
A single assembly insertion may specify several templates, of which one is
matched at code-generation time.  This allows very flexible in-line insertion
of special-purpose code sequences which would otherwise need to be hard-
wired into the compiler.

     The declaration of an assembly insertion is the same as for any other
procedure.  The declaration specifies the types of the arguments, in 
the new (function prototype) style.

     Asm insertions must be defined before any use in a module in order to
get well-defined semantics.  A program that calls a procedure with the 
name of a later-defined asm function is in error, though this error may
go undiagnosed by the compiler.

     The code within the block describes a set of assembly-language sequences
which may be expanded in-line at the site of calls to the named function.
For example:

       asm int test(int a, int b) {
       %    reglit a,b,return; label lab1;

       lab1:
            test a,b
            move a,return
            beq lab1

       }

       main() {
            register int x,y;
            register int a;


            ...
            a = test(x,y);

       }

generates this code:

       _main:
            # entry code
            # ...
            # x is in r4, y is in r5, a is in r6
       L0001:
            test r4,r5
            mov  r4,g0
            beg  L0001

            mov g0,r6
            # ...


2.1  Syntax of Asm Functions

     The contents of an asm block are interpreted as follows:



       asm_block::        asm_line +   ;

       asm_line::         '%' [control]*
                |    expansion_line
                |    blank_line
                ;

       control::     arg_def
                |    'error' [text] ';'
                |    'call' [text] ';'
                |    'spillall' ';'
                ;
       arg_def::     [ class name [',' name ]* ';']*
                |    'use' reg [',' reg*]* ';'
                ;

       name::              formal_param_id
                |    local_var_id
                ;

       expansion_line::    text [operand [',' operand]*]
                |    label ';'
                ;

       
       operand::     formal_param_id
                |    formal_param_id '('[[0-9]+['-'[0-9]+]]')'
                |    local_var_id
                |    local_var_id '('[[0-9]+[0-9]+]] ')'
                |    'return'
                |    'return('[[0-9]+['-'[0-9]+]] ')'
                ;




       class::       'reglit'
                |    'reglit('[[0-9]+[0-9]+]]')'
                |    'tmpreg'
                |    'tmpreg('[[0-9]+['-'[0-9]+]] ')'
                |    'freglit'
                |    'ftmpreg'
                |    'const'
                |    'const('[[0-9]+[':'[0-9]+]]')'
                |    'label'
                |    'void'
                ;

Storage classes have the following meanings:

       reglit     - any register, or constant that
                  can be used as a literal

       reglit(n)  - same as reglit, except a register pair, triple or
                  quad

       tempreg    - any register being used as a
                  temporary (anonymous variable,compiler temporary)

       tempreg(n) - as above, except a pair, triple, or quad

       freglit    - any floating-point register or literal

       ftmpreg    - any floating-point register being used as
                      a temporary (anonymous variable, compiler temporary)

       const      - any constant

       const(n)   - any constant with the given value or range

These storage classes are also available, but may not be applied to formal
parameters:

       label  - a label
       use    - special (see below)
       void   - applied only to 'return' (see below)

     Paramater and variable names (defined in an argument definition
statement) may be any C identifiers, except that identifiers that are
assembler reserved words are reserved (specifically register names).

     Each control line (those beginning with '%') marks the beginning of
an assembler expansion leaf.  An expansion leaf is a section of code that
will be expanded inline in the calling routing if the actual parameters
provided by the caller match those defined in the argument definition
statement.  Control lines may be continued by a backslash character'.
There can be only one control line per expansion leaf.

     An assembler pseudo-function may have zero or more expansion leaves,
%error leafs, or %call leaves.

     Each leaf represents a different set of code that may be expanded 
into the caller under certain circumstances, except %error which signals
a compile-time error to the user.

 

2.2.  Expansion Semantics

     When a call to an asm pseudo-function occurs, a part of the
contents of the asm pseudo-function is expanded in-line at the call location,
using the following rules:

1)  The actual arguments are coerced into the types of the formal arguments
    to the asm function, if necessary.  If this coercion occurs, or if they
    were expressions, these values will be in temporary registers, (tmpreg
    or ftmpreg) otherwise they will be in registers or memory locations.

2)  The storage classes (register, temporary registers, floating-register, 
    constant, or memory) of the actual arguments and the return value are
    ascertained, and the asm block is searched for argument definition
    control lines that define corresponding storage classes for the formal
    parameters of asm pseudo-function using the following rules:

    a)  tmpreg and tmpreg(n) match actual arguments that are in one or 
        more temporary registers as the result of coercions or because 
        the argument was an expression;  if the actual arguments are in
        memory (not currently allocated to a register),
        then a set of temporary registers is allocated and code is emitted
        prior to the assembly insert to move the value into these registers;
        if the actual argument is a constant, it is handled as though the
        the argument was in memory, but a loadconst instruction is emitted
        instead of a normal load; n is 1,2,3,4.
      
    b)  reglit and reglit(n) match actual arguments that are in one or
        more local or global registers, or are integer literals; if
        the actual argument is in memory, or the constant is out of
        literal range, then the rules for tmpreg are used; n is 1,2,3,4.

    c)  ftmpreg matches an actual argument that is in a floating-point
        register, and is a temporary, as defined above; if the actual is
        in memory, then a floating point temporary register is allocated
        and code is emitted to move the value there;

    d)  freglit matches an actual argument that is in a floating-point
        register, or a floating-point literal; if the actual argument
        is in memory, then a floating point temporary register is allocated
        and code is emitted to move the value there;

    e)  const matches any constant expression(integral type); const(n) 
        matches a constant expression whose value is n; const(n..m) matches
        a constant expression in the (inclusive) range n..m; as a special
        case;

    f)  the return specifier is handled specially:  the return specifier
        void is matched only if the return value of the insertion is not
        used; a return specifier typed as reglit or tmpreg (or their
        length-specified variants) matches a return destination in the
        appropriate number of registers; if the return destination is in
        memory, then a set of registers is allocated and code is generated
        after the assembly insertion to move the return value from the
        allocated registers into the appropriate memory location; the
        same applies to freglit and ftmpreg for floating-point returns;
        any return specifier matches an ignored return value - a register
        is allocated for the return value, but is otherwise ignored;


     g) label, void, and use never match actual arguments - application
        of label or use class to a formal argument is an error.

3)  If a leaf is found all of whose formal argument storage class match
    the actual argument storages classes, the code in the leaf is expanded,
    using the leaf expansion rules below;

4)  If the leaf matched is a %call, the asm pseudo-function call is coerced
    to a call to the argument to the %call control, or, if there is no
    argument, to a call to an external procedure of the same name as asm
    pseudo-function.  The compiler generates normal function-call entry
    code;  if no match or no %error line is found then a default call to
    an external procedure with the same name as the asm pseudo-function
    will be generated.

5)  If the %error specification is matched, a compile-time error is
    generated and no code is expanded;  if the %spillall specification is
    found, all aliased values currently cached in registers are spilled to
    memory by the compiler;

6)  If parameters are found in the argument definition control line that 
    do not correspond to formal parameters, then these parameters are
    considered local variables.  These local variables may be of any
    register storage class except use and const.  If a register variable is
    introduced, a temporary variable is allocated by the compiler (potentially
    of the global or local class).  If a memory variable is allocated, an
    integer-wide place on the caller's stack is allocated.  If the storage
    class is label, a compiler-generated label is allocated.

7)  If the storage class use is found in an argument definition control line,
    it serves to inform the compiler that the registers listed as the use
    statement's parameters are used by the code in the asm leaf, and those
    registers, if live at entry to the asm block must be saved prior to
    entry and restored upon exit from the asm block.

8)  During expansion, any token which fully matches a parameter is substituted
    with the value of that paramater, as defined above.  No explicit checking
    of the validity of the resulting assembler statement is performed.  For
    reglit or treglit parameters, if the parameter was specified as a register
    set (multiple registers), then the identifier for this parameter may have
    "(n)", specifying which if the registers in the set to use, in the
    range 0..3;


2.3 Special Notes

1)  Assembly inserts are not allowed to return or have as parameters
    aggregates longer than 4 words;

2)  Enums are handled as normal integers;

3)  Char and short types are promoted using the normal rules;

4)  Preprocessing is done inside asm inserts as within any other text; 
    This means that lines that have a '#' as the first non_white character
    are in error unless they are preprocessor control lines.

5)  Assembly inserts are not allowed to specify register sets longer
    than 4 words;

6)  A return must be specified as a register class.



-----------------------------------------------------------
the end
-----------------------------------------------------------

mash@mips.UUCP (John Mashey) (06/22/87)

In article <829@omepd> jimv@omepd.UUCP (Jim Valerio) writes:
>This is a paraphrase and clarification of some mail that I sent to John Mashey
>which he urged me to post to the net.
To summarize this discussion, it started with some mnetion of asms.
I observed that asms got in the way of optimizers.
Jim sent me a well-done note that said asms were OK, especially if done
as they had done them.  When apparently reasonable people disagree, it's
often worth looking for deeper reasons to gain insight about the different
conditions that lead to this.

jim says:
>When I say inline assembly language what I mean is not the the usage found, for
>example, in 4.3bsd Unix, where an arbitrary string is randomly inserted between
>C statements.....

>In his mail to me, John argues that these 4 cases are less important to them
>because in their implementation:
>    a) leaf procedures like these only have a 2-cycle overhead,
>    b) in most cases, features that cannot be gotten to from a high-level
>       language don't exist,
>    c) the compiler would need to know a great deal about the side effects,
>       and a function call automatically has the right effects.
>
>In our case, (c) is the same, (a) is nearly the same, and (b) is apparently
>a legitimate difference.  (I suspect that the true cost of leaf procedures
>is more than 2 cycles here, when you count overhead due to i-cache misses,
>software calling conventions required for separate linking, and whatever.)
Not really, or at most 2+ a fraction.
>
>In conclusion (and despite the reasons John gives), I feel that the inline
>assembly language function facility was well worth its implementation cost.

Actually, I'd agree with Jim, for his environment.  Note, however, that
what he's talking about is REALLY a way to dynamically add something like
code-generation templates to the compiler (not exactly, but much more like
that than the typical asm statement).

The deeper generalizations that I proposed are as follows:

I think there is a spectrum of reasonable viewpoints, based on 3 things:
a) Overall architecture: extremes are:
	A1: architecture has many features not easily generated
	directly by compilers
	A2: architecture has few such features
b) Subroutine call speed [especially for minimal routines]:
	B1: very slow
	B2: very fast
c) Compiler technology level
	C1: pcc or near-equivalent
	C2: all-out optimization technology (IBM's internal one
	for the RT PC), HP's, MIPS
In general, the more of the "1" cases, the more likely that "asm",
especially a good one of the type you've described, seems to be useful.
The more of the "2" cases, the less interest people are likely to have,
not because there are NO cases where it would be useful, but because it
seems to be not worth the effort.

Bottom line: I think both positions are reasonable; they just get
there from different assumptions.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

bright@dataio.Data-IO.COM (Walter Bright) (06/23/87)

In article <464@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes:
<More important:
<And finally, "asm" and global-optimizing compilers are fairly
<contradictory.  A good optimizer:
<	a) Mostly ignores register declarations. It allocates
<	registers as appropriate.
Datalight C can even allocate more than one variable to a specific register,
if their usages do not overlap. This is known as 'register allocation by
coloring'.
<	b) May, given slightly different source code, rearrange the
<	register use substantially.
<	c) Will find it VERY hard to figure out what an arbitrary
<	"asm statement" is doing, in terms of side-effects.
It's not that hard. Assuming that the asm doesn't contain jumps out of
or into itself, flow analysis can be done over asm statements by assuming
that they can but not necessarilly do modify everything.
<Finally, some questions:
<	a) What global optimizing C compilers do people use out there?
<		Known ones: MIPS, Green Hills, Sun's, DEC [newest Ultrix
<		one, I think], Multiflow, Tartan Labs, newest IBM one
<		for RT PC, HP's Spectrum compiler.
<		Any others? Any for PC's?

The Datalight C compiler is a globally optimizing compiler. It does
most of the classic global optimizations that depend on data flow analysis.
For example, it does constant propagation, copy propagation, very busy
expressions, partially redundant expressions, loop induction variables,
register allocation by coloring, etc.

Disclaimer: I have no association with Datalight other than the fact that
I wrote the compiler and receive money for it.

peter@sugar.UUCP (Peter DaSilva) (06/30/87)

Not all C compilers are running on UNIX.

Some operating systems require heavy assembly language glue around
system calls. It's so much easier to put this glue in asm statements
in 'C' source.
-- 
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter (I said, NO PHOTOS!)

henry@utzoo.UUCP (Henry Spencer) (07/06/87)

> Not all C compilers are running on UNIX.
> 
> Some operating systems require heavy assembly language glue around
> system calls. It's so much easier to put this glue in asm statements
> in 'C' source.

I have news for you:  Unix systems generally require assembly-language glue
around system calls too, since virtually no Unix system does system calls
exactly the same way it does function calls.  (If nothing else, the actual
call instruction almost invariably differs.)  The standard solution, used
by the original Unix and still used by most Unixes, is that the "system
call" in C is just an ordinary function call, and the function that's called
is written in assembler and does the actual system call.

The original PDP-11 Unixes had grossly non-C system-call conventions and
worked fine despite a C compiler that never supported "asm" at all.
-- 
Mars must wait -- we have un-         Henry Spencer @ U of Toronto Zoology
finished business on the Moon.     {allegra,ihnp4,decvax,pyramid}!utzoo!henry

francus@cheshire.columbia.edu.UUCP (07/07/87)

In article <8253@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>> Not all C compilers are running on UNIX.
>> 
>> Some operating systems require heavy assembly language glue around
>> system calls. It's so much easier to put this glue in asm statements
>> in 'C' source.
>
>I have news for you:  Unix systems generally require assembly-language glue
>around system calls too, since virtually no Unix system does system calls
>exactly the same way it does function calls.  (If nothing else, the actual
>call instruction almost invariably differs.)  The standard solution, used
>by the original Unix and still used by most Unixes, is that the "system
>call" in C is just an ordinary function call, and the function that's called
>is written in assembler and does the actual system call.
>
>The original PDP-11 Unixes had grossly non-C system-call conventions and
>worked fine despite a C compiler that never supported "asm" at all.
>-- 
>Mars must wait -- we have un-         Henry Spencer @ U of Toronto Zoology
>finished business on the Moon.     {allegra,ihnp4,decvax,pyramid}!utzoo!henry

Well, if you look at the source code, if you have source, you'll notice
that most - if not all - of the system call found in Section 2 of the
manual are written in 'C'. In fact even kernel system calls like namei
are written in 'C'. (By kernel system call I simply mean that the user
does not have access to these functions.)  WHat is interesting is that
the strcmp family, section 3 functions, are written in assembly on many
macines, the Amdahl UTS is one that comes to mind.

Yoseff

******************************************************************
yf
In Xanadu did Kubla Khan a stately pleasure dome decree
But only if the NFL to a franchise would agree.

ARPA: francus@cs.columbia.edu
UUCP: seismo!columbia!francus

guy@gorodish.UUCP (07/07/87)

> Well, if you look at the source code, if you have source, you'll notice
> that most - if not all - of the system call found in Section 2 of the
> manual are written in 'C'. In fact even kernel system calls like namei
> are written in 'C'.

You've completely missed Henry's point.  Henry has source and is
quite familiar with the UNIX kernel, so he is certainly aware that
most of the kernel is written in C.  The point is that the code that
USER-mode programs use to make system calls is written in assembler.
The original poster was claiming that you needed to have support for
embedded assembler language in C compilers in order to perform system
calls from C programs under OSes other than UNIX, since the interface
to those system calls involves some code that cannot be expressed in
pure C.  Henry was pointing out that the UNIX system call interface
also generally requires code that cannot be expressed in pure C, but
that UNIX has gotten along quite well now without requiring embedded
assembler language to make system calls; there are small "wrapper"
routines written in assembler language that are called by C code.

> WHat is interesting is that the strcmp family, section 3 functions, are
> written in assembly on many macines, the Amdahl UTS is one that comes to
> mind.

That's not particularly surprising; many machines have idioms that
can be used to speed up string processing, but it may be hard for a C
compiler to figure out when to use those idioms, or it may be that
the compiler would generate them so infrequently (e.g., only in the
source code to "strcmp" or equivalent code) that it's not worth
teaching the compiler about them.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

wesommer@bloom-beacon.UUCP (07/07/87)

In article <4788@columbia.UUCP> francus@cheshire.columbia.edu.UUCP (Yoseff Francus) writes:
>In article <8253@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>>I have news for you:  Unix systems generally require assembly-language glue
>>around system calls too, since virtually no Unix system does system calls
>>exactly the same way it does function calls.  (If nothing else, the actual
>>call instruction almost invariably differs.)  The standard solution, used
>>by the original Unix and still used by most Unixes, is that the "system
>>call" in C is just an ordinary function call, and the function that's called
>>is written in assembler and does the actual system call.
>
>Well, if you look at the source code, if you have source, you'll notice
>that most - if not all - of the system call found in Section 2 of the
>manual are written in 'C'. 

Yes, but to get to them, you have to use some form of assembler "glue"
to warp between user mode and kernel mode.. for example, on the VAX,
you use a "chmk" instruction (CHange Mode to Kernel); this causes the
processer to jump through a trap vector through some kernel assembler
code in locore.s, and into the "syscall" routine.  This winds up
copying the system call arguments from user space into a known area of
kernel space (pointed at by u.u_ap), and then calls one of the
top-level system call functions.  These functions thus get their REAL
arguments by looking at u.u_ap.

				Bill Sommerfeld
				wesommer@athena.mit.edu

francus@cheshire.columbia.edu (Yoseff Francus) (07/07/87)

In article <1084@bloom-beacon.MIT.EDU> wesommer@priam.UUCP (William Sommerfeld) writes:
>In article <4788@columbia.UUCP> francus@cheshire.columbia.edu.UUCP (Yoseff Francus) writes:
>>In article <8253@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>>>I have news for you:  Unix systems generally require assembly-language glue
>>>around system calls too, since virtually no Unix system does system calls
>>>exactly the same way it does function calls.  (If nothing else, the actual
>>>call instruction almost invariably differs.)  The standard solution, used
>>>by the original Unix and still used by most Unixes, is that the "system
>>>call" in C is just an ordinary function call, and the function that's called
>>>is written in assembler and does the actual system call.
>>
>>Well, if you look at the source code, if you have source, you'll notice
>>that most - if not all - of the system call found in Section 2 of the
>>manual are written in 'C'. 
>
>Yes, but to get to them, you have to use some form of assembler "glue"
>to warp between user mode and kernel mode.. for example, on the VAX,
>you use a "chmk" instruction (CHange Mode to Kernel); this causes the
>processer to jump through a trap vector through some kernel assembler
>code in locore.s, and into the "syscall" routine.  This winds up
>copying the system call arguments from user space into a known area of
>kernel space (pointed at by u.u_ap), and then calls one of the
>top-level system call functions.  These functions thus get their REAL
>arguments by looking at u.u_ap.
>
>				Bill Sommerfeld
>				wesommer@athena.mit.edu
The point I'm trying to make is that on Unix the programmer doesn't
have to worry about the assembler "glue" thats being used.  Not only
can a programmer use a system call in the same manner as a function
call, but can even get an understanding of what the system call is
doing (what tables is it modifying etc.) because the guts of it
are also written in 'C'.

No argument from me that even UNIX needs things like machine.s,
trap.s etc.

Yoseff





******************************************************************
yf
In Xanadu did Kubla Khan a stately pleasure dome decree
But only if the NFL to a franchise would agree.

ARPA: francus@cs.columbia.edu
UUCP: seismo!columbia!f/V.p:
>

guy%gorodish@Sun.COM (Guy Harris) (07/08/87)

> The point I'm trying to make is that on Unix the programmer doesn't
> have to worry about the assembler "glue" thats being used.  Not only
> can a programmer use a system call in the same manner as a function
> call,

And the point that many others have been making is that this is in NO
WAY unique to UNIX; other OSes do the same thing.  Even if the OS
doesn't provide assembler wrappers for system calls as part of its
standard library (or if the C compiler doesn't do so), these wrappers
can be written *without* recourse to "asm".  The original posting
seemed to imply that UNIX was somehow unique in that you could get
away without "asm", but that C implementations on other OSes
couldn't; this isn't true in general, because the same techniques
used on UNIX can be used on those other OSes.

> but can even get an understanding of what the system call is
> doing (what tables is it modifying etc.) because the guts of it
> are also written in 'C'.

Well, first of all, you can't do that if you don't have source, and
many - probably most - UNIX sites don't have source.  Second, I
wouldn't go so far as to say that *any* programmer could get an
understanding of what the system call is doing just by looking at the
source code; some stuff in the system is very complex and subtle, and
won't be obvious without a LOT of study.  Third of all, UNIX is NOT
the only OS not entirely written in assembler.  And fourth, this has
nothing to do with what the original discussion was about, anyway.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

peter@sugar.UUCP (Peter DaSilva) (07/10/87)

In article <8253@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>> [my comments about operating systems needing heavy glue around sys calls]
>
>[UNIX systems also require glue arouns system calls]

Well, yes. But when you get a UNIX 'C' compiler you also get a library that
provides that glue for all the calls. Those of us with Amigas, Atari STs,
and IBM-PCs (in increasing order of grossness of O/S interface) find that
a lot of glue must be added in production programs. In the Amiga this is
pretty much restricted to high level interrupt handlers, but still it's
a pain to have a bunch of assembly files around. In the Atari ST the calling
conventions are mostly C, but there are some things you need to go into
assembly to do (handling ^C (!), writing desk accessories, etc). On the
IBM-PC, it's a nightmare. Many of the calls return data that high level
languages can't easily deal with even after you wipe off the glue.

>The original PDP-11 Unixes had grossly non-C system-call conventions and
>worked fine despite a C compiler that never supported "asm" at all.

I know. I did a port of John James' PDP11 FIG-Forth to Version 7. But, once
again, you get the glue with the kit.

>Mars must wait -- we have un-         Henry Spencer @ U of Toronto Zoology
>finished business on the Moon.     {allegra,ihnp4,decvax,pyramid}!utzoo!henry

Have you ever heard Hope Eyrie?
-- 
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter (I said, NO PHOTOS!)

thomson@uthub.UUCP (07/10/87)

In article <22887@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:

>Well, first of all, you can't do that if you don't have source, and
>many - probably most - UNIX sites don't have source.

In fact, if they are clients of a certain solar workstation manufacturer,
they don't even have documentation (/usr/doc, that is).
-- 
		    Brian Thomson,	    CSRI Univ. of Toronto
		    utcsri!uthub!thomson, thomson@hub.toronto.edu

guy%gorodish@Sun.COM (Guy Harris) (07/13/87)

> >Well, first of all, you can't do that if you don't have source, and
> >many - probably most - UNIX sites don't have source.
> 
> In fact, if they are clients of a certain solar workstation manufacturer,
> they don't even have documentation (/usr/doc, that is).

Big deal.  You get documentation, you just don't get the "troff"
source to it.  How many vendors *do* supply that stuff?  (Besides,
getting it wouldn't do you any good; we use our own macro packages
and other tools, and it would be too much trouble to supply and
support them.)
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

henry@utzoo.UUCP (Henry Spencer) (07/13/87)

> > In fact, if they are clients of a certain solar workstation manufacturer,
> > they don't even have documentation (/usr/doc, that is).
> 
> Big deal.  You get documentation, you just don't get the "troff"
> source to it.  How many vendors *do* supply that stuff?

Well, I recall one that did.  No support, no bug fixes, no help, no advice,
don't-call-us-we'll-call-you... but they did ship sources.  Funny, their
system became awfully popular for some reason...

(For those who haven't figured it out, I'm talking about AT&T and the original
Unix distributions.)

Shipping the printed copies only is just dandy if the user wants to run your
software absolutely straight out of the box.  Oddly enough, many don't, and
as a result they end up writing their own documentation from scratch.  Not
that Sun's documentation is any great shakes, in fact it stinks, but it
might be a useful starting point for some people.

> (Besides,
> getting it wouldn't do you any good; we use our own macro packages
> and other tools, and it would be too much trouble to supply and
> support them.)

Funny, I seem to recall IBM using an explanation along those lines when asked
why they didn't supply sources in a high-level language.  "Well, we've got
all this great stuff, but we can't be bothered letting the peons, oops I
mean the customers, have it."  Is this really the model Sun wants to emulate?
-- 
Mars must wait -- we have un-         Henry Spencer @ U of Toronto Zoology
finished business on the Moon.     {allegra,ihnp4,decvax,pyramid}!utzoo!henry

guy%gorodish@Sun.COM (Guy Harris) (07/14/87)

> Well, I recall one that did.  No support, no bug fixes, no help, no advice,
> don't-call-us-we'll-call-you... but they did ship sources.  Funny, their
> system became awfully popular for some reason...

Funny thing is, it became a lot more popular, at least by the count
of the number of systems sold outside AT&T, after the V7 license came
out; most of those systems didn't come with source, because there was
an order-of-magnitude difference between the price of the source
license and the binary license.

Furthermore, there are a lot of customers out there who probably
wouldn't make use of the source if they had it.  I don't think it's
current popularity is due to the availability of sources to all
customers, since most of them don't have it; it may be due, in part,
to the availability of source to the select few who *could* make use
of it, but that's a different matter.

I would be very surprised if, for example, most organizations doing
desktop publishing would prefer to get the source to the OS and the
desktop publishing software running on their workstation and getting
"no support, no bug fixes, no help, no advice" from their vendors, to
getting what level of support their vendor offers but no source.
There are certainly customers who, because they have a sufficiently
expert staff (and are, perhaps, having trouble with vendor support),
*would* be better off under those circumstances.  There are certainly
others who would still not be better off.

It's a question of tradeoffs.  I have seen no evidence that the
tradeoff made by the original UNIX community is appropriate for all
potential customers of systems with UNIX as their OS.  The fact that
it happened to be the correct tradeoff for a technically
sophisticated group of users with specialized needs that vendors
often have difficulty responding to in no way indicates that it would
be the correct tradeoff for an organization with few or no
programmers in it.

> Funny, I seem to recall IBM using an explanation along those lines when asked
> why they didn't supply sources in a high-level language.  "Well, we've got
> all this great stuff, but we can't be bothered letting the peons, oops I
> mean the customers, have it."  Is this really the model Sun wants to emulate?

How the hell should I know?  I don't make Sun policy on this matter;
I don't make Sun policy on *most* matters.  I resent the fact that
Brian somehow seemed to consider a complaint about Sun's policy on
this to somehow be an appropriate response to my comment.

The point *I* was making was that UNIX source is not something that
every UNIX user has available; at this point, it may be that *most*
UNIX users don't have it.  Some consider this a Bad Thing; I don't,
because I'm not particularly interested in having only those people
who could make use of this source and are willing to fork out the
dough to AT&T and other vendors for it.  And no, that's not just for
purely pecuniary reasons; I think UNIX has a lot to offer people who
don't fit into that category, and I like the idea of getting it out
to those customers.  I would be overjoyed if the number of UNIX sites
where the C compiler is *never* used was an appreciable fraction of
the number of sites where it is used.

The claim that you don't have documentation if you don't have it in a
machine-readable form is, when taken as a general claim, absurd.
There are users who could use, or who need, the machine-readable
documentation.  There are others who have little use for it; they
don't have the time to spend learning how to produce that
documentation, modifying it, or printing it.

The original UNIX releases were done in a fashion that worked out
very well for its original constituency.  I am well aware of that,
having worked with it since 1977 or so.  However, the original
constituency no longer represents all of the UNIX community; it may
not even represent a majority of the community.  The response by
vendors to this change has caused some problems for the original
constituency.  This is unfortunate, and it may be that some of the
problems are due to inappropriate responses to these changes, but I
don't think that all of it is.

As for documentation tools:  there *would* be some effort involved in
supplying those tools.  I have no idea whether the cost of supplying
those tools exceeds the benefits of doing so.  I assume that the
people directly involved in this have reason to believe that the
costs currently outweigh the benefits.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

lyndon@ncc.UUCP (Lyndon Nerenberg) (07/21/87)

In article <23262@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:
> 
> Big deal.  You get documentation, you just don't get the "troff"
> source to it.  How many vendors *do* supply that stuff?  (Besides,
> getting it wouldn't do you any good; we use our own macro packages
> and other tools, and it would be too much trouble to supply and
> support them.)

That's a bit of a cheap shot! From what I've seen, Sun tends to
supply alot of "nice" things other vendors leave out. Why scrimp
on /usr/doc? If you use your own macro packages to produce the doc
then they (the macros) already exist, so there is no (non-politcol)
reason that they can't be shipped with the troff source. Nobody
says you have to support the macro package... Even without the "tools",
it's nice to be able to troff a basic copy of things to use when
you're working at home on the weekend (yes photocopiers exist,
but then again, why do people buy laser printers?)

--lyndon

-- 
Ollie for president: the tradition continues.

steve@nuchat.UUCP (Steve Nuchia) (08/01/87)

In article <23369@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:
> The claim that you don't have documentation if you don't have it in a
> machine-readable form is, when taken as a general claim, absurd.

First, I applaud your stated desire to get unix out to the non-tech types.

that said,  F L A M E   O N  !!!!

If you don't have documentation in machine readable form, and you
have more than one user, all but one of your users don't have documentation.

If you don't have documentation in machine readable form, and you
have a modem, anyone who uses the modem doesn't have documentation.

FLAME FLAME flame smolder flame spark smolder glow smoke poof.

I made this point to microport after the Nth time a user of my machine
complained about nothing being documented.  They said "but we give
you such nice printed manuals", which is true.  And which does my
users exactly no good, since they all call in on the modem.  So I
snagged the man pages from my opus, which are close but by no means
exactly the same as the microport.

A company I worked for used a plexus as the development station at
the time and having the source to all the documentation was nice.
It allowed us to look things up without leaving our desks.  It allowed
us to learn how to write man pages and more elaborate ?roff documents
by example.  It made us feel that the manufacturer cared, or at least
wasn't actively hostile.

You're often on the right side of an argument Guy, but you're dead
wrong on this one.

	Steve Nuchia
	(713) 334 6720  voice   (713) 334 1204  2400N81  login "trouble"
	{sun!housun,{soma,academ}!uhnix1}!nuchat!steve