[comp.sys.ibm.pc] Passing pointers in C to 8086 assembler

jimx@ihlpa.ATT.COM (Harris) (04/27/88)

I'm getting myself confused, and need some pointers (pun intended)
on passing pointers from a C function to an 8086 assembly
language procedure.

Let's assume that I pass a pointer to a string of unknown length,
and that the asm procedure then copies that string to a local variable:

C:
	char msg[80];
	foo( msg );

ASM, leaving out header stuff:

	local_msg	db	80 dup (0)
	_foo		proc	near
			push	bp
			mov	bp, sp
			push	di
			push	si

Here's where I get confused.  My pointer (16-bit offset, I believe)
is at [bp+4].  How do I address the second character of msg[]?
[bp+5] is one address above [bp+4], right?  Or is it the address
pointed to by bp+5?  If I want msg[4], how do I address it?

	ex: does [si+cx] = [bp+4+cx]??
			mov	si, [bp+4]
			mov	di, local_msg
			mov	cx, 4
			mov	al, [si+cx]
			mov	[di+cx], al

	or graphically:
                 ____                ____
	bp+4 -->|____|-->[bp+4]---->|____|
		|____|              |____|
		|____|              |____|
		|____|              |____|
      bp+4+4 -->|____|-->[bp+8]-->? |____|<--[bp+4][4] == [bp+4+4]???

I think we can agree that obviously [bp+8] is not the same thing
as [bp+4]+4.  But the book I have seems to imply that [bp+4][4] is
the same as [bp+4+4], which I have trouble accepting.  So what is
the solution?  

If you have a good reference for passing pointers to assembly language,
please let me know.  I have "Supercharging C with Assembly Language",
by Chesley and Waite.  I haven't found where (if) they deal with this,
so even a page reference would help.  The more I look at this, the
more confusing it seems to get!

---
As an alternative, how can I get a segment:offset pair for a variable
to put into the dos.h register structures?  If I could do this, it would
solve my problem.  
---

Thanks in advance, and you can email me or post, as you like.

			Jim Harris
			ihlpa!jimx

murphys@cod.NOSC.MIL (Steven P. Murphy) (04/27/88)

The following example should help

if any errors are found let me know (I'm trying to do the same thing)


Assume MSC 5.0 & MASM 5.0   small model

main()
{
    char msg[12] = "Hello World";     /* 11 char + \0 */
    sub(msg);			      /* C always passes arrays by address */
}


DOSSEG
.CODE
    PUBLIC _sub
_sub	PROC	NEAR
    push    bp
    mov     bp,sp
    push    si
    push    di

    mov     ax,DGROUP	    ; DGROUP is the compiler's name for data seg
    mov     ds,ax	    ; ds now holds data seg ( just a guarantee)
    mov     si,[bp+4]	    ; load offset to msg[0] into si
			    ; now ds:si is the full address of msg
    lodsb		    ; LOaD String Byte, pointed to by ds:si into al
			    ; lodsb also automaticly increments si to point
			    ; to next byte in msg

; first char is now in al do what you what then lodsb to get next char
;
; suggestion:
;		also pass char count and load it into cx then you can
;		use the loop instruction's and lodsb's auto increment feature
;		to go through the whole string


    pop     di
    pop     si
    pop     bp
    ret
_sub	endp
    end


------------------------------------------- 
 _ _ _				Clarke's law, 2nd varation:
' ) ) )             /
 / / / . . __  _   /_		    "Any sufficiently advanced technology
/ ' (_(_/_/ (_/_)_/ /_		    is indistinguishable from a rigged demo"
             /			
            '
------------------------

S. P. Murphy
Internet: murphys@cod.nosc.mil      UUCP: {ucbvax,hplabs}!sdcsvax!nosc!murphys

Howard_Reed_Johnson@cup.portal.com (04/28/88)

This is from ihnp4.uucp!ihlpa!jimx (Jim Harris):
> I'm getting myself confused, and need some pointers (pun intended)
> on passing pointers from a C function to an 8086 assembly
> language procedure.

This may be old hat to experienced 8086 / C programmers.

Judging from ihnp4.uucp!ihlpa!jimx's recent posting, he'd be content to
have his assembly procedure working under just the small memory model.
I could talk about segment registers and other memory models, but I'll
leave that for another occasion.  However, one needs to distinguish
between the code segment and data segment found in a typical small-model
.EXE program.  Just remember to put data in the data segment, not
the code segment.  Typical Microsoft segment info looks like this:

_TEXT	segment	byte public 'CODE'
	public	_foo
_foo	proc	near
;	...
	ret
_foo	endp
_TEXT	ends

_DATA	segment	byte public 'DATA'
local_msg	db	80 dup (0)
_DATA	ends

DGROUP	group	_DATA
CGROUP	group	_TEXT

	assume	cs:CGROUP, ds:DGROUP

Here's his ASM code, with modifications:

	_foo		proc	near
			push	bp
			mov	bp, sp
			push	di
			push	si

So far, so good.  Looks pretty standard.  Graphically, it looks like this:

	+---------------------------------------+
msg[80]	| parameter data array |  |  |  | . . .	| =AAAA
	+---------------------------------------+

	+---------------------------------------+
 bp+4->	| data parameter: address (offset) =AAAA| =BBBB
	+---------------------------------------+
 bp+2->	| return address (offset)		|
	+---------------------------------------+
 bp  ->	| saved previous value of bp register	| =CCCC
	+---------------------------------------+
	| saved previous value of di register	|
	+---------------------------------------+
 sp  ->	| saved previous value of si register	|
	+---------------------------------------+

> Here's where I get confused.  My pointer (16-bit offset, I believe)
> is at [bp+4].  How do I address the second character of msg[]?

	mov	bx, [bp+4]
	mov	al, [bx+1]	; 2nd char at offset 1 in 0-origin array

> [bp+5] is one address above [bp+4], right?  Or is it the address
> pointed to by bp+5?

	Jim, you're out in left field.

>		       If I want msg[4], how do I address it?

	mov	bx, [bp+4]
	mov	al, [bx+4]	; offset 4 in 0-origin array

At this point, it is important to distinguish between pointers and
data.  A pointer contains an address which in turn can be used to
"reference" data at various (different) locations.

C:
	char dat_var = 'a';
	char *ptr_var = &dat_var;
	register char al;

	al = *ptr_var;

ASM:
	dat_var	db	'z'
	ptr_var	dw	dat_var

	mov	bx, ptr_var
;	mov	bx, word ptr ptr_var	; implied in previous line
	mov	al, [bx]		; "reference" 'z' via a pointer
;	mov	al, byte ptr bx		; equivalent to previous line

Conversely, to generate a pointer for later use, one needs to
"de-reference" a variable to determine what it's address is:

C:
	ptr_var = &dat_var;
	al = *ptr_var;

ASM:
	mov	bx, offset dat_var
	mov	word ptr ptr_var, bx
	mov	al, [bx]
;	mov	al, dta_var	; at this point, equivalent to prev. line

When an ordinary variable such as dat_var has it's address stored
into a pointer variable and later used (referenced), the ordinary
variable is being affected through an "alias".  (Look for the word
"alias" in your compiler's tutorial on it's code optimizations).

	ex: does [si+cx] = [bp+4+cx]??

Data can be "referenced" via addresses stored in a limited set of registers,
as well as through ordinary variable references.  Acceptable register combo's
are:  [bx], [bx][si], [bx][di], [bp], [bp][si], and [bp][di].  Period.
Therefore, both [si+cx] and [bp+4+cx] are illegal.  Even if they were legal,
the two would usually reference different memory locations.

Going back to your code:
			mov	si, [bp+4]
;	oops		mov	di, local_msg
			mov	di, offset local_msg	; need to dereference
;	oops		mov	cx, 4
			mov	bx, 4	; let's try a different register
;	oops		mov	al, [si+cx]
;	oops		mov	[di+cx], al
			mov	al, [si+bx]
			mov	[di+bx], al
			...

>	or graphically:
>		 ____                ____
>	bp+4 -->|____|-->[bp+4]---->|____|
>		|____|              |____|
>		|____|              |____|
>		|____|              |____|
>     bp+4+4 -->|____|-->[bp+8]-->? |____|<--[bp+4][4] == [bp+4+4]???

"Near" pointers occupy 2 bytes, so there could be no more than 3
pointers between the stack locations bp+4 and bp+8:  bp+4, bp+6, bp+8.
It would be a faux pas (bug) to reference pointers at bp+5 or bp+7.

> I think we can agree that obviously [bp+8] is not the same thing
> as [bp+4]+4.  But the book I have seems to imply that [bp+4][4] is
> the same as [bp+4+4], which I have trouble accepting.  So what is
> the solution?  

Microsoft got wierd on us when they put together the syntax for ASM/MASM.
Any time you see brackets [] in an operand, there is only ONE level of
"reference" indirection.  [bx][si] does NOT mean we're doing 2-dimensioned
arrays.  Try these series of canonicalizations for clarification:

	[bp+8]			<==>		[ bp +8 ]
	[ bp +8 ]		<==>		ptr bp +8

	[bp+4]+4		<==>		[ bp +4 ] +4
	[ bp +4 ] +4		<==>		ptr bp +4 +4
	ptr bp +4 +4		<==>		ptr bp +8

	[bp+4][4]		<==>		[ bp +4 ] [ +4 ]
	[ bp +4 ] [ +4 ]	<==>		ptr bp +4 +4
	ptr bp +4 +4		<==>		ptr bp +8

	[bp+4+4]		<==>		[ bp +4 +4 ]
	[ bp +4 +4 ]		<==>		ptr bp +4 +4
	ptr bp +4 +4		<==>		ptr bp +8

	-6[bx][si]		<==>		-6 [ +bx ] [ +si ]
	-6 [ +bx ] [ +si ]	<==>		ptr -6 +bx +si

> If you have a good reference for passing pointers to assembly language,
> please let me know.  I have "Supercharging C with Assembly Language",
> by Chesley and Waite.  I haven't found where (if) they deal with this,
> so even a page reference would help.  The more I look at this, the
> more confusing it seems to get!

I'd consider this article a good start.  Another way to study this is to
generate an assembly listing from your compiler.  Microsoft C can do this
via "msc /Fc foo.c" or "cl /Fc /c foo.c".

I've found the following book to be useful for both novices and experts
(who are somewhat new to DOS programming):

	Campbell, Joe
		Crafting C tools for the IBM PC's
		Prentice-Hall, Englewood Cliffs, NJ  07632
		QA76.8.I2594C36 1986
		ISBN 0-13-188418-2


If you thought this was bad, wait 'till you deal with segmented addressing
headaches.  Better yet, try writing useful OS/2 device drivers with the
contortions of it's pain-in-the-neck "protected mode"-without-context-info!

Devin_E_Ben-Hur@cup.portal.com (04/28/88)

Howard_Johnson@portal writes:
> ...this may be old hat to experienced programers...
[ and yaks for > 100 lines about basic 8086 assembly stuff ]

Yup, old hat.  The original poster was obviously an assembler neophyte
and needed some help, but couldn't you have answered through mail
instead of making the whole net read long-winded explainations
they either allready know or don't care about?

mintz@hpindda.HP.COM (Ken Mintz) (04/30/88)

> Howard_Johnson@portal writes:
> > ...this may be old hat to experienced programers...
> > [ and yaks for > 100 lines about basic 8086 assembly stuff ]
>
> Yup, old hat.  The original poster was obviously an assembler neophyte
> and needed some help, but couldn't you have answered through mail
> instead of making the whole net read long-winded explainations
> they either allready know or don't care about?

  By the same token, yours is a one-on-one complaint that might have been
  handled best through email.  But I'm glad you posted it so that others
  might comment.

  I, for one, prefer people to post their (informative) answers rather than
  email them.  Often, I anticipate having the same question and can benefit
  from the answer.  I was not aware that this forum is only for expert
  questions (if you can define that), and I would not presume to know what
  others know or care about.  Also, sometimes there are good ways and better
  ways to do things.  Postings here might draw out the better answers.

  Certainly, it's a judgment call.  And you might be correct in this 
  particular case.  But I don't think it deserved the public lambasting which
  others might (mis)interpret as de rigueur in general.

Ken Mintz ("can you spare 2 cents?")