[comp.lang.c] Separate data and function address spaces

chip@ateng.com (Chip Salzenberg) (11/10/89)

According to chris@mimsy.umd.edu (Chris Torek), about the PDP-11:
>One of the (somewhat less frequently used) capabilities was what was
>called `split I&D', in which each machine address had to be qualified
>with `instruction' or `data' before it was unique.  There were two
>separate locations 0400, one containing code, and one containing data.

Recent processors also have this "feature".  When the '286 and '386
processors are in protected mode -- i.e. when they're running Unix
-- they do not permit program execution from any data segment. This
restriction can be bypassed only by the subterfuge of pointing two
segment descriptors at the same piece of memory. 

I happen to think it's a feature.  So sue me.

>	char *p;
>	int fn();
>	p = (char *)fn;
>
>is non-portable.

Definitely.  It's too bad that Kyoto Common Lisp includes code like this.
KCL won't run on '386s.
-- 
You may redistribute this article only to those who may freely do likewise.
Chip Salzenberg at A T Engineering;  <chip@ateng.com> or <uunet!ateng!chip>

poser@csli.Stanford.EDU (Bill Poser) (11/10/89)

	Is there a reason why someone would write code like:

	char *p;
	int fn();
	p = (char *)fn;


whose non-portability has been under discussion? In this case you can
just declare p to be a function pointer to start with. The only
thing I can think of is that if you malloc some space for function pointers
you need to cast (char *) (the result of malloc) to the apropriate function
pointer, and cast this back to (char *) if you want to free() it.

cpcahil@virtech.uucp (Conor P. Cahill) (11/10/89)

In article <2559F3AE.9260@ateng.com>, chip@ateng.com (Chip Salzenberg) writes:
> Recent processors also have this "feature".  When the '286 and '386
> processors are in protected mode -- i.e. when they're running Unix
> -- they do not permit program execution from any data segment. This
> restriction can be bypassed only by the subterfuge of pointing two
> segment descriptors at the same piece of memory. 

I don't know what unix you are using, but the System V/386 Unixs use the small
model for compiled programs which place the data and text portion into the 
same segment.  I have executed out of data space on these systems.  I have
even executed out of a shared memory segment.

> >	char *p;
> >	int fn();
> >	p = (char *)fn;

while this is non-portable, it can be done on the unixs I spoke about above. 
Try the following on your 386 system:

#include	<stdio.h>

main()
 {
	int		   a();
	int		   b();
	int		   errno;
	int		(* func )();
	void		 * malloc();
	char		 * shmaddr;
	char		 * test;

	if( (shmaddr=(char *)malloc(512)) == 0 )
	{
		printf("malloc failed, errno = %d\n", errno);
		exit(10);
	}
	cpy(shmaddr,a,b);

	func = (int (*)()) shmaddr;

	test = "If the word 'shared' appears here: ......  it works.";

	(* func)(test);

	printf("%s\n",test);

	exit(0);
}	

cpy(tgt,src,srcend)
 char * tgt;
 char * src;
 char * srcend;
 {
  while ( src != srcend )
    *tgt++ = *src++;
 }

a( s )
 char *s;
 {
  s[35] = 'S'; s[36] = 'H'; s[37] = 'A'; s[38] = 'R'; s[39] = 'E'; s[40] = 'D'; 
  return;
 }

b( s )
 char *s;
 {
  s[35] = 'N'; s[36] = 'O'; s[37] = 'R'; s[38] = 'M'; s[39] = 'A'; s[40] = 'L'; 
  return;
 }

-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+

cpcahil@virtech.uucp (Conor P. Cahill) (11/10/89)

In article <10984@csli.Stanford.EDU>, poser@csli.Stanford.EDU (Bill Poser) writes:
> 	Is there a reason why someone would write code like:
> 
> 	char *p;
> 	int fn();
> 	p = (char *)fn;
> 
> whose non-portability has been under discussion?

Yes, at least that kind of operation (where data and function pointers are 
converted an used).  I worked on a project that used shared memory to
implement shared libraries, so functions had to be in data space and the 
pointers to those functions were based upon the shmaddr returned by 
shmat().

This was definately not portable to every machine, but was usable on many 
system V.2 systems with just a bit of tweaking (mostly in the initialization
code and loader scripts).  Here the advantage of the non-portable code 
outweighed the non-portability, besides it was coded so that it could
be turn on/off at compile time, so if the system did not support it, we just
turned it off.

-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+

palowoda@fiver.UUCP (Bob Palowoda) (11/12/89)

From article <1989Nov10.123033.2494@virtech.uucp>, by cpcahil@virtech.uucp (Conor P. Cahill):
> In article <2559F3AE.9260@ateng.com>, chip@ateng.com (Chip Salzenberg) writes:
[some stuff deleted]
> I don't know what unix you are using, but the System V/386 Unixs use the small
> model for compiled programs which place the data and text portion into the 
> same segment.  I have executed out of data space on these systems.  I have
> even executed out of a shared memory segment.

  I'm curious, where did you find out that the System V/386 Unixs use the
small model?  I look through my manuals and cannot find the refernce to
"models".

---Bob

-- 
Bob Palowoda  pacbell!indetech!palowoda    *Home of Fiver BBS*  login: bbs
Home {sun|daisy}!ys2!fiver!palowoda         (415)-623-8809 1200/2400
Work {sun|pyramid|decwrl}!megatest!palowoda (415)-623-8806 2400/9600/19200 TB
Voice: (415)-623-7495                        Public access UNIX XBBS   

cpcahil@virtech.uucp (Conor P. Cahill) (11/12/89)

In article <930@fiver.UUCP>, palowoda@fiver.UUCP (Bob Palowoda) writes:
> From article <1989Nov10.123033.2494@virtech.uucp>, by cpcahil@virtech.uucp (Conor P. Cahill):
> > I don't know what unix you are using, but the System V/386 Unixs use the small
> > model for compiled programs which place the data and text portion into the 
> > same segment.  I have executed out of data space on these systems.  I have
> > even executed out of a shared memory segment.
> 
>   I'm curious, where did you find out that the System V/386 Unixs use the
> small model?  I look through my manuals and cannot find the refernce to
> "models".

I believe it was in the original System V/386 Release 3.0 release notes, but
I no longer have them around and can't check them.  However a check of the
assembly language generated by the C compiler shows that it does no use
any multi-segment function calls and/or data accesses.







-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+

darcy@bbm.UUCP (D'Arcy Cain) (11/15/89)

In article <1989Nov12.152753.9282@virtech.uucp> Conor P. Cahill writes:
>I believe it was in the original System V/386 Release 3.0 release notes, but
>I no longer have them around and can't check them.  However a check of the
>assembly language generated by the C compiler shows that it does no use
>any multi-segment function calls and/or data accesses.
>
The 386 still uses 32 bit addresses.  Using virtual memory features and
paging, the chip can access up to 64 Tb (64 trillion bytes).  Of course
each "segment" is limited to ONLY 4 gigabytes.  Once an application needs
more than this I suppose that the concept of models might make some sense
but I think technology may have moved on by then (and Intel's head office
will have moved to Andromeda).  In the meantime I think that our compilers
that treat it as a flat address space will work just fine.

D'Arcy J.M. Cain
darcy@(druid,bbm)

chip@ateng.com (Chip Salzenberg) (11/15/89)

According to cpcahil@virtech.uucp (Conor P. Cahill):
>According to chip@ateng.com (Chip Salzenberg):
>> The '286 and '386 processors, in protected mode, do not permit
>> program execution from any data segment. This restriction can be
>> bypassed only by the subterfuge of pointing two segment descriptors
>> at the same piece of memory.
>
>I don't know what unix you are using...

Conor then proceeds to describe how "real" SysV for the '386 actually does
perform the two-descriptors-pointing-at-the-same-memory trick.

Unlike SysV, SCO Xenix/286 and Xenix/386 versions 2.2 and 2.3 create
disjoint code and data segments.  The exceptions are called "impure"
binaries.  I've never seen impure binaries except for the '286/186/8086,
and such 16-bit binaries limit code and data to a total of 64K.

Incidentally, Xenix/386 2.3 can execute COFF binaries without conversion.
Perhaps Xenix would give them executable data...
-- 
You may redistribute this article only to those who may freely do likewise.
Chip Salzenberg at A T Engineering;  <chip@ateng.com> or <uunet!ateng!chip>

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (11/15/89)

In article <930@fiver.UUCP>, palowoda@fiver.UUCP (Bob Palowoda) writes:

|  > I don't know what unix you are using, but the System V/386 Unixs use the small
|  > model for compiled programs which place the data and text portion into the 
|  > same segment.  I have executed out of data space on these systems.  I have
|  > even executed out of a shared memory segment.
|  
|    I'm curious, where did you find out that the System V/386 Unixs use the
|  small model?  I look through my manuals and cannot find the refernce to
|  "models".

  Please don't confuse this already confusing issue. In the small model
the text and data do not share a segment, that is the "tiny" model.
Here's how models work for Intel systems (and others I've seen).

tiny:	the text, data, and stack segments are all the same. All
	pointers hold only an offset into the default segment.

small:	text is in one segment, data and stack are in another.
	All pointers hold only an offset into the default
	segment.

compact: The text is in a single segment. Pointers to procedures
	hold only the offset into the default segment. Data and
	stack are in separate segments. Data pointers hold both segment
	and offset information (and are therefore usually larger than
	text pointers).

medium: text is in multiple segments. Pointers to procedures
	hold both segment and offset information. Pointers to data hold
	only offset information relative to the default segment.

large:	both text and data pointers hold segment and offset
	information.

  In addition there is the software "huge" model, in which a
single array may be larger than the size of a single segment.
For 80286 and Z8000 this is 64k and makes a difference, while
for machines with 4GB segment size the need is much smaller.
Again the hardware implication is that all pointers are segment
and offset.

  The implication of using models beyond small is that the
process will run somewhat slower, depending on the ability of
the CPU to process the segment and offset information. Since
most programs do more data access than procedure calling, the
performance penalty for multisegment data is usually greater.

  On many UNIX systems the choice between small and tiny model
is made by using the -i option (selects small, default tiny).
Tiny model does not allow sharing of the text segment between
multiple processes, at least on 286, 386 and Z8000 CPUs.

  If in the effort to simplify this I have left something out,
apologies in advance.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

peter@ficc.uu.net (Peter da Silva) (11/16/89)

On various compilers, I've seen the following (I may have mixed up compact
and medium):

64K code+data				compact, tiny, com, small, impure
64K code + 64K data			small, split, pure
64K code + segmented data		large data, compact
64K code + unsegmented data		huge data
segmented code + 64K data		large code, medium
segmented code + segmented data		large
segmented code + unsegmented data	huge, large

Any of these with >64K data may have 16- or 32- bit ints. 

And people wonder why I dislike segments.
-- 
`-_-' Peter da Silva <peter@ficc.uu.net> <peter@sugar.hackercorp.com>.
 'U`  --------------  +1 713 274 5180.
"*Real* wizards don't whine about how they paid their dues"
	-- Quentin Johnson quent@atanasoff.cs.iastate.edu

palowoda@fiver.UUCP (Bob Palowoda) (11/16/89)

From article <25604050.26537@ateng.com>, by chip@ateng.com (Chip Salzenberg):
> According to cpcahil@virtech.uucp (Conor P. Cahill):
>>According to chip@ateng.com (Chip Salzenberg):
[some stuff deleted]

> Unlike SysV, SCO Xenix/286 and Xenix/386 versions 2.2 and 2.3 create
> disjoint code and data segments.  The exceptions are called "impure"
> binaries.  I've never seen impure binaries except for the '286/186/8086,
> and such 16-bit binaries limit code and data to a total of 64K.
 
   Does this mean the the 386 version (a 32bit version of Xenix) has 
limitations on huge array sizes?  I was under the impression that the
386 version of Xenix C compilier was a non-segmented 32bit compilier 
in all respects.  Other than the 32bit int's pointers etc. What are the
other features that are different than the 286 version of the Xenix 
compilier?

---Bob

-- 
Bob Palowoda  pacbell!indetech!palowoda    *Home of Fiver BBS*  login: bbs
Home {sun|daisy}!ys2!fiver!palowoda         (415)-623-8809 1200/2400
Work {sun|pyramid|decwrl}!megatest!palowoda (415)-623-8806 2400/9600/19200 TB
Voice: (415)-623-7495                        Public access UNIX XBBS   

chip@ateng.com (Chip Salzenberg) (11/18/89)

According to palowoda@fiver.UUCP (Bob Palowoda):
>According to chip@ateng.com (Chip Salzenberg):
>> Unlike SysV, SCO Xenix/286 and Xenix/386 versions 2.2 and 2.3 create
>> disjoint code and data segments.  The exceptions are called "impure"
>> binaries.  I've never seen impure binaries except for the '286/186/8086,
>> and such 16-bit binaries limit code and data to a total of 64K.
> 
>   Does this mean the the 386 version (a 32bit version of Xenix) has 
>limitations on huge array sizes?  I was under the impression that the
>386 version of Xenix C compilier was a non-segmented 32bit compilier 
>in all respects.

Not quite.  As long as a program runs on a '386 it is "segmented", since the
'386 uses segments for all memory references.  However, since small model on
the '386 (one code segment, one data segment) provides 4G of code and 4G of
data, no one bothers with the more complicated models.

To state plainly what I described in convoluted way above:

	'286 tiny model         code+data 64K, executable data
	'286 small model        code 64K, data 64K, no executable data
	'386 tiny model         code+data 4G, executable data
	'386 small model        code 4G, data 4G, no executable data

It seems that SysV/386 uses tiny model, whereas Xenix/386 uses small model.
-- 
You may redistribute this article only to those who may freely do likewise.
Chip Salzenberg at A T Engineering;  <chip@ateng.com> or <uunet!ateng!chip>
    "Did I ever tell you the Jim Gladding story about the binoculars?"

tony@oha.UUCP (Tony Olekshy) (11/19/89)

In message <930@fiver.UUCP>, palowoda@fiver.UUCP (Bob Palowoda) writes:
>
> I'm curious, where did you find out that the System V/386 Unixs use the
> small model?  I look through my manuals and cannot find the refernce to
> "models".

From man cc on a 386 Xenix system...

    Memory Models: cc can create programs for four different memory models:
    small, middle, large, and huge.  In addition, small model programs can be
    pure or impure.  On the 8086 and 80286 processors, these various
    segmentation models allow programs with code or data larger than 64K
    bytes.  Since the 80386 can address segments larger than 64K bytes, the
    middle, large and huge models are not supported on the 80386.

--
Yours, etc., Tony Olekshy (...!alberta!oha!tony or tony@oha.UUCP).