[comp.sys.atari.st] null fill eliminated

leo@philmds.UUCP (Leo de Wit) (06/01/88)

Remember the discussion about the slow null filling routine TOS uses after 
loading a program? Here's a nice small, fast and, what you hackers will like
most, dirty piece of code I made yesterday evening. It just eliminates the
null fill (there could be a 1/(vbls/sec) delay %-) and is guaranteed NOT to 
work on all ROM versions 8-).
The main problem will be the addresses (start, end) of the fill-routine in ROM.
If you know them, you can substitute them (fillhigh and filllow). I don't have
other ROM's or disassemblies. Maybe someone else can pop up a more general
solution?
The central function is fastload, that checks if the PC is the critical region;
if so, D5 will be given a big value on return from the VBL interrupt, which
causes the fill routine to end.
The search for a zero VBL pointer starts at the queue address + 4, because I
suspect some program (maybe it's GEM) to write a vector at queue_address[0],
no matter what is already there. Now fastload.prg can be put into the AUTO
folder.

	module fastload

	section s.ccode

gemdos		equ 1
bios		equ 13
super		equ $20
ptermres	equ $31
conws		equ 9
pterm		equ $4c
vbl_queue	equ $456
nvbl		equ $454
bpaglen		equ $100
textlen		equ 12
datalen		equ 20
bsslen		equ 28
fillhigh	equ $fc859c
filllow		equ $fc858a
bigint		equ $7cccccc0

fastinit
	clr.l	-(sp)
	move.w	#super,-(sp)
	trap	#gemdos
	addq.l	#6,sp
	move.l	d0,-(sp)	* Save ssp on stack
	move.w	nvbl,d0
	movea.l	vbl_queue,a0
	addq.l	#4,a0
	subq.l	#1,d0
	bmi.s	tstque2
	bra.s	tstque1
tstque0
	tst.l	(a0)+
tstque1
	dbeq	d0,tstque0
	beq.s	tstok
tstque2
	pea	noque(pc)
	move.w	#conws,-(sp)
	trap	#gemdos
	addq.l	#6,sp
	move.w	#1,-(sp)
	move.w	#pterm,-(sp)
	trap	#gemdos		* Ends here
tstok
	lea	fastload(pc),a1
	move.l	a1,-(a0)
	move.w	#super,-(sp)
	trap	#gemdos		* Restore ssp that is on stack
	addq.l	#6,sp
	move.l	4(sp),a0	* Basepage start
	move.l	#bpaglen,d0	* Basepage length
	add.l	textlen(a0),d0	* + text length
	add.l	datalen(a0),d0	* + data length
	add.l	bsslen(a0),d0	* + bss length
	clr.w	-(sp)		* Return value: 0 for success
	move.l	d0,-(sp)	* # bytes to keep
	move.w	#ptermres,-(sp)	* Keep process
	trap	#gemdos		* Stops here

fastload
	movea.l	74(sp),a0	* PC
	cmpa.l	#fillhigh,a0
	bhi.s	fastdone
	cmpa.l	#filllow,a0
	blt.s	fastdone
	move.l	#bigint,32(sp)	* Maximize D5 on stack
fastdone
	rts

	section s.data

noque	dc.b	'No vbl entry available!',13,10,0

	end

I hope no little beasties have crept in while I was typing it over (have no
modem connection yet). Have fun!

	Leo.

apratt@atari.UUCP (Allan Pratt) (06/02/88)

From article <490@philmds.UUCP>, by leo@philmds.UUCP (Leo de Wit):
> null fill (there could be a 1/(vbls/sec) delay %-) and is guaranteed NOT to 
> work on all ROM versions 8-).
> The main problem will be the addresses (start, end) of the fill-routine
> in ROM. 


This is appalling! Please get over the idea that you can fool with stuff
in ROM.  For starters, some programs EXPECT that the whole heap (not
just the declared BSS) is zeroed at startup...  Microsoft Write is one. 
Maybe the disk cache you use, or the hard disk compaction utility, or
something equally deadly expects this -- and you'll learn the hard way
not to fool with this kind of thing.

I know the clearing takes a long time on 11/20 ROMs.  It's much faster
in Mega and future ROMs.  But it's still there, because it is a "settled
expectation" among developers.  ("Settled expectations" are things that
people count on despite the fact that nobody promised they'd stay true.)

I urge people not to use this trick or any other which changes the
environment that programs execute in, or depends so heavily on actual
addresses in ROM. 

============================================
Opinions expressed above do not necessarily	-- Allan Pratt, Atari Corp.
reflect those of Atari Corp. or anyone else.	  ...ames!atari!apratt

leo@philmds.UUCP (Leo de Wit) (06/03/88)

Here are some small corrections for the fast loader I put on the net this week.
1) There is a header for the module now. It says:

* Even when loading from ramdisk or harddisk the ROM program null fills all
* uninitialized data, heap, stack (often the major part of your RAM).
* This null filler makes loading programs faster. Its null filling is 7 times
* as fast as the ROM's, using the quick movem.l instruction. Besides it only
* clears the BSS space.
* At least the fillhigh and filllow addresses have to be adapted to suit your
* ROM version.

2) The bigint definition should read:

bigint		equ $7ffffff0

3) I abandoned the idea of no null filling at all. Some programs generated
bus errors when started with this VBL routine active, so I've looked things 
up in K & R. In paragraph 4.9 (Initialization):
...In the absence of explicit initialization, external and static variables 
are guaranteed to be initialized to zero; ...
So the routine now clears the BSS space; the programs that generated errors
now work OK. The null filling is performed by null filling chunks of 128
bytes using movem.l instructions; that seems to be the fastest way, especially
if you move many registers at a time. The 'modulo 128' part is cleared first,
at the top of the BSS. Here it is (I have left the initialization routine out):

fastload
	movea.l	74(sp),a0	* PC
	cmpa.l	#fillhigh,a0
	bhi.s	fastdone
	cmpa.l	#filllow,a0
	blt.s	fastdone
	lea.l	32(sp),a0	* Address D5 on stack
	cmp.l	#bigint,(a0)
	bge.s	fastdone	* Already filled
	move.l	#bigint,(a0)	* Maximize D5 on stack
	move.l	68(sp),a6	* Value of A6 on stack to A6
	move.l	-4(a6),a4	* Start of block to fill
	move.l	-58(a6),d0	* # bytes to fill: BSS size
	move.l	d0,d1
	and.w	#$7f,d1		* d1 = d0 & 0x7f
	moveq.l	#0,d2
	lea.l	(a4,d0.l),a5	* End (one past)
	bra.s	fastl1
fastl0
	move.b	d2,-(a5)	* Clear top d1 bytes
fastl1
	dbra	d1,fastl0
	moveq.l	#0,d0		* Nullify d0-d7/a0-a3
	move.l	d0,d1
	move.l	d0,d2
	move.l	d0,d3
	move.l	d0,d4
	move.l	d0,d5
	move.l	d0,d6
	move.l	d0,d7
	move.l	d0,a0
	move.l	d0,a1
	move.l	d0,a2
	move.l	d0,a3
	bra.s	fastl3		* a5 - a4 is now a multiple of 128
fastl2
	movem.l	do-d7/a0-a3,-(a5)  * Clear 4 * (12 + 12 + 8) = 128 bytes / turn
	movem.l	do-d7/a0-a3,-(a5)
	movem.l	do-d7,-(a5)
fastl3
	cmpa.l	a4,a5
	bgt.s	fastl2		* Until start address A4 reached
fastdone
	rts

	section s.data

noque	dc.b	'No vbl entry available!',13,10,0

	end

fred@pnet01.cts.com (Fred Brooks) (06/04/88)

,
apratt@atari.UUCP (Allan Pratt) writes:
>From article <490@philmds.UUCP>, by leo@philmds.UUCP (Leo de Wit):
>> null fill (there could be a 1/(vbls/sec) delay %-) and is guaranteed NOT to 
>> work on all ROM versions 8-).
>> The main problem will be the addresses (start, end) of the fill-routine
>> in ROM. 
>
>
>This is appalling! Please get over the idea that you can fool with stuff
>in ROM.  For starters, some programs EXPECT that the whole heap (not
>just the declared BSS) is zeroed at startup...  Microsoft Write is one. 
>Maybe the disk cache you use, or the hard disk compaction utility, or
>something equally deadly expects this -- and you'll learn the hard way
>not to fool with this kind of thing.
>
>I know the clearing takes a long time on 11/20 ROMs.  It's much faster
>in Mega and future ROMs.  But it's still there, because it is a "settled
>expectation" among developers.  ("Settled expectations" are things that
>people count on despite the fact that nobody promised they'd stay true.)
>
>I urge people not to use this trick or any other which changes the
>environment that programs execute in, or depends so heavily on actual
>addresses in ROM. 
>
>============================================
>Opinions expressed above do not necessarily	-- Allan Pratt, Atari Corp.
>reflect those of Atari Corp. or anyone else.	  ...ames!atari!apratt

Why not make it an official then? "Settled expectations" are a hell of a way
to make programming specs. All of the programming languages say don't count
on space in the heap being zero or variables being set to any value before
being init'ed.

UUCP: {cbosgd hplabs!hp-sdd sdcsvax nosc}!crash!pnet01!fred
ARPA: crash!pnet01!fred@nosc.mil
INET: fred@pnet01.cts.com

pes@bath63.UUCP (06/08/88)

In article <1067@atari.UUCP> apratt@atari.UUCP (Allan Pratt) writes:
>This is appalling! Please get over the idea that you can fool with stuff
>in ROM.  For starters, some programs EXPECT that the whole heap (not
>just the declared BSS) is zeroed at startup...  Microsoft Write is one. 

I may be coming close to starting a theological argument here.  On the other
hand, I spent ages, some years ago, in helping users convert IBM/360
applications (where storage was zeroed when you got it) to run on IBM/370
systems (where it wasn't -- or was it the other way round?).  Anyway,
I was always taught, and since that experience have believed, that you
should **NEVER** write a program which depends upon ANYTHING being ANYWHERE
in storage, unless it put it there itself -- with the obvious exception, of
course, of DEFINED system locations.

So, what I find appalling is that a company like Microsoft (who I generally
think highly of) would hire programmers who indulge in writing code that
relies on this sort of undocumented effect.  If Microsoft Write (or anything
else) WANTS an alloc'ed block of storage to contain zeroes, it oughta D**N
WELL put them there its-own-self.  Period.

leo@philmds.UUCP (Leo de Wit) (06/09/88)

I just couldn't resist...

In article <1067@atari.UUCP> apratt@atari.UUCP (Allan Pratt) writes:
>From article <490@philmds.UUCP>, by leo@philmds.UUCP (Leo de Wit):
>> null fill (there could be a 1/(vbls/sec) delay %-) and is guaranteed NOT to 
>> work on all ROM versions 8-).
>> The main problem will be the addresses (start, end) of the fill-routine
>> in ROM. 
>
>
>This is appalling! Please get over the idea that you can fool with stuff
>in ROM.  For starters, some programs EXPECT that the whole heap (not
>just the declared BSS) is zeroed at startup...  Microsoft Write is one. 

Then I think those programs are wrong, in sofar that they depend too
strongly on the environments they work in. A portable C program should never
expect the heap to contain all zeros, although many O.S.'s may do this:
the clearing of pages that for instance Unix applies when a process asks
for more space (more heap by calling sbrk, more stack by pushing data on it),
has nothing to do with initializing, but with protection; that process may
not peek at what another process left there. It could be just as well be set
to all 1's, or 0xe5 (but some Unix boxes have probably fast clear instructions).
When I come to think of it, it is even not very trivial to write a program that
uses that 'zero' feature; let's see, assume we have

#define DATASIZE 5000
then:

1) first try:

char egdata[DATASIZE];
static char sgdata[DATASIZE];

but these are not good examples; both are BSS so my fastload also clears them
(according to K&R).

2) second try:

function()
{
	char adata[DATASIZE];
	static char sldata[DATASIZE];

	/* code here */
}

are also not good examples; sldata is BSS so my fastload also clears it; 
adata is on the stack so assuming it contains zeroes is bluntly wrong.

3) third try:

function()
{
	char *pdata;

	pdata = malloc(DATASIZE);
	/* code here */
}

Wrong again; if the malloc() was preceded by free() you're likely to get
non-zero space; use calloc(), that's a portable way to get cleared space.
The same applies when you use the GEMDOS call Malloc(); it can just as
well contain non-zero space. And so for sbrk() and brk().

Moral: If you know beforehand the size you can use BSS declarations to
get a zero-initialized data element; note that they take no data space in the
program file (or use the initialized data segment by explicitly initializing it
to zero). If you don't, don't rely on some magic undocumented builtin
trick of some O.S. (you can put a T in front) but do it the standard way,
using calloc() for heap and _bzero() or setmem() or repmem() or 
zero_it_yourself_mem() for general space clearing; more likely to be portable
and big chance it's faster than the ROM's routines; this goes not for all 
parts of it, but some are really badly if at all optimized (but that's another 
chapter).

>Maybe the disk cache you use, or the hard disk compaction utility, or
>something equally deadly expects this -- and you'll learn the hard way
>not to fool with this kind of thing.

Maybe the new ROMs have their disk IO reorganized. All disk cache owners
will be surprised then, but not happily 8-). They will learn not to fool
with machines that are so poorly documented (please no flames, this is
merely a plea for better and less terse manuals in the future). I
understand it's easier for Atari not to make any statements regarding
the behaviour of their machines, in order to fix certain problems easier;
but they should understand that in order to get a machine, or perhaps better,
an O.S. accepted, they have to specify a certain standard, whatever that might
be, so that developers can rely on it, have fait in it (this is not a
follow-up to 'What Atari should do' 8-).

>I know the clearing takes a long time on 11/20 ROMs.  It's much faster
>in Mega and future ROMs.  But it's still there, because it is a "settled
>expectation" among developers.  ("Settled expectations" are things that
>people count on despite the fact that nobody promised they'd stay true.)

I don't need to count on it and I wouldn't settle for it %-). Besides I think
it is a pure waste: 
On my 520 ST+ with harddisk many of the tools I use take more time to
clear their space than to actually load. My programs and shell scripts run
much faster after this fix and most programs don't use even 10% of the so 
slowly cleared space. The editor and compiler that use BSS space amongst 
others are perfectly happy.

>I urge people not to use this trick or any other which changes the
>environment that programs execute in, or depends so heavily on actual
>addresses in ROM. 

Ever taken a look at AHDI.PRG, the hard disk driver that comes with the SH204?
This one also uses fixed addresses to check for ROM version etc. I think
there's no other way if you want to fix things, or you end up rewriting
complete parts of TOS/GEMDOS, and I wouldn't like that. My utility could
just as well be extended to settle for more ROM versions, and I would help
anyone that wants to find things out for his version. You can also easily
modify it to give you all cleared space, if you insist on that; it will still
be more than 7 times as fast as what my ROM does.

>============================================
>Opinions expressed above do not necessarily	-- Allan Pratt, Atari Corp.
>reflect those of Atari Corp. or anyone else.	  ...ames!atari!apratt

Now you're making sense! And I hope my opinions CLEARed things a bit...
only the BSS, that will do 8-).

	Leo.

br@laura.UUCP (Bodo Rueskamp) (06/14/88)

In article <2657@bath63.ux63.bath.ac.uk> pes@ux63.bath.ac.uk (Smee) writes:
>I was always taught, that you
>should **NEVER** write a program which depends upon ANYTHING being ANYWHERE
>in storage

K&R say, that global static storage is zeroed before the main program is
executed. This null fill has to be done by the OS or the startup module.

Because the null fill is done by GEMDOS, no startup module touches the
uninitialized static storage. If the null fill is eliminated,
most C programs will bomb.

--
Bodo Rueskamp
br@unido.uucp

rosenkra@Alliant.COM (Bill Rosenkranz) (06/17/88)

---
i have my crt0.s (originally gemstart.s by a.pratt) do the zeroing. i also
put the gemdos/bios/xbios traps in there so i don't have to worry about
linking osbind.o (saving 9 precious characters for the already limited
cmdline length :^). what's the big problem here? why is this even an issue?
doesn't MWC/Laser/etc provide source for their startup object? if u ask
me, the big developers are either a) lazy, or b) dumb for not taking care
of business. i myself strive very hard to make sure my stuff runs on as
many perturbations as possible. (now i have to prepare for the deluge of
68881 boards... :^). not always easy, but at least the simple things are
taken into account.

-bill