[comp.sys.ibm.pc.programmer] Material on .ASM subtleties sought

rar@unix.cis.pitt.edu (Richard A Rubin) (07/31/90)

In article <37512@shemp.CS.UCLA.EDU> wales@CS.UCLA.EDU (Rich Wales) writes:
>Of the above, I'm familiar only with "The Zen of Assembly Language".  I
>recommend this book highly.  It's especially good at helping one under-
>stand the design of the PC and the impact that design has on the way a
>program works.

Could you please post the author, publisher, etc. for this book?

Rich Rubin
University of Pittsburgh Dept. of Radiology

wales@valeria.cs.ucla.edu (Rich Wales) (07/31/90)

In article <37512@shemp.CS.UCLA.EDU> I wrote:

	Of the above, I'm familiar only with "The Zen of Assembly
	Language".  I recommend this book highly.  It's especially
	good at helping one understand the design of the PC and
	the impact that design has on the way a program works.

In article <26363@unix.cis.pitt.edu>
rar@unix.cis.pitt.edu (Richard A Rubin) replied:

	Could you please post the author, publisher, etc. for this
	book?

OK, here it is.

	Abrash, Michael.
	Zen of Assembly Language:  Volume I, Knowledge.
	Scott, Foresman & Co., 1990.
	ISBN 0-673-38602-3.
	List price:  US $29.95

--
-- Rich Wales <wales@CS.UCLA.EDU> // UCLA Computer Science Department
   3531 Boelter Hall // Los Angeles, CA 90024-1596 // +1 (213) 825-5683
   "Indeed!  Twenty-four is the gateway to heroic salvation."

dmt@pegasus.ATT.COM (Dave Tutelman) (08/02/90)

In article <1068@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes:
>I am a roughly intermediate C coder...
>I now seek references on 8086 assembly language that are more
>than just teaching of opcodes. 
	I agree with all of Rich Wales' well-written response.
	My response below is similar in substance, but maybe with
	slightly different emphasis.

>What are the main priciples in optimizing .ASM code?  (Down to the
>clock ticks level) 
	Assuming you NEED to optimize clock ticks (see below for that
	decision), you will need a manual for the CPU you're using,
	with a cycle count for each instruction.  However, there are
	a couple of things that make it more complicated than just
	adding up the cycles burned:
	   -	The various 80x8x chips have DIFFERENT instruction
		sets.  If you really need to optimize, you may
		need different code for the 8086, 80286, and 80386.
		(I certainly have examples of 8086 and 80286
		differences in optimal code for a routine.)
	   -	The different chips also have a different number of
		cycles for the same instruction.  Example of this
		effect: one of my fast assembly programs had to
		multiply an integer by a constant integer.  The
		80286 and 80386 do integer multiplication pretty fast,
		but the 8086 did it faster as a string of
		shift and add instructions.
	   -	Which brings us to...  These CPUs have a prefetch
		queue.  A string of very fast (low-cycle-count)
		instructions won't run as fast as a sum-of-cycles
		calculation would lead you to believe, since the
		instructions and data have to be fetched from memory.
	The whole thing is sort of an intuitive tradeoff.  One of the
	best rules I use for intuition is to optimize for the SLOWEST
	CPU I expect the program to run on, since the faster ones
	will be "pretty quick" anyway.  (E.g., if you can meet your
	performance requirement on a 4.7 MHz 8088, the same code is
	"fast enough" on a 8 MHz 286.)

>And the most important (IMHO) questions of all:
>WHEN is it practical to go to assembly rather than a more portable
>language such as C?  How can assembly be developed to make it
>re-usable and understandable by others?
	I agree that's the most important one.  (Or two, actually.)
	First the "when", then the "how":

WHEN:
   -	You need the speed.  This is one everyone admits, but it's
	too often abused.  I prefer to code the WHOLE program in C,
	then measure it to see where my performance problems (if any)
	are.  If there are performance problems in an area that ASM
	could help, I recode THAT portion.  A few words of experience
	here:
	   .	This approach yields far less assembly coding than
		the "hackers" who start off with assumptions about
		speed and revert to assembler immediately.
	   .	Most performance problems found this way yield more
		to algorithm improvement than to assembly code.
	   .	Assembly sections added this way are usually very
		small and localized.  If yours aren't, there is
		probably something wrong with your software architecture
		in the first place.  Revisit the decomposition of
		your program.

   -	You need the space.  This one is little-recognized, but
	important.  Some examples where this yields big gains
	(Freudian slip - I originally typed that as "bug gains"):
	   .	You need a small stand-alone program (.EXE or .COM).
		If you code in C, you'll get several kilobytes of
		essential library and startup code.  Assembler can
		cut this to a small fraction of that if the function
		to be performed is simple.  (E.g.- "Hello world" in C
		will probably result in a 5K-10K .EXE; in assembler
		you can probably code it in about 100 bytes.)
		(My favorite example is a little .COM file that
		turns the NumLock key off.  If I recall correctly,
		it's under 20 bytes; try doing that in C.)
	   .	You need a specialized "library" routine, that is
		much more general (and thus much bigger) in the
		standard library.  However, you can frequently gain
		much of the advantage by just recoding the routine
		in C.
	Of course, these gains only matter if program size is important.
	A few cases where it is include: (1) TSR programs that take
	up precious RAM all the time, and (2) squeezing a bunch of files
	onto the minimum number of floppies for distribution.

   -	There's no other way.  This is decreasingly a valid excuse.
	When I started PC hacking, you had to do a lot yourself that's
	now part of the major (MSC, Turbo C) commercial C products.
	For instance:
	   .	Interrupt handlers.
	   .	BIOS calls.
	   .	Efficient jump tables (e.g., for BIOS-like function
		dispatchers).
	   .	Interrupt-vector manipulation.
	I've done all these in assembler years ago.  I'd use the C
	facilities today.

   -	When dealing with hardware that has timing or register
	dependencies (generally special I/O).  My favorite example here
	is a speed optimization of an existing program, where the
	bottleneck was Screen I/O.  I decided that, for the slower
	processors, I'd have to write directly to display memory
	(a decision frequently made by application developers).  It
	was easy enough to code this in C, but I wound up with snow
	on the screen.  In order to "de-snow" the screen, the timing
	was so critical I had to write the code in assembler.
	(I posted a tutorial on the subject a couple of years ago;
	it's on several bulletin boards, probably under the name
	"snowfree", or something like that.)

HOW:
   -	Identify whole functions or families of functions that can
	be recoded in assembler.  If your code has an "object"
	flavor, then the methods associated with a class of objects
	become a candidate for recoding.  If you can't break it apart
	in this way, consider restructuring before recoding.

   -	Recode the LOWEST possible levels of the function chain.
	(I think Rich mentioned this, too.)  This minimizes the
	work needed to port the program.  (The older I get, the more
	I consider portability early.)

   -	Remember those programming "standards" that you routinely
	ignore in your C coding?  Well follow them RELIGIOUSLY in
	assembler; they're even more important here.  Especially:
	   .	Break up the code into small, modular functions.
		Be more ruthless about this than even the best
		C code, since some C "one-liners" may be a half
		page of ASM.
	   .	Include a FULL preamble for EVERY function.  Include
		input (arguments, register state at call time, etc.),
		output, what's done, which registers aren't preserved,
		etc.
	   .	All the rules Rich noted about commenting lines.
	   .	Choose meaningful names for variables, functions, and
		even labels (which you usually don't worry about in C).
	I'd like to nod some praise in the direction of a former
	colleague, Eric Bauer.  We shared an all-assembler project
	(a multi-tasking MS-DOS), and had to read one another's code
	to get anything done.  Eric's adherence to good coding
	practice made it easier to follow his code than even most
	C code (for instance, the C programs I currently maintain). 
	So it IS possible to write non-obscure assembler, even if
	you're doing something inherently obscure.

>(I personally think assembly language is becoming a lost art, and
>that it can produce significant payoffs in time critical applications
>such as database processing or graphics display.)
	Graphics display - ASM absolutely essential.
	Database processing - I'd guess that algorithm and structure
		optimization will yield bigger payoffs.  Maybe, just
		maybe, in the disk handling (on the assumption that
		you believe your DBMS wants to bypass the OS's file
		system, a questionable conclusion at best)....

>Tom Rombouts  Torrance Techie  tomr@ashtate.A-T.com  V:(213) 538-7108

Well, this has been fun for me.  Hope it's helpful to you.

Dave
+---------------------------------------------------------------+
|    Dave Tutelman						|
|    Physical - AT&T Bell Labs  -  Lincroft, NJ			|
|    Logical -  ...att!pegasus!dmt == dmt@pegasus.att.com	|
|    Audible -  (201) 576 2194					|
+---------------------------------------------------------------+