[comp.sys.mac.programmer] Flex fast lexical analyzer for MPW announcement.

earleh@eleazar.dartmouth.edu (Earle R. Horton) (05/22/88)

     I have Flex, the lex replacement which was recently posted to
comp.sources.unix, successfully ported to Macintosh Programmer's
Workshop C, and linked as an MPW Tool.  After I try it out on a few
more programs, I will post a binary of the tool to comp.binaries.mac
and the sources to comp.sources.mac.  The MPW implementation is pretty
much the same as the UNIX implementation, except the *stupid* compiler
and the *stupid* linker will not allow more than 32k of global data.
This means that the fast scanner options probably will not work,
unless you have a real small specification or are clever enough to
stuff the state tables in a resource.  The default scanner, which is
plenty fast, works fine.

     I didn't realize this until trying to link Flex, but a maximum of
32k for global data in a real program is *stupid* (imagine the word
*stupid* in bold, italic, shadowed style of the system font, size 24).
Get with it, Apple.

*********************************************************************
*Earle R. Horton, H.B. 8000, Dartmouth College, Hanover, NH 03755   *
*********************************************************************

dan@Apple.COM (Dan Allen) (05/23/88)

I agree that a 32K global data limit is stupid.  But the workaround in
most cases is so trivial (especially in C) that why waste time worrying
about it?  USE THE HEAP!  If a program needs big buffers or big arrays,
then use New in Pascal, malloc in C, or NewPtr in general for the Mac.
And with MultiFinder around, if a large buffer is only temporary, you
can request the space from MultiFinder's heap and have your application
look like a hero by running in a much smaller MultiFinder partition!
(This last suggestion may not work real well with MPW tools, but works
fine in apps.)

It is very rare for an application to REALLY NEED more than 32K of
global data.  I submit that if an app is using more than 32K, then it is
written wrong, that is, the allocation of the space should be at a more
local scope or whatever.

I have done a fair amount of work with several NON-TRIVIAL applications
for the Mac.  You may have even heard of them.

HyperCard 1.2	312732 bytes of code,  25872 bytes of global data = 8%
MPW Shell 2.0+  179874 bytes of code,  20674 bytes of global data = 11%

The percentages are simply global data divided by size of code (not
including resources-just real live CODE).  The MPW percentage is
actually a bit high because it is written in C which does a lot of
global data initialization which is included in the code size, whereas
HyperCard is written in MPW Pascal and the initialization code is not
automatic, although HyperCard does initialize portions of its global
data.

So anyway, I do not think that global data should be as big a deal.  If
you have a specific problem, I would be interested in hearing about it
to see if my conjecture (that no well written app NEEDS more than 32K of
global data) is actually true or not.

Use the heap, and then if you still need more memory, buy more SIMMs.

Dan Allen
Software Explorer
Apple Computer

wetter@tybalt.caltech.edu (Pierce T. Wetter) (05/23/88)

In article <10896@apple.Apple.Com> dan@apple.UUCP (Dan Allen) writes:
>I agree that a 32K global data limit is stupid.  But the workaround in
>most cases is so trivial (especially in C) that why waste time worrying
>about it?  USE THE HEAP!  If a program needs big buffers or big arrays,
   32K = 3,276.8 Real numbers. I have seen programs which have arrays with
   more then this many data points in the code. (State tables for the solar
  system). These are static constants which count as global data. Granted this
  could be put in a data file, but when you're the one given the task of porting
  this code over, its not a fun job editing a hugh file of numbers into some
  format from a mess of DATA statements. Additionally many programs such as
  lex, gnuchess, yacc, bison etc. were originally written assuming there was
  infinite memory. When you don't really understand what the code does (which
  I didn't with bison, I just replaced array declartions with malloc) its a
  pain to go back and find ALL large array references and fix them.

  ABOVE ALL, 32K IS HALF AS MUCH AS THE LIMIT SHOULD BE. Whoever is in charge
  of writing linkers for Mac compilers could instantly double the amount of 
  global data simply by pointing A5 at the MIDDLE of the global data rather then
  the top. After all its a SIGNED offset, so that means there's 32K each 
  direction.

  Are you listening Apple, Think?

Pierce Wetter

----------------------------------------------------------------
wetter@tybalt.caltech.edu     Race For Space Grand Prize Winner.
-----------------------------------------------------------------
   Useless Advice #986: Never sit on a Tack.

singer@endor.harvard.edu (Rich Siegel) (05/23/88)

In article <6645@cit-vax.Caltech.Edu> wetter@tybalt.caltech.edu.UUCP (Pierce T. Wetter) writes:
>In article <10896@apple.Apple.Com> dan@apple.UUCP (Dan Allen) writes:
>>I agree that a 32K global data limit is stupid.  But the workaround in
>>most cases is so trivial (especially in C) that why waste time worrying
>>about it?  USE THE HEAP!  If a program needs big buffers or big arrays,

	Ditto!

>  lex, gnuchess, yacc, bison etc. were originally written assuming there was
>  infinite memory. When you don't really understand what the code does (which
>  I didn't with bison, I just replaced array declartions with malloc) its a
>  pain to go back and find ALL large array references and fix them.

	I'm sorry, but that's a fundamental problem when porting mainframe
programs to microcomputers. I am not unsympathetic to your problem, because 
I've done it many times myself, and learned the hard way. Assuming that you
have infinite amounts of core to play around with is simply not valid.
You can't get something for nothing...

	When it comes to large constant data tables, the solution, while not
effortless, is nearly so. Take your initialization statement, and make the
necessary calls to allocate a big enough handinitialize the handle
with this data, and write it out as a resource. Then your program simply
needs to load in this resource at initialization time. (For ann added bonus
you can mark the resource as preloaded and locked, so it goes very low in
the heap - right above the master pointer blocks.) This is much faster
than initializing all that data...

	I'm using this technique in an iversion if PEditof PEdit (the
sample editor that goes with Capps Prime) for keyword tables (for boldfacing
Toolbox calls)) and it works beautifully for me.

>  ABOVE ALL, 32K IS HALF AS MUCH AS THE LIMIT SHOULD BE. Whoever is in charge
>  of writing linkers for Mac compilers could instantly double the amount of 
>  global data simply by pointing A5 at the MIDDLE of the global data 

	It's an interesting idea, but since I'm not writing the linker, all
I can do is pass the suggestion along (and it's also a cochange that
needs to be made. I think.).

	By the way, it's worth noting that LightspeedC places no limit on
the amount of string and floating-point constants that you can declare, which
helps somewhat...

>  Are you listening Apple, Think?

	Always. :-)

		--Rich

CAVEAT: I'm not a compiler or linker writer, just a wimpy user interface
kinda guy. :-)




Rich Siegel
Quality Assurance Technician
THINK Technologies Division, Symantec Corp.
Internet: singer@endor.harvard.edu
UUCP: ..harvard!endor!singer
Phone: (617) 275-4800 x305

lsr@Apple.COM (Larry Rosenstein) (05/23/88)

In article <6645@cit-vax.Caltech.Edu> wetter@tybalt.caltech.edu.UUCP (Pierce T. Wetter) writes:
>
>  ABOVE ALL, 32K IS HALF AS MUCH AS THE LIMIT SHOULD BE. Whoever is in charge

Recall that positive offsets from A5 are used for jump table entries.  These
are necessary for cross-segment procedure calls.  So the other half of the
global space is not totally unused.

mce@tc.fluke.COM (Brian McElhinney) (05/27/88)

In article <10896@apple.Apple.Com> dan@apple.UUCP (Dan Allen) writes:
> I agree that a 32K global data limit is stupid.  But the workaround in
> most cases is so trivial (especially in C) that why waste time worrying
> about it?  USE THE HEAP!

You misunderstand.  Flex doesn't create buffers, but static, initialized data
structures.  It uses C as a sort of high level assembler.  And, no, it is not
trivial to use resources either.  Developing one software tool based on
another is obviously extremely desirable, and the 32K limit should be taken
out and shot.

> It is very rare for an application to REALLY NEED more than 32K of global
> data.  I submit that if an app is using more than 32K, then it is written
> wrong, that is, the allocation of the space should be at a more local scope
> or whatever.

I submit that you cannot call any program right or wrong based on some
arbitrary metric!  Software history is full of people making this same
mistake.  Software grows and grows and grows.  There are also conflicts
between "well written" and "maintainable", when you start applying rules
that define "good" programs.

> If you have a specific problem, I would be interested in hearing about it to
> see if my conjecture (that no well written app NEEDS more than 32K of global
> data) is actually true or not.

How about GNU Emacs: 778240 bytes of code, 212992 bytes of global data = 27%.

Emacs has been called a pig, but not non-trivial or badly written.  I have
never, ever, seen a more useful program (including MPW Shell).  I only wish it
"did graphics".

> The MPW percentage is actually a bit high because it is written in C which
> does a lot of global data initialization which is included in the code size,
> whereas HyperCard is written in MPW Pascal and the initialization code is not
> automatic, although HyperCard does initialize portions of its global data.

I have always suspected that the REAL reason resources were used in the
Macintosh was not because it is such a good idea, but rather because Pascal
does not allow initialized data structures.  :-)


Brian McElhinney
mce@tc.fluke.com