[comp.os.vms] Flex and DEC multi-nationals, help!

earleh@eleazar.dartmouth.edu (Earle R. Horton) (05/24/88)

I am trying to port Flex to Macintosh Programmer's Workshop C, which
like VAX C treats characters as signed.  Has anyone in this part of
the world had any luck making Flex scan DEC multi-national characters
properly?

The best I have been able to do so far is to get Flex and its scanners
to pass [200-377] unchanged, but the following generates errors:

%%

<char with high bit set>

	{
		printf("High bit!");
	}
%%

Thanks.
*********************************************************************
*Earle R. Horton, H.B. 8000, Dartmouth College, Hanover, NH 03755   *
*********************************************************************

jdc@naucse.UUCP (John Campbell) (05/25/88)

I'm posting this because I could not find a uucp path to eleazar, if
anyone is interested in flex esoterica, read on.

In article <8546@dartvax.Dartmouth.EDU>, earleh@eleazar.dartmouth.edu (Earle R. Horton) writes:
> I am trying to port Flex to Macintosh Programmer's Workshop C, which
> like VAX C treats characters as signed.  Has anyone in this part of
> the world had any luck making Flex scan DEC multi-national characters
> properly?
> 
> The best I have been able to do so far is to get Flex and its scanners
> to pass [200-377] unchanged, but ...

Well, I wanted to do the same thing using the original lex (I think
the following will hold for flex as well).  The best I could do was
fold the upper bit stuff back to 7 bit ascii and then build patterns that
worked on the 7 bit representation (I wanted <CSI> of course).  The
macro stuff looked something like the following (lex fragment).

: %{
: #define NewEOF 127
: 
: /* Change lex's input to allow us to think csi (9b) is esc (1b). */
: # define input() (((yytchar=yysptr>yysbuf?U(*--yysptr):getc(yyin)&0x7f)\
: ==10?(yylineno++,yytchar):yytchar)==NewEOF?0:yytchar)
: 
: /* Done with lex substitution. */
: %}
: csi     "\033"
: eseq1   {csi}[ -/]*[0-~]
: eseq2   {csi}\[[0-?]*[ -/]*[@-~]
: eseq3   {csi}[0-?]*[ -/]*[@-~]
: %%
: {eseq1}         {/* Ignore */ }
: {eseq2}         {/* Ignore */ }
: {eseq3}         {/* Ignore */ }
 
Note that flex doesn't have the same tables feature as lex, but I couldn't
extend the lex tables anyway.  Building a special version of flex that can 
handle 255 character tables might not be too hard--if I am right, you are 
getting hit because flex assumes 127 characters in its character table. 

You might try playing with CSIZE in flexdef.h (defined as 127).  I'm not sure
if this will impact other values (like INITIAL_MAX_CCL_TBL_SIZE, etc.).  A
note to Vern Paxson (ucbvax!lbl-csam.arpa!vern) regarding the impact of
making this change and a plea for supporting character sets greater than 
127 may even be reasonable. 

As stated above, *sorry* I don't have the final answer, but I do sympathize.
-- 
	John Campbell               ...!arizona!naucse!jdc

	unix?  Sure send me a dozen, all different colors.

info-vax-request%kl.sri.COM%KL.SRI.COM%lbl%sfsu1.hepnet@LBL.GOV (05/28/88)

Received: from KL.SRI.COM by LBL.Gov with INTERNET ;
          Thu, 26 May 88 20:39:46 PDT
Received: from ucbvax.Berkeley.EDU by KL.SRI.COM with TCP; Wed 25 May 88 01:46:11-PDT
Received: by ucbvax.Berkeley.EDU (5.59/1.28)
	id AA20577; Tue, 24 May 88 17:24:27 PDT
Received: from USENET by ucbvax.Berkeley.EDU with netnews
	for info-vax@kl.sri.com (info-vax@kl.sri.com)
	(contact usenet@ucbvax.Berkeley.EDU if you have questions)
Date: 24 May 88 21:59:02 GMT
From: naucse!jdc@arizona.edu  (John Campbell)
Organization: Northern Arizona University, Flagstaff, AZ
Subject: Re: Flex and DEC multi-nationals, help!
Message-Id: <720@naucse.UUCP>
References: <8546@dartvax.Dartmouth.EDU>
Sender: info-vax-request@kl.sri.com
To: info-vax@kl.sri.com
 
I'm posting this because I could not find a uucp path to eleazar, if
anyone is interested in flex esoterica, read on.
 
In article <8546@dartvax.Dartmouth.EDU>, earleh@eleazar.dartmouth.edu (Earle R. Horton) writes:
> I am trying to port Flex to Macintosh Programmer's Workshop C, which
> like VAX C treats characters as signed.  Has anyone in this part of
> the world had any luck making Flex scan DEC multi-national characters
> properly?
> 
> The best I have been able to do so far is to get Flex and its scanners
> to pass [200-377] unchanged, but ...
 
Well, I wanted to do the same thing using the original lex (I think
the following will hold for flex as well).  The best I could do was
fold the upper bit stuff back to 7 bit ascii and then build patterns that
worked on the 7 bit representation (I wanted <CSI> of course).  The
macro stuff looked something like the following (lex fragment).
 
: %{
: #define NewEOF 127
: 
: /* Change lex's input to allow us to think csi (9b) is esc (1b). */
: # define input() (((yytchar=yysptr>yysbuf?U(*--yysptr):getc(yyin)&0x7f)\
: ==10?(yylineno++,yytchar):yytchar)==NewEOF?0:yytchar)
: 
: /* Done with lex substitution. */
: %}
: csi     "\033"
: eseq1   {csi}[ -/]*[0-~]
: eseq2   {csi}\[[0-?]*[ -/]*[@-~]
: eseq3   {csi}[0-?]*[ -/]*[@-~]
: %%
: {eseq1}         {/* Ignore */ }
: {eseq2}         {/* Ignore */ }
: {eseq3}         {/* Ignore */ }
 
Note that flex doesn't have the same tables feature as lex, but I couldn't
extend the lex tables anyway.  Building a special version of flex that can 
handle 255 character tables might not be too hard--if I am right, you are 
getting hit because flex assumes 127 characters in its character table. 
 
You might try playing with CSIZE in flexdef.h (defined as 127).  I'm not sure
if this will impact other values (like INITIAL_MAX_CCL_TBL_SIZE, etc.).  A
note to Vern Paxson (ucbvax!lbl-csam.arpa!vern) regarding the impact of
making this change and a plea for supporting character sets greater than 
127 may even be reasonable. 
 
As stated above, *sorry* I don't have the final answer, but I do sympathize.
-- 
	John Campbell               ...!arizona!naucse!jdc
 
	unix?  Sure send me a dozen, all different colors.

info-vax-request%kl.sri.COM%KL.SRI.COM%lbl%sfsu1.hepnet@LBL.GOV (05/28/88)

Received: from KL.SRI.COM by LBL.Gov with INTERNET ;
          Thu, 26 May 88 17:35:20 PDT
Received: from ucbvax.Berkeley.EDU by KL.SRI.COM with TCP; Tue 24 May 88 16:11:15-PDT
Received: by ucbvax.Berkeley.EDU (5.59/1.28)
	id AA10883; Tue, 24 May 88 07:51:42 PDT
Received: from USENET by ucbvax.Berkeley.EDU with netnews
	for info-vax@kl.sri.com (info-vax@kl.sri.com)
	(contact usenet@ucbvax.Berkeley.EDU if you have questions)
Date: 23 May 88 18:10:37 GMT
From: dartvax!eleazar.dartmouth.edu!earleh@bu-cs.bu.edu  (Earle R. Horton)
Organization: Dartmouth College, Hanover, NH
Subject: Flex and DEC multi-nationals, help!
Message-Id: <8546@dartvax.Dartmouth.EDU>
Sender: info-vax-request@kl.sri.com
To: info-vax@kl.sri.com
 
I am trying to port Flex to Macintosh Programmer's Workshop C, which
like VAX C treats characters as signed.  Has anyone in this part of
the world had any luck making Flex scan DEC multi-national characters
properly?
 
The best I have been able to do so far is to get Flex and its scanners
to pass [200-377] unchanged, but the following generates errors:
 
%%
 
<char with high bit set>
 
	{
		printf("High bit!");
	}
%%
 
Thanks.
*********************************************************************
*Earle R. Horton, H.B. 8000, Dartmouth College, Hanover, NH 03755   *
*********************************************************************

PEARCEDJ%ATBM01@dupont.COM (Dudley J Pearce) (06/03/88)

UNSUBSCRIBE