joel@techunix.BITNET (Yossi (Joel) Hoffman) (03/22/90)
Hi folks! I was trying to use LEX to process a text (yes, text) file that happens to use all eight bits (the 8th bit signifies Hebrew text). I just inserted the 8-bit letters in the usual way, but LEX choked on it. (It didn't produce any C output at all.) This couldn't just be a coincidence; is there anyway I can tell LEX that I'm going to use all 8 bits? Any help will be much appreciated. -Joel (joel@techunix.technion.ac.il -or- joel@techunix.BITNET) --
martin@mwtech.UUCP (Martin Weitzel) (03/23/90)
In article <9463@discus.technion.ac.il> joel%techunix.bitnet@jade.berkeley.edu (Yossi (Joel) Hoffman) writes: >Hi folks! I was trying to use LEX to process a text (yes, text) file >that happens to use all eight bits (the 8th bit signifies Hebrew text). >I just inserted the 8-bit letters in the usual way, but LEX choked on >it. (It didn't produce any C output at all.) This couldn't just be >a coincidence; is there anyway I can tell LEX that I'm going to use >all 8 bits? >Any help will be much appreciated. Though there are some efforts to make U*IX '8 Bit clean' I have not yet seen an implementation of 'lex' which gives support for 8-bit chars. The major problem is that 'lex' uses the 8th bit for its own purposes in the compiled representation of the regular expressions (and it seems that no one at AT&T or the software companies which port U*IX are willing to dig into the sources of 'lex' ... :-() SO BE AWARE: Even if 'lex' produces a compilable 'lex.yy.c', the behaviour may be strange if you feed input with the 8th bit set! (This specific problem hit me some time ago and I was searching for hours to track the roots of the behaviour: The pitty is that only *some* few characters trigger the errative situation. So if SOME test input seems to be processed correctly under SOME circumstances, you have no guarantee that ALL input will be processed correctly under ALL circumstances!) Whether there are work arounds or not depends on your problem: If you only want to process all chars whith the high bit set in some more or less uniform way, you may roll your own version of the 'input'-macro and translate the 8-Bit chars to some other representation. Eg you can establish a buffer which parallels 'yytext' where you store the 'real' input, but let the macro return some common representation for all characters, that you treat in the same way anyhow. [To the poster: If you need any further hints mail me a little more about your problem] As a general rule, avoid characters outside the range 1 .. 127 in your input as well as in the regular expression specification! (BTW: Who knows how the PD Version FLEX handles this?) -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
terry@pride386.UUCP (Terry Lyons) (03/23/90)
In article <9463@discus.technion.ac.il>, joel@techunix.BITNET (Yossi (Joel) Hoffman) writes: > is there anyway I can tell LEX that I'm going to use > all 8 bits? > Any help will be much appreciated. > yes declair all chars as unsigned terry -- ************************************************************************** * UUNET ...!pride386!terry * FAX (714) 739 - 2203 * * Pern is a dragons best freind * **************************************************************************
knighten@pinocchio (Bob Knighten) (03/26/90)
A recent posting on compi.compilers --- From: vern@cs.cornell.edu (Vern Paxson) Newsgroups: comp.sources.d,comp.compilers Subject: flex 2.2 alpha release available Summary: anonymous ftp to svax.cs.cornell.edu or ftp.ee.lbl.gov Keywords: flex, lex, scanner Message-ID: <1990Mar21.153942.3237@esegue.segue.boston.ma.us> Date: 21 Mar 90 15:39:42 GMT Reply-To: vern@cs.cornell.edu (Vern Paxson) Followup-To: comp.sources.d Organization: Cornell Univ. CS Dept, Ithaca NY Lines: 46 Release 2.2 of flex, a lex replacement, is now available. You can get it via anonymous ftp to svax.cs.cornell.edu (128.84.254.2, East coast) or ftp.ee.lbl.gov (128.3.254.68, West coast). Retrieve flex-2.2.alpha.tar.Z, using binary mode. The more interesting changes between 2.2 and the previous 2.1 release are: - Full user documentation. - Support for 8-bit scanners. - Scanners now accept NUL's. - A facility has been added for dealing with multiple input buffers. - A number of changes to bring flex closer into compliance with the latest POSIX lex draft. - C++ support; generated scanners can be compiled with C++ compiler. - Support for MS-DOS, VMS, and Turbo-C integrated. This is an alpha release. There are a number of new features which may not work quite right and which may have broken previous functionality. Because of this, I'd like to keep the distribution of this release limited to folks who don't mind that the software may be buggy and who are willing to report bugs back to me so I can fix them. Once the number of new bugs being found drops off sufficiently, a beta release will be made and posted to the Usenet, probably to comp.sources.unix. If the alpha release proves particularly stable, the beta will be skipped and 2.3 will instead be a full release. The intent is that in either case, 2.3 will come out by the end of May. If you don't have anonymous ftp access, let me know and I'll mail you the uuencoded tar file. Vern Vern Paxson vern@cs.cornell.edu Computer Science Dept. decvax!cornell!vern Cornell University vern@LBL (bitnet) -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.
rsalz@bbn.com (Rich Salz) (03/26/90)
In article <9463@discus.technion.ac.il> joel%techunix.bitnet@jade.berkeley.edu (Yossi (Joel) Hoffman) writes: > is there anyway I can tell LEX that I'm going to use >all 8 bits? In <691@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes: >(BTW: Who knows how the PD Version FLEX handles this?) The latest version of FLEX, that just entered beta-test, handles eight-bit input. See comp.compilers for the test announcement. It will appear in comp.sources.unix sometime after the beta-test is done. /rich $alz -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net. Use a domain-based address or give alternate paths, or you may lose out.