[comp.text] String functions in TeX: instr and extract

armstrng@cs.dal.ca (Stan Armstrong) (08/30/89)

I need to do some simple string manipulation in TeX.  I would like to
redefine the \input command so that an alternate file extension is used
when the file is not found.  In order to do this, I need an instr function
to find the position of the file extension within the filename, and an
extract function to remove the extension.  I've looked through the TeXbook
and could not find anything useful, and I've also looked through an issue
of Tugboat which had an example using \expandafter to locate a substring
within a string, but did not return the location of the substring.

Help!  I don't have a lifetime to devote to learning TeX so that I can
write these functions myself.  Surely someone out there must have a working
solution.

It would be nice if the instr function could find the first occurence after
a specified position within the string, although that could be simulated
using the extract function.  I need to recognize the first period which is
not between square brackets as the beginning of the file extension since
VMS filenames are of the form:

nodename::devicename:[directory.subdirectory]filename.extension;version

Any hints, suggestions, etc. would be deeply appreciated.

Ben Armstrong (UUCP: armstrng@dalcs BITNET: armstrong@STMARYS)

grunwald@anchor.colorado.edu (Dirk Grunwald) (08/30/89)

would

\def\VmsFileName#1::#2:[#3.#4]#5.#6;#7{
 \gdef\nodename{#1}
 \gdef\devicename{#2}
	..etc..
}

work?

chris@mimsy.UUCP (Chris Torek) (08/31/89)

In article <1989Aug30.133802.15579@cs.dal.ca> armstrng@cs.dal.ca
(Stan Armstrong) writes:
>I need to do some simple string manipulation in TeX.

The usual trick is to define delimited macros, e.g.,

	\def\@get#1.#2\@@{\def\@got{#1}}
	\def\namepart#1{\@get#1.\@@\@got}

then `\namepart{a.b}' expands to \@get a.b.\@@, which makes \@get's
#1 be `a' and #2 be `b.'.  If you give \namepart no `.', the second `.'
supplies the one \@get needs, so that \namepart{a} expands to \@get a.\@@,
making #1 be `a' and #2 be empty (an implicit space, I think; see
_The_TeXbook_ to be sure).

A similar (but different) trick can be used to extract the `ext' part
`b'.  It is a bit more complicated.

However:

>I would like to redefine the \input command so that an alternate file
>extension is used when the file is not found.

You cannot achieve this.  The only way to test for a file's existence
is to open it; and open always succeeds.  If the file cannot be opened,
TeX interacts with its operator to get a different name---and it will
not take `no name' for an answer!  (I consider this a bug.)

>... I need to recognize the first period which is
>not between square brackets as the beginning of the file extension since
>VMS filenames are of the form:
>
>nodename::devicename:[directory.subdirectory]filename.extension;version

Not that it will do any good (see `However' above), but \catcode tricks
can help here.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (08/31/89)

>>I would like to redefine the \input command so that an alternate file
>>extension is used when the file is not found.

In article <19344@mimsy.UUCP> I wrote:
>You cannot achieve this.  The only way to test for a file's existence
>is to open it; and open always succeeds.  If the file cannot be opened,
>TeX interacts with its operator to get a different name---and it will
>not take `no name' for an answer!  (I consider this a bug.)

This is not quite true (as ken@cs.rochester.edu pointed out).  You
can first open it with `\openin', and then use `\ifeof' to check for
immediate EOF, which `appears' true for nonexistent files (and, after
testing, it seems it appears false for empty files!).  Then, if and
only if it exists, you can \input it.  Something like this (which I
may start using myself...):

\catcode`@=11 % plain TeX only, not in LaTeX style files.

% Give #1 as an error, with help #2.  Stolen from LaTeX.
\def\@errhelp#1#2{\edef\@tempc{#2}\expandafter\errhelp\expandafter{\@tempc}
  \errmessage{#1}}

\let\@oldinput=\input
\newread\test@existence
% Is the `\immediate\closein' below unnecessary?
\def\@inerr#1{\immediate\closein\test@existence
  \@errhelp{Cannot read #1.tex}{The file you named does not exist,
  or cannot be read.^^JIf you want to use a different file,
  type \space I \string\input{file} <return> \space here.}}
\def\@inok#1{\immediate\closein\test@existence\@oldinput #1}
\def\input#1{\immediate\openin\test@existence=#1
  \ifeof\test@existence\let\next\@inerr\else\let\next\@inok\fi\next{#1}}

\catcode`@=12 % plain TeX only again.

Incidentally, the reason for using \next and \@inerr and \@inok is that
otherwise TeX's error-interaction lets one delete tokens from the
expansion of \input.  (My first version, before I experimented, used
\ifeof...\errmessage{Cannot read #1}\else...\fi; this let you delete
the \else...\fi part.)

(Now it is up to the original poster to decide how to look for
extensions in the argument to \input.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

armstrng@cs.dal.ca (Stan Armstrong) (08/31/89)

In article <GRUNWALD.89Aug30090437@anchor.colorado.edu> grunwald@flute.cs.uiuc.edu writes:
>
>would
>
>\def\VmsFileName#1::#2:[#3.#4]#5.#6;#7{
> \gdef\nodename{#1}
> \gdef\devicename{#2}
>	..etc..
>}
>
>work?

I don't see how that helps.  We currently have hundreds of documents which
contain \input{vmsfilename} commands.  My bias is against changing all of
these commands to something else.  That rules out a list of parameters.
Besides, any one of the above parts may be missing from the filename and
will be defaulted by VMS (except for the name and extension).  Furthermore,
the directory specification may contain any number of subdirectories.
For example, I need to be able to parse the following:

     \input{devname:file.ext}
     \input{[dir.subdir1]file.ext}
     \input{devname:[dir.subdir1.subdir2]file.ext}

The only way I can think of to solve this is using instr and extract
functions as described in my first posting.

Any more suggestions anyone?

Ben Armstrong (UUCP: armstrng@dalcs BITNET: armstrong@STMARYS)

bts@sas.UUCP (Brian T. Schellenberger) (09/06/89)

In article <1989Aug31.142820.18594@cs.dal.ca> armstrng@cs.dal.ca.UUCP (Stan Armstrong) writes:
|In article <GRUNWALD.89Aug30090437@anchor.colorado.edu> grunwald@flute.cs.uiuc.edu writes:
|>
|>would
|>
|>\def\VmsFileName#1::#2:[#3.#4]#5.#6;#7{
|> \gdef\nodename{#1}
|> \gdef\devicename{#2}
|>	..etc..
|>}
|>
|>work?
|
|I don't see how that helps.  We currently have hundreds of documents which
|contain \input{vmsfilename} commands.  My bias is against changing all of
|these commands to something else.  That rules out a list of parameters.

No, it doesn't.  You can redefine \input like:

\def \input #1 {\VmsFileName #1 . . . .}
Actually, you would want to add a \endVmsFileName to the end, and supply
it from \input; this prevents you from reading too far if you get an odd-
looking filename.

|Besides, any one of the above parts may be missing from the filename and
|will be defaulted by VMS (except for the name and extension).  

This is a serious problem.  You would actually have to break up the
"VmsFileName" into a sequence of pieces, and supply defaults; eg

\def \Empty {}
\def \VmsOne #1.#2\EndOne {\let\devname=\Empty \let\nodename=\Empty . . .
	\let\ver=\Empty
	\VmsTwo{#1::\EndTwo} \VmsEnd{#2;\EndEnd}
\def \VmsTwo #1::#2\EndTwo {\def\tmp{#2} \ifx\tmp\Empty
	\def\fliename{#1}\else
		\def\nodename{#1} \VmsThree{#2:\EndThree}\fi}
\def \VmsThree #1:#2::\EndThree { . . .}

. . . what we do is to pass the parts to other macros, each of which
is given the delimeter it needs so that TeX will always find it.
Then we test to see if the delimeter was *really* there by seeing if
the second argument was empty.  Now, the only problem with this is that
the extra :: the caller tacked onto the VmsTwo argument is still there
if it *wasn't* empty.  No problem, we just add it onto the stuff that 
\VmsThree expects as delimiter, and it gets neatly tossed for us.

|Furthermore,
|the directory specification may contain any number of subdirectories.
|For example, I need to be able to parse the following:
|
|     \input{devname:file.ext}
|     \input{[dir.subdir1]file.ext}
|     \input{devname:[dir.subdir1.subdir2]file.ext}

No problem.  If you have:

\def \foo #1[#2.#3]#4 {. . .

and you call it as

\foo a[b.c.d]e

Then all that happens is:
#1 <- a
#2 <- b
#3 <- c.d
#4 <- e

No problem.  You can iteratively call an #1.#2 parsing function, if you 
*really* need to know all the components (and quite frankly, I can't see
why you want anything other than 

[#1]

but perhaps there's some good reason . . .
-- 
-- Brian, the Man from Babble-on.		...!mcnc!rti!sas!bts
--
"Every jumbled pile of person has a thinking part that wonders what the part
that isn't thinking isn't thinking of" -- THEY MIGHT BE GIANTS

gm@romeo.cs.duke.edu (Greg McGary) (09/09/89)

In article <19344@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>The only way to test for a file's existence
>is to open it; and open always succeeds.  If the file cannot be opened,
>TeX interacts with its operator to get a different name---and it will
>not take `no name' for an answer!  (I consider this a bug.)

I agree that it's a bug that you can't check for file existence without
opening and risking interaction with TeX, but TeX will take `no name'
for an answer--that's what `null.tex' is for.  When TeX forces you to
type a file-name, type `null' and you can escape!
-- Greg McGary
-- 10310 Main Street #109, Fairfax, Virginia 22030    	(703) 352-0407
-- {decvax,hplabs,uunet,mcnc}!duke!gm
--                                 gm@cs.duke.edu