[comp.unix.questions] why does -vi- set the hi bit when expanding `%' and `#'?

norm@oglvee.UUCP (Norman Joseph) (01/09/89)

From article <8700002@gistdev>, by dlp@gistdev.UUCP:
> => oglvee.UUCP!norm says:
> =>     I am editing a file with vi (say `file.c'), and I want to
> => print it without leaving the editor, so I escape to the command
> => line by hitting `:' and I type `` :!lp % ''.
> 
> Just out of curiousity, why don't you just `` :w !lp '' ?  I assume
> your `lp` reads stdin if there are no filename arguments.

You are the second poster to suggest this approach to me, and to be
honest, it never occurred to me (even though I use the related
command `` :r !<command> '').  This approach works on my system,
and I may never need to use the `%' macro again (except when I want
to use a command that -doesn't- read from stdin!).  I thank both you
and the other poster (fcival!dac) for the suggestion.

Good, so now I've learned a new vi trick, but I -haven't- learned why
vi sets extra bits in the characters of the file name when expanding
the `%' and `#' file name macros, which was the intent of my original
posting.

In article <15219@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> vi believes that by setting bit 7, it is quoting the file name,
> so that if you are editing the file `foo*bar.c', the command
> 
> 	!echo %
> 
> produces
> 
> 	!echo \f\o\o\*\b\a\r\.\c
> 
> in shell-internal-quoting format (bit 7 set).

Maybe I'm just thick, or maybe I was home sick the day they explained
``shell-internal-quoting format'' to everyone, but would some kind
soul who knows what Chris is talking about care to fill me in? (E-mail
would be fine.  I'm sure people are falling asleep even as we speak :^).
Is this the same as quoting sh meta-characters with '\'?  Is this
something I need to care about beyond being curious?
-- 
Norm Joseph - Oglevee Computer System, Inc.
UUCP: ...!{pitt,cgh}!amanue!oglvee!norm
"Mate, that parrot wouldn't *VROOM* if you put four million volts through it!"

guy@auspex.UUCP (Guy Harris) (01/12/89)

>Maybe I'm just thick, or maybe I was home sick the day they explained
>``shell-internal-quoting format'' to everyone, but would some kind
>soul who knows what Chris is talking about care to fill me in?

Inside most versions of the Bourne, C, and Korn shells (and maybe the V6
and PWB shells as well), strings containing quoted characters (yes,
"quoted" as in "protected either with double-quotes or single-quotes, or
with a backslash," so yes,

>Is this the same as quoting sh meta-characters with '\'?

it is the same) are represented by turning the 8th bit of a byte
containing a quoted character on.  "vi", in a rather slimy move, "knew"
that this was the case, and instead of using, say, backslashes or
single-quotes to quote characters in file names, it turned the 8th bit
of the bytes containing those characters on, under the assumption that

	1) the 8th bit would be passed through the shell intact

and

	2) would thus be interpreted as meaning the characters were
	   quoted.

Unfortunately, more recent versions of the Bourne and Korn shells do
*not* use the 8th bit for this purpose, because they support 8-bit
character sets.  As such, while 1) is true, 2) isn't.

>Is this something I need to care about beyond being curious?

It's useful to keep the "8th bit" convention in mind if you may be
working on a system whose shell uses it (older - pre S5R3 - Bourne
shells, older - pre-"ksh-i" Korn shells, and all currently-available
versions of the C shell that I know of), since you won't be able to use
8-bit character sets when typing commands to those shells.  If your OS
supports file names with 8 bit characters, for example, and a file with
such characters in its name is created, you may have trouble removing it
if you are using such a shell.

It's also useful to keep in mind that using the 8th bit in such a
fashion - or other fashions - interferes with support for 8-bit
character sets, such as the ISO 8859 character sets that include
accented characters for Western European languages other than English.

domo@riddle.UUCP (Dominic Dunlop) (01/16/89)

[Already it's hard to keep track of who's quoting whom in this thread.
Sorry if I've got it wrong...]

In article <450@oglvee.UUCP> norm@oglvee.UUCP (Norman Joseph) writes:
[Stuff about vi setting the high bit of each character in the filenames it
produces when expanding `%' an `#' on shell command lines omited.]
>In article <15219@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
>> vi believes that by setting bit 7, it is quoting the file name,
>> so that if you are editing the file `foo*bar.c', the command
>> 
>> 	!echo %
>> 
>> produces [in effect]
>> 
>> 	!echo \f\o\o\*\b\a\r\.\c
>> 
>> in shell-internal-quoting format (bit 7 set).
>
>Maybe I'm just thick, or maybe I was home sick the day they explained
>``shell-internal-quoting format'' to everyone, but would some kind
>soul who knows what Chris is talking about care to fill me in? (E-mail
>would be fine.  I'm sure people are falling asleep even as we speak :^).
>Is this the same as quoting sh meta-characters with '\'?>
                                ^^^^
Yes, except that, strictly, the backslash can be used to quote any character:
it's just that the quoting is a no-op on any character other than a
metacharacter.  (Yes, this topic has scope for soporific semantic pedantry.)

>Is this
>something I need to care about beyond being curious?

No.  Apart from anything else, it's obsolescent, and its use by
applications software has been deprecated for A Long Time (this deprecation
having been broadcast in the same way as information about the `feature'
itself -- that is, by word of mouth).  As I understand it, we finally get
to say goodbye to bit seven internal quoting with the System V, release 4
version of the shell.  It's possible that it's been eliminated in V.3.1 and
later as well.  Comments, anybody?

Why has it gone?  Because it's a real pain in the butt for users of
character sets which require all eight bits of a byte in order to represent
all alphabetic characters.  This turns out to mean most Europeans.
(Asian character sets are something else again.)  Having the shell
interpret that eighth bit as a quote, then clear it, mangles text which
includes characters (usually accented letters) which ANSI didn't think
of all those years ago.

The 1003.2 working group of the IEEE is drafting a standard for the shell
command language.  I don't have it to hand, but, as I recall, it
effectively outlaws eighth bit quoting in the shell.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/19/89)

In article <969@riddle.UUCP> domo@riddle.UUCP (Dominic Dunlop) writes:
-As I understand it, we finally get
-to say goodbye to bit seven internal quoting with the System V, release 4
-version of the shell.  It's possible that it's been eliminated in V.3.1 and
-later as well.  Comments, anybody?

Yes, it's already gone.  Also, it was officially announced by AT&T.
No need to rely on "word of mouth".

guy@auspex.UUCP (Guy Harris) (01/20/89)

>As I understand it, we finally get to say goodbye to bit seven internal
>quoting with the System V, release 4 version of the shell.  It's
>possible that it's been eliminated in V.3.1 and later as well.

Heck, it was eliminated in the V.3 Bourne shell; that's why the V.3.1
(maybe the V.3, I don't remember) "vi" doesn't use the 8th-bit-quoting
hack anymore when running commands.  (It doesn't do any quoting.)