[alt.sources] jargon lookup, part 01 of 02

goer@ellis.uchicago.edu (Richard L. Goerwitz) (04/05/91)

This is the package I posted a while back to facilitate quick lookup
of entries in the jargon file.  Naturally when I snarfed the latest
version, 2.8.2, the format had changed enough to break my lookup pro-
gram.  Several bugs (minor) also turned up, which I've fixed.  The
entire package is small enough, and the diffs large enough, that I am
just re-posting it in toto.

-Richard


---- Cut Here and feed the following to sh ----
#!/bin/sh
# This is a shell archive (produced by shar 3.49)
# To extract the files from this archive, save it to a file, remove
# everything above the "!/bin/sh" line above, and type "sh file_name".
#
# made 04/04/1991 15:36 UTC by goer@sophist.uchicago.edu
# Source directory /u/richard/Jargon
#
# existing files will NOT be overwritten unless -c is specified
# This format requires very little intelligence at unshar time.
# "if test", "cat", "rm", "echo", "true", and "sed" may be needed.
#
# This is part 1 of a multipart archive                                    
# do not concatenate these parts, unpack them in order with /bin/sh        
#
# This shar contains:
# length  mode       name
# ------ ---------- ------------------------------------------
#   1330 -rw-r--r-- jargon.src
#   6260 -r--r--r-- gettext.icn
#   1335 -r--r--r-- adjuncts.icn
#   3452 -r--r--r-- idxtext.icn
#   1640 -r--r--r-- jarg2get.icn
#   3157 -rw-r--r-- README
#   1928 -rw-r--r-- Makefile.dist
#
if test -r _shar_seq_.tmp; then
	echo 'Must unpack archives in sequence!'
	echo Please unpack part `cat _shar_seq_.tmp` next
	exit 1
fi
# ============= jargon.src ==============
if test -f 'jargon.src' -a X"$1" != X"-c"; then
	echo 'x - skipping jargon.src (File already exists)'
	rm -f _shar_wnt_.tmp
else
> _shar_wnt_.tmp
echo 'x - extracting jargon.src (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'jargon.src' &&
X############################################################################
X#
X#	Name:	 1.4
X#
X#	Title:	 look up words in hacker's jargon database
X#
X#	Author:	 Richard L. Goerwitz
X#
X#	Version: jargon.icn
X#
X############################################################################
X#
X#  Defines hacker's jargon.  Usage is simply "jargon word," where word
X#  is some bit of hacker's slang for which a definition is desired.
X#  Aborts with an exit code of 1 on no-arg invocation.  If a "word"
X#  arg is given, but no definition is found, jargon exits with status
X#  2.  Otherwise the appropriate entry for "word" is displayed.
X#
X#  Database is based on the jargon file, version 2.7.1, posted to alt.
X#  sources on March 1, 1991.  Might work on other versions, though I
X#  have not tested it out.
X#
X############################################################################
X#
X#  Links:  gettext.icn, adjuncts.icn
X#
X############################################################################
X
Xprocedure main(a)
X
X    local database, usage, no, yes
X
X    # Change this, if you use a different location.
X    database := "/usr/local/lib/jargon/jargon.wrd"
X
X    no := &ucase || "-/"; yes := &lcase || "  "
X    usage := "usage:  jargon word"
X    *a = 1 | stop(usage)
X    write(gettext(trim(map(a[1], no, yes)), database)) | exit(2)
X
Xend 
SHAR_EOF
true || echo 'restore of jargon.src failed'
rm -f _shar_wnt_.tmp
fi
# ============= gettext.icn ==============
if test -f 'gettext.icn' -a X"$1" != X"-c"; then
	echo 'x - skipping gettext.icn (File already exists)'
	rm -f _shar_wnt_.tmp
else
> _shar_wnt_.tmp
echo 'x - extracting gettext.icn (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'gettext.icn' &&
X############################################################################
X#
X#	Name:	 gettext.icn
X#
X#	Title:	 gettext (simple text-base routines)
X#
X#	Author:	 Richard L. Goerwitz
X#
X#	Version: 1.16
X#
X############################################################################
X#
X#  Gettext() and associated routines allow the user to maintain a file
X#  of KEY/value combinations such that a call to gettext(KEY, FNAME)
X#  will produce value.  Gettext() fails if no such KEY exists.
X#  Returns an empty string if the key exists, but has no associated
X#  value in the file, FNAME.
X#
X#  The file format is simple.  Keys belong on separate lines, marked
X#  as such by an initial colon+colon (::).  Values begin on the line
X#  following their respective keys, and extend up to the next
X#  colon+colon-initial line or EOF.  E.g.
X#
X#    ::sample.1
X#    Notice how the key above, sample.1, has :: prepended to mark it
X#    out as a key.  The text you are now reading represents that key's
X#    value.  To retrieve this text, you would call gettext() with the
X#    name of the key passed as its first argument, and the name of the
X#    file in which this text is stored as its second argument (as in
X#    gettext("sample.1","tmp.idx")).
X#    ::next.key
X#    etc...
X#
X#  For faster access, an indexing utility is included, idxtext.  Idxtext
X#  creates a separate index for a given text-base file.  If an index file
X#  exists in the same directory as FNAME, gettext() will make use of it.
X#  The index becomes worthwhile (at least on my system) after the text-
X#  base file becomes longer than 5 kilobytes.
X#
X#  Donts:
X#      1) Don't nest gettext text-base files.
X#      2) Don't use spaces and/or tabs in key names.
X#      3) Don't modify indexed files in any way other than to append
X#         additional keys/values (unless you want to re-index).
X#
X#  This program is intended for situations where keys tend to have
X#  very large values, and use of an Icon table structure would be
X#  unweildy.
X#
X#  BUGS:  Gettext() relies on the Icon runtime system and the OS to
X#  make sure the last text/index file it opens gets closed.
X#
X#  Note:  This program is NOT YET TESTED UNDER DOS.  In particular,
X#  I have no idea whether the indexing mechanism will work, due to
X#  translation that has to be done on MS-DOS text files.
X#
X############################################################################
X#
X#  Links: ./adjuncts.icn
X#
X#  Requires: UNIX (maybe MS-DOS; untested)
X#
X############################################################################
X
X# declared in adjuncts.icn
X# global _slash, _baselen
X
Xprocedure gettext(KEY,FNAME)
X
X    local line, value
X    static last_FNAME, intext, inidx
X    initial {
X	if find("UNIX", &features) then {
X	    _slash := "/"
X	    _baselen := 10
X	}
X	else if find("MS-DOS", &features) then {
X	    _slash := "\\"
X	    _baselen := 8
X	}
X	else stop("gettext:  OS not supported")
X    }
X
X    (/KEY | /FNAME) & stop("error (gettext):  null argument")
X
X    if FNAME == \last_FNAME then {
X	seek(intext, 1)
X	seek(\inidx, 1)
X    }
X    else {
X	# We've got a new text-base file.  Close the old one.
X	every close(\intext | \inidx)
X        # Try to open named text-base file.
X	intext := open(FNAME) | stop("gettext:  ",FNAME," not found")
X        # Try to open index file.
X	inidx := open(Pathname(FNAME) || getidxname(FNAME)) | &null
X    }
X    last_FNAME := FNAME
X
X    # Find offsets for key KEY in index file.  If inidx (the index
X    # file) is null (which happens when none was found), get_offsets()
X    # defaults to 1.  Otherwise it returns the offset for KEY in the
X    # index file, and then returns the last indexed byte of the file.
X    # Returning the last indexed byte lets us seek to the end and do a
X    # sequential search of any key/value entries that have been added
X    # since the last time idxtext was run.
X
X    seek(intext, get_offsets(KEY, inidx))
X
X    # Find key.  Should be right there, unless the user has appended
X    # key/value pairs to the end without re-indexing, or else has not
X    # bothered to index in the first place.  In this case we're
X    # supposed to start a sequential search for KEY upto EOF.
X
X    while line := (read(intext) | fail) do {
X	line ? {
X	    if (="::", =KEY, pos(0))
X	    then break
X	}
X    }
X
X    # Collect all text upto the next colon+colon-initial line (::)
X    # or EOF.
X    value := ""
X    while line := read(intext) do {
X	match("::",line) & break
X	value ||:= line || "\n"
X    }
X
X    # Note that a key with an empty value returns an empty string.
X    return trim(value, '\n')
X
Xend
X
X
X
Xprocedure get_offsets(KEY, inidx)
X
X    local bottom, top, loc, firstpart, offset
X    # Use these to store values likely to be reused.
X    static old_inidx, firstline, SOF, EOF
X
X    # If there's no index file, then just return an offset of 1.
X    if /inidx then
X	return 1
X
X    # First line contains offset of last indexed byte in the main
X    # text file.  We need this later.  Save it.  Start the binary
X    # search routine at the next byte after this line.
X    seek(inidx, 1)
X    if not (inidx === \old_inidx) then {
X
X	# Get first line.
X	firstline := !inidx
X	# Set "bottom."
X	1 = (SOF := where(inidx)-1) &
X	    stop("get_offsets:  corrupt .IDX file; reindex")
X	# How big is this file?
X	seek(inidx, 0)
X	EOF := where(inidx)
X
X	old_inidx := inidx
X    }
X    # SOF, EOF constant for a given inidx file.
X    bottom := SOF; top := EOF
X
X    # If bottom gets bigger than top, there's no such key.
X    until bottom > top do {
X
X	loc := (top+bottom) / 2
X	seek(inidx, loc)
X
X	# Move past next newline.  If at EOF, break.
X	incr := 1
X	until reads(inidx) == "\n" do
X	    incr +:= 1
X	if loc+incr = EOF then {
X	    top := loc-1
X	    next
X	}
X
X	# Check to see if the current line contains KEY.
X	read(inidx) ? {
X
X	    # .IDX file line format is KEY\toffset
X	    firstpart := tab(find("\t"))
X	    if KEY == firstpart then {
X		# return offset
X		return (move(1), tab(0))
X	    }
X	    # Ah, this is what all binary searches do.
X	    else {
X		if KEY << firstpart
X		then top := loc-1
X		else bottom := loc + incr + *&subject
X	    }
X	}
X    }
X
X    # First line of the index file contains offset of last indexed
X    # byte + 1.  Might be the only line in the file (if it had no
X    # keys when it was indexed).
X    return firstline
X
Xend
SHAR_EOF
true || echo 'restore of gettext.icn failed'
rm -f _shar_wnt_.tmp
fi
# ============= adjuncts.icn ==============
if test -f 'adjuncts.icn' -a X"$1" != X"-c"; then
	echo 'x - skipping adjuncts.icn (File already exists)'
	rm -f _shar_wnt_.tmp
else
> _shar_wnt_.tmp
echo 'x - extracting adjuncts.icn (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'adjuncts.icn' &&
X############################################################################
X#
X#	Name:	 adjuncts.icn
X#
X#	Title:	 adjuncts (adjunct utilities for gettext and idxtext)
X#
X#	Author:	 Richard L. Goerwitz
X#
X#	Version: 1.2
X#
X############################################################################
X#  
X#  Pretty mundane stuff.  Basename(), Pathname(), Strip(), and a utility
X#  for creating index filenames.
X#
X############################################################################
X#
X#  Links: none
X#
X#  See also: gettext.icn, idxtext,icn
X#
X############################################################################
X
X
Xglobal _slash, _baselen
X
Xprocedure Basename(s)
X
X    # global _slash
X    s ? {
X	while tab(find(_slash)+1)
X	return tab(0)
X    }
X
Xend
X
X
Xprocedure Pathname(s)
X
X    # global _slash
X    s2 := ""
X    s ? {
X	while s2 ||:= tab(find(_slash)+1)
X	return s2
X    }
X
Xend
X
X
Xprocedure getidxname(FNAME)
X
X    #
X    # Discard path component.  Cut basename down to a small enough
X    # size that the OS will be able to handle addition of the ex-
X    # tension ".IDX"
X    #
X
X    # global _slash, _baselen
X    return right(Strip(Basename(FNAME,_slash),'.'), _baselen, "x") || ".IDX"
X
Xend
X
X
Xprocedure Strip(s,c)
X
X    local s2
X
X    s2 := ""
X    s ? {
X	while s2 ||:= tab(upto(c))
X	do tab(many(c))
X	s2 ||:= tab(0)
X    }
X    return s2
X
Xend
SHAR_EOF
true || echo 'restore of adjuncts.icn failed'
rm -f _shar_wnt_.tmp
fi
# ============= idxtext.icn ==============
if test -f 'idxtext.icn' -a X"$1" != X"-c"; then
	echo 'x - skipping idxtext.icn (File already exists)'
	rm -f _shar_wnt_.tmp
else
> _shar_wnt_.tmp
echo 'x - extracting idxtext.icn (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'idxtext.icn' &&
X############################################################################
X#
X#	Name:	 idxtext.icn
X#
X#	Title:	 idxtext (index text-base for gettext() routine)
X#
X#	Author:	 Richard L. Goerwitz
X#
X#	Version: 1.11
X#
X############################################################################
X#
X#      Idxtext turns a file associated with gettext() routine into an
X#  indexed text-base.  Though gettext() will work fine with files
X#  that haven't been indexed via idxtext(), access is faster if the
X#  indexing is done if the file is, say, over 10k (on my system the
X#  crossover point is actually about 5k).
X#
X#      Usage is simply "idxtext [-a] file1 [file2 [...]]," where file1,
X#  file2, etc are the names of gettext-format files that are to be
X#  (re-)indexed.  The -a flag tells idxtext to abort if an index file
X#  already exists.
X#
X#      Indexed files have a very simple format: keyname tab offset
X#  [tab offset [etc.]]\n.  The first line of the index file is a
X#  pointer to the last indexed byte of the text-base file it indexes.
X#
X#  BUGS: Index files are too large.  Also, I've yet to find a portable
X#  way of creating unique index names that are capable of being
X#  uniquely identified with their original text file.  It might be
X#  sensible to hard code the name into the index.  The chances of a
X#  conflict seem remote enough that I haven't bothered.  If you're
X#  worried, use the -a flag.
X#
X############################################################################
X#
X#  Links: ./adjuncts.icn
X#  Requires: UNIX or MS-DOS
X#  See also: gettext.icn
X#
X############################################################################
X
X
X# declared in adjuncts.icn
X# global _slash, _baselen
X
Xprocedure main(a)
X
X    local ABORT, idxfile_name, fname, infile, outfile
X    initial {
X	if find("UNIX", &features) then {
X	    _slash := "/"
X	    _baselen := 10
X	}
X	else if find("MS-DOS", &features) then {
X	    _slash := "\\"
X	    _baselen := 8
X	}
X	else stop("idxtext:  OS not supported")
X    }
X
X    if \a[1] == "-a" then ABORT := pop(a)	
X
X    # Check to see if we have any arguments.
X    *a = 0 & stop("usage: idxtext [-a] file1 [file2 [...]]")
X
X    # Start popping filenames off of the argument list.
X    while fname := pop(a) do {
X
X	# Open input file.
X	infile := open(fname) |
X	    { write(&errout, "idxtext:  ",fname," not found"); next }
X	# Get index file name.
X	idxfile_name := Pathname(fname) || getidxname(fname)
X	if \ABORT then if close(open(idxfile_name)) then
X	    stop("idxtext:  index file ",idxfile_name, " already exists")
X	outfile := open(idxfile_name, "w") |
X	    stop("idxtext:  can't open ", idxfile_name)
X
X	# Write index to index.IDX file.
X	write_index(infile, outfile)
X
X	every close(infile | outfile)
X
X    }
X
Xend
X
X
Xprocedure write_index(in, out)
X
X    local key_offset_table, w, line, KEY
X
X    # Write to out all keys in file "in," with their byte
X    # offsets.
X
X    key_offset_table := table()
X
X    while (w := where(in), line := read(in)) do {
X	line ? {
X	    if ="::" then {
X		KEY := trim(tab(0))
X		if not (/key_offset_table[KEY] := KEY || "\t" || w)
X		then stop("idxtext:  duplicate key, ",KEY)
X	    }
X	}
X    }
X
X    # First line of index contains the offset of the last
X    # indexed byte in write_index, so that we can still
X    # search unindexed parts of in.
X    write(out, where(in))
X
X    # Write sorted KEY\toffset lines.
X    if *key_offset_table > 0 then
X	every write(out, (!sort(key_offset_table))[2])
X
X    return
X
Xend
SHAR_EOF
true || echo 'restore of idxtext.icn failed'
rm -f _shar_wnt_.tmp
fi
# ============= jarg2get.icn ==============
if test -f 'jarg2get.icn' -a X"$1" != X"-c"; then
	echo 'x - skipping jarg2get.icn (File already exists)'
	rm -f _shar_wnt_.tmp
else
> _shar_wnt_.tmp
echo 'x - extracting jarg2get.icn (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'jarg2get.icn' &&
X############################################################################
X#
X#	Name:	 1.1
X#
X#	Title:	 jargon to gettext format converter
X#
X#	Author:	 Richard L. Goerwitz
X#
X#	Version: jarg2get.icn
X#
X############################################################################
X#  
X#  Converts jargon.ascii (stdin) to a format suitable for use by gettext.
X#  Writes to stdout.  Jargon.ascii was posted recently (c. March 1, 1991)
X#  to alt.sources.
X#
X############################################################################
X
Xprocedure main()
X
X    local line, KEY, key_set, no, yes, blank_count
X
X    blank_count := 0
X    key_set := set()
X    no := &ucase || "-/"; yes := &lcase || "  "
X    # Isn't goal-directed evaluation nice?
X    (match("= A =", !&input), "" == !&input)
X
X    # Read stdin, looking for entries.  Entries can be distinguished
X    # a) by a preceding blank line, and b) by the presence of charac-
X    # ters beginning immediately at the margin, and c) by the presence
X    # of a colon plus a space on the line.
X    while line := trim(read(), '\t \xFF\r') do {
X
X	if "" == line then {
X	    if (blank_count +:= 1) > 2
X	    then exit(0)
X	    else write()
X	}
X	else {
X	    line ? {
X	        if match("Hacker Folklore"|"Appendix A: ")
X		then exit(0)
X		if blank_count > 0 &
X		   KEY :=
X		       trim(map(tab(any(&letters)) || tab(find(": ")),no,yes))
X		then {
X		    if not member(key_set, KEY)
X		    then write("::", KEY)
X		    insert(key_set, KEY)
X		}
X		(="= ", tab(any(&ucase)), =" =", !&input) | write(line)
X	    }
X	    blank_count := 0
X	}
X    }
X
X    stop("jarg2get:  aborting (are you sure you have the correct file?)")
X
Xend
SHAR_EOF
true || echo 'restore of jarg2get.icn failed'
rm -f _shar_wnt_.tmp
fi
# ============= README ==============
if test -f 'README' -a X"$1" != X"-c"; then
	echo 'x - skipping README (File already exists)'
	rm -f _shar_wnt_.tmp
else
> _shar_wnt_.tmp
echo 'x - extracting README (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'README' &&
X-------
X
XProgram name:  jargon
XSource language:  icon
XPurpose:  quickly find entries in the hackers' jargon file
X
X-------
X
XDescription:
X
XThis shell archive contains a jargon database program, aptly enough
Xnamed "jargon."  If you have the recently posted (March 1, 1991, alt.
Xsources) hackers' jargon file around, you can use this package for
Xquick access to entries within that file, as in "jargon word" (where
X"word" is the bit of hacker's jargon you want a definition for).
X
X-------
X
XInstallation:
X
XCp Makefile.dist to Makefile and edit it to reflect local naming con-
Xventions.  After editing, type "make all."  If, after seeing a lot of
Xgarbage go by, you get the message "everything seems OK," then su root
Xand make install.  Be sure to set default ownership and permissions in
Xthe makefile to some value that makes sense for your system.  If you
Xdon't have root privileges, you must change the DESTDIR and LIBDIR
Xvariables in the makefile so that they reflect directories you have
Xaccess to.  Note that I am assuming a Unix installation in the
Xmakefile.  If you are using some other platform, you'll need to edit
Xthe file "jargon.src" so that the variable "database" is set to the
Xfull path of the jargon.wrd file (i.e. $(LIBDIR)/jargon.wrd in the
Xmakefile).  Then type:
X
X	icont jarg2get.icn
X	icont -o idxtext idxtext.icn adjuncts.icn
X	copy jargon.src jargon.icn
X	icont -o jargon jargon.icn gettext.icn adjuncts.icn
X	jarg2get < JARGONFILE > jargon.wrd
X	idxtext jargon.wrd
X
Xwhere "copy" is your system's file copy or rename command, and
XJARGONFILE is the name of the original jargon.ascii file.  To test a
Xnon-UNIX installation, type
X
X	(ficonx) jargon zork
X
XThe ficonx part may not be necessary.  If you get a definition for
X"zork," then everything is probably OK.  If you altered jargon.src so
Xthat the database variable points to a jargon.wrd file in the current
Xdirectory, then there is no need to do anything more.  If you named a
SHAR_EOF
true || echo 'restore of README failed'
fi
echo 'End of  part 1'
echo 'File README is continued in part 2'
echo 2 > _shar_seq_.tmp
exit 0
-- 

   -Richard L. Goerwitz              goer%sophist@uchicago.bitnet
   goer@sophist.uchicago.edu         rutgers!oddjob!gide!sophist!goer