[comp.sources.misc] v07i124: random names generator

allbery@uunet.UU.NET (Brandon S. Allbery - comp.sources.misc) (08/08/89)

Posting-number: Volume 7, Issue 124
Submitted-by: jrk@sys.uea.ac.uk (Richard Kennaway)
Archive-name: names_jrk

This is names.c, a program for generating random names for FRP characters.
Unlike many other such programs, this one will generate names to match
any language you like.  Feed it with text in that language, and it will
generate words statistically similar to the input text.  For example,
here is some of the output it gives when fed with the Sindarin words
from a Sindarin-English dictionary:

	annun ossen bered lamedo tolbrandirithron meregil arad doriel
	lothrond nim min rohir carch menel caradan uil las tolbrant arahad
	dol egalen rhiw iath remmen celeth arveduin elwing benn min forlan
	uil angborn morgai arad torn dain char thond toreth anfaladel

As you can see, not all the output is directly usable, but by
exercising some selection you can obtain results like:

	Ossiriel, Eredhel, Belain, Minarwen, Gwathlain, Gundaer,
	Suldor Belebrethand, Berielegor, Gwairithir, Gaurgor, Nardol,
	Sammathremmir,...

For comparison, here's some output from the food-and-drink section of a
German phrasebook :-):

	kursch rhampelebans prottelm tradivier en bohl sauber arnen men
	rautt kabbeer banineln stetschahn blummeloneulen sarneclacher
	men chwarschen aal raustdorelone en garscht karadie blat raube
	sch kirschte protten flen mohl arderben audelspinguse trauchel

Runs on Macintosh (if you have MPW) and Unix.
Public domain.  Share and enjoy.

--
Richard Kennaway          SYS, University of East Anglia, Norwich, U.K.
uucp:  ...mcvax!ukc!uea-sys!jrk		Janet:  kennaway@uk.ac.uea.sys

-----cut-----cut-----cut-----cut-----cut-----cut-----cut-----cut-----cut-----
#! /bin/sh
# This is a shar archive.  "sh" it to unpack.
# Contents: names.1 Makefile names.c
echo x - names.1
sed 's/^X//' >names.1 <<'*-*-END-of-names.1-*-*'
X.TH NAMES 1 "August 1989"
X.UC
X.SH NAME
Xnames \- generate random names
X.SH SYNOPSIS
X.B names
X[
X.B \-3
X] [
X.BR \-w\ |\ \-s
X] [
X.B -l
X.I nnn
X] [
X.I files
X]
X.SH DESCRIPTION
X.I Names
Xis a random name generator.
XIt will read text from standard input or from files given on the
Xcommand line, and generate a random stream of words whose statistical
Xcharacteristics are similar to those of the input.
XThus if you give it a database of Elvish names, it will generate Elvish-like
Xnames; if a database of Orcish names, it generates Orc-like names, etc.
X.PP
XIt does this by counting the frequency of all 1-, 2-, 3-, and 4-character
Xsequences of letters or spaces in the input.
XCase is ignored, and all runs of non-letters are seen as single spaces.
XThe first character to be output, say "a",
Xis generated according to the relative frequencies
Xwith which each character was found to follow a space in the input.
XThe second, say "b", is generated according to the relative frequencies with
Xwhich each character can appear following the digraph " a".
XThe third is generated according to the relative frequencies with which
Xeach character follows the trigraph " ab".
XAnd so on ad infinitum, each new character depending on the previous three.
X.PP
XThe larger the input, the better.
XIt needs at least a few thousand bytes of input for useful results.
XIf the input is not large enough,
Xyou will tend to get words from the input appearing verbatim in the
Xoutput, as much of the time three consecutive characters will uniquely
Xdetermine the next character.
XIf more input of the desired form is not available, the program can be
Xmade to use a third-order approximation instead, each character of the
Xoutput depending only on the two preceding characters.  This is also
Xuseful if there is not enough memory to construct the tetragraph
Xtable, which occupies just over half a megabyte.
X.PP
XThe output is wrapped to 76 chars maximum, hyphenating any word
Xthat has to be broken over a line-end.
X.PP
X.I Names
Xwill run on Unix, and on a Macintosh as an MPW shell tool.
X.SH OPTIONS
X.TP
X.B \-w
XGenerate successive words independently,
Xi.e. each word begins as if it was the beginning of the whole output,
Xignoring how the preceding word ended.  (Default.)
X.TP
X.B \-s
XNegation of
X.B \-w
Xoption.
XThe first character of each word will depend on the last three characters
Xgenerated (i.e. the last two characters of the preceding word,
Xand the inter-word space).
X.TP
X.B \-l nnn
XGenerate nnn lines of output.  Default is 20.
XNo space between the \-l and the nnn.
XIf \-l is given with no argument, the output will go on (nearly) forever.
X.TP
X.B \-3
XUse trigraph frequencies instead of tetragraph frequencies.
XGives better results when input data is limited.
XThis option will automatically be used if there is insufficient memory
Xto build the tetragraph table (a warning will be given).
X.SH DIAGNOSTICS
X.I Names
Xgives a usage message if the arguments are bad.
XExits with status 0 if all went well.
XExits with status 1 if there were bad arguments (other than non-existent
Xfiles), or insufficient memory for even trigraph tables.
XNo names are generated.
XOtherwise, exits with status 2 if any files were not found
X(however, it will read all the files it could find and generate names).
X.PP
XWrites to stderr a count of the characters read
X(i.e. letters and runs of non-letters).
X.PP
XIf compiled with SHOWTABLE defined,
Xdumps the 0th to 3rd order tables to standard output
Xbefore the random names.
X.SH LIMITATIONS
XThe tetragraph counts are limited to 255 maximum,
Xthe trigraph and digraph counts to 65535,
Xand the total number of characters read to 4294967295.
X(The counts stick at their maximum values if they are reached,
Xthey do not overflow.)
X.SH FURTHER IDEAS
XA more compact form of the tetragraph tables could be used.
XThis would allow the use of a larger character set (e.g. accented letters,
Xwhich play a large part in the Elvish languages).
X.PP
XArrange to write the tetragraph tables to a file
Xand read them in again, to avoid having to reconstruct them
Xevery time.  Use some sort of compression to keep the file size down.
X(Run-length encoding might do - most of the entries will be zero.)
X.PP
XWhen insufficient input is available for good fourth-order generation,
Xinstead of using trigraph statistics a better way of improving the output
Xmight be to fudge the tetragraph frequencies a little.  Wherever a trigraph is
Xfound with only one possible successor, choose some other letter
Xand make its frequency nonzero.  Make the choice based on some
Xnotion of similarity among letters.  Such a notion should not
Xbe defined a priori, but based on the statistics of the input.
X.SH AUTHOR
XRichard Kennaway.
X.PP
Xjrk@uk.ac.uea.sys (JANET), ...mcvax!uea-sys!jrk (UUCP).
X.PP
XThis program is public domain.
XDon't bother telling me the code could be improved, I know.
XBy all means tell me of any improvements you make.
*-*-END-of-names.1-*-*
echo x - Makefile
sed 's/^X//' >Makefile <<'*-*-END-of-Makefile-*-*'
Xnames : names.c
X	cc names.c -o names
*-*-END-of-Makefile-*-*
echo x - names.c
sed 's/^X//' >names.c <<'*-*-END-of-names.c-*-*'
X/* names.c */
X/* Random name generator */
X
X/* Richard Kennaway */
X/* JANET:  jrk@uk.ac.uea.sys */
X/* UUCP:   ...mcvax!uea-sys!jrk */
X
X/* August 1989 */
X/* Public domain! */
X
X
X#define FALSE  0
X#define TRUE   1
X
X/* Choose one... */
X#define UNIX   TRUE    /* Version for Unix */
X#define MPW    FALSE   /* Version for Apple MacIntosh (MPW C) */
X
X
X/* System stuff */
X
X#include <stdio.h>
X
Xtypedef char int8;
Xtypedef unsigned char uint8;
Xtypedef short int16;
Xtypedef unsigned short uint16;
Xtypedef long int32;
Xtypedef unsigned long uint32;
X
X#define MAXUINT8		((uint8) ((int8) (-1)))
X#define MAXUINT16		((uint16) ((int16) (-1)))
X#define MAXUINT32		((uint32) ((int32) (-1)))
X
X#if MPW
X#include <QuickDraw.h>	/* need this for random numbers */
X#endif
X#if UNIX
X#define Boolean		int
Xint32 random();
X#define Random()	((int16) (random()))
X#endif
X
X#if MPW
X#define NEWLINECHAR     '\r'
X#endif
X#if UNIX
X#define NEWLINECHAR     '\n'
X#endif
X
X#define EOFCHAR     (-1)
X
Xchar *malloc();
X
X
X/* Parameters stuff */
X
Xint Argc;
Xchar **Argv;
Xint ExitStatus = 0;
X
Xint16 CurFile;
XBoolean FileArgs = FALSE;
X
XBoolean Big = TRUE, SeparateWords = TRUE;
X
X#define BREAK1		60
X#define BREAK2		75
X
Xint16 Column = 0;
Xuint32 Lines = 0;
X#define DEFAULTMAXLINES		20
Xuint32 MaxLines = DEFAULTMAXLINES;
X
X
X/* Tables */
X
X#define MAXINDEX        27
X#define SPACEINDEX      26
X#define T4SIZE		(MAXINDEX*MAXINDEX*MAXINDEX*MAXINDEX)
X
Xuint16 chartable[256];
X#define indextable(c)   ((c)==(-1) ? NEWLINECHAR : \
X                         (c)==SPACEINDEX ? ' ' : \
X                         ((c)+'a') \
X                        )
X
Xuint32 table0 = 0, *table1 = NULL;
Xuint16 **table2 = NULL, ***table3 = NULL;
Xuint8 *table4 = NULL;
X
X
X/* Memory allocation */
X
Xnomemory()
X{
X    fprintf( stderr, "Cannot get memory!%c", NEWLINECHAR );
X    ExitStatus = 1;
X    exit( ExitStatus );
X}  /* nomemory() */
X
Xgetmemory()
X{
Xuint32 i, j, k;
Xuint16 *t2, **t3, *tt3;
X
X    table1 = (uint32 *) malloc( MAXINDEX * sizeof(uint32) );
X    if (table1==NULL) nomemory();
X    table2 = (uint16 **) malloc( MAXINDEX * sizeof(uint16 *) );
X    if (table2==NULL) nomemory();
X    for (i=0; i<MAXINDEX; i++) {
X        table2[i] = NULL;
X    }
X    table3 = (uint16 ***) malloc( MAXINDEX * sizeof(uint16 **) );
X    if (table3==NULL) nomemory();
X    for (i=0; i<MAXINDEX; i++) {
X        table3[i] = NULL;
X    }
X
X    if (Big) {
X	table4 = (uint8 *) malloc( T4SIZE * sizeof(uint8) );
X	if (table4==NULL) {
X	    Big = FALSE;
X	    fprintf( stderr, "Cannot get space for 4th-order generation - using 3rd-order instead.%c",
X		NEWLINECHAR );
X	}
X	if (Big) for (i=0; i<T4SIZE; i++) table4[i] = 0;
X    }
X
X    for (i=0; i<MAXINDEX; i++) {
X        table1[i] = 0;
X
X        t2 = (uint16 *) malloc( MAXINDEX * sizeof(uint16) );
X        if (t2==NULL) nomemory();
X        table2[i] = t2;
X
X        t3 = (uint16 **) malloc( MAXINDEX * sizeof(uint16 *) );
X        if (t3==NULL) nomemory();
X        table3[i] = t3;
X        for (j=0; j<MAXINDEX; j++) {
X            t3[j] = NULL;
X        }
X        for (j=0; j<MAXINDEX; j++) {
X            t2[j] = 0;
X            tt3 = (uint16 *) malloc( MAXINDEX * sizeof(uint16) );
X            if (tt3==NULL) nomemory();
X            t3[j] = tt3;
X            for (k=0; k<MAXINDEX; k++) {
X                tt3[k] = 0;
X	    }
X        }
X    }
X}  /* getmemory() */
X
Xfreememory()
X{
Xuint16 i, j, k;
Xuint16 *t2, **t3, *tt3;
X
X    if (table1 != NULL) free( table1 );
X    if (table2 != NULL) {
X        for (i=0; i<MAXINDEX; i++) {
X            if (table2[i] != NULL) free( table2[i] );
X        }
X        free( table2 );
X    }
X    if (table3 != NULL) {
X        for (i=0; i<MAXINDEX; i++) {
X            t3 = table3[i];
X            if (t3 != NULL) {
X                for (j=0; j<MAXINDEX; j++) {
X                    if (t3[j] != NULL) free( t3[j] );
X                }
X                free( t3 );
X            }
X        }
X        free( table3 );
X    }
X    if (table4 != NULL) free( table4 );
X    table1 = NULL;
X    table2 = NULL;
X    table3 = NULL;
X    table4 = NULL;
X}  /* freememory() */
X
X
X/* Preliminary setup */
X
Xmaketranstable()
X{
Xuint16 c;
X
X    for (c=0; c<256; c++) chartable[c] = SPACEINDEX;
X    for (c='A'; c<='Z'; c++) chartable[c] = c-'A';
X    for (c='a'; c<='z'; c++) chartable[c] = c-'a';
X}  /* maketranstable() */
X
X
X/* Input */
X
XBoolean openfile()
X{
XFILE *temp;
X
X    temp = freopen( Argv[CurFile], "r", stdin );
X    if (temp == NULL) {
X    	fprintf( stderr, "%s: could not open file \"%s\"%c",
X	    Argv[0], Argv[CurFile], NEWLINECHAR );
X	ExitStatus = 2;
X    }
X    return( temp != NULL );
X}  /* Boolean openfile() */
X
XBoolean getnextfile()
X{
XFILE *temp;
X
X    while (((++CurFile) < Argc) && (! openfile())) { /* nothing */ }
X    return( CurFile < Argc );
X}  /* Boolean getnextfile() */
X
Xint16 getrawchar()
X{
Xint16 c;
X    c = getchar();
X    while ((c==EOFCHAR) && getnextfile()) {
X        c = getchar();
X    }
X    return(c);
X}  /* int16 getrawchar() */
X
X#define WASSPACE    0
X#define WASNONSPACE 1
X#define END         2
Xint16 Where = WASSPACE;
X
Xint16 nextchar()
X{
Xint16 c, result;
X
X    switch (Where) {
X        case WASSPACE:
X            while (((c = getrawchar()) != EOFCHAR) &&
X                   (chartable[c]==SPACEINDEX)) {
X                /* nothing */
X            }
X            if (c==EOFCHAR) {
X                Where = END;
X                return(-1);
X            } else {
X                Where = WASNONSPACE;
X                return(chartable[c]);
X            }
X        case WASNONSPACE:
X            c = getrawchar();
X            if (c==EOFCHAR) {
X                Where = END;
X                return(SPACEINDEX);
X            } else {
X                result = chartable[c];
X                if (result==SPACEINDEX) Where = WASSPACE;
X                return(result);
X            }
X        case END:
X            return(-1);
X    }
X    return(-1);	/* Never happens. */
X}  /* int16 nextchar() */
X
Xentergroup( a, b, c, d )
Xint16 a, b, c, d;
X{
Xuint32 ind;
XBoolean do_it;
X
X    if (table0 >= MAXUINT32) return;
X    do_it = table1[a] < MAXUINT16;
X    if (Big && do_it) {
X	ind = (((((a*MAXINDEX) + b)*MAXINDEX) + c)*MAXINDEX) + d;
X	do_it = table4[ind] < MAXUINT8;
X	if (do_it) table4[ind]++;
X    }
X    if (do_it) {
X	table0++;
X	table1[a]++;
X	table2[a][b]++;
X	table3[a][b][c]++;
X    }
X}  /* entergroup( a, b, c, d ) */
X
Xbuildtable()
X{
Xint16 a0, b0, c0, a, b, c, d;
X
X    a0 = nextchar();
X    b0 = nextchar();
X    c0 = nextchar();
X    if (c0 == -1) return;
X    a = a0;  b = b0;  c = c0;
X    while ((d = nextchar()) != (-1)) {
X    	entergroup( a, b, c, d );
X	a = b;  b = c;  c = d;
X    }
X    if (c==SPACEINDEX) {
X	entergroup( a, b, c, a0 );
X	entergroup( b, c, a0, b0 );
X	entergroup( c, a0, b0, c0 );
X    } else {
X	entergroup( a, b, c, SPACEINDEX );
X	entergroup( b, c, SPACEINDEX, a0 );
X	entergroup( c, SPACEINDEX, a0, b0 );
X	entergroup( SPACEINDEX, a0, b0, c0 );
X    }
X}  /* buildtable() */
X
X
X/* Dump the 0th to 3rd order tables.  Not the 4th-order! */
X/* Only called if SHOWTABLE is defined at compile time. */
X
Xshowtable()
X{
Xuint16 i, j, k;
Xuint16 *t2, **t3, *tt3;
X
X    for (i=0; i<MAXINDEX; i++) if (table1[i] != 0) {
X        printf( "%c\t%lu%c\t\ttot", i+'a', table1[i], NEWLINECHAR );
X        for (k=0; k<MAXINDEX; k++) {
X            printf( "\t%c", k+'a' );
X        }
X        putchar( NEWLINECHAR );
X        t2 = table2[i];
X        t3 = table3[i];
X        for (j=0; j<MAXINDEX; j++) if (t2[j] != 0) {
X            printf( "\t%c\t%u", j+'a', t2[j] );
X            tt3 = t3[j];
X            for (k=0; k<MAXINDEX; k++) {
X                putchar( '\t' );
X                if (tt3[k]==0) putchar( '-' );
X                else printf( "%u", tt3[k] );
X            }
X            putchar( NEWLINECHAR );
X        }
X        putchar( NEWLINECHAR );
X    }
X}  /* showtable() */
X
X
X/* Generation of output */
X
Xint16 randint( max )
Xuint16 max;
X{
X    return( max==0 ? 0 : (int16) (((uint16) Random())%max) );
X}  /* int16 randint( max ) */
X
Xint16 randchoice8( tot, dist )
Xuint32 tot;
Xuint8 *dist;
X{
Xint16 i, j;
X
X    if (tot==0) return(-1);
X    i = randint( tot );
X    for (j=0; j<MAXINDEX; j++) {
X        i -= dist[j];
X        if (i < 0) {
X            return(j);
X	}
X    }
X    return( -1 );	/* Should never happen. */
X}  /* int16 randchoice8( tot, dist ) */
X
Xint16 randchoice16( tot, dist )
Xuint32 tot;
Xuint16 *dist;
X{
Xint16 i, j;
X
X    if (tot==0) return(-1);
X    i = randint( tot );
X    for (j=0; j<MAXINDEX; j++) {
X        i -= dist[j];
X        if (i < 0) {
X            return(j);
X	}
X    }
X    return( -1 );	/* Should never happen. */
X}  /* int16 randchoice16( tot, dist ) */
X
Xint16 randchoice32( tot, dist )
Xuint32 tot;
Xuint32 *dist;
X{
Xint16 i, j;
X
X    if (tot==0) return(-1);
X    i = randint( tot );
X    for (j=0; j<MAXINDEX; j++) {
X        i -= dist[j];
X        if (i<0) return(j);
X    }
X    return( -1 );	/* Should never happen. */
X}  /* int16 randchoice32( tot, dist ) */
X
Xoutchar( c )
Xchar c;
X{
X    if ((c=='.') || (c==' ')) {
X	if (Column > BREAK1) {
X	    if (c=='.') putchar('.');
X	    putchar( NEWLINECHAR );
X	    Column = 0;  Lines++;
X	} else {
X	    if (c=='.') { putchar('.');  putchar(' ');  Column += 2; }
X	    putchar(' ');  Column++;
X	}
X    } else {
X	if (Column > BREAK2) {
X	    putchar('-');  putchar( NEWLINECHAR );  Column = 0;  Lines++;
X	}
X	putchar(c);  Column++;
X    }
X}  /* outchar( c ) */
X
Xgenerateword()
X{
Xint16 a, b, c, d;
X
X    a = SPACEINDEX;
X    b = randchoice16( table1[a], table2[a] );
X    if (b==(-1)) return;
X    outchar( indextable(b) );
X    c = randchoice16( table2[a][b], table3[a][b] );
X    if (c==(-1)) return;
X    outchar( indextable(c) );
X    while (Lines < MaxLines) {
X	if (Big) {
X	    d = randchoice8( table3[a][b][c], &(table4[ ((((a*MAXINDEX)+b)*MAXINDEX)+c)*MAXINDEX ]) );
X	} else {
X	    d = randchoice16( table2[b][c], table3[b][c] );
X	}
X	if (d==(-1)) {
X	    outchar( '.' );
X	    d = SPACEINDEX;
X	} else {
X	    outchar( indextable(d) );
X	}
X	if (SeparateWords && (d==SPACEINDEX)) return;
X        a = b;  b = c;  c = d;
X    }
X}  /* generateword() */
X
Xgenerate()
X{
X    if (table0 > 0) while (Lines < MaxLines) generateword();
X}  /* generate() */
X
X
X/* Argument parsing */
X
Xusageerror()
X{
X    fprintf( stderr, "Usage: %s [-3] [-s|-w] [-lnnn] [file]%c",
X    	Argv[0], NEWLINECHAR );
X    fprintf( stderr, "\t-3: 3rd-order statistics (default is 4th-order)%c",
X    	NEWLINECHAR );
X    fprintf( stderr, "\t-w: successive words are independent (default)%c",
X    	NEWLINECHAR );
X    fprintf( stderr, "\t-s: (sentences) successive words are dependent%c",
X    	NEWLINECHAR );
X    fprintf( stderr, "\t-lnnn: Generate nnn lines of output (default %d).%c",
X    	DEFAULTMAXLINES, NEWLINECHAR );
X    ExitStatus = 1;
X    exit( ExitStatus );
X}  /* usageerror() */
X
Xprocessoptions()
X{
Xint i;
X
X    CurFile = Argc;
X    for (i=1; i<Argc; i++) {
X    	if (Argv[i][0] == '-') {
X	    switch (Argv[i][1]) {
X	    	case 's':
X		    SeparateWords = FALSE;
X		    break;
X	    	case 'w':
X		    SeparateWords = TRUE;
X		    break;
X		case '3':
X		    Big = FALSE;
X		    break;
X		case 'l':
X		    if (Argv[i][2]==0) {
X			MaxLines = MAXUINT32;
X		    } else if ((sscanf( &(Argv[i][2]), "%lu", &MaxLines ) != 1) ||
X			(MaxLines < 0)) {
X		        usageerror();  /* exits */
X		    }
X		    break;
X		default:
X		    usageerror();  /* exits */
X	    }
X	} else if (Argv[i][0] == 0) {
X	    FileArgs = FALSE;
X	} else {
X	    FileArgs = TRUE;
X	    CurFile = i-1;
X	    getnextfile();
X	    return;
X	}
X    }
X}  /* processoptions() */
X
X
X/* Control */
X
X#if UNIX
Xcleanup( status, ignore )
Xint status;
Xchar *ignore;
X#endif
X#if MPW
Xvoid cleanup( status )
Xint status;
X#endif
X{
X    freememory();
X}  /* cleanup( status, ignore ) */
X
Xmain( argc, argv )
Xint argc;
Xchar **argv;
X{
X#if MPW
X    InitGraf( &(qd.thePort) );  /* for random numbers */
X    GetDateTime( &(qd.randSeed) );
X#endif
X#if UNIX
X    srandom( time(0) );
X#endif
X
X    Argc = argc;  Argv = argv;
X#if UNIX
X    on_exit( cleanup, NULL );	/* probably not necessary */
X#endif
X#if MPW
X    onexit( cleanup );		/* maybe necessary */
X#endif
X    processoptions();
X    maketranstable();
X    getmemory();
X    buildtable();
X    fprintf( stderr, "%u characters%c", table0, NEWLINECHAR );
X#ifdef SHOWTABLE
X    showtable();
X#endif
X    generate();
X    exit( ExitStatus );
X}  /* main() */
*-*-END-of-names.c-*-*
exit