allbery@uunet.UU.NET (Brandon S. Allbery - comp.sources.misc) (08/08/89)
Posting-number: Volume 7, Issue 124 Submitted-by: jrk@sys.uea.ac.uk (Richard Kennaway) Archive-name: names_jrk This is names.c, a program for generating random names for FRP characters. Unlike many other such programs, this one will generate names to match any language you like. Feed it with text in that language, and it will generate words statistically similar to the input text. For example, here is some of the output it gives when fed with the Sindarin words from a Sindarin-English dictionary: annun ossen bered lamedo tolbrandirithron meregil arad doriel lothrond nim min rohir carch menel caradan uil las tolbrant arahad dol egalen rhiw iath remmen celeth arveduin elwing benn min forlan uil angborn morgai arad torn dain char thond toreth anfaladel As you can see, not all the output is directly usable, but by exercising some selection you can obtain results like: Ossiriel, Eredhel, Belain, Minarwen, Gwathlain, Gundaer, Suldor Belebrethand, Berielegor, Gwairithir, Gaurgor, Nardol, Sammathremmir,... For comparison, here's some output from the food-and-drink section of a German phrasebook :-): kursch rhampelebans prottelm tradivier en bohl sauber arnen men rautt kabbeer banineln stetschahn blummeloneulen sarneclacher men chwarschen aal raustdorelone en garscht karadie blat raube sch kirschte protten flen mohl arderben audelspinguse trauchel Runs on Macintosh (if you have MPW) and Unix. Public domain. Share and enjoy. -- Richard Kennaway SYS, University of East Anglia, Norwich, U.K. uucp: ...mcvax!ukc!uea-sys!jrk Janet: kennaway@uk.ac.uea.sys -----cut-----cut-----cut-----cut-----cut-----cut-----cut-----cut-----cut----- #! /bin/sh # This is a shar archive. "sh" it to unpack. # Contents: names.1 Makefile names.c echo x - names.1 sed 's/^X//' >names.1 <<'*-*-END-of-names.1-*-*' X.TH NAMES 1 "August 1989" X.UC X.SH NAME Xnames \- generate random names X.SH SYNOPSIS X.B names X[ X.B \-3 X] [ X.BR \-w\ |\ \-s X] [ X.B -l X.I nnn X] [ X.I files X] X.SH DESCRIPTION X.I Names Xis a random name generator. XIt will read text from standard input or from files given on the Xcommand line, and generate a random stream of words whose statistical Xcharacteristics are similar to those of the input. XThus if you give it a database of Elvish names, it will generate Elvish-like Xnames; if a database of Orcish names, it generates Orc-like names, etc. X.PP XIt does this by counting the frequency of all 1-, 2-, 3-, and 4-character Xsequences of letters or spaces in the input. XCase is ignored, and all runs of non-letters are seen as single spaces. XThe first character to be output, say "a", Xis generated according to the relative frequencies Xwith which each character was found to follow a space in the input. XThe second, say "b", is generated according to the relative frequencies with Xwhich each character can appear following the digraph " a". XThe third is generated according to the relative frequencies with which Xeach character follows the trigraph " ab". XAnd so on ad infinitum, each new character depending on the previous three. X.PP XThe larger the input, the better. XIt needs at least a few thousand bytes of input for useful results. XIf the input is not large enough, Xyou will tend to get words from the input appearing verbatim in the Xoutput, as much of the time three consecutive characters will uniquely Xdetermine the next character. XIf more input of the desired form is not available, the program can be Xmade to use a third-order approximation instead, each character of the Xoutput depending only on the two preceding characters. This is also Xuseful if there is not enough memory to construct the tetragraph Xtable, which occupies just over half a megabyte. X.PP XThe output is wrapped to 76 chars maximum, hyphenating any word Xthat has to be broken over a line-end. X.PP X.I Names Xwill run on Unix, and on a Macintosh as an MPW shell tool. X.SH OPTIONS X.TP X.B \-w XGenerate successive words independently, Xi.e. each word begins as if it was the beginning of the whole output, Xignoring how the preceding word ended. (Default.) X.TP X.B \-s XNegation of X.B \-w Xoption. XThe first character of each word will depend on the last three characters Xgenerated (i.e. the last two characters of the preceding word, Xand the inter-word space). X.TP X.B \-l nnn XGenerate nnn lines of output. Default is 20. XNo space between the \-l and the nnn. XIf \-l is given with no argument, the output will go on (nearly) forever. X.TP X.B \-3 XUse trigraph frequencies instead of tetragraph frequencies. XGives better results when input data is limited. XThis option will automatically be used if there is insufficient memory Xto build the tetragraph table (a warning will be given). X.SH DIAGNOSTICS X.I Names Xgives a usage message if the arguments are bad. XExits with status 0 if all went well. XExits with status 1 if there were bad arguments (other than non-existent Xfiles), or insufficient memory for even trigraph tables. XNo names are generated. XOtherwise, exits with status 2 if any files were not found X(however, it will read all the files it could find and generate names). X.PP XWrites to stderr a count of the characters read X(i.e. letters and runs of non-letters). X.PP XIf compiled with SHOWTABLE defined, Xdumps the 0th to 3rd order tables to standard output Xbefore the random names. X.SH LIMITATIONS XThe tetragraph counts are limited to 255 maximum, Xthe trigraph and digraph counts to 65535, Xand the total number of characters read to 4294967295. X(The counts stick at their maximum values if they are reached, Xthey do not overflow.) X.SH FURTHER IDEAS XA more compact form of the tetragraph tables could be used. XThis would allow the use of a larger character set (e.g. accented letters, Xwhich play a large part in the Elvish languages). X.PP XArrange to write the tetragraph tables to a file Xand read them in again, to avoid having to reconstruct them Xevery time. Use some sort of compression to keep the file size down. X(Run-length encoding might do - most of the entries will be zero.) X.PP XWhen insufficient input is available for good fourth-order generation, Xinstead of using trigraph statistics a better way of improving the output Xmight be to fudge the tetragraph frequencies a little. Wherever a trigraph is Xfound with only one possible successor, choose some other letter Xand make its frequency nonzero. Make the choice based on some Xnotion of similarity among letters. Such a notion should not Xbe defined a priori, but based on the statistics of the input. X.SH AUTHOR XRichard Kennaway. X.PP Xjrk@uk.ac.uea.sys (JANET), ...mcvax!uea-sys!jrk (UUCP). X.PP XThis program is public domain. XDon't bother telling me the code could be improved, I know. XBy all means tell me of any improvements you make. *-*-END-of-names.1-*-* echo x - Makefile sed 's/^X//' >Makefile <<'*-*-END-of-Makefile-*-*' Xnames : names.c X cc names.c -o names *-*-END-of-Makefile-*-* echo x - names.c sed 's/^X//' >names.c <<'*-*-END-of-names.c-*-*' X/* names.c */ X/* Random name generator */ X X/* Richard Kennaway */ X/* JANET: jrk@uk.ac.uea.sys */ X/* UUCP: ...mcvax!uea-sys!jrk */ X X/* August 1989 */ X/* Public domain! */ X X X#define FALSE 0 X#define TRUE 1 X X/* Choose one... */ X#define UNIX TRUE /* Version for Unix */ X#define MPW FALSE /* Version for Apple MacIntosh (MPW C) */ X X X/* System stuff */ X X#include <stdio.h> X Xtypedef char int8; Xtypedef unsigned char uint8; Xtypedef short int16; Xtypedef unsigned short uint16; Xtypedef long int32; Xtypedef unsigned long uint32; X X#define MAXUINT8 ((uint8) ((int8) (-1))) X#define MAXUINT16 ((uint16) ((int16) (-1))) X#define MAXUINT32 ((uint32) ((int32) (-1))) X X#if MPW X#include <QuickDraw.h> /* need this for random numbers */ X#endif X#if UNIX X#define Boolean int Xint32 random(); X#define Random() ((int16) (random())) X#endif X X#if MPW X#define NEWLINECHAR '\r' X#endif X#if UNIX X#define NEWLINECHAR '\n' X#endif X X#define EOFCHAR (-1) X Xchar *malloc(); X X X/* Parameters stuff */ X Xint Argc; Xchar **Argv; Xint ExitStatus = 0; X Xint16 CurFile; XBoolean FileArgs = FALSE; X XBoolean Big = TRUE, SeparateWords = TRUE; X X#define BREAK1 60 X#define BREAK2 75 X Xint16 Column = 0; Xuint32 Lines = 0; X#define DEFAULTMAXLINES 20 Xuint32 MaxLines = DEFAULTMAXLINES; X X X/* Tables */ X X#define MAXINDEX 27 X#define SPACEINDEX 26 X#define T4SIZE (MAXINDEX*MAXINDEX*MAXINDEX*MAXINDEX) X Xuint16 chartable[256]; X#define indextable(c) ((c)==(-1) ? NEWLINECHAR : \ X (c)==SPACEINDEX ? ' ' : \ X ((c)+'a') \ X ) X Xuint32 table0 = 0, *table1 = NULL; Xuint16 **table2 = NULL, ***table3 = NULL; Xuint8 *table4 = NULL; X X X/* Memory allocation */ X Xnomemory() X{ X fprintf( stderr, "Cannot get memory!%c", NEWLINECHAR ); X ExitStatus = 1; X exit( ExitStatus ); X} /* nomemory() */ X Xgetmemory() X{ Xuint32 i, j, k; Xuint16 *t2, **t3, *tt3; X X table1 = (uint32 *) malloc( MAXINDEX * sizeof(uint32) ); X if (table1==NULL) nomemory(); X table2 = (uint16 **) malloc( MAXINDEX * sizeof(uint16 *) ); X if (table2==NULL) nomemory(); X for (i=0; i<MAXINDEX; i++) { X table2[i] = NULL; X } X table3 = (uint16 ***) malloc( MAXINDEX * sizeof(uint16 **) ); X if (table3==NULL) nomemory(); X for (i=0; i<MAXINDEX; i++) { X table3[i] = NULL; X } X X if (Big) { X table4 = (uint8 *) malloc( T4SIZE * sizeof(uint8) ); X if (table4==NULL) { X Big = FALSE; X fprintf( stderr, "Cannot get space for 4th-order generation - using 3rd-order instead.%c", X NEWLINECHAR ); X } X if (Big) for (i=0; i<T4SIZE; i++) table4[i] = 0; X } X X for (i=0; i<MAXINDEX; i++) { X table1[i] = 0; X X t2 = (uint16 *) malloc( MAXINDEX * sizeof(uint16) ); X if (t2==NULL) nomemory(); X table2[i] = t2; X X t3 = (uint16 **) malloc( MAXINDEX * sizeof(uint16 *) ); X if (t3==NULL) nomemory(); X table3[i] = t3; X for (j=0; j<MAXINDEX; j++) { X t3[j] = NULL; X } X for (j=0; j<MAXINDEX; j++) { X t2[j] = 0; X tt3 = (uint16 *) malloc( MAXINDEX * sizeof(uint16) ); X if (tt3==NULL) nomemory(); X t3[j] = tt3; X for (k=0; k<MAXINDEX; k++) { X tt3[k] = 0; X } X } X } X} /* getmemory() */ X Xfreememory() X{ Xuint16 i, j, k; Xuint16 *t2, **t3, *tt3; X X if (table1 != NULL) free( table1 ); X if (table2 != NULL) { X for (i=0; i<MAXINDEX; i++) { X if (table2[i] != NULL) free( table2[i] ); X } X free( table2 ); X } X if (table3 != NULL) { X for (i=0; i<MAXINDEX; i++) { X t3 = table3[i]; X if (t3 != NULL) { X for (j=0; j<MAXINDEX; j++) { X if (t3[j] != NULL) free( t3[j] ); X } X free( t3 ); X } X } X free( table3 ); X } X if (table4 != NULL) free( table4 ); X table1 = NULL; X table2 = NULL; X table3 = NULL; X table4 = NULL; X} /* freememory() */ X X X/* Preliminary setup */ X Xmaketranstable() X{ Xuint16 c; X X for (c=0; c<256; c++) chartable[c] = SPACEINDEX; X for (c='A'; c<='Z'; c++) chartable[c] = c-'A'; X for (c='a'; c<='z'; c++) chartable[c] = c-'a'; X} /* maketranstable() */ X X X/* Input */ X XBoolean openfile() X{ XFILE *temp; X X temp = freopen( Argv[CurFile], "r", stdin ); X if (temp == NULL) { X fprintf( stderr, "%s: could not open file \"%s\"%c", X Argv[0], Argv[CurFile], NEWLINECHAR ); X ExitStatus = 2; X } X return( temp != NULL ); X} /* Boolean openfile() */ X XBoolean getnextfile() X{ XFILE *temp; X X while (((++CurFile) < Argc) && (! openfile())) { /* nothing */ } X return( CurFile < Argc ); X} /* Boolean getnextfile() */ X Xint16 getrawchar() X{ Xint16 c; X c = getchar(); X while ((c==EOFCHAR) && getnextfile()) { X c = getchar(); X } X return(c); X} /* int16 getrawchar() */ X X#define WASSPACE 0 X#define WASNONSPACE 1 X#define END 2 Xint16 Where = WASSPACE; X Xint16 nextchar() X{ Xint16 c, result; X X switch (Where) { X case WASSPACE: X while (((c = getrawchar()) != EOFCHAR) && X (chartable[c]==SPACEINDEX)) { X /* nothing */ X } X if (c==EOFCHAR) { X Where = END; X return(-1); X } else { X Where = WASNONSPACE; X return(chartable[c]); X } X case WASNONSPACE: X c = getrawchar(); X if (c==EOFCHAR) { X Where = END; X return(SPACEINDEX); X } else { X result = chartable[c]; X if (result==SPACEINDEX) Where = WASSPACE; X return(result); X } X case END: X return(-1); X } X return(-1); /* Never happens. */ X} /* int16 nextchar() */ X Xentergroup( a, b, c, d ) Xint16 a, b, c, d; X{ Xuint32 ind; XBoolean do_it; X X if (table0 >= MAXUINT32) return; X do_it = table1[a] < MAXUINT16; X if (Big && do_it) { X ind = (((((a*MAXINDEX) + b)*MAXINDEX) + c)*MAXINDEX) + d; X do_it = table4[ind] < MAXUINT8; X if (do_it) table4[ind]++; X } X if (do_it) { X table0++; X table1[a]++; X table2[a][b]++; X table3[a][b][c]++; X } X} /* entergroup( a, b, c, d ) */ X Xbuildtable() X{ Xint16 a0, b0, c0, a, b, c, d; X X a0 = nextchar(); X b0 = nextchar(); X c0 = nextchar(); X if (c0 == -1) return; X a = a0; b = b0; c = c0; X while ((d = nextchar()) != (-1)) { X entergroup( a, b, c, d ); X a = b; b = c; c = d; X } X if (c==SPACEINDEX) { X entergroup( a, b, c, a0 ); X entergroup( b, c, a0, b0 ); X entergroup( c, a0, b0, c0 ); X } else { X entergroup( a, b, c, SPACEINDEX ); X entergroup( b, c, SPACEINDEX, a0 ); X entergroup( c, SPACEINDEX, a0, b0 ); X entergroup( SPACEINDEX, a0, b0, c0 ); X } X} /* buildtable() */ X X X/* Dump the 0th to 3rd order tables. Not the 4th-order! */ X/* Only called if SHOWTABLE is defined at compile time. */ X Xshowtable() X{ Xuint16 i, j, k; Xuint16 *t2, **t3, *tt3; X X for (i=0; i<MAXINDEX; i++) if (table1[i] != 0) { X printf( "%c\t%lu%c\t\ttot", i+'a', table1[i], NEWLINECHAR ); X for (k=0; k<MAXINDEX; k++) { X printf( "\t%c", k+'a' ); X } X putchar( NEWLINECHAR ); X t2 = table2[i]; X t3 = table3[i]; X for (j=0; j<MAXINDEX; j++) if (t2[j] != 0) { X printf( "\t%c\t%u", j+'a', t2[j] ); X tt3 = t3[j]; X for (k=0; k<MAXINDEX; k++) { X putchar( '\t' ); X if (tt3[k]==0) putchar( '-' ); X else printf( "%u", tt3[k] ); X } X putchar( NEWLINECHAR ); X } X putchar( NEWLINECHAR ); X } X} /* showtable() */ X X X/* Generation of output */ X Xint16 randint( max ) Xuint16 max; X{ X return( max==0 ? 0 : (int16) (((uint16) Random())%max) ); X} /* int16 randint( max ) */ X Xint16 randchoice8( tot, dist ) Xuint32 tot; Xuint8 *dist; X{ Xint16 i, j; X X if (tot==0) return(-1); X i = randint( tot ); X for (j=0; j<MAXINDEX; j++) { X i -= dist[j]; X if (i < 0) { X return(j); X } X } X return( -1 ); /* Should never happen. */ X} /* int16 randchoice8( tot, dist ) */ X Xint16 randchoice16( tot, dist ) Xuint32 tot; Xuint16 *dist; X{ Xint16 i, j; X X if (tot==0) return(-1); X i = randint( tot ); X for (j=0; j<MAXINDEX; j++) { X i -= dist[j]; X if (i < 0) { X return(j); X } X } X return( -1 ); /* Should never happen. */ X} /* int16 randchoice16( tot, dist ) */ X Xint16 randchoice32( tot, dist ) Xuint32 tot; Xuint32 *dist; X{ Xint16 i, j; X X if (tot==0) return(-1); X i = randint( tot ); X for (j=0; j<MAXINDEX; j++) { X i -= dist[j]; X if (i<0) return(j); X } X return( -1 ); /* Should never happen. */ X} /* int16 randchoice32( tot, dist ) */ X Xoutchar( c ) Xchar c; X{ X if ((c=='.') || (c==' ')) { X if (Column > BREAK1) { X if (c=='.') putchar('.'); X putchar( NEWLINECHAR ); X Column = 0; Lines++; X } else { X if (c=='.') { putchar('.'); putchar(' '); Column += 2; } X putchar(' '); Column++; X } X } else { X if (Column > BREAK2) { X putchar('-'); putchar( NEWLINECHAR ); Column = 0; Lines++; X } X putchar(c); Column++; X } X} /* outchar( c ) */ X Xgenerateword() X{ Xint16 a, b, c, d; X X a = SPACEINDEX; X b = randchoice16( table1[a], table2[a] ); X if (b==(-1)) return; X outchar( indextable(b) ); X c = randchoice16( table2[a][b], table3[a][b] ); X if (c==(-1)) return; X outchar( indextable(c) ); X while (Lines < MaxLines) { X if (Big) { X d = randchoice8( table3[a][b][c], &(table4[ ((((a*MAXINDEX)+b)*MAXINDEX)+c)*MAXINDEX ]) ); X } else { X d = randchoice16( table2[b][c], table3[b][c] ); X } X if (d==(-1)) { X outchar( '.' ); X d = SPACEINDEX; X } else { X outchar( indextable(d) ); X } X if (SeparateWords && (d==SPACEINDEX)) return; X a = b; b = c; c = d; X } X} /* generateword() */ X Xgenerate() X{ X if (table0 > 0) while (Lines < MaxLines) generateword(); X} /* generate() */ X X X/* Argument parsing */ X Xusageerror() X{ X fprintf( stderr, "Usage: %s [-3] [-s|-w] [-lnnn] [file]%c", X Argv[0], NEWLINECHAR ); X fprintf( stderr, "\t-3: 3rd-order statistics (default is 4th-order)%c", X NEWLINECHAR ); X fprintf( stderr, "\t-w: successive words are independent (default)%c", X NEWLINECHAR ); X fprintf( stderr, "\t-s: (sentences) successive words are dependent%c", X NEWLINECHAR ); X fprintf( stderr, "\t-lnnn: Generate nnn lines of output (default %d).%c", X DEFAULTMAXLINES, NEWLINECHAR ); X ExitStatus = 1; X exit( ExitStatus ); X} /* usageerror() */ X Xprocessoptions() X{ Xint i; X X CurFile = Argc; X for (i=1; i<Argc; i++) { X if (Argv[i][0] == '-') { X switch (Argv[i][1]) { X case 's': X SeparateWords = FALSE; X break; X case 'w': X SeparateWords = TRUE; X break; X case '3': X Big = FALSE; X break; X case 'l': X if (Argv[i][2]==0) { X MaxLines = MAXUINT32; X } else if ((sscanf( &(Argv[i][2]), "%lu", &MaxLines ) != 1) || X (MaxLines < 0)) { X usageerror(); /* exits */ X } X break; X default: X usageerror(); /* exits */ X } X } else if (Argv[i][0] == 0) { X FileArgs = FALSE; X } else { X FileArgs = TRUE; X CurFile = i-1; X getnextfile(); X return; X } X } X} /* processoptions() */ X X X/* Control */ X X#if UNIX Xcleanup( status, ignore ) Xint status; Xchar *ignore; X#endif X#if MPW Xvoid cleanup( status ) Xint status; X#endif X{ X freememory(); X} /* cleanup( status, ignore ) */ X Xmain( argc, argv ) Xint argc; Xchar **argv; X{ X#if MPW X InitGraf( &(qd.thePort) ); /* for random numbers */ X GetDateTime( &(qd.randSeed) ); X#endif X#if UNIX X srandom( time(0) ); X#endif X X Argc = argc; Argv = argv; X#if UNIX X on_exit( cleanup, NULL ); /* probably not necessary */ X#endif X#if MPW X onexit( cleanup ); /* maybe necessary */ X#endif X processoptions(); X maketranstable(); X getmemory(); X buildtable(); X fprintf( stderr, "%u characters%c", table0, NEWLINECHAR ); X#ifdef SHOWTABLE X showtable(); X#endif X generate(); X exit( ExitStatus ); X} /* main() */ *-*-END-of-names.c-*-* exit