ecragg@GMUVAX.GMU.EDU ("EDWARD CRAGG") (02/23/88)
Does anyone have either ideas or experience related to a hashing function in Ada for converting a string of unspecified length (probably less than 256) to an integer that doesn't eat you alive trying to get around all the type checking? Ideally I would like to use something like the old Fortran trick of equivalencing the string to an LOGICAL*4, XORing the component 4 byte units, and taking the absolute value of the result as an integer to be divided by the table size to give the hash code as the remainder. However the type conversion using attributes (to keep totaly machine independent) is eating me alive. Does anybody know what kind of hashing function the compiler writers are using? (Or is that the real reason Ada compilers are so slow :-) ------------------------------------------------------------------- Edward E Cragg Bitnet: ECRAGG@GMUVAX Internet: ECRAGG@GMUVAX.GMU.EDU ------
djones@megatest.UUCP (Dave Jones) (02/27/88)
in article <8802231548.AA16212@ajpo.sei.cmu.edu>, ecragg@GMUVAX.GMU.EDU ("EDWARD CRAGG") says: > > Does anyone have either ideas or experience related to a hashing > function in Ada ... > I don't speak Ada. So I don't know if my input will help at all, but I'll give it a try. (If you are wondering how I came to find this posting: a cohort told me about it, and suggested I might be able to help.) Like I said, I don't speak Ada, but if conversion of characters into unsigned integers is efficient (it should be), you can use the following hash function, which I will write in my native dialect, C: int hash(str) char *str; { int result; while(*str) result += result + (unsigned int)(*str++); return (result % table_size); } If you don't speak C, we are going to need an interpreter. I have discovered a rehashing technique which will save time on the mod-by-table-size part. It is based on the rather remarkable fact, which I have very tediously proven, that the range of the sequence defined below is the set of all numbers in 0..table_size-1 which are congruent to 0 or 3 mod 4, provided that table_size is a power of two. S(0) = 0 S(n+1) = ((S(n) + 1)*3) % table_size You use the function R(n) = ((n+1)*3) % table_size as a rehash function for collision resolution. You always use a table_size which is an even power of two, so that modding it out can be strength-reduced to masking with table_size-1. (Will the Ada strong-typing let you do that?) I am including a sharfile which may be of some use. It is a hash-table package written in C++. If you can translate it into Ada you are in business. But the type-checking may make that impossible. In C++ you have strong type-checking when you want it, but if it gets in the way, you can chuck it. Good luck, Dave Jones Megatest Corp. 880 Fox Lane San Jose, CA. 95131 (408) 437-9700 Ext 3227 UUCP: ucbvax!sun!megatest!djones ARPA: megatest!djones@riacs.ARPA #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh <file", e.g.. If this archive is complete, you # will see the following message at the end: # "End of shell archive." # Contents: DISCLAIMER Assoc.C Assoc.H Assoc.doc assoc_demo.C heap.C # heap.H heap.doc idents.C idents.H idents.doc queue.C queue.H # queue.doc # Wrapped by djones@goofy on Fri Feb 26 15:35:22 1988 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f DISCLAIMER -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"DISCLAIMER\" else echo shar: Extracting \"DISCLAIMER\" \(243 characters\) sed "s/^X//" >DISCLAIMER <<'END_OF_DISCLAIMER' X X X// X// DISCLAIMER. X// X// Neither the author nor Megatest Corp. issues any waranty of the X// contents of these files. No claim is made that they are adequate to any X// purpose, nor that any is error-free, nor that any works at all. X// END_OF_DISCLAIMER if test 243 -ne `wc -c <DISCLAIMER`; then echo shar: \"DISCLAIMER\" unpacked with wrong size! fi # end of overwriting check fi if test -f Assoc.C -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"Assoc.C\" else echo shar: Extracting \"Assoc.C\" \(6850 characters\) sed "s/^X//" >Assoc.C <<'END_OF_Assoc.C' X X#include "Assoc.H" X X// This is a ASSOCIATIVE ARRAY class, which maps arbitrary things to X// structures of arbitary size. There is also an ITERATOR CLASS for X// the associative arrays. X// X// C.f. C++ Programming Language book, sections 2.3.10 and 6.7. X// X// X X// Author: Dave Jones X X// X// NOTES: X// X// Here's how it works. We keep an array of Pairs, X// { char* key; VOID* contents } X// X// A Pair is valid if and only if the contents is not null. X// If contents is not null, it points to a struct of the given size. X// We hash into the array to find keys, using a rehash function for X// collision avoidance. If you look up a previously invalid key, X// we create a valid Pair for it. By default, the first long int of a X// new field is zeroed, so a user may use it to determine whether or not an X// array element is a new one or not. Users may specify some other X// initialization routine. (It's "virtual".) X// X// To save space, the user may invalidate entries with the procedure remove(). X// X// When the array approaches half full, we double its size. X// X// WARNING: These algorithms are quite tricky. They are built for speed, X// not clarity. If you change something, you may very well introduce a bug X// which will not manifest itself except in very infrequent situations, X// and which will be impossible to track down. Fore-warned is skeptical, X// but fore-armed. X// X// For example, the algorithms will not work unless the table size is a X// power of two, and the table is kept strictly less than half full. X// In particular, the lookup routine only provably terminates X// because for a given pos. int. M, the sequence X// X// s(0) = 0 X// X// s(i+1) = ((s(i)+1)*3) mod 2**M X// X// repeats after exactly 2**(M-1) steps. I have proved this true X// for all pos. ints. M, but my proof is very inelegant. You can X// prove it true for small M, say less than 33, with a simple computer X// program. I have done that also, just to be on the safe side. X// X X X Xstatic Assoc_entry* new_table(int); Xstatic int round_up_to_pwr2(int); X X XAssoc::Assoc( int struct_size, int initial_size) X : heap(struct_size) X{ X size = round_up_to_pwr2(initial_size); X num_entries = 0; X max_entries = size/2 - 1; X mask = size - 1; // Binary arithmetic. x mod size == x & (size-1) X hash_table = new_table(size); X} X X X X X XAssoc::~Assoc() X{ X Assoc_iterator next(*this); X Assoc_entry* bucket; X X while(bucket=next()) X delete_key(bucket->key); X X delete hash_table; X} X X X X XVOID* XAssoc::lookup( char* key ) X{ X // The lookup may add an entry, so.. X if ( num_entries >= max_entries ) overflow(); X X register int bucket_number = HASH(key); X register Assoc_entry * bucket; X X while(1) X { X X bucket = hash_table + bucket_number; X X if ( bucket->contents == 0 ) X { X bucket->key = copy_key(key); X bucket->contents = (VOID*)heap.alloc(); X init(bucket->contents); X num_entries++; X break; // <====== created new entry X } X X if ( !equiv( bucket->key, key) ) X { X bucket_number = REHASH(bucket_number); X continue; // <====== search some more (collision) X } X X break; // <====== found old Assoc_entry X X } X X return bucket->contents; X X} X X X Xvoid XAssoc::overflow() X{ X Assoc_entry* old_hash = hash_table; X int old_size = size; X X max_entries = size - 1; X size = size * 2; X mask = size - 1; X hash_table = new_table(size); X X /* Take everything that was in the old table, and put it X ** into the new one. X */ X X register int recno; X for (recno = 0; recno < old_size; recno++) X { Assoc_entry *mem = old_hash + recno; X if ( mem->key ) X { X register int bucket_number = HASH(mem->key); X while(1) X { X register Assoc_entry* bucket = hash_table + bucket_number; X if ( bucket->contents == 0 ) X { X bucket->key = mem->key; X bucket->contents = mem->contents; X break; // <==== copied it X } X X // search some more X bucket_number = REHASH(bucket_number); X X } X } X X } X X delete old_hash; X} X X X X X Xvoid XAssoc::remove( char* key ) X{ X register int bucket_number = HASH(key); X register struct Assoc_entry * bucket; X X // Find old entry and remove it. X while(1) X { bucket = hash_table+bucket_number; X X if ( bucket->contents == 0 ) X return; // <===== nothing to delete X X if ( equiv(bucket->key, key )) X { // found the condemned item X delete_key(bucket->key); X heap.free(bucket->contents); X bucket->contents = 0; X num_entries-=1; X break; // <====== clobbered it X } X // collision X bucket_number = REHASH(bucket_number); X } X X // Entries that might have been "bumped down the rehash chain" X // when they collided with the now defunct Assoc_entry, may now be misplaced. X // So we remove them, then reinsert them. This is the trickiest part. X while( X bucket_number = REHASH(bucket_number), X bucket = hash_table + bucket_number, X bucket->contents != 0 X ) X { // check for bumpee X register char* key = bucket->key; X register int new_bucket_number = HASH(key); X X if(new_bucket_number != bucket_number) // unnecessary test (accelerator) X { // remove and reinsert X VOID* contents = bucket->contents; X bucket->contents = 0; // remove it. X X while(1) X { X Assoc_entry* new_bucket = hash_table + new_bucket_number; X X if ( new_bucket->contents == 0 ) X { X new_bucket->key = key; X new_bucket->contents = contents; X break; // <====== reinserted entry X } X X new_bucket_number = REHASH(new_bucket_number); X } X X }// end remove and reinsert X } // end check for bumpee X X return; X X} // end Assoc::remove() X X X X XAssoc_entry* XAssoc_iterator::operator()() X{ X for (; i < cs->size; i++) X if ( cs->hash_table[i].contents) X { X return &cs->hash_table[i++]; X } X i = 0; // Table is exhausted. We will start the sequence anew next time. X return 0; X} X X Xint XName_table::hash(register char* key) X{ X /* I looked at the assembly language generated for various hashing X ** algorithms, (Sun-3, release 3.4), and this one won. X */ X register int retval = 0; X while(*key) retval += retval + (unsigned char)(*key++); X return retval; X} X Xvoid XName_table::delete_key(char* key) X{ delete key; } X Xchar* XName_table::copy_key(char* key) X{ return strcpy( new char[strlen(key)], key ); } X Xint XName_table::equiv(char* key1, char* key2) X{ return strcmp(key1, key2) == 0; } X X X X XAssoc_entry* Assoc::new_table(int size) X{ X Assoc_entry* table = new Assoc_entry [size]; X int i = 0; X for(i = 0; i < size; i++) table[i].contents = 0; X return table; X} X Xstatic int round_up_to_pwr2(int initial_size) X{ X const int SMALLEST = 4; X int size = SMALLEST; X X while(initial_size > SMALLEST) X { size *= 2; X initial_size = ( initial_size + 1)/2; X } X return size*2; X} END_OF_Assoc.C if test 6850 -ne `wc -c <Assoc.C`; then echo shar: \"Assoc.C\" unpacked with wrong size! fi # end of overwriting check fi if test -f Assoc.H -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"Assoc.H\" else echo shar: Extracting \"Assoc.H\" \(2495 characters\) sed "s/^X//" >Assoc.H <<'END_OF_Assoc.H' X#ifndef NAME_TABLE_INCLUDED X#define NAME_TABLE_INCLUDED X X#include <string.h> X#include "heap.H" X#include <generic.h> X X// Author: Dave Jones X Xstruct VOID // structure of unknown size and content X{ char* pointer; // Assoc's make this null X}; X Xclass Assoc_entry X{ X friend class Assoc; X friend class Assoc_iterator; X char* key; X VOID* contents; X public: X inline const char* index() { return key; } X inline const VOID* value() { return contents; } X}; X X Xclass Assoc // generic associative array X{ X virtual int hash( char* whatever ) { return (int) whatever; } X X friend class Assoc_iterator; X Heap heap; X Assoc_entry* hash_table; X int size; X int max_entries; X int mask; X int num_entries; X inline int HASH( char* key) { return (hash(key)) & mask;} X inline int REHASH(int num) { return ((((num)+1)*3) & mask); } X void overflow(); X Assoc_entry* new_table(int); X X public: X X virtual char* copy_key(char* key) { return key; } X virtual void delete_key(char* key) {} X virtual int equiv( char* key1, char* key2) { return key1 == key2; } X virtual void init( VOID *member ) { member->pointer = 0; } X X Assoc( int struct_size, int initial_size = 4 ); X X ~Assoc(); X X VOID* lookup(char*); X VOID& operator[] (char* key) { return *Assoc::lookup(key); } X void remove( char * ); X X inline int entries( ) { return num_entries; } X X}; X Xclass Name_table : public Assoc X{ X int hash( char* ); X Xpublic: X char* copy_key(char *); X void delete_key(char*); X int equiv(char*, char*); X Name_table(int struct_size, int initial_size = 4) : X (struct_size, initial_size) {} X X}; X X X// CAVEAT X// Do not use assoc::operator[] or assoc::remove() while an assoc_iterator X// is active, or risk having the iterator miss some entries. X Xclass Assoc_iterator X{ X Assoc* cs; X int i; X public: X Assoc_iterator(Assoc& s) { cs = &s; i = 0; } X Assoc_entry* operator() (); X}; X X X#define NAME_TABLE(_name_,_type_) \ Xclass name2(_name_,_table) : public Name_table \ X{ \ X public: \ X name2(_name_,_table) () : (sizeof(_type_)) {} \ X _type_& operator[] (char* key) \ X { return *((_type_*)Name_table::lookup(key));} \ X}; \ X\ Xclass name2(_name_,_entry) : public Assoc_entry \ X{ \ X public: \ X inline const _type_* value() { return (_type_*)(Assoc_entry::value()); } \ X}; \ X\ Xclass name2(_name_,_iterator): public Assoc_iterator \ X{ \ X public: \ X name2(_name_,_entry)* operator() () \ X { return (name2(_name_,_entry)*) Assoc_iterator::operator() (); } \ X}; X#endif NAME_TABLE_INCLUDED END_OF_Assoc.H if test 2495 -ne `wc -c <Assoc.H`; then echo shar: \"Assoc.H\" unpacked with wrong size! fi # end of overwriting check fi if test -f Assoc.doc -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"Assoc.doc\" else echo shar: Extracting \"Assoc.doc\" \(1716 characters\) sed "s/^X//" >Assoc.doc <<'END_OF_Assoc.doc' X// X// Class Assoc is a generic associative-array class. X// Class Name_table is a somewhat less generic derived class which X// maps null-terminated strings to arbitrary things. X// X// There is also an iterator class for the arrays. X// X// These classes use open hash-tables. The "keys" are typed a char*, but X// as far as Assoc is concerned, they could be anything. X// X// Virutal functions: X// X// copy_key(char*) Makes a copy of the key, to be stored in the hash- X// table for future comparison X// delete_key(char*) Deletes the key when no longer needed X// equiv(char*,char*) Returns 1 iff the keys are equivalent. X// hash(char*) Used by hashing algorithm. X// Requirement: hash(key1)==hash(key2) if X// equiv(key1,key2) is true X// init(VOID *) Initializes an entry which has not been X// previously referenced X// X// See definitions of Assoc and Name_table in Assoc.C and Assoc.H X// for the default function definitions of these virtual functions. X// X// The macro NAME_TABLE defines three derived classes of Name_table. X// For example, X// X// NAME_TABLE(Identifier, struct id_rec ) X// X// defines classes "Identifier_table", "Identifier_entry", and X// "Identifier_iterator". X// X// You can use an object of type "Identifier_table" as though it were X// an array of struct id_rec's : X// X// table["foo"] = some_struct_id_rec; X// X// Identifier_entry has methods index() and value() which return the X// key associated with an entry, and the contents of the entry, respectively. X// X// Identifier_iterator will iterate through all the entries in the table. X// X// See assoc_demo.C. X// END_OF_Assoc.doc if test 1716 -ne `wc -c <Assoc.doc`; then echo shar: \"Assoc.doc\" unpacked with wrong size! fi # end of overwriting check fi if test -f assoc_demo.C -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"assoc_demo.C\" else echo shar: Extracting \"assoc_demo.C\" \(1112 characters\) sed "s/^X//" >assoc_demo.C <<'END_OF_assoc_demo.C' X#include "Assoc.H" X#include <stream.h> X Xchar* copy(char* str) X{ X return strcpy(new char[strlen(str)], str); X} X XNAME_TABLE(String, char*) X XString_table table; X Xmain() X{ X cout << "\nTo put an entry, type p <key> <contents>\n" X << "To get an entry, type g <key>\n" X << "To show all entries, type s\n" X << "To delete an entry, type d\n" X << "To count the entries, type c\n\n" X ; X X char key[128]; X char cont[128]; X char command; X X do X { X cout << ". "; X cin >> command; X switch(command) X { X break;case 'q': {} X X break;case 'p': X cin >> key >> cont; X table[key] = copy(cont); X X break;case 'g': X cin >> key; X cout << table[key] << "\n"; X X break;case 's': X { X String_entry *p; X String_iterator next(table); X while(p = next()) X cout << p->index() << " " << *(p->value()) << "\n"; X } X X break;case 'd': X cin >> key; X delete table[key]; X table.remove(key); X X break;case 'c': X cout << table.entries() << "\n"; X X break; default: X cout << "Huh?\n"; X X break; X } X } X while(command != 'q'); X X} X END_OF_assoc_demo.C if test 1112 -ne `wc -c <assoc_demo.C`; then echo shar: \"assoc_demo.C\" unpacked with wrong size! fi # end of overwriting check fi if test -f heap.C -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"heap.C\" else echo shar: Extracting \"heap.C\" \(1209 characters\) sed "s/^X//" >heap.C <<'END_OF_heap.C' X#include "heap.H" X#include "Assoc.H" X X// Everything is calculated in "heap-units", which is the amount of space X// required for an arbitrary pointer. X// X// Each block of packets is linked to the previous ones through the X// first "heap-unit". The head of the list is "cache". X// X// All free packets are similarly linked. The head of the list is X// "next_free". X// X X XHeap::Heap(unsigned int size, unsigned int init_size ) X{ X // calculate size of element in heap-units. X packet_size = (size + (sizeof(heap_unit) - 1)) / sizeof(heap_unit); X cache = 0; X num_elements = init_size; X next_free = 0; X} X Xvoid Heap::underflow() X{ X heap_unit* more = cache; X cache = new heap_unit[packet_size*num_elements + 1]; // one extra for link X cache->next = more; // This is the link. X heap_unit* freebee = cache + 1; // First free element is 1 beyond link. X X int i; X for(i=0; i<num_elements; i++) X { free(freebee); // This is "our" free, not C-library's free. X freebee += packet_size; X } X X // If we run out again, we'll get this many more. X num_elements = next_load(num_elements); X X} X XHeap::~Heap() X{ X while(cache) X { heap_unit* more = cache->next; X delete cache; X cache = more; X } X} X END_OF_heap.C if test 1209 -ne `wc -c <heap.C`; then echo shar: \"heap.C\" unpacked with wrong size! fi # end of overwriting check fi if test -f heap.H -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"heap.H\" else echo shar: Extracting \"heap.H\" \(1133 characters\) sed "s/^X//" >heap.H <<'END_OF_heap.H' X#ifndef HEAP_DEFD X#define HEAP_DEFD X X Xclass heap_unit X{ friend class Heap; X heap_unit* next; X}; X Xclass Heap // implements heap of packets of a fixed size X{ X virtual int next_load(int n) // Strategy determines size of subsequent X { return n*2; } // block, calculated from size of X // previous block. X X int packet_size; // in units of sizeof(heap_unit) X int num_elements; // number of packets to allocate in next block X heap_unit* cache; // queue of all blocks of packets X heap_unit* next_free; // free store of packets X void underflow(); // called when free store is depleted X Xpublic: X X Heap(unsigned int /* packet-size in bytes */, X unsigned int = 4 /* number of packets in first block */); X X ~Heap(); X X inline void* X alloc() X { heap_unit* retval; X if(next_free == 0) underflow(); X retval=next_free; next_free = next_free->next; X return (void*)retval; X } X X inline void X free(void* packet) X { X heap_unit *more = next_free; X next_free = (heap_unit*)packet; X ((heap_unit*)packet)->next = more; X } X}; X X X#endif HEAP_DEFD END_OF_heap.H if test 1133 -ne `wc -c <heap.H`; then echo shar: \"heap.H\" unpacked with wrong size! fi # end of overwriting check fi if test -f heap.doc -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"heap.doc\" else echo shar: Extracting \"heap.doc\" \(252 characters\) sed "s/^X//" >heap.doc <<'END_OF_heap.doc' X// X// Class Heap implements a free-store (heap) of packets of one size. X// It can be significantly faster than default methods, if X// packets are repeatedly allocated and freed. Uses C++ built-in X// function "new" to allocate blocks of packets. X// X X END_OF_heap.doc if test 252 -ne `wc -c <heap.doc`; then echo shar: \"heap.doc\" unpacked with wrong size! fi # end of overwriting check fi if test -f idents.C -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"idents.C\" else echo shar: Extracting \"idents.C\" \(2304 characters\) sed "s/^X//" >idents.C <<'END_OF_idents.C' X// COPYRIGHT 1988 Megatest Corp. X X#include <ctype.h> X#include "idents.H" X Xinline unsigned char XLOWER(unsigned char ch) { return (isupper(ch)?tolower(ch):(ch)); } X XIdents_UC::equiv(register char* str1, register char* str2) X{ X while( isalnum(*str1) || *str1 == '_' ) X if(*str1++ != *str2++) X return 0; X return !(isalnum(*str2) || *str2 == '_'); X} X XIdents_uC::equiv(register char* str1, register char* str2) X{ X while( isalnum(*str1)) X if(*str1++ != *str2++) X return 0; X return !isalnum(*str2); X} X XIdents_uc::equiv(register char* str1, register char* str2) X{ X while( isalnum(*str1)) X if( LOWER(*str1) != LOWER( *str2)) X return 0; X else X { str1++; str2++; } X return !isalnum(*str2); X} X XIdents_Uc::equiv(register char* str1, register char* str2) X{ X while(isalnum(*str1) || *str1 == '_' ) X if( LOWER(*str1) != LOWER( *str2)) X return 0; X else X { str1++; str2++; } X return !(isalnum(*str2) || *str2 == '_'); X} X XIdents_Tc::equiv(register char* str1, register char* str2) X{ X do X if( LOWER(*str1) != LOWER(*str2)) X return 0; X else X { do str1++; while (*str1 == '-'); X do str2++; while (*str2 == '-'); X } X while(isalnum(*str1)); X return !(isalnum(*str2)); X} X XIdents_UC::hash(register char* str) X{ X register int hash = 0; X while( isalnum(*str) || *str == '_' ) X hash += hash + (unsigned char)(*str++); X return hash; X} X X/* underscores not valid. case matters. */ XIdents_uC::hash(register char* str) X{ X register int hash = 0; X while( isalnum(*str)) X { hash += hash + (unsigned char)(*str++); } X return hash; X} X X X/* underscores are not valid. case does not matter. (Standard Pascal) */ XIdents_uc::hash(register char* str) X{ X register int hash = 0; X while( isalnum(*str)) X { hash += hash + LOWER((unsigned char)(*str++)); X } X return hash; X} X X/* Telephone scheme */ XIdents_Tc::hash(register char* str) X{ X register int hash = 0; X do X { hash += hash + LOWER((unsigned char)(*str)); X do str++; while (*str == '-'); X } X while( isalnum(*str)); X return hash; X} X X/* underscores are valid. case does not matter. */ XIdents_Uc::hash(register char* str) X{ X register int hash = 0; X while(isalnum(*str) || *str == '_' ) X { hash += hash + LOWER((unsigned char)(*str++)); X } X return hash; X} END_OF_idents.C if test 2304 -ne `wc -c <idents.C`; then echo shar: \"idents.C\" unpacked with wrong size! fi # end of overwriting check fi if test -f idents.H -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"idents.H\" else echo shar: Extracting \"idents.H\" \(1652 characters\) sed "s/^X//" >idents.H <<'END_OF_idents.H' X#ifndef IDENTS_INCLUDED X#define IDENTS_INCLUDED X X#include "Assoc.H" X X X/* Underscores are valid, Case matters. (Mega-Pascal/C/C++) */ Xclass Idents_UC: public Assoc X{ int hash( char*); X int equiv( char*, char*); X X public: X Idents_UC(int struct_size, int initial_size = 4):(struct_size,initial_size) X {} X}; X X X/* Underscores are not valid. Case does not matter. (Standard Pascal) */ Xclass Idents_uc: public Assoc X{ int hash( char*); X int equiv( char*, char*); X X public: X Idents_uc(int struct_size, int initial_size = 4):(struct_size,initial_size) X {} X}; X X/* Underscores are valid, but "transparent glue" when not in the first X** position. Case does not matter. E.g. "foo_bar" is equivalent to X** "FooBar". But "_foo" is not equivalent to "foo". X** X** This scheme passes the "telephone test": You can read programs to X** people over the telephone, without constantly saying, "all caps", X** "underscore", etc.. I wish this were standard practice. X*/ Xclass Idents_Tc: public Assoc // T for telephone. X{ int hash( char*); X int equiv( char*, char*); X X public: X Idents_Tc(int struct_size, int initial_size = 4):(struct_size,initial_size) X {} X}; X X X/* Underscores are not valid. Case matters. (Modula II) */ Xclass Idents_uC: public Assoc X{ int hash( char*); X int equiv( char*, char*); X X public: X Idents_uC(int struct_size, int initial_size = 4):(struct_size,initial_size) X {} X}; X X X/* Underscores are valid. Case does not matter. */ Xclass Idents_Uc: public Assoc X{ int hash( char*); X int equiv( char*, char*); X X public: X Idents_Uc(int struct_size, int initial_size = 4):(struct_size,initial_size) X {} X}; X X#endif IDENTS_INCLUDED END_OF_idents.H if test 1652 -ne `wc -c <idents.H`; then echo shar: \"idents.H\" unpacked with wrong size! fi # end of overwriting check fi if test -f idents.doc -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"idents.doc\" else echo shar: Extracting \"idents.doc\" \(668 characters\) sed "s/^X//" >idents.doc <<'END_OF_idents.doc' X// X// These are classes derived from class Assoc, for indexing with identifiers X// (as defined by various languages). They are intended for use in compilers X// and such. X// X// The keys need not be null-terminated, so you can just read an entire X// source file into a buffer, then use pointers into the buffer as keys. X// The redefined virtual functions hash() and equiv() recongize the X// ends of the keys by context. X// X// The virtual functions copy_key() and delete_key() are not redefined. X// The default version does not actually make a copy, it just returns X// the char* it is given. So the keys must remain untouched so long X// as the lookup table is active. END_OF_idents.doc if test 668 -ne `wc -c <idents.doc`; then echo shar: \"idents.doc\" unpacked with wrong size! fi # end of overwriting check fi if test -f queue.C -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"queue.C\" else echo shar: Extracting \"queue.C\" \(1049 characters\) sed "s/^X//" >queue.C <<'END_OF_queue.C' X#include "queue.H" X X X// All Queue_lists will come from this private heap. MUCH quicker. X Xstatic Heap heap(sizeof(Queue_list)); X X Xvoid* Queue::push(void* cont) X{ X size++; X Queue_list* old_head = head; X head = (struct Queue_list*)heap.alloc(); X head->cdr = old_head; X head->car = cont; X if(!tail) tail = head; X return cont; X} X Xvoid* Queue::pop() X{ X if(size == 0) return 0; X size--; X void* cont = head->car; X Queue_list* condemned = head; X head = head->cdr; X heap.free(condemned); X if(size==0) tail = 0; X return cont; X} X Xvoid* Queue::append(void *cont) X{ X size++; X Queue_list* old_tail = tail; X tail = (struct Queue_list*)heap.alloc(); X tail->cdr=0; X tail->car=cont; X if(old_tail) X old_tail->cdr = tail; X if(head==0)head = tail; X return cont; X} X X Xvoid* Queue_iterator::operator() () X{ X if(done()) { rest = lq->head; return 0; } X void* retval = rest->car; X rest=rest->cdr; X return retval; X} X XQueue::~Queue() X{ X while(head) X { Queue_list* condemned = head; X head = head->cdr; X heap.free(condemned); X } X} END_OF_queue.C if test 1049 -ne `wc -c <queue.C`; then echo shar: \"queue.C\" unpacked with wrong size! fi # end of overwriting check fi if test -f queue.H -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"queue.H\" else echo shar: Extracting \"queue.H\" \(779 characters\) sed "s/^X//" >queue.H <<'END_OF_queue.H' X#ifndef QUEUE_DEFD X#define QUEUE_DEFD X X#include "heap.H" X X Xclass Queue_list X{ X Xfriend class Queue_iterator; Xfriend class Queue; X X Queue_list* cdr; X void* car; X X}; X X Xclass Queue X{ X X Queue_list* tail; X Queue_list* head; X int size; X friend class Queue_iterator; X X public: X Queue() { size = 0; tail=0; head=0; } X ~Queue(); X X void* push(void*); X void* pop(); X void* append(void*); X X inline X void* first() X { return(size==0)? 0 : head->car; } X X inline X void* last() X { return (size==0) ? 0: tail->car; } X X inline int length() { return size; } X}; X Xclass Queue_iterator X{ X Queue* lq; X Queue_list* rest; X X public: X Queue_iterator(Queue& q) { lq=&q; rest = q.head; } X void* operator() (); X int done() { return rest==0; } X X}; X X#endif QUEUE_DEFD END_OF_queue.H if test 779 -ne `wc -c <queue.H`; then echo shar: \"queue.H\" unpacked with wrong size! fi # end of overwriting check fi if test -f queue.doc -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"queue.doc\" else echo shar: Extracting \"queue.doc\" \(267 characters\) sed "s/^X//" >queue.doc <<'END_OF_queue.doc' X X// Class Queue impelements queues of pointers. Can be used X// as FIFO (use method "append") or LIFO (use method "push"). X// X// Public methods are X// X X void* push(void*); X void* pop(); X void* append(void*); X void* first(); X void* last(); X inline int length(); END_OF_queue.doc if test 267 -ne `wc -c <queue.doc`; then echo shar: \"queue.doc\" unpacked with wrong size! fi # end of overwriting check fi echo shar: End of shell archive. exit 0
agnew@trwrc.UUCP (R.A. Agnew) (03/01/88)
In article <8802231548.AA16212@ajpo.sei.cmu.edu> "EDWARD CRAGG" <ecragg@gmuvax.gmu.edu> writes: >Does anyone have either ideas or experience related to a hashing >function in Ada for converting a string of unspecified length >the type conversion using attributes (to keep totaly machine >independent) is eating me alive. Does anybody know what kind of > If using character'pos(s(i)) is generating slow code, you need a new compiler. This should only be overhead for the compiler but should generate very little code. I generally define my own enumeration type, say Identifier_Char and type convert with Identifier_Char'Pos(s(i)). I just benched a minimally perfect hash code for Ada reserved words using this technique on a Sun3/75 with the Telesoft Telegen2 compiler and got 133 microseconds worst case (case where tokens are always Ada reserved words). Just to check the above, I put 10000 character'pos conversions in a loop and timed them under Unix and got 0.0 seconds on all three times. In order to get a better comparison, I wrote the following two subprograms: procedure test_mod is i: positive; n: natural := 0; c: Character; begin for i in 1..10000 loop for c in 'A' .. 'Z' loop n := n mod 16381; end loop; end loop; end test_mod; procedure test_pos is i: positive; n: natural := 0; c: Character; begin for i in 1..10000 loop for c in 'A' .. 'Z' loop n := (n + Character'Pos(c)) mod 16381; end loop; end loop; end test_pos; The only difference between the two is the character to integer conversion and so the difference in execution times between them is the cost of ten thousand conversions. I did not use the optimizer. They both timed out between 1.9 and 2.0 seconds elapsed time on the Sun3/75 so I still claim that the compiler generates no code due to the type conversion. Perhaps your hash agolrithm is inherently slow. I have developed a very efficient hashcode algorithm for arbitrary length strings which is based on Fermat and Mersenne primes and uses only shifts and integer addition. And no, that's not the reason why the Ada compilers are so slow. The language that they are compiling is just a couple orders of magnitude more complex than most others. To appreciate this, visualize writing a compiler to optimize code generated by nested generics instantiated on a data type like a complex tensor!