[comp.lang.ada] Hashing Function wanted

ecragg@GMUVAX.GMU.EDU ("EDWARD CRAGG") (02/23/88)

Does anyone have either ideas or experience related to a hashing
function in Ada for converting a string of unspecified length
(probably less than 256) to an integer that doesn't eat you alive
trying to get around all the type checking?  Ideally I would like 
to use something like the old Fortran trick of equivalencing the
string to an LOGICAL*4, XORing the component 4 byte units, and
taking the absolute value of the result as an integer to be divided
by the table size to give the hash code as the remainder.  However
the type conversion using attributes (to keep totaly machine 
independent) is eating me alive.  Does anybody know what kind of
hashing function the compiler writers are using?  (Or is that the
real reason Ada compilers are so slow :-)

-------------------------------------------------------------------

Edward E Cragg                Bitnet:   ECRAGG@GMUVAX
                              Internet: ECRAGG@GMUVAX.GMU.EDU
------

djones@megatest.UUCP (Dave Jones) (02/27/88)

in article <8802231548.AA16212@ajpo.sei.cmu.edu>, ecragg@GMUVAX.GMU.EDU ("EDWARD CRAGG") says:
> 
> Does anyone have either ideas or experience related to a hashing
> function in Ada ...
>

I don't speak Ada. So I don't know if my input will help at all,
but I'll give it a try. (If you are wondering how I came to find
this posting: a cohort told me about it, and suggested I might be able to 
help.)

Like I said, I don't speak Ada, but if conversion of characters into
unsigned integers is efficient (it should be), you can use the following
hash function, which I will write in my native dialect, C:

	int hash(str)
	char *str;
	{
	  int result;
	  while(*str) result += result + (unsigned int)(*str++);
	  return (result % table_size);
	}

If you don't speak C, we are going to need an interpreter. 

I have discovered a rehashing technique which will save time on the 
mod-by-table-size part.  It is based on the rather remarkable fact,
which I have very tediously proven, that the range of the sequence
defined below is the set of all numbers in 0..table_size-1  which
are congruent to 0 or 3 mod 4, provided that table_size is a power of two.

	S(0) = 0
	S(n+1) = ((S(n) + 1)*3) % table_size

You use the function R(n) = ((n+1)*3) % table_size as a rehash function
for collision resolution. You always use a table_size which is an even power 
of two, so that modding it out can be strength-reduced to masking with 
table_size-1. (Will the Ada strong-typing let you do that?)

I am including a sharfile which may be of some use.  It is a hash-table
package written in C++.  If you can translate it into Ada you are in
business.  But the type-checking may make that impossible.  In C++
you have strong type-checking when you want it, but if it gets in the
way, you can chuck it.

Good luck,

		Dave Jones
		Megatest Corp.
		880 Fox Lane
		San Jose, CA.
		95131

		(408) 437-9700 Ext 3227
		UUCP: ucbvax!sun!megatest!djones
		ARPA: megatest!djones@riacs.ARPA

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  DISCLAIMER Assoc.C Assoc.H Assoc.doc assoc_demo.C heap.C
#   heap.H heap.doc idents.C idents.H idents.doc queue.C queue.H
#   queue.doc
# Wrapped by djones@goofy on Fri Feb 26 15:35:22 1988
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f DISCLAIMER -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"DISCLAIMER\"
else
echo shar: Extracting \"DISCLAIMER\" \(243 characters\)
sed "s/^X//" >DISCLAIMER <<'END_OF_DISCLAIMER'
X
X
X//
X//  DISCLAIMER.  
X// 
X//  Neither the author nor Megatest Corp. issues any waranty of the
X//  contents of these files.  No claim is made that they are adequate to any
X//  purpose, nor that any is error-free, nor that any works at all.
X//
END_OF_DISCLAIMER
if test 243 -ne `wc -c <DISCLAIMER`; then
    echo shar: \"DISCLAIMER\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f Assoc.C -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"Assoc.C\"
else
echo shar: Extracting \"Assoc.C\" \(6850 characters\)
sed "s/^X//" >Assoc.C <<'END_OF_Assoc.C'
X
X#include "Assoc.H"
X
X//  This is a ASSOCIATIVE ARRAY class, which maps arbitrary things to
X//  structures of arbitary size.  There is also an ITERATOR CLASS for 
X//  the associative arrays.
X//
X//  C.f.  C++ Programming Language book, sections 2.3.10 and 6.7.
X//
X//            
X
X// Author: Dave Jones
X
X//
X//  NOTES:
X//
X//  Here's how it works.  We keep an array of Pairs, 
X//     { char* key; VOID* contents }
X//
X//  A Pair is valid if and only if the contents is not null.
X//  If contents is not null, it points to a struct of the given size.
X//  We hash into the array to find keys, using a rehash function for
X//  collision avoidance.  If you look up a previously invalid key,
X//  we create a valid Pair for it.  By default, the first long int of a 
X//  new field is zeroed, so a user may use it to determine whether or not an
X//  array element is a new one or not.  Users may specify some other
X//  initialization routine. (It's "virtual".)
X//
X//  To save space, the user may invalidate entries with the procedure remove().
X//
X//  When the array approaches half full, we double its size.
X//
X//  WARNING:  These algorithms are quite tricky. They are built for speed,
X//  not clarity.  If you change something, you may very well introduce a bug 
X//  which will not manifest itself except in very infrequent situations,
X//  and which will be impossible to track down.  Fore-warned is skeptical,
X//  but fore-armed.
X//
X//  For example, the algorithms will not work unless the table size is a 
X//  power of two, and the table is kept strictly less than half full.
X//  In particular, the lookup routine only provably terminates
X//  because for a given pos. int. M, the sequence
X//
X//        s(0) = 0
X//
X//       s(i+1) = ((s(i)+1)*3) mod 2**M
X//
X//  repeats after exactly 2**(M-1) steps.  I have proved this true
X//  for all pos. ints. M, but my proof is very inelegant.  You can
X//  prove it true for small M, say less than 33, with a simple computer
X//  program.  I have done that also, just to be on the safe side.
X//
X
X
X
Xstatic Assoc_entry* new_table(int);
Xstatic int round_up_to_pwr2(int);
X
X
XAssoc::Assoc( int struct_size, int initial_size)
X     : heap(struct_size)
X{
X  size = round_up_to_pwr2(initial_size);
X  num_entries = 0;
X  max_entries = size/2 - 1;
X  mask = size - 1;  // Binary arithmetic. x mod size == x & (size-1)
X  hash_table = new_table(size);
X}
X
X
X
X
X
XAssoc::~Assoc()
X{
X  Assoc_iterator next(*this);
X  Assoc_entry* bucket;
X
X  while(bucket=next())
X    delete_key(bucket->key);
X
X  delete hash_table;
X}
X
X
X
X
XVOID*
XAssoc::lookup( char* key )
X{
X  // The lookup may add an entry, so..
X  if ( num_entries >= max_entries ) overflow();
X  
X  register int bucket_number = HASH(key);
X  register Assoc_entry * bucket;
X
X  while(1)
X    {
X
X      bucket = hash_table + bucket_number;
X
X      if ( bucket->contents == 0 )
X	{ 
X	  bucket->key = copy_key(key);
X	  bucket->contents = (VOID*)heap.alloc();
X	  init(bucket->contents);
X	  num_entries++;
X	  break;    // <====== created new entry
X	}
X      
X      if ( !equiv( bucket->key, key) )
X	{ 
X	  bucket_number = REHASH(bucket_number);
X	  continue; // <====== search some more (collision)
X	}
X
X      break;        // <====== found old Assoc_entry
X
X    }
X  
X  return bucket->contents;
X
X}
X
X
X
Xvoid
XAssoc::overflow()
X{
X  Assoc_entry* old_hash = hash_table;
X  int old_size = size;
X
X  max_entries = size - 1;
X  size = size * 2;
X  mask = size - 1;
X  hash_table = new_table(size);
X  
X  /* Take everything that was in the old table, and put it
X  ** into the new one.
X  */
X
X  register int recno;
X  for (recno = 0; recno < old_size; recno++)
X    { Assoc_entry *mem = old_hash + recno;
X      if ( mem->key )
X	{ 
X	  register int bucket_number = HASH(mem->key);
X	  while(1)
X	    {
X	      register Assoc_entry* bucket = hash_table + bucket_number;
X	      if ( bucket->contents == 0 )
X		{ 
X		  bucket->key = mem->key;
X		  bucket->contents = mem->contents;
X		  break; // <==== copied it
X		}
X
X	      // search some more
X	      bucket_number = REHASH(bucket_number);
X	      
X	    }
X	}
X      
X    }
X
X  delete old_hash;
X}
X
X
X
X
X
Xvoid
XAssoc::remove( char* key )
X{
X  register int  bucket_number  = HASH(key);
X  register struct Assoc_entry * bucket;
X
X  // Find old entry and remove it.
X  while(1)
X    { bucket = hash_table+bucket_number;
X       
X       if ( bucket->contents == 0 )
X	 return;  // <===== nothing to delete
X
X       if ( equiv(bucket->key, key ))
X	 { // found the condemned item
X	   delete_key(bucket->key);
X	   heap.free(bucket->contents);
X	   bucket->contents = 0;
X	   num_entries-=1;
X	   break; // <====== clobbered it
X	 }
X       // collision
X       bucket_number = REHASH(bucket_number);
X     }
X  
X  // Entries that might have been "bumped down the rehash chain"
X  // when they collided with the now defunct Assoc_entry, may now be misplaced.
X  // So we remove them, then reinsert them.   This is the trickiest part.
X  while(
X        bucket_number = REHASH(bucket_number),
X	bucket = hash_table + bucket_number,
X	bucket->contents != 0
X       )
X    { // check for bumpee
X      register char* key = bucket->key;
X      register int new_bucket_number = HASH(key);
X
X      if(new_bucket_number != bucket_number) // unnecessary test (accelerator)
X	{ // remove and reinsert
X	  VOID* contents = bucket->contents;
X	  bucket->contents = 0;  // remove it.
X	  
X	  while(1)
X	    {
X	      Assoc_entry* new_bucket = hash_table + new_bucket_number;
X	      
X	      if ( new_bucket->contents == 0 )
X		{ 
X		  new_bucket->key = key;
X		  new_bucket->contents = contents;
X		  break;    // <====== reinserted entry
X		}
X	      
X	      new_bucket_number = REHASH(new_bucket_number);
X	    }
X
X	}// end remove and reinsert
X    } // end check for bumpee
X
X  return;
X
X} // end Assoc::remove()
X  
X
X
X
XAssoc_entry*
XAssoc_iterator::operator()()
X{
X  for (; i < cs->size; i++)
X    if ( cs->hash_table[i].contents)
X      { 
X	return &cs->hash_table[i++];
X      }
X  i = 0;  // Table is exhausted. We will start the sequence anew next time.
X  return 0;
X}
X
X
Xint
XName_table::hash(register  char* key)
X{
X  /* I looked at the assembly language generated for various hashing
X  ** algorithms, (Sun-3, release 3.4), and this one won.
X  */
X  register int retval = 0;
X  while(*key) retval += retval + (unsigned char)(*key++);
X  return retval;
X}
X
Xvoid
XName_table::delete_key(char* key)
X{ delete key; }
X
Xchar*
XName_table::copy_key(char* key)
X{ return strcpy( new char[strlen(key)], key ); }
X
Xint
XName_table::equiv(char* key1, char* key2)
X{ return strcmp(key1, key2) == 0; }
X
X
X
X
XAssoc_entry* Assoc::new_table(int size)
X{
X  Assoc_entry* table = new Assoc_entry [size];
X  int i = 0;
X  for(i = 0; i < size; i++) table[i].contents = 0;
X  return table;
X}
X
Xstatic int round_up_to_pwr2(int initial_size)
X{
X  const int SMALLEST = 4;
X  int size = SMALLEST;
X
X  while(initial_size > SMALLEST)
X    { size *= 2;
X      initial_size = ( initial_size + 1)/2;
X    }
X  return size*2;
X}
END_OF_Assoc.C
if test 6850 -ne `wc -c <Assoc.C`; then
    echo shar: \"Assoc.C\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f Assoc.H -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"Assoc.H\"
else
echo shar: Extracting \"Assoc.H\" \(2495 characters\)
sed "s/^X//" >Assoc.H <<'END_OF_Assoc.H'
X#ifndef NAME_TABLE_INCLUDED
X#define NAME_TABLE_INCLUDED
X
X#include <string.h>
X#include "heap.H"
X#include <generic.h>
X
X// Author: Dave Jones
X
Xstruct VOID   // structure of unknown size and content
X{ char* pointer;  // Assoc's make this null
X};
X
Xclass Assoc_entry
X{  
X  friend class Assoc;
X  friend class Assoc_iterator;
X  char* key;
X  VOID* contents;
X public:
X  inline const char* index() { return key; }
X  inline const VOID* value() { return contents; }
X};
X
X
Xclass Assoc  // generic associative array
X{
X  virtual int hash( char* whatever ) { return (int) whatever; }
X
X  friend class Assoc_iterator;
X  Heap heap;
X  Assoc_entry* hash_table;  
X  int size;
X  int max_entries;
X  int mask;
X  int num_entries;
X  inline int HASH( char* key) { return (hash(key)) & mask;}
X  inline int REHASH(int num)  { return ((((num)+1)*3) & mask); }
X  void overflow();
X  Assoc_entry* new_table(int);
X
X public:
X  
X  virtual char* copy_key(char* key) { return key; }
X  virtual void delete_key(char* key) {}
X  virtual int equiv( char* key1,  char* key2) { return key1 == key2; }
X  virtual void init( VOID *member ) { member->pointer = 0; }
X
X  Assoc( int struct_size, int initial_size = 4 );
X
X  ~Assoc();
X
X  VOID* lookup(char*);
X  VOID& operator[] (char* key) { return *Assoc::lookup(key); }
X  void remove( char * );
X
X  inline int entries( ) { return num_entries; }
X
X};
X
Xclass Name_table : public Assoc
X{
X  int hash( char* );
X  
Xpublic:
X  char* copy_key(char *);
X  void delete_key(char*);
X  int equiv(char*, char*);
X  Name_table(int struct_size, int initial_size = 4) : 
X     (struct_size, initial_size) {}
X
X};
X
X
X// CAVEAT
X// Do not use assoc::operator[] or assoc::remove() while an assoc_iterator
X// is active, or risk having the iterator miss some entries.
X
Xclass Assoc_iterator  
X{
X  Assoc* cs;
X  int i;
X public:
X  Assoc_iterator(Assoc& s) { cs = &s; i = 0; }
X  Assoc_entry* operator() ();
X};
X
X
X#define NAME_TABLE(_name_,_type_) \
Xclass name2(_name_,_table) : public Name_table \
X{ \
X  public: \
X    name2(_name_,_table) () : (sizeof(_type_)) {} \
X    _type_& operator[] (char* key) \
X      { return *((_type_*)Name_table::lookup(key));} \
X}; \
X\
Xclass name2(_name_,_entry) : public Assoc_entry \
X{ \
X  public: \
X    inline const _type_* value() { return (_type_*)(Assoc_entry::value()); } \
X};  \
X\
Xclass name2(_name_,_iterator): public Assoc_iterator \
X{ \
X  public: \
X    name2(_name_,_entry)* operator() () \
X      { return (name2(_name_,_entry)*) Assoc_iterator::operator() (); } \
X}; 
X#endif NAME_TABLE_INCLUDED
END_OF_Assoc.H
if test 2495 -ne `wc -c <Assoc.H`; then
    echo shar: \"Assoc.H\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f Assoc.doc -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"Assoc.doc\"
else
echo shar: Extracting \"Assoc.doc\" \(1716 characters\)
sed "s/^X//" >Assoc.doc <<'END_OF_Assoc.doc'
X//
X// Class Assoc is a generic associative-array class.
X// Class Name_table is a somewhat less generic derived class which
X// maps null-terminated strings to arbitrary things.
X//
X// There is also an iterator class for the arrays.
X//
X// These classes use open hash-tables.   The "keys" are typed a char*, but 
X// as far as Assoc is concerned, they could be anything.
X// 
X// Virutal functions:
X//
X//    copy_key(char*)     Makes a copy of the key, to be stored in the hash-
X//                          table for future comparison
X//    delete_key(char*)   Deletes the key when no longer needed
X//    equiv(char*,char*)  Returns 1 iff the keys are equivalent.
X//    hash(char*)         Used by hashing algorithm. 
X//                          Requirement: hash(key1)==hash(key2) if
X//                          equiv(key1,key2) is true
X//    init(VOID *)        Initializes an entry which has not been
X//                          previously referenced
X//
X// See definitions of Assoc and Name_table in Assoc.C and Assoc.H
X// for the default function definitions of these virtual functions.
X//
X// The macro NAME_TABLE defines three derived classes of Name_table.  
X// For example,
X//
X// NAME_TABLE(Identifier, struct id_rec )
X// 
X// defines classes "Identifier_table", "Identifier_entry", and 
X// "Identifier_iterator".
X//
X// You can use an object of type "Identifier_table" as though it were 
X// an array of struct id_rec's :
X//
X// table["foo"] = some_struct_id_rec;
X//
X// Identifier_entry has methods index() and value() which return the
X// key associated with an entry, and the contents of the entry, respectively.
X//
X// Identifier_iterator will iterate through all the entries in the table.
X//
X// See assoc_demo.C.
X// 
END_OF_Assoc.doc
if test 1716 -ne `wc -c <Assoc.doc`; then
    echo shar: \"Assoc.doc\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f assoc_demo.C -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"assoc_demo.C\"
else
echo shar: Extracting \"assoc_demo.C\" \(1112 characters\)
sed "s/^X//" >assoc_demo.C <<'END_OF_assoc_demo.C'
X#include "Assoc.H"
X#include <stream.h>
X
Xchar* copy(char* str)
X{
X  return strcpy(new char[strlen(str)], str);
X}
X
XNAME_TABLE(String, char*)
X
XString_table table;
X
Xmain()
X{
X  cout << "\nTo put an entry, type p <key> <contents>\n"
X       << "To get an entry, type g <key>\n"
X       << "To show all entries, type s\n"
X       << "To delete an entry, type d\n"
X       << "To count the entries, type c\n\n"
X  ;
X 
X  char key[128];
X  char cont[128];
X  char command;
X
X  do
X    {
X      cout << ". ";
X      cin >> command;
X      switch(command)
X	{
X	  break;case 'q': {}
X
X	  break;case 'p':
X            cin >> key >> cont;
X	    table[key] = copy(cont);
X
X	  break;case 'g':
X	    cin >> key;
X	    cout << table[key] << "\n";
X
X	  break;case 's':
X	  { 
X	    String_entry *p;
X	    String_iterator next(table);
X	    while(p = next())
X	      cout << p->index() << " " << *(p->value()) << "\n";
X	  }
X
X	  break;case 'd':
X	    cin >> key;
X	    delete table[key];
X	    table.remove(key);
X
X	  break;case 'c':
X	    cout << table.entries() << "\n";
X
X	  break; default:
X	    cout << "Huh?\n";
X
X	  break;
X	}
X    }
X  while(command != 'q');
X
X}
X
END_OF_assoc_demo.C
if test 1112 -ne `wc -c <assoc_demo.C`; then
    echo shar: \"assoc_demo.C\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f heap.C -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"heap.C\"
else
echo shar: Extracting \"heap.C\" \(1209 characters\)
sed "s/^X//" >heap.C <<'END_OF_heap.C'
X#include "heap.H"
X#include "Assoc.H"
X
X// Everything is calculated in "heap-units", which is the amount of space
X// required for an arbitrary pointer.
X//
X// Each block of packets is linked to the previous ones through the
X// first "heap-unit". The head of the list is "cache".
X//
X// All free packets are similarly linked. The head of the list is
X// "next_free".
X//
X
X
XHeap::Heap(unsigned int size, unsigned int init_size )
X{
X  // calculate size of element in heap-units.
X  packet_size = (size + (sizeof(heap_unit) - 1)) / sizeof(heap_unit);
X  cache = 0;
X  num_elements = init_size;
X  next_free = 0;
X}
X
Xvoid Heap::underflow()
X{
X  heap_unit* more = cache;
X  cache = new heap_unit[packet_size*num_elements + 1]; // one extra for link
X  cache->next = more;  // This is the link.
X  heap_unit* freebee = cache + 1;  // First free element is 1 beyond link.
X
X  int i;
X  for(i=0; i<num_elements; i++)
X    { free(freebee);  // This is "our" free, not C-library's free.
X      freebee += packet_size;
X    }
X
X  // If we run out again, we'll get this many more.
X  num_elements = next_load(num_elements);
X
X}
X
XHeap::~Heap()
X{
X  while(cache)
X    { heap_unit* more = cache->next;
X      delete cache;
X      cache = more;
X    }
X}
X
END_OF_heap.C
if test 1209 -ne `wc -c <heap.C`; then
    echo shar: \"heap.C\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f heap.H -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"heap.H\"
else
echo shar: Extracting \"heap.H\" \(1133 characters\)
sed "s/^X//" >heap.H <<'END_OF_heap.H'
X#ifndef HEAP_DEFD
X#define HEAP_DEFD
X
X
Xclass heap_unit  
X{ friend class Heap;
X  heap_unit* next;
X};
X
Xclass Heap  // implements heap of packets of a fixed size
X{
X  virtual int next_load(int n) // Strategy determines size of subsequent
X    { return n*2; }            //   block, calculated from size of 
X			       //   previous block.
X
X  int packet_size;      // in units of sizeof(heap_unit)
X  int num_elements;     // number of packets to allocate in next block
X  heap_unit* cache;     // queue of all blocks of packets
X  heap_unit* next_free; // free store of packets
X  void underflow();     // called when free store is depleted
X
Xpublic:
X
X  Heap(unsigned int     /* packet-size in bytes */, 
X       unsigned int = 4 /* number of packets in first block */);
X
X  ~Heap();
X
X  inline void* 
X  alloc()
X    { heap_unit* retval;
X      if(next_free == 0) underflow();
X      retval=next_free;  next_free = next_free->next;
X      return (void*)retval;
X    }
X
X  inline void
X  free(void* packet)
X    {
X      heap_unit *more = next_free;
X      next_free = (heap_unit*)packet;
X      ((heap_unit*)packet)->next = more;
X    }
X}; 
X
X
X#endif HEAP_DEFD
END_OF_heap.H
if test 1133 -ne `wc -c <heap.H`; then
    echo shar: \"heap.H\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f heap.doc -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"heap.doc\"
else
echo shar: Extracting \"heap.doc\" \(252 characters\)
sed "s/^X//" >heap.doc <<'END_OF_heap.doc'
X//
X// Class Heap implements a free-store (heap) of packets of one size.
X// It can be significantly faster than default methods, if
X// packets are repeatedly allocated and freed.  Uses C++ built-in 
X// function "new" to allocate blocks of packets.
X//
X
X
END_OF_heap.doc
if test 252 -ne `wc -c <heap.doc`; then
    echo shar: \"heap.doc\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f idents.C -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"idents.C\"
else
echo shar: Extracting \"idents.C\" \(2304 characters\)
sed "s/^X//" >idents.C <<'END_OF_idents.C'
X//  COPYRIGHT 1988 Megatest Corp.
X
X#include <ctype.h>
X#include "idents.H"
X
Xinline unsigned char
XLOWER(unsigned char ch)  { return (isupper(ch)?tolower(ch):(ch)); }
X
XIdents_UC::equiv(register  char* str1, register  char* str2)
X{
X  while( isalnum(*str1) || *str1 == '_' )
X    if(*str1++ != *str2++)
X      return 0;
X  return !(isalnum(*str2) || *str2 == '_');
X}
X
XIdents_uC::equiv(register  char* str1, register  char* str2)
X{
X  while( isalnum(*str1))
X    if(*str1++ != *str2++)
X      return 0;
X  return !isalnum(*str2);
X}
X
XIdents_uc::equiv(register  char* str1, register  char* str2)
X{
X  while( isalnum(*str1))
X    if( LOWER(*str1) != LOWER( *str2))
X      return 0;
X    else
X      { str1++; str2++; }
X  return !isalnum(*str2);
X}
X
XIdents_Uc::equiv(register  char* str1, register  char* str2)
X{
X  while(isalnum(*str1) || *str1 == '_' )
X    if( LOWER(*str1) != LOWER( *str2))
X      return 0;
X    else
X      { str1++; str2++; }
X  return !(isalnum(*str2) || *str2 == '_');
X}
X
XIdents_Tc::equiv(register  char* str1, register  char* str2)
X{
X  do
X    if( LOWER(*str1) != LOWER(*str2))
X      return 0;
X    else
X      { do str1++; while (*str1 == '-');
X	do str2++; while (*str2 == '-');
X      }
X  while(isalnum(*str1));
X  return !(isalnum(*str2));
X}
X
XIdents_UC::hash(register  char* str)
X{
X  register int hash = 0;
X  while( isalnum(*str) || *str == '_' )
X    hash += hash + (unsigned char)(*str++);
X  return hash;
X}
X
X/* underscores not valid. case matters. */
XIdents_uC::hash(register  char* str)
X{
X  register int hash = 0;
X  while( isalnum(*str))
X      { hash += hash + (unsigned char)(*str++); }
X  return hash;
X}
X
X
X/* underscores are not valid. case does not matter. (Standard Pascal) */
XIdents_uc::hash(register  char* str)
X{
X  register int hash = 0;
X  while( isalnum(*str))
X      { hash += hash + LOWER((unsigned char)(*str++));
X      }
X  return hash;
X}
X
X/* Telephone scheme */
XIdents_Tc::hash(register  char* str)
X{
X  register int hash = 0;
X  do
X      { hash += hash + LOWER((unsigned char)(*str));
X	do str++; while (*str == '-');
X      }
X  while( isalnum(*str));
X  return hash;
X}
X
X/* underscores are valid. case does not matter. */
XIdents_Uc::hash(register  char* str)
X{
X  register int hash = 0;
X  while(isalnum(*str) || *str == '_' )
X    { hash += hash + LOWER((unsigned char)(*str++));
X    }
X  return hash;
X}
END_OF_idents.C
if test 2304 -ne `wc -c <idents.C`; then
    echo shar: \"idents.C\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f idents.H -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"idents.H\"
else
echo shar: Extracting \"idents.H\" \(1652 characters\)
sed "s/^X//" >idents.H <<'END_OF_idents.H'
X#ifndef IDENTS_INCLUDED
X#define IDENTS_INCLUDED
X
X#include "Assoc.H"
X
X
X/* Underscores are valid, Case matters. (Mega-Pascal/C/C++) */
Xclass Idents_UC: public Assoc
X{ int hash( char*);
X  int equiv( char*,  char*);
X
X public:
X  Idents_UC(int struct_size, int initial_size = 4):(struct_size,initial_size)
X    {}
X};
X
X
X/* Underscores are not valid. Case does not matter.  (Standard Pascal) */
Xclass Idents_uc: public Assoc
X{ int hash( char*);
X  int equiv( char*,  char*);
X
X public:
X  Idents_uc(int struct_size, int initial_size = 4):(struct_size,initial_size)
X    {}
X};
X
X/* Underscores are valid, but "transparent glue" when not in the first
X** position. Case does not matter.  E.g. "foo_bar" is equivalent to
X** "FooBar". But "_foo" is not equivalent to "foo".
X**
X** This scheme passes the "telephone test":  You can read programs to
X** people over the telephone, without constantly saying, "all caps",
X** "underscore", etc..  I wish this were standard practice.
X*/
Xclass Idents_Tc: public Assoc  // T for telephone.
X{ int hash( char*);
X  int equiv( char*,  char*);
X
X public:
X  Idents_Tc(int struct_size, int initial_size = 4):(struct_size,initial_size)
X    {}
X};
X
X
X/* Underscores are not valid.  Case matters.  (Modula II) */
Xclass Idents_uC: public Assoc
X{ int hash( char*);
X  int equiv( char*,  char*);
X
X public:
X  Idents_uC(int struct_size, int initial_size = 4):(struct_size,initial_size)
X    {}
X};
X
X
X/* Underscores are valid. Case does not matter. */
Xclass Idents_Uc: public Assoc
X{ int hash( char*);
X  int equiv( char*,  char*);
X
X public:
X  Idents_Uc(int struct_size, int initial_size = 4):(struct_size,initial_size)
X    {}
X};
X
X#endif IDENTS_INCLUDED
END_OF_idents.H
if test 1652 -ne `wc -c <idents.H`; then
    echo shar: \"idents.H\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f idents.doc -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"idents.doc\"
else
echo shar: Extracting \"idents.doc\" \(668 characters\)
sed "s/^X//" >idents.doc <<'END_OF_idents.doc'
X//
X// These are classes derived from class Assoc, for indexing with identifiers
X// (as defined by various languages).  They are intended for use in compilers
X// and such.
X//
X// The keys need not be null-terminated, so you can just read an entire 
X// source file into a buffer, then use pointers into the buffer as keys.
X// The redefined virtual functions hash() and equiv() recongize the
X// ends of the keys by context.
X//
X// The virtual functions copy_key() and delete_key() are not redefined.
X// The default version does not actually make a copy, it just returns
X// the char* it is given.  So the keys must remain untouched so long
X// as the lookup table is active.
END_OF_idents.doc
if test 668 -ne `wc -c <idents.doc`; then
    echo shar: \"idents.doc\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f queue.C -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"queue.C\"
else
echo shar: Extracting \"queue.C\" \(1049 characters\)
sed "s/^X//" >queue.C <<'END_OF_queue.C'
X#include "queue.H"
X
X
X// All Queue_lists will come from this private heap.  MUCH quicker.
X
Xstatic Heap heap(sizeof(Queue_list));
X
X
Xvoid* Queue::push(void* cont)
X{
X  size++;
X  Queue_list* old_head = head;
X  head = (struct Queue_list*)heap.alloc(); 
X  head->cdr = old_head;
X  head->car = cont;
X  if(!tail) tail = head;
X  return cont;
X}
X
Xvoid* Queue::pop()
X{
X  if(size == 0) return 0;
X  size--;
X  void* cont = head->car;
X  Queue_list* condemned = head;
X  head = head->cdr;
X  heap.free(condemned);
X  if(size==0) tail = 0;
X  return cont;
X}
X
Xvoid* Queue::append(void *cont)
X{
X  size++;
X  Queue_list* old_tail = tail;
X  tail = (struct Queue_list*)heap.alloc();
X  tail->cdr=0;
X  tail->car=cont;
X  if(old_tail)
X    old_tail->cdr = tail;
X  if(head==0)head = tail;
X  return cont;
X}
X
X
Xvoid* Queue_iterator::operator() ()
X{
X  if(done()) { rest = lq->head;  return 0; }
X  void* retval = rest->car;
X  rest=rest->cdr;
X  return retval;
X}
X
XQueue::~Queue()
X{
X  while(head)
X    { Queue_list* condemned = head;
X      head = head->cdr;
X      heap.free(condemned);
X    }
X}
END_OF_queue.C
if test 1049 -ne `wc -c <queue.C`; then
    echo shar: \"queue.C\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f queue.H -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"queue.H\"
else
echo shar: Extracting \"queue.H\" \(779 characters\)
sed "s/^X//" >queue.H <<'END_OF_queue.H'
X#ifndef QUEUE_DEFD
X#define QUEUE_DEFD
X
X#include "heap.H"
X
X
Xclass Queue_list
X{
X  
Xfriend class Queue_iterator;
Xfriend class Queue;
X  
X  Queue_list* cdr;
X  void* car;
X  
X};
X
X
Xclass Queue
X{
X
X  Queue_list* tail;
X  Queue_list* head;
X  int size;
X  friend class Queue_iterator;
X
X public:
X  Queue() { size = 0; tail=0; head=0; }
X  ~Queue();
X
X  void* push(void*);
X  void* pop();
X  void* append(void*);
X
X  inline  
X  void* first()
X    { return(size==0)? 0 : head->car; }
X
X  inline  
X  void* last()
X    { return (size==0) ? 0:  tail->car; }
X  
X  inline int length() { return size; }
X};
X
Xclass Queue_iterator
X{
X  Queue* lq;
X  Queue_list* rest;
X
X public:
X  Queue_iterator(Queue& q) { lq=&q; rest = q.head; }
X  void* operator() ();
X  int done() { return rest==0; }
X				 
X};
X
X#endif QUEUE_DEFD
END_OF_queue.H
if test 779 -ne `wc -c <queue.H`; then
    echo shar: \"queue.H\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f queue.doc -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"queue.doc\"
else
echo shar: Extracting \"queue.doc\" \(267 characters\)
sed "s/^X//" >queue.doc <<'END_OF_queue.doc'
X
X// Class Queue impelements queues of pointers.  Can be used
X// as FIFO (use method "append") or LIFO (use method "push").
X//
X// Public methods are
X//
X
X  void* push(void*);
X  void* pop();
X  void* append(void*);
X  void* first();
X  void* last();
X  inline int length();
END_OF_queue.doc
if test 267 -ne `wc -c <queue.doc`; then
    echo shar: \"queue.doc\" unpacked with wrong size!
fi
# end of overwriting check
fi
echo shar: End of shell archive.
exit 0

agnew@trwrc.UUCP (R.A. Agnew) (03/01/88)

In article <8802231548.AA16212@ajpo.sei.cmu.edu> "EDWARD CRAGG" <ecragg@gmuvax.gmu.edu> writes:
>Does anyone have either ideas or experience related to a hashing
>function in Ada for converting a string of unspecified length
>the type conversion using attributes (to keep totaly machine 
>independent) is eating me alive.  Does anybody know what kind of
>
    If using character'pos(s(i)) is generating slow code, you need a new compiler.
This should only be overhead for the compiler but should generate very little code.
I generally define my own enumeration type, say Identifier_Char and type convert
with Identifier_Char'Pos(s(i)). I just benched a minimally perfect hash code for 
Ada reserved words using this technique on a Sun3/75 with the Telesoft Telegen2
compiler and got 133 microseconds worst case (case where tokens are always Ada
reserved words).

	Just to check the above, I put 10000 character'pos conversions in a loop
and timed them under Unix and got 0.0 seconds on all three times. In order to
get a better comparison, I wrote the following two subprograms:


procedure test_mod is

	i: positive;
	n: natural := 0;
	c: Character;

begin
	for i in 1..10000 loop
		for c in 'A' .. 'Z' loop
			n := n mod 16381;
		end loop;
	end loop;
end test_mod;

procedure test_pos is

	i: positive;
	n: natural := 0;
	c: Character;

begin
	for i in 1..10000 loop
		for c in 'A' .. 'Z' loop
			n := (n + Character'Pos(c)) mod 16381;
		end loop;
	end loop;
end test_pos;

The only difference between the two is the character to integer conversion and
so the difference in execution times between them is the cost of ten thousand
conversions. I did not use the optimizer. They both timed out between 1.9 and
2.0 seconds elapsed time on the Sun3/75 so I still claim that the compiler
generates no code due to the type conversion.  Perhaps your hash agolrithm is
inherently slow. I have developed a very efficient hashcode algorithm for arbitrary
length strings which is based on Fermat and Mersenne primes and uses only shifts
and integer addition.

    And no, that's not the reason why the Ada compilers are so slow. The language
that they are compiling is just a couple orders of magnitude more complex than 
most others. To appreciate this, visualize writing a compiler to optimize code
generated by nested generics instantiated on a data type like a complex tensor!