[comp.ai] Repository of AI source code

mnr@daisy.learning.cs.cmu.edu (Marc Ringuette) (12/08/90)

It would be extremely useful to have access to an archive of source code
for common AI problems.  Such an archive could contain simple planners, 
parsers, frame-based representations, and commonly used algorithms.  This
would encourage sharing and discourage reinventing the wheel.

A second emphasis of such an archive could be as a research resource.  It
could contain implementations of published work, experimental results and
challenge problems, and domains for testing (for instance) robot agents.
I would put a version of my Tileworld domain in such an archive, if I knew
of one.


Does such a repository exist?  If not, I'm sure the AAAI would be willing
to sponsor such an effort.  Do you think it would be worthwhile, and if so
do you have any ideas for additional material it should contain?

Please share your comments.

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
 \\\ Marc Ringuette \\\ Carnegie Mellon University, Comp. Sci. Dept. \\\
  \\\ mnr@cs.cmu.edu \\\ Pittsburgh, PA 15213.  Phone 412-268-3728(w) \\\
   \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

forbus@ils.nwu.edu (Kenneth Forbus) (12/09/90)

AAAI is indeed sponsoring such an effort.  

There are two important purposes for such a library.  First, as a field
we have done a terrible job at record-keeping.  Programs die, due to
bit-decay (i.e., the language they are written in evolving out from
under them) and by their authors simply not keeping copies around.
The existence of Common Lisp makes bit-decay easier to prevent,  
Keeping copies around, however, should be made easier.  In other fields
not being able to easily duplicate one's experiments is considered very
shoddy.  I'm told that in psychology, for example, some journals require that
authors maintain data on which articles are based for at least X years,
where X varies with the journal.

The second purpose is communication and education.  Programs are our
main experimental apparatus, and sharing programs can help us make
progress better.  How many times have you read about some interesting
technique, and really wanted to try it on some example, but been stymied by
the effort it would take to re-implement the technique?  Having a set of
well-developed, portable, well-documented programs, with examples, would
help overcome such problems.

Clearly, these two goals conflict: Asking someone to produce a high-quality,
bullet-proof program before archiving it would simply mean that few would
archive their programs.  So, the idea is that the Program Library will
have two kinds of programs:

1. Archival systems, such as thesis programs, which are being deposited
purely for purposes of scientific replication and inspection.  

2. "Vetted" systems, which have passed the inspection of an Editorial
Board, to make sure they are adequately documented, run on the supplied
examples, are reasonably portable, etc.  Included with the system will be
reviews of it.

We are still investigating the right way to run the legalities,
so that AAAI doesn't get sued if someone misuses programs, or tries to deposit
their company's trade secrets.  The model we are looking at right now is the
Free Software Foundations Copyleft.  Access will be via anonymous ftp and
other media; details still being worked out.

While alot has been worked out, many things remain to be worked out.
Progress has been somewhat slowed by my recent move, but I hope to have
an initial version of the Library up and running by the middle of next year.
I'll be posting more details as soon as things are better worked out.  In
the meantime, I'd be happy to hear any comments, questions, or suggestions
anyone has.

	Ken Forbus
	The Institute for the Learning Sciences
	Northwestern University
	1890 Maple Avenue
	Evanston, IL, 60201, USA

P.S. Please be forewarned that my email response time varies wildly with 
my other duties, so patience may be required :-)

In article <11331@pt.cs.cmu.edu>, mnr@daisy.learning.cs.cmu.edu (Marc
Ringuette) writes:
> It would be extremely useful to have access to an archive of source code
> for common AI problems.  Such an archive could contain simple planners, 
> parsers, frame-based representations, and commonly used algorithms.  This
> would encourage sharing and discourage reinventing the wheel.
> 
> A second emphasis of such an archive could be as a research resource.  It
> could contain implementations of published work, experimental results and
> challenge problems, and domains for testing (for instance) robot agents.
> I would put a version of my Tileworld domain in such an archive, if I knew
> of one.
> 
> 
> Does such a repository exist?  If not, I'm sure the AAAI would be willing
> to sponsor such an effort.  Do you think it would be worthwhile, and if so
> do you have any ideas for additional material it should contain?
> 
> Please share your comments.
> 
> \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>  \\\ Marc Ringuette \\\ Carnegie Mellon University, Comp. Sci. Dept. \\\
>   \\\ mnr@cs.cmu.edu \\\ Pittsburgh, PA 15213.  Phone 412-268-3728(w) \\\
>    \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

joshua@athertn.Atherton.COM (Flame Bait) (12/09/90)

mnr@daisy.learning.cs.cmu.edu (Marc Ringuette) writes:
>It would be extremely useful to have access to an archive of source code
>for common AI problems.  Such an archive could contain simple planners, 
>parsers, frame-based representations, and commonly used algorithms.  This
>would encourage sharing and discourage reinventing the wheel.

Instead of a central archive, have a central index.  That means that
one machine does not have to store all the source, etc.  It just needs
to store an index and "how to get" instructions from all the various
sites listed in the index.  

This has been successfully done on rec.games.frp, where someone keeps a 
list of electronic resources and how to use them.  It is posted every month, 
I think.

Joshua Levy
joshua@atherton.com
(408) 734-9822

theo@cs.fau.edu (Theo Heavey) (12/09/90)

> >It would be extremely useful to have access to an archive of source code
> >for common AI problems.  Such an archive could contain simple planners, 
> >parsers, frame-based representations, and commonly used algorithms.  This
> >would encourage sharing and discourage reinventing the wheel.
> 
> Instead of a central archive, have a central index.  That means that
> one machine does not have to store all the source, etc.  It just needs
> to store an index and "how to get" instructions from all the various
> sites listed in the index.  
> 
> This has been successfully done on rec.games.frp, where someone keeps a 
> list of electronic resources and how to use them.  It is posted every month, 

Wouldn't it be more efficient to keep this "index" at an anon ftp site?
This would reduce the amount of repostings of the entire "index".
If the moderator (if there is one) on rec.games.frp just listed the
new sites for information via anon ftp as they are introduced I think 
it would be a lot more helpful.

Theo Heavey
Florida Atlantic University
Dept. of Computer Science
Boca Raton, FL

Internet: theo@cs.fau.edu

pat@cs.strath.ac.uk (Pat Prosser) (12/13/90)

In article <11331@pt.cs.cmu.edu> mnr@daisy.learning.cs.cmu.edu (Marc Ringuette) writes:
>It would be extremely useful to have access to an archive of source code
>for common AI problems.  Such an archive could contain simple planners, 
>parsers, frame-based representations, and commonly used algorithms.  This
>would encourage sharing and discourage reinventing the wheel.

Extremely

>
>Do you have any ideas for additional material it should contain?
>

Recently there was such a request posted to this group for 
public domain (PD) algorithms for the constraint satisfaction 
problem. I would like to see this, along with a set of standard 
csp problems. Also, for those among us that are interested in scheduling,
a number of scheduling problems could be made available.
These algorithms/problems would not only discourage reinvention, but
would also allow us to compare algorithms on given data sets. This would 
be progress!

POPX@vax.oxford.ac.uk (Jocelyn Paine) (12/14/90)

Newsgroups: comp.ai
Subject: Re: Repository of AI source code
Summary:
Expires:
References: <11331@pt.cs.cmu.edu>
Sender:
Reply-To: popx@vax.ox.ac.uk (Jocelyn Paine)
Followup-To:
Distribution:
Organization: Experimental Psychology, Oxford University, GB                 
Keywords:

In article <11331@pt.cs.cmu.edu> mnr@daisy.learning.cs.cmu.edu (Marc Ringuette) writes:
>It would be extremely useful to have access to an archive of source code
>for common AI problems.  Such an archive could contain simple planners,
>parsers, frame-based representations, and commonly used algorithms.  This
>would encourage sharing and discourage reinventing the wheel.
>

In 1987  I set up such  a library for  Prolog, for the very  reasons you
describe. I'd be  willing to extend it to cover  AI software in general.
In fact,  why don't  I decide to  do so  now? So here's  what I  have to
offer:

``I teach AI to psychology  undergraduates, using Pop-11 and Prolog (the
course  used to  be entirely  Prolog,  but I'm  moving towards  Pop-11).
During  the  course, I  talk  about  topics like  scripts,  mathematical
creativity, planning,  natural language analysis, and  expert systems; I
exemplify them by mentioning well-known  programs like GPS, Sam, and AM.
I  hope   before  too  long  (May   1991)  to  integrate   these  into a
computer-simulated animal.

I would like  my students to be  able to run existing  AI programs, from
GPS to Mycin and up, and to investigate their mechanism and limitations.
For students  to incorporate into their  own programs, I'd also  like to
provide a  library of  tools such as  chart parsers,  inference engines,
search routines, and planners.  Unfortunately, published descriptions of
the  famous programs  give much  less information  than is  necessary to
re-implement them (it  would be easier to re-implement  cold fusion than
the  average AI  program).  As for  the tools:  some  are reproduced  in
textbooks.  But the  published  code has  to be  kept  small to  satisfy
publishers, and it is often not available in machine-readable form.

I therefore  decided in  1987 to start  up a  library of  Prolog code. I
shall now  extend it to  cover any AI language  in which people  want to
send programs.

Sending contributions.
----------------------

  Please E-mail  them to  user POPX at  Janet address  UK.AC.OX.VAX (the
  Vax-Cluster at  Oxford University Computing Service).  Only send text,
  not  object or binary files (I  will not accept  programs in any  form
  other than source text).

  If a  file occupies more  than a megabyte,  please E-mail me  about it
  first, but don't send the big file itself until I reply to request it.
  This will avoid the problem we sometimes have where our mailer rejects
  big files because there isn't room for them.

  I  accept  all  entries  on   the  understanding  that  they  will  be
  distributed to anyone who asks for them. I intend that the contents of
  the  library  be treated  in  the  same way  as  proofs  in the  maths
  literature, and  algorithms in  computer science textbooks  - publicly
  available  ideas which  everyone can  experiment with,  criticise, and
  improve.

  I'll try to put entries into  the library within two weeks of arrival,
  and  to  test  those  entries  for which I  have a  suitable  language             
  implementation.

Catalogue.
----------

  I keep  a catalogue of entries.  It contains for each  entry: the name
  and  geographical  address  of  the entry's  contributor  (to  prevent
  contributors receiving  unwanted E-mail, I don't  include their E-mail
  addresses unless they ask me to);  a description of the entry, usually
  with examples  of use; and an  approximate size in kilobytes  (to help
  those whose mailers can't receive large files easily).

  For those  entries which I can  run, I also include  my evaluations of
  ease of use, portability, standardness, and documentation.

Quality of entries.
-------------------

  Any contribution  may be useful to  someone out there, so  I'll accept
  anything.  I'm  not just  looking  for  elegant code  and  declarative
  respectability.

  However, it would be nice if  entries were to be adequately documented
  (with   literature  references   if   appropriate,  plus   respectable
  documentation for both the users and the programmers).

Requesting entries.
-------------------

  It prefer  to send by  E-mail, and can do  so into any  network that's
  connected cost-free to the UK academic  network Janet. I can also send
  files as DOS  text on IBM-PC discs,  or on VAX tapes.  In this case, I
  will  ask for  you to  send either  media,  or payment  for media,  in
  advance. We hope eventually to get a mail server running.

  You may  request the catalogue, or  a particular entry in  it, or (for
  example) "all the expert system shells written in LISP you have".

  I'll try to answer all requests within two weeks. If you get no reply,
  please send a  message by paper mail to my  address. Give full details
  of where your E-mail  was sent from, the time, etc.;  this may help us
  trace lost messages.

Jocelyn Paine,
Experimental Psychology,
South Parks Road,
Oxford OX1 3UD.

POPX @ UK.AC.OX.VAX
''

joshua@athertn.Atherton.COM (Flame Bait) (12/14/90)

This is the current context:
>> >It would be extremely useful to have access to an archive of source code
>> >for common AI problems.  Such an archive could contain simple planners, 
>> >parsers, frame-based representations, and commonly used algorithms.  This
>> >would encourage sharing and discourage reinventing the wheel.
>> 
>> Instead of a central archive, have a central index.  That means that
>> one machine does not have to store all the source, etc.  It just needs
>> to store an index and "how to get" instructions from all the various
>> sites listed in the index.  
>> 
>> This has been successfully done on rec.games.frp, where someone keeps a 
>> list of electronic resources and how to use them.  It is posted every month,

To which theo@cs.fau.edu (Theo Heavey) replied:
>Wouldn't it be more efficient to keep this "index" at an anon ftp site?
>This would reduce the amount of repostings of the entire "index".
>If the moderator (if there is one) on rec.games.frp just listed the
>new sites for information via anon ftp as they are introduced I think 
>it would be a lot more helpful.

In theory you're right, but in practice, posting the list is better.
The index can be posted automatically every month, so no one has 
to worry about it.  Also, the people most likely to use it are new 
to the newsgroup, and may not even know of the index's existance.  
Also, posting it regularly serves to remind the "old timers" of its
existance and the various sources of sources.  At the minimum you 
should post the location of the index every month (or every two weeks).

If the index is not posted regularly, then it should be available via
email archive-server (not just FTP).  Remember, most of the people who
get newsgroups can not FTP things.  They only have UUCP connections to
the net, and can not use FTP, only email.

Joshua Levy (joshua@atherton.com)

reece@enuxha.eas.asu.edu (Glen A. Reece) (12/15/90)

In article <5282@baird.cs.strath.ac.uk>, pat@cs.strath.ac.uk (Pat Prosser) writes:
> In article <11331@pt.cs.cmu.edu> mnr@daisy.learning.cs.cmu.edu (Marc Ringuette) writes:
> >It would be extremely useful to have access to an archive of source code
> >for common AI problems.  Such an archive could contain simple planners, 
> >parsers, frame-based representations, and commonly used algorithms.  This
> >would encourage sharing and discourage reinventing the wheel.
> 
> Extremely
> 
> >
> >Do you have any ideas for additional material it should contain?
> >
> 
> Recently there was such a request posted to this group for 
> public domain (PD) algorithms for the constraint satisfaction 
> problem. I would like to see this, along with a set of standard 
> csp problems. Also, for those among us that are interested in scheduling,
> a number of scheduling problems could be made available.
> These algorithms/problems would not only discourage reinvention, but
> would also allow us to compare algorithms on given data sets. This would 
> be progress!

  This is an excellent idea, and seems like many people are interested
in doing such a thing (i.e., Ken Forbus).  I would like to second Pat's
call for including scheduling problems and algorithms/techniques.  I'm
currently working in the area of job shop scheduling for my thesis in
AI and I'm running into the vary problem of reinventing work that I
know for a fact was done in the past.  In fact, I'm working with
Karl Kempf from the Intel AI Lab in Santa Clara, California, and his
position is that the results of the work must be made available so
people don't keep bumping their heads against the same walls.

    - Glen


------------------------------------------------------------------------
= Glen A. Reece             =  Arizona State University                =
= Industrial Fellow         =  Artificial Intelligence Lab.            =
=                           =  Dept. of Computer Science & Engineering =
= (602) 965-2735            =  Tempe, Arizona  85287-5406              =
=                           =                                          =
= reece@enuxha.eas.asu.edu  =  What's another word for Thesaurus?      =
------------------------------------------------------------------------

dmocsny@minerva.che.uc.edu (Daniel Mocsny) (12/25/90)

In article <1933@enuxha.eas.asu.edu> reece@enuxha.eas.asu.edu (Glen A. Reece) writes:
>[...] I'm
>currently working in the area of job shop scheduling for my thesis in
>AI and I'm running into the vary problem of reinventing work that I
>know for a fact was done in the past.  In fact, I'm working with
>Karl Kempf from the Intel AI Lab in Santa Clara, California, and his
>position is that the results of the work must be made available so
>people don't keep bumping their heads against the same walls.

This is a problem endemic to most areas of science and engineering.
Science and engineering advance only when communities of investigators
share their findings with each other, and build on the results of 
previous work.

The traditional vehicle for sharing findings is, of course, the
printed literature. This vehicle was adequate in ancient times
when most scientists and engineers worked on comparatively simple
problems. When your results consisted of a few concise equations,
maybe a few plots and nomographs, or a manageable table of data,
your paper was a complete summary of your work. Any of your peers
with comparable skills could read your paper and immediately begin
building on your results.

Today, computer technology has enabled scientists and engineers to
embark on complex research that tends to defy concise verbal 
explanation. Most significant results today can't be *functionally*
expressed in words, since their real expression is now in computer 
code. That doesn't render words obsolete---we still need those
high-level descriptions to organize our approach to the low-level
details. However, merely reading a high-level description no longer
enables the reader to reproduce the original results, nor to build 
on them productively and efficiently.

The traditional literature is now faltering in its mission as a vehicle 
for sharing ideas. Technical readers once expected to read a paper, 
and find something immediately useful. Today, many technical papers 
read more like advertisements, their functional content emasculated, 
and the reader no more capable after finishing the paper than before.

This sad trend appears to be the result of two forces: 
(1) traditionalism, and (2) hucksterism. 

Historically, science developed as a hobby of the idle rich. Scientific
results also tended to be too simple to have much commercial potential.
Since hucksterism was not a necessary or practical choice for
scientists most of the time, they had the luxury of establishing a
rather lofty tradition of excluding it. However, scientists still had 
a scarce commodity to ration---peer recognition. Instead of competing 
economically, they competed on the basis of the quantity and quality 
of their contributions to the literature. However, the exact nature
of those "contributions" became intimately entwined with the
particular technological basis for that literature: the printing press.
When science was simple, this was not a problem. Today, science is no
longer simple, but the definition of "contribution" still follows
from the technology of Gutenberg.

The massive expansion of science and technology after the Second World
War overwhelmed the breeding capacity of the idle rich. The only way
to sustain such expansion has been to recruit people from the middle
class, by turning science and technology into a set of professions.
For most scientists and engineers, peer recognition is more than 
something to feel good about while relaxing in the den. It is the key 
to sustaining and advancing careers.

At the same time, scientific results have become much more complex and
economically valuable. The scientist today, upon discovering something
useful, must consider its commercial potential before reporting it.
A useful result can now become the basis for a major new industry in
just a few years. This is a profound temptation for a salaried employee.

What is the answer? I don't know. Reward systems and productivity
in science and technology today need some serious investigation. 
Scientists and engineers need a vehicle for publishing their *complete*
results, not just advertisements about their results. They also need
incentives for doing so. We need some sort of "productivity index" to
attach to scientific publications (of all types). Does the publication
increase the capability of the reader in any measurable way?

--
Dan Mocsny				Snail:
Internet: dmocsny@minerva.che.uc.edu	Dept. of Chemical Engng. M.L. 171
	  dmocsny@uceng.uc.edu		University of Cincinnati
513/751-6824 (home) 513/556-2007 (lab)	Cincinnati, Ohio 45221-0171