emv@msen.com (Ed Vielmetti) (06/17/91)
do you ever get the feeling that all this NREN stuff is going to build us a network that's extraordinarily fast but impossible to use? mitch kapor, in his "building the open road: policies for the national public network" [*], compares the existing mess in the current state of the art at cataloging and describing the services available on the net today as "like a giant library with no card catalog". who is going to provide the moral equivalent of the rand-mcnally road atlas, the texaco road maps, the aaa trip-tiks? what we have now is much more like the old Lincoln Highway, with painted markings on trees and oral tradition that helps you get through the rough spots on the road. efforts by existing commercial internet providers have been mediocre at best. none appear to be much interested in mapping out the network beyond the immediate needs of their customers. if you consider that one of the roles of a commercial internet provider is to provide access to software archives, and then you take a look at the state of the software archives on uunet.uu.net and uu.psi.com, you see enormous duplication, strange and hard to understand organizations of files, no aids in finding materials beyond a cryptic "ls-lR" file, and dozens if not hundreds of files which are stale and out of date compared with the One True Version maintained by the author of the documents. [&] Visiting these places is like reading magazines at a dentist's office, you know that what you're reading was new once a few weeks or months ago. efforts by nsf-funded network information centers have been similarly muddled and half-useful. if you read the Merit proposal to NSFnet closely, you saw plans for GRASP (Grand interface to SPIRES) which was going to be the ideal delivery mechanism for information about the NSFnet to users of the net. Promises promises. What you do have from nis.nsf.net is a stale collection of out of date maps [%], a bunch of traffic measurement aggregate numbers [#], and some newsletters[=]. the work at nnsc.nsf.net isn't all that much better. part of the problem is reliance on volunteered information -- the general approach to network information gathering appears to be not much more than send out a survey, wait, tabulate the responses. very little of this work is what you would call "pro-active", that's why chapter 3 (archives) lists just 26 of the over 1000 anonymous FTP sites and mail-based archive servers available on the net. [?] (Think of it as a road atlas that shows less than 1 road in 40 and you'll get the right idea.) that's not to say that there aren't skilled people out there, it's just that they're generally not supplied with resources adequate to the task they're facing. you aren't seeing organizations like ANS, which seems to be flush with cash and hiring skilled people left and right, hiring anyone with the archivist skills of a (say) Keith Peterson. you aren't seeing innovative applications like "archie", a union list catalog of FTP sites around the globe, funded as part and parcel of NSF infrastructure; it's being done in Canada, with no guarantee to continued existence if it starts to swamp their already soggy USA-Canada slow link or if they need the machine back. [+] you don't see nic.ddn.mil hosting the arpanet "list of lists" anymore, they didn't like the contents so it's gone. [@] the internet library guides are run as best they can by individuals, and they're in the form of long ascii lists of instructions on how to connect rather than an interactive front-end that would make the connections for you -- not that the technology isn't there, just that no one has a mission and the resources to provide them. [!] so what do we end up with? a very fast net (in spots) with a "savage user interface" [*]. multi-megabit file transfers, you can get anything you want in seconds, but no way to find it. regional networks spending large amounts of federal dollars on bandwidth but very little on ways to use it effectively. a vast, largely uncharted network, with isolated pockets of understanding here and there, and no one yet who has appeared with any of the proper incentives and resources to map it out. -- Edward Vielmetti, MSEN Inc. moderator, comp.archives emv@msen.com references for further study: [*] eff.org:/npn/. discussion in comp.org.eff.talk. [@] ftp.nisc.sri.com:/netinfo/interest-groups. see also dartcms1.dartmouth.edu:siglists and vm1.nodak.edu:listarch:new-list.* discussion in bit.listserv.new-list. [!] vaxb.acs.unt.edu:[.library], also nic.cerf.net:/cerfnet/cerfnet_info/internet-catalogs* discussion in comp.misc and bit.listserv.pacs-l. [+] see discussion in comp.archives.admin. archie information can be found in quiche.cs.mcgill.ca:/archie/doc/ [%] in nis.nsf.net:maps. note that several are as old as 1988. no readily apparent newsgroup for discussion. [#] in nis.nsf.net:stats. no readily apparent newsgroup for discussion. [=] in nis.nsf.net:linklttr. no convenient way to search through them short of downloading the whole set. [&] for instance, see uunet.uu.net:/sitelists/ (empty) uunet.uu.net:/citi-macip/ (CITI has withdrawn this code) uu.psi.com:/pub/named.ca (out of date named cache file still shows nic.ddn.mil as root nameserver) discussion in comp.archives.admin [?] nnsc.nsf.net:/resource-guide/chapter.3/. note that many entries have not been updated since 1989. discussion in comp.archives.admin.
srctran@world.std.com (Gregory Aharonian) (06/18/91)
I agree with Ed completely. For the past six years, I have been building a database of information on the location of computer software available in source code form from around the world. Currently I have information on over 15,000 programs. What I do is very time consuming, and intellectually demanding in that I have to know a little bit about everything to help separate the good stuff from the bad. To date, I have received ZERO attention and funding from the US government, even though most of the software I track is government funded. The Don't Transfer Research Projects Agency epitomizes the incompetence in the government, spending hundreds of millions of dollars on software development, and ZERO on any effective transfer. NASA likes to think its competent with its 1200 program COSMIC collection, even though I have records on over 4000 programs available at NASA sites. Despite receiving some attention in the press, and many letters on my part, no government agencies have shown any interest in doing anything with existing source code resources (and universities don't do any better). The Congressional bills to promote critical technologies, information highways, and have the CIA involved in technology espionage, are all a waste of tax dollars. There is so much technology already available that can be transferred with low technology solutions. My observation is that there is gross misunderstanding of the economics of information and information transfer, leading to proposals that, if they could be evaluated, would have negative cost-benefit. Unfortunately, I do not believe (and care anymore) that any solutions will come out of the government. There has been so little criticism of government information technology activities inside the DoD, DoE, NASA and NSF that they would not recognize a good idea if it hit them. The only way these problems will be solved will be through people willing to understand the economics of information and software, and offer solutions through the market. (By the way, I forgot to flame my favorite waste project, the Software Thats Alreadybeen Rejected Somewherelse project, which seeks to improve software productivity ten fold without spending a cent proving that they achieved their goals.) I'll probably get flamed for this posting (just in case, other words that come to mind include incompetent, self-serving, tax-dollar waste, impotent, fraudelent, repetivitie, duplicative (I have seen 200 federally funded FFT routines), and most other perjoratives). All I know is that there are over 15,000 programs available publicly in source code form in this great computer/software country of ours, and I'm the only one that knows where. Gregory Aharonian Source Translation & Optimization
emv@msen.com (Ed Vielmetti) (06/18/91)
In article <9106171612.AA01441@mazatzal.merit.edu> clw@MERIT.EDU writes:
The Directory Group at MERIT, Chris Weider and Mark Knopper, are starting
to address some of these issues. I do think that Directory Services are
a good medium term answer, and we're starting to put everything which
fits the X.500 philosophy into X.500....
All due respects, Chris, but X.500 doesn't address many of these
issues at all, and the ones it does sort of fit into can be more
easily addressed with other tools.
X.500 Directory services assume a neat, structured, hierarchical name
space and a clear line of authority running from the root all the way
to the leaves. Indeed, most X.500 services in place on the internet
today that work well enough to be useful run off of centrally
organized, centrally verified, and bureaucractically administered
information -- the campus phone book. For what this is, it's great --
i'm happy that I can finger user@host.edu at any number of sites and
get something back. But that is of little relevance to the archives
problem.
X.500 services are hard to run -- the technology is big, bulky,
osified. So the people who are most interested in running it are the
"computer center" folks. If you look for the innovative, interesting,
and desirable applications that you'd want to find on the net, you'll
see that many of them are being done out in the field in departmental
computing environments or increasingly in small focused private
commercial or non-commercial efforts. There's not a terribly good
reason for these two groups to communicate, and so most X.500 projects
have much more structure than substance.
X.500 services are directory oriented. The data in them is relatively
small, of known value, and highly structured. Information about
archive sources is just about completely counter to these basic
principles. The amount of information about any particular service
which you'd like to have on hand can be quite considerable; perhaps at
minimum access instructions, but more likely some text describing the
service, who its intended audience is, sample output, etc. In
addition it would be valuable to keep information on user reactions to
the system close to the official provided directory notice; these
reviews (a la the michelin guide) are often more valuable than the
official propaganda put out by the designer. To search this mass of
information, you'll want something much more expressive than the
relatively pitiful X.500 directory access tools -- full text
searching, at the very minimum, with a way to sensibly deal both with
structured data and with more fuzzy matches on "similar" items.
X.500 is a holy grail, there's a lot of money which seems to be being
thrown at it these days in the hope to make it useful. Good luck, I
wish you well. But please, don't try to cram all the world's data
into it, because it doesn't all fit. It's a shame that equivalent
amounts of effort aren't being spent on developing other protocols
more suited to the task. I'm thinking in particular of the Z39.50
implementation in WAIS [*] which holds a lot of potential for
providing a reasonable structure for searching and sifting through
databases which have rich textual information. Perhaps it's just as
well that federal subsidy hasn't intruded here and clouded people's
judgments on the applicability of a particular technology to a
certain task.
--
Edward Vielmetti, MSEN Inc. moderator, comp.archives emv@msen.com
"often those with the power to appoint will be on one side of a
controversial issue and find it convenient to use their opponent's
momentary stridency as a pretext to squelch them"
[*] think.com:/public/wais/,
also quake.think.com:/pub/wais/
worley@compass.com (Dale Worley) (06/18/91)
In article <EMV.91Jun18000345@bronte.aa.ox.com> emv@msen.com (Ed Vielmetti) writes:
X.500 services are directory oriented. The data in them is relatively
small, of known value, and highly structured. Information about
archive sources is just about completely counter to these basic
principles.
X.500 is a holy grail, there's a lot of money which seems to be being
thrown at it these days in the hope to make it useful.
What can be done to produce good catalogs? As Ed notes, archive
information is likely to be bulky, chaotic, and of unknown (probably
small) value. Given how much money is needed to get a directory
system for information without these problems running, it will
probably take much more to get a good system for archive information
working.
Perhaps the analogy to road maps can be a guide -- Roads have been
around for thousands of years, but road maps have only been available
for fifty(?) years. What happened? One thing is that it is now
possible to make a map and then sell thousands (hundreds of
thousands?) of copies, thus making each copy reasonably inexpensive.
Until the development of the automobile this was not possible, there
were too few potential users. (Or even necessary, since a horse cart
is slow enough that stopping to ask directions in each town isn't a
burden.)
One possibility is to make a service that charges you for use. A good
archive information system should see enough use that each query can
be quite inexpensive. And the authorization and billing should be
easy enough to automate!
Dale Worley Compass, Inc. worley@compass.com
--
Perhaps this excerpt from the pamphlet, "So You've Decided to
Steal Cable" (as featured in a recent episode of _The_Simpsons_)
will help:
Myth: Cable piracy is wrong.
Fact: Cable companies are big faceless corporations,
which makes it okay.
eachus@largo.mitre.org (Robert I. Eachus) (06/18/91)
I was at a Hypertext meeting a year or so ago, and after listening to all the talks, I commented to a friend: "You know, we had librarians for thousands of years before the invention of movable type made them necessary. In Hypertext, everyone is trying to do it the other way round." What I see on the net makes the Hypertext people sound like forward thinkers. The net is even more chaotic than the (often static) environments that have been used in Hypertext prototypes. In the Hypertext arena the problem is that they are developing the tools without considering how the necessary databases will be created. On the net, we have much more data than anyone can comprehend, but no support for even developing the tools. What the world and the net need is a new type of organization which is a software library. Given funding, such an institution could provide disk space (cheap), net access (not so cheap, but arguably billable to actual users, software developers to provide the necessary tools (no big deal), and actual software LIBRARIANS to develop a cataloging system and actually organize all this stuff. That will be by far the bigest expense. There is as yet no Dewey Decimal system for software, but we desparately need it. Incidently, all the fancy software in the world with multiple keys, multiple views, etc. won't address that need. What make the Dewey system (or Library of Congress) useful is that once I have it in my head, I know where books on say Cryptography are to be found, and I can find related books that I didn't know about. A keyword probe will miss closely related--but different--subjects. -- Robert I. Eachus with STANDARD_DISCLAIMER; use STANDARD_DISCLAIMER; function MESSAGE (TEXT: in CLEVER_IDEAS) return BETTER_IDEAS is...
scs@iti.org (Steve Simmons) (06/19/91)
worley@compass.com (Dale Worley) writes: >What can be done to produce good catalogs? As Ed notes, archive >information is likely to be bulky, chaotic, and of unknown (probably >small) value. Given how much money is needed to get a directory >system for information without these problems running, it will >probably take much more to get a good system for archive information >working. Arguing with an analogy is silly, but I'm gonna do it . . . :-) In the middle ages, maps were often critical trade secrets. A chart of waters was worth significantly more than its weight in gold, as it revealed both what places existed and how to get there and back safely. The Portugese managed to keep the "safe route" to Japan secret for an incredibly long time. Trivially yours, Steve -- "If we don't provide support to our users someone is bound to confuse us with Microsoft." -- Charles "Chip" Yamasaki
ajw@manta.mel.dit.CSIRO.AU (Andrew Waugh) (06/19/91)
In article <EMV.91Jun18000345@bronte.aa.ox.com> emv@msen.com (Ed Vielmetti) writes: > X.500 Directory services assume a neat, structured, hierarchical name > space and a clear line of authority running from the root all the way > to the leaves. While this is certainly true, it is important to understand why this is so. X.500 is intended to support a distributed directory service. It is assumed that there will be thousands, if not millions, of repositories of data (DSAs). These will co-operate to provide the illusion of a single large directory. The problem with this model is how you return a negative answer in a timely fashion. Say you ask your local DSA for a piece of information. If the local DSA holds the information you want, it will return it. But what if it doesn't hold the information? Well, the DSA could ask another DSA, but what if this second DSA also doesn't hold the information? How many DSAs do you contact before you return the answer "No, that piece of information does not exist"? All of them? X.500 solves this problem by structuring the stored data hierarchically and using this heirarchy as the basis for distributing the data amongst DSAs. Using a straightforward navigation algorithm, a query for information can always progress towards the DSA which should hold the information. If the information does not exist that DSA can authoritatively answer "No such information exists." You don't have to visit all - or even a large proportion - of the DSAs in the world. It is important to realise that this is a generic problem with highly distributed databases. The X.500 designers chose to solve it by structuring the data. This means that X.500 is suitable for storing data which can be represented hierarchically and is less suitable for storing data which cannot. Exactly what data will be suitable for storing in X.500 is currently an open question - there is simply not sufficient experience. The proposed archive database which started this thread will have exactly the same problem. The solution chosen will, if different to that X.500 uses, will have problems as well. There is no such thing as a perfect networking solution! >X.500 services are hard to run -- the technology is big, bulky, >osified. So the people who are most interested in running it are the >"computer center" folks. If you look for the innovative, interesting, >and desirable applications that you'd want to find on the net, you'll >see that many of them are being done out in the field in departmental >computing environments or increasingly in small focused private >commercial or non-commercial efforts. There's not a terribly good >reason for these two groups to communicate, and so most X.500 projects >have much more structure than substance. > >X.500 services are directory oriented. The data in them is relatively >small, of known value, and highly structured. Information about >archive sources is just about completely counter to these basic >principles. The amount of information about any particular service >which you'd like to have on hand can be quite considerable; perhaps at >minimum access instructions, but more likely some text describing the >service, who its intended audience is, sample output, etc. In >addition it would be valuable to keep information on user reactions to >the system close to the official provided directory notice; these >reviews (a la the michelin guide) are often more valuable than the >official propaganda put out by the designer. To search this mass of >information, you'll want something much more expressive than the >relatively pitiful X.500 directory access tools -- full text >searching, at the very minimum, with a way to sensibly deal both with >structured data and with more fuzzy matches on "similar" items. > >X.500 is a holy grail, there's a lot of money which seems to be being >thrown at it these days in the hope to make it useful. Good luck, I >wish you well. But please, don't try to cram all the world's data >into it, because it doesn't all fit. It's a shame that equivalent >amounts of effort aren't being spent on developing other protocols >more suited to the task. I'm thinking in particular of the Z39.50 >implementation in WAIS [*] which holds a lot of potential for >providing a reasonable structure for searching and sifting through >databases which have rich textual information. Perhaps it's just as >well that federal subsidy hasn't intruded here and clouded people's >judgments on the applicability of a particular technology to a >certain task. As for the rest of the posting, all I can say is that it must be great to know so much about the costs and benefits of using X.500. From my perspective, it is obvious that X.500 will not solve all the world's problems (nothing ever does :-) but it is way too early to be so dogmatic. When we have had 1) The necessary expericence of implementing X.500, running X.500 databases and storing different types of data in such a database; and 2) experience in alternative highly distributed databases. (X.500 might prove to be extremely poor for storing certain types of data - but the alternatives might be even worse.) then we can be dogmatic. andrew waugh
rhys@cs.uq.oz.au (Rhys Weatherley) (06/19/91)
In <EACHUS.91Jun18164709@largo.mitre.org> eachus@largo.mitre.org (Robert I. Eachus) writes: > Incidently, all the fancy software in the world with multiple >keys, multiple views, etc. won't address that need. What make the >Dewey system (or Library of Congress) useful is that once I have it in >my head, I know where books on say Cryptography are to be found, and I >can find related books that I didn't know about. A keyword probe will >miss closely related--but different--subjects. I agree that something like this is needed, but how is it going to be organised? There's a big difference between books and computer programs. If I go into a library and walk up to the shelf marked "Mathematical Logic" (marked in Dewey Decimal or whatever), then the books I find there will be about the various aspects of "Mathematical Logic" and just that. However, if I walk into a computer store and walk up to the shelf marked "Spreadsheets" I'll also find programs that double up as wordprocessors, databases, desktop publishers, comms programs, ... in addition to being a spreadsheet. So if the "Compy Decimal" system (or whatever) was used, we'd find such programs under lots of different numbers and sooner or later some librarian is going to forget to enter a program under all necessary headings, or a programmer is not going to tell the librarian all the headings and we are back to square one. Similarly, using identifiers for programs like "spreadsheet,database,wordprocessor,unix,xwindows:123.8" aren't going to be much better, and we'll get back to the keyword search problem eventually. Some central control would be needed (as with any library system) and that would be a good idea (and I agree with this), but with "creeping featurism" being the favourite passtime of upgrades these days, it's only going to get worse. When a book is published, further editions don't stray much from the original topic - but program users are always screaming for more features over and above what a program was initially intended for, meaning extra identifiers for every new version of a program. Distributed database technology is not the answer, just the means. Better information is the answer. Maybe it's time we retrained programmers to write programs to perform a single task, not control the world! :-) We'll come up with something eventually, but I don't think it will fit into the library/archive framework we are used to: there's so much more information in computing than humans are used to. It will have to be something new. Any ideas? Cheers, Rhys. +=====================+==================================+ || Rhys Weatherley | The University of Queensland, || || rhys@cs.uq.oz.au | Australia. G'day!! || || "I'm a FAQ nut - what's your problem?" || +=====================+==================================+
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/19/91)
Instead of complaining about how inappropriate X.500 is for all but the simplest problems, why don't we identify the problems we're really trying to solve? I think that the Internet People Problem---make a *usable* database of people on the Internet---embodies most, if not all, of the technical and political difficulties that an archive service has to overcome. You want to find that vocrpt package? I want to find Don Shearson Jr. You want to find a SLIP-over-the-phone package? I want to find a SLIP-over-the-phone expert. You want to know where you got this collection of poems? I want to know where I got this phone number. You want to see what everyone thinks of vocrpt now that you've found it? DISCO wants to get references for Shearson from anyone who's willing to admit he's worked with him. One advantage of starting from the Internet People Problem is that it has a lot more prior art than the archive problem, from telephone directories on up. Once we've solved it we can see how well the same mechanisms handle data retrieval. ---Dan
eachus@largo.mitre.org (Robert I. Eachus) (06/19/91)
In article <2013@uqcspe.cs.uq.oz.au> rhys@cs.uq.oz.au (Rhys Weatherley) writes: > I agree that something like this is needed, but how is it going to be > organised? There's a big difference between books and computer programs... We're violently agreeing. Anyone can do the repository bit, it is organizing a software collection in a meaningful way that will be the tough job. Ed Vielmetti is trying to do one part of the job, but I am saying that the real need is for the other $150 (or whatever) worth of work on that Library of Congress card. > Maybe it's time we retrained programmers to write programs to perform > a single task, not control the world! :-) We used to joke that every program in the MIT AI lab grew until it could be used to read mail. Now we know they don't stop there... > We'll come up with something eventually, but I don't think it will fit > into the library/archive framework we are used to: there's so much > more information in computing than humans are used to. It will have > to be something new. Any ideas? Some ideas, but this is in the class of very hard problems. Even if you have a database program which is designed only to be a fancy phone dialer, it may implement an algorithm which is what I am looking for for my radar application. Or I may not want the program, but I am looking for the Minneapolis telephone directory which is provided with this program, and I'll also need the program so I can use it... It seems to me that we will need an indexing scheme that looks hierarchical to the user, but which is actually implemented with fuzzy logic. When I go looking for a database program it would originally exclude the phone dialer programs, but when I get to database programs with data on addresses in Minnesota, the example I used above is now back in. -- Robert I. Eachus with STANDARD_DISCLAIMER; use STANDARD_DISCLAIMER; function MESSAGE (TEXT: in CLEVER_IDEAS) return BETTER_IDEAS is...
lars@spectrum.CMC.COM (Lars Poulsen) (06/20/91)
In article <EMV.91Jun18000345@bronte.aa.ox.com> emv@msen.com (Ed Vielmetti) writes: > X.500 services are directory oriented. The data in them is relatively > small, of known value, and highly structured. Information about > archive sources is just about completely counter to these basic > principles. In article <WORLEY.91Jun18094957@sn1987a.compass.com> WOrley@compass.com (Dale Worley) writes: >What can be done to produce good catalogs? As Ed notes, archive >information is likely to be bulky, chaotic, and of unknown (probably >small) value. Given how much money is needed to get a directory >system for information without these problems running, it will >probably take much more to get a good system for archive information >working. Actually, we know quite well what it takes to raise the signal-to-noise ratio. Administration and moderation. One possible option would be for the Internet Society to sponsor an archive registration facility. Maybe each of the IETF task forces can identify valuable programs that need to be archived, with mirrored servers on each continent, available for NFS mounting as well as anonymous FTP. It should be worth $50 for each site to have access to good easily accessible archives instead of having to keep disk space for everything in our own space. (I know; not every "hobby site" can afford $50, but there are many commercial sites, including my own, that would be happy to help feed such a beaST; I'm sure many academic sites would be able to help, too). -- / Lars Poulsen, SMTS Software Engineer CMC Rockwell lars@CMC.COM
worley@compass.com (Dale Worley) (06/20/91)
In article <1991Jun20.070516.683@spectrum.CMC.COM> lars@spectrum.CMC.COM (Lars Poulsen) writes:
One possible option would be for the Internet Society to sponsor an
archive registration facility. Maybe each of the IETF task forces can
identify valuable programs that need to be archived, with mirrored
servers on each continent, available for NFS mounting as well as
anonymous FTP. It should be worth $50 for each site to have access to
good easily accessible archives instead of having to keep disk space
for everything in our own space.
Let me do some calculation. (Of course, some of these numbers may be
off -- I'd like to see how other people think it can be organized.)
First off, it's going to take at least 6 people to run the
organization. For the first few years, it will take at least 3
programmers and 3 administrators. Remember, there are 15,000 (to
quote somebody) programs out there, and each one needs to be
catalogued, at least minimally. Also, since it is a for-pay service,
somebody has to handle payment and bookkeeping. That will cost
something like $600,000 per year.
And then there's advertising costs -- and it's going to be hard to
advertise it over the Internet, because the Internet doesn't like
money-grubbing.
And there's the cost of maintaining the system's computer, with its
connection to the Internet.
And there has to be a way to limit access to the archives to those
people who have paid for the service -- otherwise there's no incentive
for people to subscribe.
OK, so maybe the total budget is $700,000 per year.
Now, how many sites can we get to sign up? If we're extremely lucky,
and spend a lot on advertising, maybe 1000 will sign up the first
year. That puts subscriptions at $700/year. If you start with 100
sites, subscriptions have to be $7000/year.
Dale Worley Compass, Inc. worley@compass.com
--
I'm a politician -- that means I'm a liar and a cheat. And when I'm not
kissing babies, I'm stealing their lollypops. -- "The Hunt for Red October"
caa@Unify.Com (Chris A. Anderson) (06/21/91)
In article <2013@uqcspe.cs.uq.oz.au> rhys@cs.uq.oz.au writes: >In <EACHUS.91Jun18164709@largo.mitre.org> eachus@largo.mitre.org (Robert I. Eachus) writes: >However, if I walk into a computer store and walk up to the shelf marked >"Spreadsheets" I'll also find programs that double up as wordprocessors, >databases, desktop publishers, comms programs, ... in addition to being >a spreadsheet. One way around this is to have a "Main Category" and "Sub-Category" headings for the software. That way, the primary function of the software would be listed, and any other features could be placed under sub-categories. And Emacs could still be the kitchen sink. :-) >So if the "Compy Decimal" system (or whatever) was used, we'd find such >programs under lots of different numbers and sooner or later some librarian >is going to forget to enter a program under all necessary headings, or >a programmer is not going to tell the librarian all the headings and >we are back to square one. Similarly, using identifiers for programs like >"spreadsheet,database,wordprocessor,unix,xwindows:123.8" aren't going >to be much better, and we'll get back to the keyword search problem >eventually. I think that a perfect system is unrealistic. The idea is to make it better than it is. If a program is not entered under all of it's relevant headings, then so be it. So long as the main purpose of the program is found, I'd be lots happier. >Some central control would be needed (as with any library system) and that >would be a good idea (and I agree with this), but with "creeping featurism" >being the favourite passtime of upgrades these days, it's only going to >get worse. When a book is published, further editions don't stray much >from the original topic - but program users are always screaming for more >features over and above what a program was initially intended for, meaning >extra identifiers for every new version of a program. Distributed database >technology is not the answer, just the means. Better information is the >answer. Why not have the authors of the program provide the categories that a system or program should be entered under? They know the software best, and probably wouldn't forget to enter it under too many headings. The problem with having a central librarian concept is that you require those people to be authorities on a vast amount of information. Not only what has gone before, but every new technology that comes out. That's a terrific burden. >Maybe it's time we retrained programmers to write programs to perform >a single task, not control the world! :-) Reality, my friend, reality! :-) And let's train managers and marketroids to not ask for just "one more thing" while we're at it. Chris -- +------------------------------------------------------------+ | Chris Anderson, Unify Corp. caa@unify.com | +------------------------------------------------------------+ | Do not meddle in the affairs of wizards ... for you |
peter@ficc.ferranti.com (Peter da Silva) (06/22/91)
In article <2013@uqcspe.cs.uq.oz.au> rhys@cs.uq.oz.au writes: > If I go into a library and walk up to the shelf marked "Mathematical Logic" > (marked in Dewey Decimal or whatever), then the books I find there will > be about the various aspects of "Mathematical Logic" and just that. Most will. Many will have digressions into other aspects of mathematics, logic, Zen, etcetera... > However, if I walk into a computer store and walk up to the shelf marked > "Spreadsheets" I'll also find programs that double up as wordprocessors, > databases, desktop publishers, comms programs, ... in addition to being > a spreadsheet. Yes, but they are basically spreadsheets. All these "integrated programs" have a central model that describes their behaviour, and a bunch of extra tools that are stuck on the side. They're also a dying fad. The only point to things like Lotus is to make up for limitations in MS-DOS (single tasking, no IPC, etc). Better operating environments will replace the swiss-army-knife program. > Maybe it's time we retrained programmers to write programs to perform > a single task, not control the world! :-) Start by boycotting MS-DOS. -- Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180; Sugar Land, TX 77487-5012; `-_-' "Have you hugged your wolf, today?"
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/24/91)
I think the Mathematics Subject Classification model would apply quite well to archived files (and netnews!). A central authority defines a three-level hierarchy of codes, each covering some subject area; in the MSC, for instance, 11 is number theory, 11J is approximation, and 11J70 is continued fraction approximation. Every article published is given (by the author) a primary five-digit code and any number of secondary five-digit codes. Mathematical Reviews then lists articles by code. Anyone who doesn't find his subject listed can use a ``None of the above but in this section'' classification, then ask the AMS to add that subject in the next MSC revision. Of course, the MSC (which is available for anonymous ftp on e-math.ams.com as mathrev/asciiclass.new) wouldn't apply directly to software; we'd have to draft a whole new set of categories. But the model will work. ---Dan
worley@compass.com (Dale Worley) (06/24/91)
There already *is* a computer science classification system (the ACM Computer Surveys classifications), although it's oriented toward academic CS research rather than practical software. Dale Worley Compass, Inc. worley@compass.com -- Vietnam was only a police action so why do we have a War on Drugs?!? -- P.B. Horton
cmf851@anu.oz.au (Albert Langer) (06/25/91)
In article <11900.Jun2322.59.2491@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >I think the Mathematics Subject Classification model would apply quite >well to archived files (and netnews!). Sounds like a useful model to start from - especially: 1. Use of more than one level. 2. Codes defined by a central authority. 3. Assignment of primary and any number of secondary codes. I doubt that there will be much success with self-assignment by authors of software packages since unlike mathematicians they are not used to relying on literature searches for prior art anyway. However there is no way to find out how viable that fourth feature of the maths system is until we have the codes assigned by a central authority. If it also turns out to be viable fine, otherwise I propose the "cooperative cataloging" model used by libraries - i.e. the first major archive site that stocks the package does the classifying and others copy - that distributes the work among people who understand the classification scheme, even though not as widely as by distributing it to authors as well. (Once it has caught on, and people actually USE the catalog classifications, one could THEN hope for some self-cataloging by authors.) By "major archive site" I really mean "cataloging site" - i.e. one that is willing to do far more than the typical ftp site in actually maintaining organized cataloging information. This need not actually be a site that has disk space available on the internet, though considering that disk space is now only $2 per MB I don't see why not. Another set of possible catalogers are the moderators and indexers of the *sources* groups. (There was some discussion re a classification scheme in comp.sources.d recently). >Of course, the MSC (which is available for anonymous ftp on >e-math.ams.com as mathrev/asciiclass.new) wouldn't apply directly to >software; we'd have to draft a whole new set of categories. But the >model will work. As well as new categories I think we would have to add quite a lot of features to the model e.g. 1. Version numbers. For whole and component parts. 2. *sources* message-id/subject headings/archive names 3. file sizes for source and object code software, docs, test and other data, abstracts (README, HISTORY etc) and various combinations, with "standard" filenames. 4. refinement of 3 to include postscript/dvi and "source" forms of documentation, compressed and uncompressed versions with various packaging methods etc. 5. Patches and what they apply to and result in. 6. Languages used (perhaps merely one of many classifications, but could add file sizes and numbers for each). 7. Pre-requisite software. (Not a classification but a reference to other cataloged packages with specific version numbers). 8. Pre-requisite hardware. 9. Release status. (alpha, beta, gamma etc) 10. Copyright information. (Whether "freely available" etc) 11. Systems tested on. 12. Systems it is believed to work on. 13. Systems it is believed not to work on. Only the most important information need be provided initially, but it should be possible to add other stuff including even review comments or pointers to discussion in newsgroups. This could be provided for at the same time as setting up system for cooperative cataloging since coop cataloging implies being able to take an existing or non-existant catalog record and add to it and have that then available for others to use or add to. Adding "review comments" would be particularly useful. It still strikes me that libraries are the institutions that should be doing this. One thing though, if they aren't prepared to take it on yet, perhaps they could make available the software used at no charge? There are some very powerful systems in use for cooperative cataloging and MARC records that cover everything from audio tapes to maps are just as complex as anything we will need for software packages. How about just submitting a couple of packages as "publisher" to the LC and ask for the "Cataloging In Publication" data to be returned overnight as is done for book manuscripts. Should produce some discussion :-). U.S. copyright law clearly defines computer programs as "literary works" and I can't see anybody claiming that something like "c news" or X windows is "merely ephemeral" so I guess they would HAVE to catalog it. The Library of Congress IS on the internet (loc.gov) - but if they won't accept submissions by email or ftp somebody could just startup a "publisher" to issue a series of tapes and diskettes for physical delivery to them with each volume a separate monograph (not part of a single serial) containing one software package. I'm quite serious about this, proper cataloging DOES cost about $200 per item and it IS THEIR JOB. We should just be helping with specialist advice. P.S. For anyone wanting to follow up - I just don't have time - a contact at the LC is: Sally H. McCallum, Chief Network Development and MARC Standards Office Library of Congress smcc@seq1.loc.gov (202) 707-6273 -- Opinions disclaimed (Authoritative answer from opinion server) Header reply address wrong. Use cmf851@csc2.anu.edu.au