jt@nrcvax.UUCP (Jerry Toporek) (09/20/85)
This newsgroup has been rather quiet. If there are folks out there, I'd like to hear what you are doing, or would like to be doing, with regards to a number of topics related to statistical software. This isn't a survey, so don't feel obliged to answer everything. Pick out whatever seems important to you. Here goes: Are you generally happy with the available statistical software in your computing environment? Are UNIX people using S? Is it really what you want, and, if so, for what types of applications? What else is being used in the UNIX world? Are your data management tools adequate? Do they provide the kind of operating environment you want? Do data analysts still basically prepare commands and submit them to a background process, or do they prefer some kind of interactive operation? How are the statistical packages performing on the IBM PC? Do you prefer the older, bigger, major statistical packages which have been made to run on the PC, or the newer packages produced specifically to run in the small machine environment? Is there a package which combines the best features of both types of package? What are those features? Are people starting to use smaller machines for local computing and large machines for data storage? Are there tools available to support distributed computing and data management? Do you want them? Let me interrupt this line of questioning to say that my interest in all this stems from the fact that most of my professional career has been spent developing statistical software, but the past year has been spent entirely in development of networking software. The switch came, in part, from a belief that statistical software of the future will be built on top of tools providing access to resources within a network environment. Data storage on the machine with the big disks, number crunching from the array processor, data collection direct from the lab equipment or production line sensors, print service down the hall on the machine with the laser printer, etc. etc., all at my disposal on my little machine under my desk which couldn't hope to do all that by itself. Anyone else think this is the way to go? Are we still recovering from the dramatic shortages of card readers? Enough for now? -- Jerry Toporek {sdcsvax,hplabs}!sdcrdcf!psivax!nrcvax!jt ucbvax!calma!nrcvax!jt
perlman@wanginst.UUCP (Gary Perlman) (09/25/85)
I am compelled by unknown forces to do this every year, I guess because people thank me for it. Since 1980, I have been distributing a small statistics package called UNIX|STAT, so called because it was developed on UNIX and uses pipelines a lot; it is a very UNIX style package. Thanks to a lot of grundgy work by Fred Horan at Cornell, the Lattice C compiler, and continuing education in portabilty, most of the programs have been ported to MSDOS on the IBM PC. I am not yet ready to distribute the programs on floppies for MSDOS, but more than one site has been able to take the sources I distribute and compile them for MSDOS with other C compilers. Over the next few months, I will be doing V&V work on the MSDOS versions and find some floppy-copy house to make copies. So, what is UNIX|STAT? Well, it's not comprehensive, but there are a lot of good programs in it. They are described below. More programs are likely in the next year. Some people have sent me code (that I have not yet had time to incorporate) for non- parametrics, and I am working on a multi-factor crosstabs/chi-square. People seem to like UNIX|STAT because it integrates with UNIX naturally, reading the standard input and writing the standard output. It even has documentation: tutorials, manual entries, and I have even made a video tape introduction (although the tape has not been distributed with the package). It is also cheap: $20 gets you a mag tape, or you can send me a 600 foot mag tape and prepaid return mailer and get it free. This, obviously, is public domain software. If you send me your postal address, I can send you more documentation. Now for details. Note: if you are using UNIX|STAT 5.0, there is nothing new here. UNIX|STAT 5.0 COMPACT DATA ANALYSIS PROGRAMS UNIX|STAT is a set of UNIX System data manipulation and analysis programs developed at the University of California, San Diego by Gary Perlman (now teaching at the Wang Institute of Graduate Studies). The programs are designed with the UNIX System philosophy that individual programs should be designed as tools that do one task well and produce output suitable for input via pipes to other programs. Interactive use is supported in the UNIX System shell which also provides a programming language for complex analyses. Typical usage involves a pipeline of transformations of data followed by input to an analysis program, summarized schematically by: INPUT DATA | TRANSFORM | ANALYSIS | OUTPUT RESULTS Functionality often built into statistical packages (e.g., graphics, sorting and other data manipulation) is not re-invented in UNIX|STAT which delegates such responsibility to standard UNIX System tools. FEATURES easy to use (negligible training period) simple input formats (free format field oriented) used in pipelines with other UNIX System utilities (sort, vi) flexible data manipulation data validation provided (range and type checking) full documentation support (manual entries, tutorials) extensible (many modular C functions) faster than most packages (usually less than a second per analysis) small enough for micros (10-25K byte programs) runs on any UNIX System (V6, V7, 2.8BSD, 4BSD, III.0, System V, others) public domain software (can't be distributed for gain) in use at more than 300 UNIX System sites for five years CHANGES FOR RELEASE 5.0 (March 5, 1985) reworked to increase portability, reliability, and usability all commands now use a standard option parser (getopt) all calculations are now done in double precision diagnostic error messages have been improved regress now does a partial correlation analysis colex and trans were added as alternatives for dm F ratio probabilities are now better approximated some inefficient input was optimized some non-portable features of C were replaced so that the programs now run under MSDOS on the IBM PC the random number seeding has been improved all programs now use a zero exit status on success version control was added--we are now at release 5.0 UNIX|STAT is Public Domain The programs have been released to the public and are distributed to anyone who wants them. Persons wanting to get a copy of the package should contact me directly. You can get the package for free if you send me a tape and a self-addressed prepaid return mailer. Or you can send me personally $20 US to cover the costs of a tape and mailing. The distribution includes: The C source files for all the programs. The documentation source files. A collection of test examples. Contact: Gary Perlman Wang Institute of Graduate Studies Tyng Road Tyngsboro, MA 01879 USA (617) 649-9731 uucp: decvax!wanginst!perlman sdcsvax!sdcsla!perlman csnet: perlman@wanginst arpa: sdcsla!perlman@nprdc NOTES: UNIX|STAT is unsupported, though known bugs have been removed. UNIX|STAT may not be distributed for profit. UNIX|STAT is NOT a product of any company or organization. UNIX|STAT is distributed on a `` use-at-your-own-risk basis.'' UNIX|STAT(1) UNIX User's Manual UNIX|STAT(1) NAME UNIX | STAT - compact data analysis programs DESCRIPTION UNIX | STAT is a set of data manipulation and analysis pro- grams developed at the University of California, San Diego. The programs are designed with the UNIX System philosophy that individual programs should be designed as tools that do one task well and produce output suitable for input via pipes to other programs. Interactive use is supported in the UNIX System shell which also provides a programming language for complex analyses. Functionality often built into statistical packages (e.g., graphics, sorting and other data manipulation) is not re-invented in UNIX | STAT which delegates such responsibility to standard UNIX System tools. DATA TRANSFORMATION PROGRAMS abut join data files colex column extraction dm column oriented data manipulator io control and monitor input and output maketrix create matrix type file from free-form file perm randomly permute lines in a file repeat repeat a pattern or file reverse reverse lines and characters series print a series of numbers transpose transpose matrix type file ANALYSIS PROGRAMS anova multi-factor anova with repeated measures calc interactive algebraic modeling calculator critf/pof F-ratio/probability conversion functions dataplot flexible data plotting desc descriptions histograms, frequency tables dprime signal detection d' and beta calculations oneway one-way anova and t-test pair paired data statistics, regression, plots regress multivariate linear regression ts time series analysis and plots validata verify data file consistency vincent time-series comparison AUTHOR Gary Perlman (with the help of several others) SEE ALSO sh(1), sort(1), uniq(1), sed(1), awk(1), grep(1), rm(1), cp(1), pr(1), ls(1), mv(1) -- Gary Perlman Wang Institute Tyngsboro, MA 01879 (617) 649-9731 UUCP: decvax!wanginst!perlman CSNET: perlman@wanginst
ronb@natmlab.OZ (Ron Baxter) (09/26/85)
In article <277@nrcvax.UUCP> jt@nrcvax.UUCP (Jerry Toporek) writes: > >Are you generally happy with the available statistical software in your >computing environment? Are UNIX people using S? Is it really what you >want, and, if so, for what types of applications? What else is being used >in the UNIX world? > On our system (4.2 BSD on a Vax 750) the main statistical packages available are: o GLIM - an old favourite with good notation and abilities for fitting models, but somewhat messy output and a quirky syntax (e.g. sometimes you need to have a $ at the end of a line to provoke action, and sometimes you don't.) Only needs a low once-off fee. o MINITAB - users like it because it is easy to use. It has an annual fee (so is more expensive than GLIM) and is distributed as a binary (so tough if you don't like some of the decisions that have been made for you). o GENSTAT - for getting ANOVAS for data from complex designed experiments - it is better than the rest. It also has an annual fee (similar price to Minitab). It is not seen as "easy to use", but it is quite powerful. I have done a UNIX conversion of this package and it is available from NAG. o S - need I say more in this group. It has the best graphics facilities that we have for data analysts. It is this that often gets users started on S, but then they discover it can do more. The fact that it really is a practical proposition to add your own algorithms in Fortran puts it way ahead of the others which have limits that are more solidly defined. These are the main ones, we do have other more specialized packages, and libraries such as IMSL. >Are your data management tools adequate? Do they provide the kind of >operating environment you want? Do data analysts still basically prepare >commands and submit them to a background process, or do they prefer some kind >of interactive operation? > S and MINITAB are largely used interactively. GENSTAT can be but usually isn't (people grew up using this in batch mode on CDC machines so ...). GLIM is somewhere between being used interactively some of the time. >Are people starting to use smaller machines for local computing and large >machines for data storage? Are there tools available to support distributed >computing and data management? Do you want them? > I can see lots of scope for mmachines like the Microvax but we haven't moved far down this path yet. >...................................... The switch came, in part, from a >belief that statistical software of the future will be built on top of tools >providing access to resources within a network environment. ............ I agree that different facilities will be brought together by networks. I also like the idea of this personal workstation being my window onto all this. However, at this stage I don't see a powerful enough workstation at a low enough price to start pushing the low-cost VDUs off everyones desks. -- Ron Baxter, ACSNET: ronb@natmlab CSIRO Div Maths & Stats, ARPA: munnari!natmlab.oz!ronb@SEISMO.ARPA National Measurement Lab., UUCP: ...!seismo!munnari!natmlab.oz!ronb PO Box 218, Lindfield, NSW, Australia, 2070. PHONE: +61 2 467 6059
hes@ecsvax.UUCP (Henry Schaffer) (09/26/85)
> > This newsgroup has been rather quiet. > > How are the statistical packages performing on the IBM PC? > -- > Jerry Toporek The big news around here is awaiting SAS for the PC. It should run on IBM and compatibles - requires 512k, and should take up about 5 Mb on a hard disk. It really sounds like a PC/AT is the desired configuration. It will only be available on a site license basis, with their usual sizeable discount for educational institutions. It will come out in several parts, and should be very compatible with the mainframe version. --henry schaffer