[comp.sys.dec] data analysis packages

jkelly@inland.com (04/20/91)

This posting is probably not suited to the 'fortran' newsgroup, but suggestions
as to appropriate newsgroups for the posting are welcome.

Please email responses directly to me, jkelly@inland.com, I will post a summary
response to the appropriate newsgroups.

I want to establish a fairly complete list of commercial packages suitable for
database and data analysis.  The application is analysis of data from a pilot
scale industrial process, and will involve plotting the value of various values
over time, generation of summary statistics, pareto charts and other statistical
information.  I need to be able to zoom in and out on particular portions of
the data.  It would be nice to be able to box a certain portion of the data
with a mouse, and compute information local to that box such as average value,
derivative wrt time, smoothed curve, etc.  Various signal processing type
abilities might well be valuable.  For example, various signals might get
destroyed by 60 Hz noise.  Being able to try various digital filtering
techniques, might well help solve such problems.  The volume of data is likely
to be on the order of 100 variables with on the order of 20000 data points for
each variable.  I would expect the package to be able to plot 20000 data points 
in less than 10 seconds.  (believe it or not rs1 on a vax 8800 cannot meet this
requirement.  I know, I know 9600 baud to a dumb terminal, but that is the least
of its performance problems).

The likely platforms for such software are: a 386/486 pc, some unix box, some
vax box, some other box that meets the cost and performance requirements.
(Here is a opertunity for vendors to assail me with proposals).  The preferred
interface would be some type of window enviroment e.g. macintosh, open look,
motif.  Sunview, decwindows might do in a pinch, depending on how well the
package has adapted to those systems flaws.

Systems that I know about include, these may or may not fit the above wish list.

rs1:  Seems unneccessarily slow, I don't know if it has a more pleasant window
type interface on some other box.  Doesn't really provide an enviroment for
playing with the data.  Has no signal processing capabilities.  Probably far
too expensive.

SAS:  Last time I used it was in 1983, seemed okay then, but I've become fussier
since.  I believe the have a workstation version of it.  I don't know how
pleasant it is to use.  It has good statistical capabilities, I don't know if
it has any signal processing capabilities.  Probably far too expensive.

IDX:  Last used it on a microcrap in 1988.  I liked the sort of interactive
fortran style programming enviroment.  At that point it just had a character
based input.  signal processing capabilities.  I don't know what its statistical
capabilities are.

PI: written specifialy for industrial process control. Its capabilities for
'playing' with data might be limited.
Given its intended market, I would imagine it is priced at 5 times what it
should be. (If I am being unnecessarily obnoxious,
please send me figures to prove me wrong).

Sherpa: A steel company product for process control.  I don't know any of
its capabilities.

Thanks,
Jim Kelly 

jkelly@inland.com (06/11/91)

Finally, the summary that I promised on information regarding data
analysis packages.  It looks like I need to talk to SAS and RSI.
We got 'gnuplot' from the net, it is pretty convenient for plotting
raw data and mathematical functions, but not really what I had in
mind.  Thanks to all who responded.

J. Kelly

------------
From:	UUCP%"rfinch@locke.water.ca.gov" 23-APR-1991 23:17:25.43
To:	jkelly@inland.com
Subj:	Re: Data analysis packages

We use PV-Wave, which is almost 100% derived from IDL.  It's pretty
good, will do much of what you want.  For more info contact RSI:
303-786-9900

-- 
Ralph Finch			916-445-0088
rfinch@water.ca.gov		...ucbvax!ucdavis!caldwr!rfinch
Any opinions expressed are my own; they do not represent the DWR

----------------
From:	UUCP%"macq@miguel.llnl.gov" 22-APR-1991 23:18:41.02
To:	jkelly@inland.com
Subj:	Re: Data analysis packages

I'm a statistician using SAS version 6.03 on a Sun Sparcstation IPC under
Openwindows 2.0.  I've used SAS since about 1987.  I would hazard a
guess that the improvements since 1983 are substantial.
SAS now runs on a wide range of hardware and operating systems, including
DEC's latest and fastest ultrix machines (8360 or something like that,
a 3 cpu machine), Sun and DEC workstations, and I don't know what-all.

They have two specialized products perhaps relevant to you, SAS/QC
(quality control) and SAS/ETS (exploratory time series) neither of which
I have used.  The next version, 6.07, will be an X windows application,
and will have an interactive graphics and exploratory statistics module
called SAS/INSIGHT.  I've seen and tried this; it is very good and Mac-like,
but I doubt that it has your time series analysis capabilities.  Actually,
it is already out on some of the unix platforms (HP for one) and supposed
to ship in June for others (Sun).

It is probably worth your while to talk to SAS, but I'd guess that they don't
have the fancy mac-like interactivity you describe in ETS or QC.

Current prices for Sun workstations are on the order of $150 per module
first year, more like $100 for renewal, with discounts for licensing
multiple copies.  This pricing appears to be due to SAS assuming there
is one user per license.  There has been some talk on the SAS
newsgroup (bit.listserv.sas-l) about big price increases for certain
high end Sun.  That is, SAS is assuming that many users on a network
are using SAS, thus the high price.
-- 
--------------------
Don MacQueen
macq@miguel.llnl.gov
--------------------

-------------------
From:	UUCP%"marty%atmos.ogi.edu@RELAY.CS.NET" 22-APR-1991 23:17:42.17
To:	jkelly%inland.com@RELAY.CS.NET
Subj:	"data analysis"

One software package that will do some of what you want is Quality Analyst,
sold by NW Analytical, Portland, OR.  I can look up phone no. and address if
you're interested.  I believe their package will handle as many data points
as you have RAM to store it in (PC based only).  It's a statistical quality
control package and does x-bar, R, Pareto, and other standard process
control charts; also has FFT and some other time series analysis features
which I haven't used.  I don't think it has a mouse driven interface.  I
haven't used the most recent version a great deal so can't answer specific
questions, though I have a copy and could check a few things.  Good luck
with your search!  Marty S.

-----------------
From:	UUCP%"klassen@sol.UVic.CA" 23-APR-1991 23:16:40.39
To:	jkelly@inland.com
Subject: Re: Data analysis packages

If you have a SUN workstation, with a colour monitor,
then one new package to look at is PV~WAVE,
from Precision Visuals, Inc., of Boulder, Colorado.
It's been designed as an interactive, data-visualization tool.
It contains many features, plus a macro-writing language,
which lets you massage your data, and even do pop-up windows.

-------------------
From:	UUCP%"opto!glen@gatech.edu"  5-MAY-1991 23:17:38.04
To:	dscatl!inland.com!jkelly
Subject: Data analysis packages

Jim:

	The following is in response to your recent posting regarding
a graphics visualization package.

	(You indicated concern that 'comp.lang.fortran' might not be
a proper place to post same. While your inquiry did not directly pertain
to FORTRAN, the type of problem you indicate is typical of the problems
encountered regularly by newsgroup members and closely allied with the
skills they have developed. I doubt anyone will fault you for your 
choice of forum.)

	I believe your inquiry actually implied three distinct questions:

	1) Visualization software.
	2) Hardware.
	3) Data analysis and filtering methods.


VISUALIZATION SOFTWARE
 
	Depending on your budget, I believe the two extremes of the
spectrum are 'gnuplot' and 'PV-wave'.

	'gnuplot' is public domain, available from most archiving
network nodes, and has drivers for most screen environments and many 
hardcopy output devices (HP plotters and laserprinters). It has the 
ability to arbitrarily size the windows and assign the range of each 
axis. If you don't specify parameter limits, the 'autoscale' function 
usually makes intelligent decisions based on the immediate dataset. The 
big plus is it's free. The downside is that it won't do 3-D (But does 
do 2-D quite well). It compiles from C source, so if your C compiler 
has good floating-point performance, this will run very quickly. If 
your C compiler doesn't implement hardware floating point calls, the 
performance will suffer enough to make you find one that does. If funds 
are very tight, this may be the answer. Don't let the fact that it's 
free lead you to believe it's a toy. We've used it on major projects 
for Westinghouse and Hearst.

	I believe the high end of the market is presently occupied by
a package called PV-Wave. It runs on most SUN/DEC/SGI platforms and
costs about $2,000 for a single-CPU license. I can't put my finger on
an address without digging, but I remember it comes out of Utah.
Shouldn't be too hard to find. I can look further if you can't find 
other references easily. It does about everything you can ask a graphics
package to do (except DSP... more on this below). It's mouse-driven and
allows you to look at any dataset in just about any way imaginable.

HARDWARE

Hardware is hardware and will be of secondary importance in your decision
process. We make it a point to keep abreast of who has how much horse-
power for the dollar in this area. Some random thoughs follow:

1) If you *must* use a PC/INTEL platform, get one that has the Weitek
   floating-point co-processor and make sure the software is able to
   use it. For our applications, the Weitek performance is between 
   200% and 300% faster than the intel processors (80387 etc) at 
   the identical clock speed.

2) In the UNIX world, we have been pleased with the benchmarks we've run
   on the SUN SPARC II and IBM RS6000, which seem to have about the same
   horesepower for about the same dollars. Both draw fast pictures. If
   you like DEC, I understand the DECstation 5100 has similar performance.

3) Without going into machines having price tags that are 6-digits wide, 
   the top of the heap right now appears to be the Hewlett-Packard RISC 
   machines released about 2 weeks ago. (Nicknamed the 'snake' machines.)
   They claim 57 MIPS for a $12,000 box that sits on a desktop. 
   Benchmarks I've seen on the net indicate that these numbers are real, 
   not marketing hype. 

DATA ANALYSIS AND FILTERING SOFTWARE

	Our experience has been that few graphics packages have any
useful filtering (DSP) capability. What smoothing or other statistical
capabilities they posess are usually elementary textbook functions
with no real usefullness or easily-accessed applicability to engineering-
based data analysis. Perhaps this is because few comp-sci type have an
in-depth engineering background, and few practicing engineers have 
taken the time to write a graphics package.

	If the filtering capability is essential (notching out the
60 Hz noise or removing some other undesired artifact in the dataset), 
we may be able to be of help. Our firm specializes in optimizing 
industrial designs and processes. We have many, inhouse, DSP techniques 
to remove unwanted data contamination and accentuate desired trends in 
data. If we can be of service, please feel free to contact us at the
addresses below.

	Glen Clark, P.E.

	Glen Clark & Associates
	Consulting Engineers
	1150 Alpha Drive
	Suite 150
	Alpharetta, GA 30201-7168

	(404) 740-0178

	gatech!dscatl!opto!glen