[bit.listserv.sas-l] Inter-Rater/Inter-Time Reliability

ted@CARDIO.UCSF.EDU (Ted Venet) (02/06/90)

hELlo out therE:

I (not really) sure this issue has bee thrashed through in previous
correspondences (but if it is I don't know where it went to) so my
apologies for raising it this time around.

Suppose there is some clinical phenomenon of interest and one wants to
acquire some index of its reliablility - both with respect ot the individual
making the observation on the patient and the temporal stability of the
observation without respect to the identity of the observer.

Any ideas how this might be tackled.

Some questions come to mind:

     1 - how many observations/case would be needed; how many cases would be
         needed.  (Which is more important? Is it better somehow to go with
         more than two observations per patient (each made twice by each
         observer)?

     2 - has anyone seen any SAS code that could help.

Thanks in advance,

Ted Venet
UCSF School of Medicine

alderton@NPRDC.NAVY.MIL (David Alderton) (02/08/90)

Ted,
   You asked which is most important, observer reliability or the
stability of the thing being observed.  Let me recast this a little.
Reliability is reduced by three separate sources of error variance.
One stems from the properties of the measurement scale.  Although it is
not often discussed, many metrics are arbitrary transformations of the
"true" underlying metric and these transformations are frequently
non-linear albeit usually monotone.  A second source of error is the from
the measurement instrument (not to be confused with the metric itself).
Instruments themselves ALWAYS have error variance which is manifest
in the metric but not due to it.  The third source of "error" is the
stability of the thing being measured.  I quote error because not all things
worthwhile measuring are stable and the fluctuations (i.e., instability)
may only be error in the sense of a reliability statistic attempting to
determine stability.
    Given all this, now ask yourself where you expect most of the error?
If you have a good instrument (like an air pressure gage) then its
error contribution will be minimal (and unimportant if you have everyone
use the same one) and thus the observers' contribution will be minimized.
If you have a good metric its error contribution will be
minimal (and perhaps inestimatable if its the only metric -- just try to
make observations that employ the whole scale) and the observers'
contribution will be minimal.  If these two things are "tight" then the
only thing that will affect reliability will be the stability of the
thing being measured.
      Now for reality.  Many psychological and clinical "measures" are
barely distinguishable from the person doing the measurement (i.e., observer).
Rating scales for schizophrenia, beauty pagents, child motility, and social
aggressiveness, for some examples.  Here the presence and degree of presence
are assigned by people, the scale is arbitrary (from "none" to "a lot"),
and the measurement "instrument" is little more than the consensual
understanding among the people doing the rating of the thing being
measured.  If what you are dealing with is something along these lines then
a large source of error variance will be from the rators -- the scale and
printed instrument have little to do with it.  If the inter-rator variance
is not controlled for then this error variance will be attributed to the
"thing" being measured.  Here you would want multiple rators for each
occurrance of the "thing" to assess their agreement and also have multiple
"observations" of the thing to assess its stability.
       Enough, if you have something like this, then SAS can be used to
partition the variance into shares due to rators and shares due to the
instability of the thing.  This can be done with GLM or something else.
If you want to share a little more about your specific problem I'll be happy
to make some more specific recommendations about a design and SAS analysis
plan.
       I don't mean to sound pedantic but THIS (testing and measurement)
is my (professional) life!

   Sincerely,
      David.

                          David L. Alderton, Ph.D.
               Navy Personnel Research and Development Center
                         Aptitude Research Division
                                  Code 131
                          San Diego, CA  92152-6800
                     (619 553-7647 or AUTOVON 553-7647)
                      arpanet: alderton@nprdc.navy.mil

 *===========================================================================*
 | The opinions expressed or implied are mine, are not official, and do not  |
 | necessarily reflect the views of the Navy Department or the US Government |
 *===========================================================================*