Download Exploratory Variable Clustering for Integrating Analyses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Exploratory Variable Clustering for Integrating Analyses
David J Pasta, Technology Assessment Group, Inc.
Lori Potter, Technology Assessment Group, Inc.
quality oflife. Two treatment groups (209
Group 1 versus 295 Group 2) were compared on
36 summary outcome measures. Measurements
were analyzed as single question items and as
scales and subscales (weighted combinations of
items); the total number of original items
represented is 100 (Table 1).
ABSTRACT
In a cross sectional study comparing treatments
for pain, we demostrate how PROC VARCLUS
was used to strengthen and simplify results and
discover the underlying structure in the data.
Analysis was begun using standard estimation
procedures (PROC GLM) with a large number
of outcome measures. This paper traces the
steps involved in developing composite scores,
and discusses how to apply clustering
techniques to data that has both single item,
scale, and subscale scores. A proposed
approach to integrating and summarizing
clustering results is described.
TABLE 1
INTRODUCTION
Comparing two treatments using a large set of
variables often yields results that are difficult to
interpret. Using clustering techniques to reduce
the number of variables into a manageable set of
measurements often represents a fair tradeoff:
an acceptable loss in the amount of variance
explained for an increased understanding of the
data and a clearer picture of how variables
relate.
The VARCLUS procedure in SAS was used to
evaluate the validity of the pre-specified
composite scales and to explore other possible
scales. VARCLUS uses an iterative splitting
technique to divide a group of variables into
non-overlapping subgroups, each of which is
approximately unidimensional. For a group of
variables with a great deal of structure, a small
number of clusters can explain a large
proportion of the variance in the original group
of variables. With variables that have little
relationship, many clusters, each composed of
only a handful of variables, might be needed to
explain the same proportion of variance. The
first k principal components will always explain
at least as much variance as k cluster scores.
However, the cluster scores are easier to
describe and to interpret and often the reduction
in variance explained compared with principal
components is not dramatic.
Reactions Towards
Medication
FACT-G:
BPI
TAG
MOS
MSAS
Q12 a, b
TAG
Functional Assessment of Cancer Therapy-General
Brief Pain Index
Technology Assessment Group
Medical Outcomes Study
Memorial Symptom Assessment Scale
Analysis of covariance (ANCOVA), using
PROC GLM, was performed, controlling for
treatment site, demographics, and cancer stage.
Initial differences found between the two
treatment groups were in outcome measures
that, while not specifically linked to pain
medication, were indicative of the patient's
overall physical condition. Accordingly, we
repeated the analysis using physical
functioning as a covariate instead of an outcome
measure. Main effects by treatment interactions
were tested.
DATA ANALYSIS
In a cross-sectional, non-randomized study of
, pain management, 504 cancer patients were
surveyed regarding their current mode of pain
medication delivery and its impact on their
297
Group 1 patients had lower social, emotional,
and functional well-being. There were strong
differences between the two treatments on side
effects. Group 1 patients experienced fewer
medication side effects (p=.004) and less impact
from side effects (p<.001). Willingness to
continue the medication was greater for men in
Group 1 vs Group 2 (p <.01); No difference was
found among women. Gender and age affect
whether or not the medication met a patient's
expectations. Group 1 men had higher scores,
and older Group 2 patients were disappointed in
how well their medication met expectations. For
the various pain measures and sleep measures,
and for the symptom scores and subscales, there
were few differences and some presented
interpretation difficulties. Results are
summarized in Table 2 in the next column.
TABLE2
Variable Clustering
Exploratory evaluation of the outcome measures
to determine which variables could be combined
into scales revealed a fairly simple structure.
We found that six underlying scales explained a
substantial proportion of the variability. The
cluster scores themselves can be the first
principal component of the variables in the
cluster or they can be the average of the raw (or
standardized) variables. Where it was
reasonable, we used the average of the raw
variables in order to keep the composites as
simple as possible.
·2.07
(0.039)
-0.55
(0.58)
omitting response of"!Oh-1"
(never taken any pain med)
As a preliminary step, we applied PROC
VARCLUS to large groups of variables,
including both individual items, predefined
scales and predefined subscales of those scales.
This meant that some variables might be
included two or more times - as individual items
and as part of one or more of the scales and
subscales. Although such multiple inclusion of
variables would horrify purists, and would not
be recommended for confirmatory clustering, in
practice it was very useful.
When the dust settled from the clustering, we
could see which variables ended up in the same
cluster as the overall scale score, and which
subscales wound up together and which apart.
This allowed us to define six broad scales.
298
individual variables. This was determined by
comparing the variance explained by the first
principal component with the variance explained
by the first centroid component (by specifying
the CENTROID and COV options in PROC
VARCLUS). The four FACT scales generally
explained from 42% to 50% of the variance of
the individual items, and had internal
consistency (as measured by Cronbach's alpha)
of .75 to .78. However, the Social score
explained only 28% of the variance in the seven
items comprising the scale and had a
Cronbach's alpha of .58.
Creating a single composite of the four FACT
scales was supported by the finding that 55% of
the variance of the four scales was explained by
the average of those scales. This single FACT
composite explained about 23% of the variance
in the original 26 items.
The six scales are summarized as follows: the
Functioning scale consists of the social,
emotional, and functional scores from the FACT
(The physical functioning score from the FACT
was not included among the outcome measures
because it was being used as a control variable).
The Symptom scale consists of the 24
frequency-times-bothersomeness MSAS items
plus the 8 bothersomeness items. The Pain
scale consists of the three pain items from the
BPI (pain in the last week, pain on average,
present pain) and the three pain disruption
during sleep items. The Sleep scale consists of
five MOS sleep items (adequacy (2 items),
disturbance-initiation, maintenance and
respiratory problems), omitting the MOS sleep
somnolence item, "Have trouble staying awake
during the day". The Satisfaction scale consists
of the pain relief, ease of use, delivery
satisfaction, willingness to continue medication,
would recommend, met expectations, and
similar relief items. Finally, the Side Effects
scale consists of the frequency and impact of
side effect items.
SIDE EFFECTS
A "side effects" composite was created from the
simple average ofQ10b "How often do you
have side effects with <medication>?" and Q 1Oc
"How bothered are you by the side effects with
<medication>?" These variables had a
correlation of .85 and a Cronbach's alpha of .92.
These variables did not correlate highly enough
with any other variables to warrant increasing
the composite beyond these two items.
FACT
Exploratory variable clustering was performed
on the items comprising the FACT scales. This
clustering, using the VARCLUS procedure in
SAS, tended to confirm the scales provided by
the developers. The deviations are potentially
instructive. Three items were assigned to
different scales. Question 2a, "I feel distant
from my friends," was more closely associated
with the Physical Well-being scale than the
Social Well-being scale. Question 3b, "I am
proud of how I'm coping with my illness," was
found to be more tied to Social rather than to
Emotional Well-being. Question 4e, "I am
sleeping well," was found to fit better with the
Emotional Well-being scale than the Functional
Well-being scale. Finally, one item-- Question
2h, "I am satisfied with my sex life" - was
found to have relatively low association with the
other items in the Social Well-being scale.
A comparison of the variation explained by a
simple average of the items in the scale with the
variation explained by the first principal
component revealed that the simple average
explained at least 93% as much variation as the
first principal component on each of the four
scales. That is, at most 7% of the explanatory
power of a composite was lost by making the
simplifying assumption of equal weights for the
SYMPTOMS
Symptoms were assessed by 24 items in Q14
and an additional8 items in Ql5; these items
comprise the MSAS. For Q14, patients
indicated how often they had a symptom and
how much it distressed or bothered them. We
created individual combinations of frequency
and bothersomeness as the product of the two
responses; these scores were then analyzed. For
Q15, only the bothersomeness of the symptom
was elicited and these scores were used directly.
The MSAS scales include some overlap, and
preliminary analyses of those scales was not
especially revealing. Therefore, exploratory
clustering was undertaken using VARCLUS.
We conjectured that some of the clusters of
symptoms would be related to the cancer itself,
some to chemotherapy and other cancer
treatments, and some to the pain medication.
We found that the symptoms were not easy to
classify and the empirical clusters were not
entirely of one type or another. In the
299
assignment process, we found that. due in part
to the wide range of cancer types, most of the 32
items could be attributed to the disease itself,
but that only some were likely to be drug- or
treatment-related. Nonetheless, the symptom
clusters do have different emphasis.
Cluster I includes symptoms that are primarily
psychological. They are associated with the
disease, and, to a lesser extent, either treatment
or pain medication. Only 2 of the 12 symptoms
in this cluster are specifically physical
symptoms (mouth sores and constipation), and
both have low cluster correlation scores. Of the
remaining 10 symptoms, only one (feeling
drowsy) is not related to cancer. Cluster 2
includes symptoms that appear to be primarily
gastrointestinal and chiefly treatment-related.
Cluster 3 is composed of symptoms that are
physical and mostly disease-related. Also,
while we thought that many of the symptoms
could be linked to treatment or drugs, these are
not the primary side effects one might associate
with either. Cluster 4 contains 4 items, mostly
related to appearance, and all attributable to
treatment.
variables were closely associated with the BPI
pain measures. Those items, all developed
specifically for this study, were Q13g "Wake up
during the night to take your pain medication?",
Q13h "Feel your sleep was interrupted because
of your pain?", and Q13i "Feel frustrated at
having your sleep interrupted due to your pain?''
SATISFACTION
Several of the original questionnaire items
pertaining to medication satisfaction revealed a
close association via initial exploratory variable
clustering. Internal consistency was strong (.78)
among these items: Pain relief(Q9), Ease of Use
(QlOa), Satisfaction with treatment delivery
system (QIOd), Whether the patient would
recommend their medication (Q10e), Whether
the medication met the patient's expectations
(Q 1Of), Whether the medication provided
similar relief to pervious pain medications
(QIOg), and Whether the patient would be
willing to continue using the current medication
(Qll). Inital clustering was performed on these
items as they were originally measured, each on
a different scale, and results were poor. These
items were all rescaled (0-100), and the
composite Satisfaction score was created as a
simple average of the seven items.
SLEEP
Sleep was assessed by 6 sleep measures from
the MOS, Q13a through Q13f, and by 3
measures of sleep pain disruption developed
specifically for this study. Preliminary results
revealed that the sleep pain disruption items
functioned more as pain items than as sleep
items. Accordingly, those items are included
with pain. Of the 6 MOS sleep items, Q13d
"Have trouble staying awake during the day?,"
was found to have a low correlation (.24) with
the other items. Further, the internal consistency
was higher (.80) without the item than with it
(.77). The other 5 items relate to quality and
quantity of sleep at night, and therefore
represent a reasonable scale of"nighttime
sleep." Accordingly, we created a sleep
composite consisting of the remaining five
questionss, Q13a, b, c, e, and f.
TABLE3
Initiation, Maintenance, Respiratory problems
Pain
PAIN
Initial analysis showed that the three pain
measures from the BPI, Q5 " ... your pain at its
worst in the past week," Q6 " ... your pain on
average," and Q7 "pain you have right now"
formed a consistent scale (Cronbach's alpha
.87). As noted above, exploratory variable
clustering revealed that the sleep pain disruption
300
Results
We repeated the ANCOVA using these six
measures to simplify and unify the results. We
found that the results confirmed and
strengthened the conclusions based on the initial
analysis of36 items. There were no significant
differences between Group 1 patients and Group
2 patients, after controlling for site and the
demographic variables, on Symptoms or Sleep.
The Group 1 patients reported lower scores on
the Functioning scale. Men receiving the Group
1 treatment reported higher Satisfaction than
other men, but no significant difference emerged
for women. The most dramatic difference was
found for frequency and impact of Side Effects,
where 9toup 1 patients reported much less
difficulty. Finally, the findings for Pain
included an interaction with cancer stage. The
Group 1 patients reported higher pain for Stages
1 and 2, lower pain for Stage 3, and similar pain
for Stage 4. It should be noted that the number
of patients with Stage 1 or Stage 2 cancers is
small (roughly eight percent).
CONCLUSION
We found that applying variable clustering to
the data allowed us to develop a more coherent
and straightforward presentation to our client.
Instead of having to analyze 32 separate
symptom measures, for instance, we could
gauge the impact of the treatment on symptoms
overall. For clients who are statistically
unsophisticated, details of the exploration need
not be explained; we have found that most
clients are comfortable with the basic
description of the process. Most clients,
regardless of statistical background, may grasp
results more easily when dealing with a few
integrated measures rather than a large and
varied group of variables.
301