Download Determining the Roles of the SAS® System and a Database Management System in Clinical Research

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
DBDltIlDIDIIG 'rBlI ROLES 01' TIIB SAIl. SYS'l'BK AlII) A DA'l'AIIA8E
KIIlIAGBIIBIIT ST8'l'BX :or CLlnCAL IlBSEloRCII
Mark J.
Ho~dbrook,
»"traot
that is important in evaluatinq this
Although the SAS system has a great deal
of breadth, it lacks depth in many
areas.
In Clinical Research the demand
for a full featured database management
sygtem is sufficiently great that
frequently both the SAS System and a
database manaqement system are utilized.
While most database ~agement systems
produots lac~ the SAS system's
analytical and graphical capabilities
they overlap greatly in terms of their
data manipulation and report qeneration
capabilities.
Genentech, Inc.
The question then becomes
which product, or possibly both
products, should occupy the middle
ground that can be served well by
either.
A reviaw of the two products shows that
a significant overlap usually exists in
the areas of data manipulation, data
manaqement and report generation. To
decide which product to use for these
purposes you will need to look at the
specifio capabilities of your database
manaqem:ent system relative to the SAS
Systell, the structure of your
organization l the interface between the
problem is presented and discussed.
Finally, there is a discussion of the
impact of the future enhancements which
have been announced to the SAS System.
This paper does not present the solution
to the problem.
The solution will vary
accordinq to your particular situation
and will not neces~arily be the same at
any two locations, even if they are
using the s~e database ~anaqe.ent
system. What is presented here are the
criteria to evaluate in arriving at a
decision.
A1though I have attempted to make this
paper as unbiased and as generic as
possible l in terms of the database
management system, it will still be
biased by .y own
bao~ground
and li.itad
by my ranqe of experience. I have been
programminq with the SAS System for
several years and my-experience and
knowledge of database management systems
is much more limited. currently within
clinical Research at Cenentech we use a
which runs VMS. We use the SA!>
System and System 1032 for the database
VAX 8800
manaq~ent
system. We have also used
INFORMIX as our database manaqement
system and the 3AS System under MVS on
an IBM mainframe.. Previously I have
used the SAS system on an IBM mainframe
runninq MVS with ~DMS as the database
SAS System and your database .anaqeRent
system, bow may versions of the data you
want to have at one time ana possibly
which product is more efficient in terms
of CPU resourc8a. Final1y~ aftar yQU
evaluate these oriteria you should
consider the future enhancements that
mana9ement system ..
Description of • "Ganeric" Dat.abase
KaDageaent By.tea
have been announced by the SAS
Institute. With the screen control
language, multiple engine architecture,
system and the SAS system will be
Databases come
are relationa1
which seems to
is-used rather
increased. These enhancements may also
alleviate the nead for a separate
hierarchical database management
systems, relational databases with
database management system altogether.
hierarchica1 capabilities and so on.
Described here are some of the features
that-are commonly found in many of the
PROC SQL and indexed. 8AS datasets the
overlap
~etween
your database management
l:hl;r04.."l;ioh
in many "flavors". There
database syat~l a term
be in vogue currently and
loosely by same vendors,
database management systems.
Frequently within clinical research in
System and a database management system
the pharmaceutical industry both the SAS
Dal;a 4icUohU'Y
Many database manaqement systems come
are employed for data processing.
with a built in data dictionary.
data dictionary helps implement
Because there is a significant overlap
in the capabilities of the products a
The
standardization within and between
databases. It assists in keeping the
decision needs to be made as to which
product; or possibly both, will be used
names and attributes of similar data
for a given purpose.
fields similar ..
This paper presents a discussion of the
PUll ."reen entry with fiel4 cheCking
capabilities of the SAS System and a
database manaq....ent system as they apply
Database management systems typically
come with a full-screen data entry
to clinical research.
capability that can
Next the criteria
303
chec~
the data as it
is entered~ This allows the data to go
in as quickly and cleanly as possible.
The earlier in the process that errors
are detected the better.
which will be sent to regulatory
agencies ..
IlL:!
A hiqh level language, interface allows
for the development of extensions to the
database management systems, in
languages such as C or Pascal, for
features not provided in the p~oduct by
the vendor.
Batab _try
The ability to read data from other nondatabase files into the database in a
batch _ode is usually available.
concurrency
COncurrency allows data entry to be
conducted by more than one data entry
operator at the same time. They can
work in the same database and frequently
Description of the BAS Sy.tea
The SAS syate. is a fu11-featnred
product that is very Dread in scope.
The product has both third and fourth
generation programming capabilities, but
is used most effectively as a fourth
generation language. AlthQugh most
readers are probably well aware of the
features of the SAS System they are
presented here in order to contrast them
with database management systems and
illustrate how the product meets some of
the requirements of clinica1 Research.
the same file.
security
Security is a feature which is
frequently built in to the database
manaqement systems~ It allows the
access of eaoh user or a group of users
to be specified at the file, record or
even the field level.
Audit trail.s
Audit trails are also frequently found
in database management systems. They
can'be helpful for validation, roll-back
and error recovery.
FUll screen entry
SAS/FSP* can be used for data entry.
Within Version 5 only a minimal amount
of error checking is provided during
data entry.
In4eze4 file Btruature.
Hierarchical databases and even some
relational databases have indexed file
structures which allow for efficient
retrieval and procQssing of information.
This can be particularly helpful to
speed retrieval with larqe databases.
Batch entry
Batch entry of data from external files
has long been a feature of the SAS
System.
concurrency (SAS/SKARE*)
concurrency is available with the SAS
System in some operating environments
with the SAS/SHARE product.
Interaotive browse and query (HonPr"ee4Ural.. BQL)
Most databases have an ad hoc query
capability that allows non-programmers
to browse and query the data without the
intervention of a programmer. SQL is
becominq a standard amonq the relational
databases.
Brow8e/Query data
The SAS System provides limited
abilities to browse and perform ad hoc
queries of SAS data sets with SAS/FSP.
Data aanipulation
All database management systems should
have some sort of a facility for the
manipulation of data fields that will
allow the creation of new variables and
modification of existing variables.
usually an array of .athematical,
lexioal and other functions are
supported. Many of the systems have
support for missing values.
AlternativelYt SAS/AF allows programmers
to build menu systems that allow more
elaborate queries by non-programmers.
Costo. aenu interrace (BAS/AF*)
Data manipulation
The SAS System provides the usual third
generation type programming
capabilities. A complete array of
mathematical, statistical t financial,
lexical, date and time fUnctions are
supported. The SAS System provides
support for more missing values types
than most of us will ever require.
Data aanaqem.ent
Database manaqement systems require a
metbod for manipulation of files and
reco~.
capabilities such as inside
and outside joins, interleaving,
concatenation and subsetting are
provided to varying degrees.
Data llADag_eDt
The SAS System provides a complete set
of data management features allowing you
to join, interleave, concatenate and
subset data with just a few statements.
Many of the operatiQns however do
require the data tG be sorted first.
Report Gen.ration (Procedural)
Procedural type report generators can be
useful for generating hard copy displays
of the data for inclusion into reports
304
Data proce•• ing goals within clinical
reaearoh
The qoals on which to evaluate
a data processinq system in clinical
researoh ahQuld include:
Report OJeneratioll (thiroS anoS rourth
OJen.ratioll)
.
The SAS System provides complete report
generation capabilities .ith Courth
generation capabilities via the PROCs
such.as PRINT and TABULATE as well as
Flexibility
The system must be flexible in order to
accommodate the variations in study
design. Time constraints require that
the flexibility must be accomplished
easily. At the same time there is a
need for consistency of tha commQn
elements between databases. Ultimately
the data fram all the studies on a given
substance will have to be combined.
full third qeneration type proqra. .ing
When necessary.
featu~es
IILI
. The SAS System is extensible with otber
high
leve~
languages via user written
functions, procedures and custom infile
interCa""".
stati.tical Analysi.
The area of statistical analysis has
always been one Qf the strengths of the
SAS system. There are procedures
available for regression, correlation,
analysis of variance, general linear
Modelin9 and non-parametric analysis to
Data Quality
The results must be accurate. Peoples
bealth and possibly lives can be at
stake. Systems which allow errors to be
caught as early as possible are
preferable. The system also needs to
maintain the integrity of the data while
providing access to a wide range of
individuals.
name just a few.
Grapbic. (BAS/GRAPH.)
The SAS System. bas a comprehensive array
of procedural graphics capabilities.
There are procedures to produce bar
charts, pie charts, maps, scattergrams
and 3-D plots. Yn addition the annotate
facility allows direct access to the
graphics image via a set of low level
grapbic functions.
Speed
Clinical research is one of thQ final
staqes of a multi~year process to
develop new pharmaceuticala. Each month
that a compound is delaYed frOll cQJaing
to the market can deny a company
millions of dollars in revenues.
ObviQusly you need to process the
information as quickIy as possib1e.
D8scriptioD Clinical aesearch
8a..l<qro....4
C1inical Research is a relatively unique
data processing application in
safety Monitoring
One of the moral and legal obligations
in c1inical research is to monitor the
safety data from tbe trial. This
requires access to the data by the
clinicians while the trial is still
ongoing. Frequently you may want to
limit the fields to which they nave
access to prevent them fram seeing the
treatment code for the patient. They
need to have tbe ability to formulate ad
hoc queries of the safety data,
optimally without the assistance of a
progr.ammer. Graphics can also be .
helpful tool for assessment of safety
data.
comparison to many other business
applioations such as banking and public
utilities. usually the data is
collected on hardcopy documents called
case report forms~ The case report
fOrDS are completed by clinical
investigators, K.D.s and their staff~
who usually are not employees of the
phar.maceutical· company_
Frequently you are dealing with 100 or
more studies at a time. Studies are
similar but usually each one does or can
have unique features.
Each study may be
a separate database with 20-30 files or
tables of a few hundred records each.
Tbe databases tend to be small in
comparison to many other applications in
terms of the number of records but the
number of databases and files is quite
larqe relative to tbe quantity of the
data.
lI.eport
The results of the trial will ultimately
be reported to regulatory agencies.
The
preparation of the report wi11 require
statistical analysis of the data and the
data will be displayed in tabular and
gTapbic form.
Similar to many other industries
clinical research is regulated by
government aqencies~ most notably the
FDA. Reoently there has been an
increas&d aWareness of issues such as
oomputer systems validation in the
clinical area.
Kinimi.e .anpower
Besides the obvious advantage of reduced
costs, systems which help to reduce the
manpower required avoid the overhead and
inefficiency that seems to result when
the number of people working on a
project increases.
305
data manaqement and pr09ra1l1llling t or
someti.es three groups, data manaqement,
applications proqramminq and a
MIs/systems proqramminq qroup.
Preferably tha software used should be
fa.iliar to the programmers and data
manaqers who are within the clinical
orqanization. This helps maintain
control and ensures accountability.
Th" us By8t_ ...4 Databa8" IlaaagtllUDt
Byatea fa c1iaical Raseareb
_~t By.t....
DatabaSQ aanaqement syst@~ are employed
frequently in Clinical Research ~e~use
Il&talI•••
they tend to be the best for enterinq.
checking, browsing and storing the data.
Tliair ability to check the data at entry
time helps u.prove the quality ot the
data. concurrency helps expedite the
processinq of the data in a crunch
because more than one operator can enter
the data simultaneously. The security
Combined with the browse and query
capability of database manaqement
systems allows clinicians to access the
data on an ongoing basis as soon as it
is entered into the system and still
maintain the integrity of the data.
Whatever your structure you need to
consider which qroup will be responsible
for the report generation software and
which group will be responsible for
generatinq the displays?
What is the software expertease of the
Which software does
the qroup who is responsible for
developing the report generation
software use most? The group 'which is
responsible for generating the displays?
groups involved?
The 8M By.t....
The SAS System is commonly used for
perfo~inq the statistical analyses and
generating graphical presentations of
the results for the study report.
For exa~ple, if the corporate MIS group
is responsible for the development of
the database system and there is an
applications proqramming group within
clinical. that ~orks primarily with the
SAS system, it may be better if the
report generation software was written
in the SAS System.
OVerlap
There is still a 1arge and
i~ortant
area in the middle which involves the
data manipulation, data manaqement and
report generation that frequently can be
accomplished wall by either the database
management systems or the SAS system.
XSBues
In another situation it may be
preferable to use the database
management system for report generation.
For instance, if there is a single
programming group, Which develops
software with both the SAS system and
the database management system, that is
responsible for software development and
to ASSBSS
To decide which of the products should
be used for these activities there are
several issues which need to be
considered.
the data manaq&.ent group, which
uSeS the database management
system software, is responsible for the
report generation.
p~imarily
capabilities of each product
Each product has certain features that
allow it to perform some of the required
tasks well. Although these systems
frequently are extensible and can be
programmed to provide almost any
feature, it is preferable to select the
product which contains more of the
features you require within the native
product and supported by the software
vendor. Using the product with the most
complete set of required features will
reduce your develo~ent time of the
system. The reliability of the syst....
will be improved because the dependenoe
on custom in-house code is decreased4
You will also exper.ience the additional
benefit that the system will be easier
to enhance and modify because it is not
as complex. Finally, the validation of
a simpler system should be 1ess
difficult.
Interface
The interface between the database
management system and the SAS System is
another issue which needs to be
considered.
The SAS System divides the universe of
data fi~es into two major categories,
their's and everybody elses. To obtain
the functionality that the SM System
has to offer the data needs to be in the
form of a SAS dataset. I view database
interfaces to the SAS System as coming
in three varieties, flat file interface,
infile interface and direct acce$s. Tha
direct access inta~tace is the .ost
effective. While both the flat file
interface and the infile interface can
be done reasonably well the 1nfile
interface is typically somewhat more
elegant. For the interface to be most
effective you will need to be able to
invoke it from within your SAS job. If
the interface is effective and efficient
then the SAS system is viable for report
generation. otherwise the database
Organization.l struoture
Your organizational structure can be
another very 1.PQrtant consideration.
usually the develo~ent of reporting
software and the production of the
displays involves at least two groups,
306
managemant systems is preferable.
are already available in version 6 will
extend the produot even further into the
domain of database management systems.
HUltipl. vera ion. o~ the data
Usually it is preferable to keep the
data extracted from the database
manag~t system as permanent gAS
datasets to avoid the overhead of the
interface and the I/O required if the
&AS System is to be used for report
generation. This will result in two
copies of the data. There are
'disadvantages and advantages of having
two copies. Frequently there are errors
in the data that are found at the time
of report generation and.analysis. with
two copies you need to have careful
error correction procedures to ensure
that the correction is made to both
copies or you need to correct the
database and re-extract the data. YOU
also double your disk space requir~ent
with two copies of the data. o~ the
other hand you may. need two cop~es
anyway, since you will need SAS datas~ts
for the statistical analysis. Also, ~t
is frequently preferable to have report
qeneration performed on a "frozenw copy
of the data. If the database management
systems is used for report generation
this can result in three copies Qf the
data. By using the SAS System for
report generation of data which has been
gathered and stored by a database
management system you have a ".frozen"
copy of the data by definition.
SoreeD.control LaDgUage (SOL)
The Screen control Languaqe is already
avai1able in version 6 en the PC and
UNIX hardware platforms. It will allow
muoh more extensive data checking and
cross validation of data at entry time.
HUltiple Bngine Arahiteoture
with the multiple engine architecture
non-gAS data sets will be elevated to the
same status as BAS datasets. GAS
procedure,s will be able to _access the
data that resides in a database which
has an engine directly.
It will remove
the concern of the interface between the
SAS system and tba database management
systems from consideration and alleviate
the need for two copies of the data. At
the same time it will further increase
the overlap by allowinq you to USe the
SAS system to brows8, query and possibly
even edit the data.
PROC.SQL
The SAS Institute has announced that
they will be providing SQL in the form
of a SAS procedure. This will greatly
simplify queryinq by non-programmers.
The requirement to develop a special
front-end with SAS/AF will be
alleviated. with PROC SQL and a SAS
engine for your database, it will be
possible to use SQL from within the SAS
System on a database which does not
provide SQL.
con ••rv. CPU reaourcea
consideration -of the solution which is
lIlost e,fficient in terms of CPU reso-urces
should be the lowest priority. I would
only consider it a faotor when all other
factors are equal or computinq resources
are cramped with no relief in sight.
xndexad 8AS ~ile.
Indexed SAS files should alleviate the
need to sort the data prior to data
manipulation and improve the speed of
access and ret~ieval.
euaaary of iasues to assess
There are a number of factors to
consider and they frequently lead to
oonflictinq conclusions. The highest
priority should be given to those items
which_ relate directly to the personnel
which use and support the system.
Reepinq the system si~plB by choosing
the product which inherently has the
most features required is probably the
next most important consideration.
CODClusioD
The decision for which product to use
when the SAS System and a database
management s~stem overlap is a difficult
one. With t1me the decisions that need
to be made will become even more
difficult. As the SAS Institute
continues to add database management
systems type featUres it will become
possible to use the SAS System as the
database management system with very
little sacrifice. Conversely, as
database management systems continue to
add features it Day become possible to
use it exclusively to meet your needs~
Thore will also be even more overlap
when both a database management syst~m
and the BAS System are employed.
Future
lJatabase manaqement systems and the BAS
System are constantly evolving and
adding new features. The net reault is
that the future may possibly be even
cloudier than the current pioture. I
would expect that if there aren't
already databases which have begun to
add graphics and analytical features
there soon will-be~ Interfaces and
integration seems to be the focus at all
.levels , both hardware and software. In
the case of the SAS syste~ some of the
new features that have been announced or
*
307
SAS, BAS/AF, SAS/FSP, SAS/GRAPH and
BAS/SHARE are registered trademarks
of SAS Institute Ino~, Cary, Ne, USA.