* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Determining the Roles of the SAS® System and a Database Management System in Clinical Research
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
ContactPoint wikipedia , lookup
DBDltIlDIDIIG 'rBlI ROLES 01' TIIB SAIl. SYS'l'BK AlII) A DA'l'AIIA8E KIIlIAGBIIBIIT ST8'l'BX :or CLlnCAL IlBSEloRCII Mark J. Ho~dbrook, »"traot that is important in evaluatinq this Although the SAS system has a great deal of breadth, it lacks depth in many areas. In Clinical Research the demand for a full featured database management sygtem is sufficiently great that frequently both the SAS System and a database manaqement system are utilized. While most database ~agement systems produots lac~ the SAS system's analytical and graphical capabilities they overlap greatly in terms of their data manipulation and report qeneration capabilities. Genentech, Inc. The question then becomes which product, or possibly both products, should occupy the middle ground that can be served well by either. A reviaw of the two products shows that a significant overlap usually exists in the areas of data manipulation, data manaqement and report generation. To decide which product to use for these purposes you will need to look at the specifio capabilities of your database manaqem:ent system relative to the SAS Systell, the structure of your organization l the interface between the problem is presented and discussed. Finally, there is a discussion of the impact of the future enhancements which have been announced to the SAS System. This paper does not present the solution to the problem. The solution will vary accordinq to your particular situation and will not neces~arily be the same at any two locations, even if they are using the s~e database ~anaqe.ent system. What is presented here are the criteria to evaluate in arriving at a decision. A1though I have attempted to make this paper as unbiased and as generic as possible l in terms of the database management system, it will still be biased by .y own bao~ground and li.itad by my ranqe of experience. I have been programminq with the SAS System for several years and my-experience and knowledge of database management systems is much more limited. currently within clinical Research at Cenentech we use a which runs VMS. We use the SA!> System and System 1032 for the database VAX 8800 manaq~ent system. We have also used INFORMIX as our database manaqement system and the 3AS System under MVS on an IBM mainframe.. Previously I have used the SAS system on an IBM mainframe runninq MVS with ~DMS as the database SAS System and your database .anaqeRent system, bow may versions of the data you want to have at one time ana possibly which product is more efficient in terms of CPU resourc8a. Final1y~ aftar yQU evaluate these oriteria you should consider the future enhancements that mana9ement system .. Description of • "Ganeric" Dat.abase KaDageaent By.tea have been announced by the SAS Institute. With the screen control language, multiple engine architecture, system and the SAS system will be Databases come are relationa1 which seems to is-used rather increased. These enhancements may also alleviate the nead for a separate hierarchical database management systems, relational databases with database management system altogether. hierarchica1 capabilities and so on. Described here are some of the features that-are commonly found in many of the PROC SQL and indexed. 8AS datasets the overlap ~etween your database management l:hl;r04.."l;ioh in many "flavors". There database syat~l a term be in vogue currently and loosely by same vendors, database management systems. Frequently within clinical research in System and a database management system the pharmaceutical industry both the SAS Dal;a 4icUohU'Y Many database manaqement systems come are employed for data processing. with a built in data dictionary. data dictionary helps implement Because there is a significant overlap in the capabilities of the products a The standardization within and between databases. It assists in keeping the decision needs to be made as to which product; or possibly both, will be used names and attributes of similar data for a given purpose. fields similar .. This paper presents a discussion of the PUll ."reen entry with fiel4 cheCking capabilities of the SAS System and a database manaq....ent system as they apply Database management systems typically come with a full-screen data entry to clinical research. capability that can Next the criteria 303 chec~ the data as it is entered~ This allows the data to go in as quickly and cleanly as possible. The earlier in the process that errors are detected the better. which will be sent to regulatory agencies .. IlL:! A hiqh level language, interface allows for the development of extensions to the database management systems, in languages such as C or Pascal, for features not provided in the p~oduct by the vendor. Batab _try The ability to read data from other nondatabase files into the database in a batch _ode is usually available. concurrency COncurrency allows data entry to be conducted by more than one data entry operator at the same time. They can work in the same database and frequently Description of the BAS Sy.tea The SAS syate. is a fu11-featnred product that is very Dread in scope. The product has both third and fourth generation programming capabilities, but is used most effectively as a fourth generation language. AlthQugh most readers are probably well aware of the features of the SAS System they are presented here in order to contrast them with database management systems and illustrate how the product meets some of the requirements of clinica1 Research. the same file. security Security is a feature which is frequently built in to the database manaqement systems~ It allows the access of eaoh user or a group of users to be specified at the file, record or even the field level. Audit trail.s Audit trails are also frequently found in database management systems. They can'be helpful for validation, roll-back and error recovery. FUll screen entry SAS/FSP* can be used for data entry. Within Version 5 only a minimal amount of error checking is provided during data entry. In4eze4 file Btruature. Hierarchical databases and even some relational databases have indexed file structures which allow for efficient retrieval and procQssing of information. This can be particularly helpful to speed retrieval with larqe databases. Batch entry Batch entry of data from external files has long been a feature of the SAS System. concurrency (SAS/SKARE*) concurrency is available with the SAS System in some operating environments with the SAS/SHARE product. Interaotive browse and query (HonPr"ee4Ural.. BQL) Most databases have an ad hoc query capability that allows non-programmers to browse and query the data without the intervention of a programmer. SQL is becominq a standard amonq the relational databases. Brow8e/Query data The SAS System provides limited abilities to browse and perform ad hoc queries of SAS data sets with SAS/FSP. Data aanipulation All database management systems should have some sort of a facility for the manipulation of data fields that will allow the creation of new variables and modification of existing variables. usually an array of .athematical, lexioal and other functions are supported. Many of the systems have support for missing values. AlternativelYt SAS/AF allows programmers to build menu systems that allow more elaborate queries by non-programmers. Costo. aenu interrace (BAS/AF*) Data manipulation The SAS System provides the usual third generation type programming capabilities. A complete array of mathematical, statistical t financial, lexical, date and time fUnctions are supported. The SAS System provides support for more missing values types than most of us will ever require. Data aanaqem.ent Database manaqement systems require a metbod for manipulation of files and reco~. capabilities such as inside and outside joins, interleaving, concatenation and subsetting are provided to varying degrees. Data llADag_eDt The SAS System provides a complete set of data management features allowing you to join, interleave, concatenate and subset data with just a few statements. Many of the operatiQns however do require the data tG be sorted first. Report Gen.ration (Procedural) Procedural type report generators can be useful for generating hard copy displays of the data for inclusion into reports 304 Data proce•• ing goals within clinical reaearoh The qoals on which to evaluate a data processinq system in clinical researoh ahQuld include: Report OJeneratioll (thiroS anoS rourth OJen.ratioll) . The SAS System provides complete report generation capabilities .ith Courth generation capabilities via the PROCs such.as PRINT and TABULATE as well as Flexibility The system must be flexible in order to accommodate the variations in study design. Time constraints require that the flexibility must be accomplished easily. At the same time there is a need for consistency of tha commQn elements between databases. Ultimately the data fram all the studies on a given substance will have to be combined. full third qeneration type proqra. .ing When necessary. featu~es IILI . The SAS System is extensible with otber high leve~ languages via user written functions, procedures and custom infile interCa""". stati.tical Analysi. The area of statistical analysis has always been one Qf the strengths of the SAS system. There are procedures available for regression, correlation, analysis of variance, general linear Modelin9 and non-parametric analysis to Data Quality The results must be accurate. Peoples bealth and possibly lives can be at stake. Systems which allow errors to be caught as early as possible are preferable. The system also needs to maintain the integrity of the data while providing access to a wide range of individuals. name just a few. Grapbic. (BAS/GRAPH.) The SAS System. bas a comprehensive array of procedural graphics capabilities. There are procedures to produce bar charts, pie charts, maps, scattergrams and 3-D plots. Yn addition the annotate facility allows direct access to the graphics image via a set of low level grapbic functions. Speed Clinical research is one of thQ final staqes of a multi~year process to develop new pharmaceuticala. Each month that a compound is delaYed frOll cQJaing to the market can deny a company millions of dollars in revenues. ObviQusly you need to process the information as quickIy as possib1e. D8scriptioD Clinical aesearch 8a..l<qro....4 C1inical Research is a relatively unique data processing application in safety Monitoring One of the moral and legal obligations in c1inical research is to monitor the safety data from tbe trial. This requires access to the data by the clinicians while the trial is still ongoing. Frequently you may want to limit the fields to which they nave access to prevent them fram seeing the treatment code for the patient. They need to have tbe ability to formulate ad hoc queries of the safety data, optimally without the assistance of a progr.ammer. Graphics can also be . helpful tool for assessment of safety data. comparison to many other business applioations such as banking and public utilities. usually the data is collected on hardcopy documents called case report forms~ The case report fOrDS are completed by clinical investigators, K.D.s and their staff~ who usually are not employees of the phar.maceutical· company_ Frequently you are dealing with 100 or more studies at a time. Studies are similar but usually each one does or can have unique features. Each study may be a separate database with 20-30 files or tables of a few hundred records each. Tbe databases tend to be small in comparison to many other applications in terms of the number of records but the number of databases and files is quite larqe relative to tbe quantity of the data. lI.eport The results of the trial will ultimately be reported to regulatory agencies. The preparation of the report wi11 require statistical analysis of the data and the data will be displayed in tabular and gTapbic form. Similar to many other industries clinical research is regulated by government aqencies~ most notably the FDA. Reoently there has been an increas&d aWareness of issues such as oomputer systems validation in the clinical area. Kinimi.e .anpower Besides the obvious advantage of reduced costs, systems which help to reduce the manpower required avoid the overhead and inefficiency that seems to result when the number of people working on a project increases. 305 data manaqement and pr09ra1l1llling t or someti.es three groups, data manaqement, applications proqramminq and a MIs/systems proqramminq qroup. Preferably tha software used should be fa.iliar to the programmers and data manaqers who are within the clinical orqanization. This helps maintain control and ensures accountability. Th" us By8t_ ...4 Databa8" IlaaagtllUDt Byatea fa c1iaical Raseareb _~t By.t.... DatabaSQ aanaqement syst@~ are employed frequently in Clinical Research ~e~use Il&talI••• they tend to be the best for enterinq. checking, browsing and storing the data. Tliair ability to check the data at entry time helps u.prove the quality ot the data. concurrency helps expedite the processinq of the data in a crunch because more than one operator can enter the data simultaneously. The security Combined with the browse and query capability of database manaqement systems allows clinicians to access the data on an ongoing basis as soon as it is entered into the system and still maintain the integrity of the data. Whatever your structure you need to consider which qroup will be responsible for the report generation software and which group will be responsible for generatinq the displays? What is the software expertease of the Which software does the qroup who is responsible for developing the report generation software use most? The group 'which is responsible for generating the displays? groups involved? The 8M By.t.... The SAS System is commonly used for perfo~inq the statistical analyses and generating graphical presentations of the results for the study report. For exa~ple, if the corporate MIS group is responsible for the development of the database system and there is an applications proqramming group within clinical. that ~orks primarily with the SAS system, it may be better if the report generation software was written in the SAS System. OVerlap There is still a 1arge and i~ortant area in the middle which involves the data manipulation, data manaqement and report generation that frequently can be accomplished wall by either the database management systems or the SAS system. XSBues In another situation it may be preferable to use the database management system for report generation. For instance, if there is a single programming group, Which develops software with both the SAS system and the database management system, that is responsible for software development and to ASSBSS To decide which of the products should be used for these activities there are several issues which need to be considered. the data manaq&.ent group, which uSeS the database management system software, is responsible for the report generation. p~imarily capabilities of each product Each product has certain features that allow it to perform some of the required tasks well. Although these systems frequently are extensible and can be programmed to provide almost any feature, it is preferable to select the product which contains more of the features you require within the native product and supported by the software vendor. Using the product with the most complete set of required features will reduce your develo~ent time of the system. The reliability of the syst.... will be improved because the dependenoe on custom in-house code is decreased4 You will also exper.ience the additional benefit that the system will be easier to enhance and modify because it is not as complex. Finally, the validation of a simpler system should be 1ess difficult. Interface The interface between the database management system and the SAS System is another issue which needs to be considered. The SAS System divides the universe of data fi~es into two major categories, their's and everybody elses. To obtain the functionality that the SM System has to offer the data needs to be in the form of a SAS dataset. I view database interfaces to the SAS System as coming in three varieties, flat file interface, infile interface and direct acce$s. Tha direct access inta~tace is the .ost effective. While both the flat file interface and the infile interface can be done reasonably well the 1nfile interface is typically somewhat more elegant. For the interface to be most effective you will need to be able to invoke it from within your SAS job. If the interface is effective and efficient then the SAS system is viable for report generation. otherwise the database Organization.l struoture Your organizational structure can be another very 1.PQrtant consideration. usually the develo~ent of reporting software and the production of the displays involves at least two groups, 306 managemant systems is preferable. are already available in version 6 will extend the produot even further into the domain of database management systems. HUltipl. vera ion. o~ the data Usually it is preferable to keep the data extracted from the database manag~t system as permanent gAS datasets to avoid the overhead of the interface and the I/O required if the &AS System is to be used for report generation. This will result in two copies of the data. There are 'disadvantages and advantages of having two copies. Frequently there are errors in the data that are found at the time of report generation and.analysis. with two copies you need to have careful error correction procedures to ensure that the correction is made to both copies or you need to correct the database and re-extract the data. YOU also double your disk space requir~ent with two copies of the data. o~ the other hand you may. need two cop~es anyway, since you will need SAS datas~ts for the statistical analysis. Also, ~t is frequently preferable to have report qeneration performed on a "frozenw copy of the data. If the database management systems is used for report generation this can result in three copies Qf the data. By using the SAS System for report generation of data which has been gathered and stored by a database management system you have a ".frozen" copy of the data by definition. SoreeD.control LaDgUage (SOL) The Screen control Languaqe is already avai1able in version 6 en the PC and UNIX hardware platforms. It will allow muoh more extensive data checking and cross validation of data at entry time. HUltiple Bngine Arahiteoture with the multiple engine architecture non-gAS data sets will be elevated to the same status as BAS datasets. GAS procedure,s will be able to _access the data that resides in a database which has an engine directly. It will remove the concern of the interface between the SAS system and tba database management systems from consideration and alleviate the need for two copies of the data. At the same time it will further increase the overlap by allowinq you to USe the SAS system to brows8, query and possibly even edit the data. PROC.SQL The SAS Institute has announced that they will be providing SQL in the form of a SAS procedure. This will greatly simplify queryinq by non-programmers. The requirement to develop a special front-end with SAS/AF will be alleviated. with PROC SQL and a SAS engine for your database, it will be possible to use SQL from within the SAS System on a database which does not provide SQL. con ••rv. CPU reaourcea consideration -of the solution which is lIlost e,fficient in terms of CPU reso-urces should be the lowest priority. I would only consider it a faotor when all other factors are equal or computinq resources are cramped with no relief in sight. xndexad 8AS ~ile. Indexed SAS files should alleviate the need to sort the data prior to data manipulation and improve the speed of access and ret~ieval. euaaary of iasues to assess There are a number of factors to consider and they frequently lead to oonflictinq conclusions. The highest priority should be given to those items which_ relate directly to the personnel which use and support the system. Reepinq the system si~plB by choosing the product which inherently has the most features required is probably the next most important consideration. CODClusioD The decision for which product to use when the SAS System and a database management s~stem overlap is a difficult one. With t1me the decisions that need to be made will become even more difficult. As the SAS Institute continues to add database management systems type featUres it will become possible to use the SAS System as the database management system with very little sacrifice. Conversely, as database management systems continue to add features it Day become possible to use it exclusively to meet your needs~ Thore will also be even more overlap when both a database management syst~m and the BAS System are employed. Future lJatabase manaqement systems and the BAS System are constantly evolving and adding new features. The net reault is that the future may possibly be even cloudier than the current pioture. I would expect that if there aren't already databases which have begun to add graphics and analytical features there soon will-be~ Interfaces and integration seems to be the focus at all .levels , both hardware and software. In the case of the SAS syste~ some of the new features that have been announced or * 307 SAS, BAS/AF, SAS/FSP, SAS/GRAPH and BAS/SHARE are registered trademarks of SAS Institute Ino~, Cary, Ne, USA.