* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Application Mapping
Survey
Document related concepts
Transcript
e Books Application Mapping on IBM i Robert Cancilla 2 Application Mapping on IBM i Chapter 1: Application Mapping on IBM i: Present and Future There is a great deal of uncertainty about the future of IBM i, including mixed messages from IBM following the consolidation of Systems i and p into the new Power Systems brand. Although IBM may have weakened the public perception of the brand, the hardware and software still deliver what they always have—rock-solid reliability and dependable applications. So although the world has forgotten about the “AS/400” and green screens, there are still huge code bases written over the past 10 to 40 years (RPG celebrated its 40th birthday in 2009) powering corporations of all sizes. The investment this technology represents can’t simply be replaced with packaged ERP software or quickly rewritten in a new language or framework. The fact that these systems are still running is a testament to the success of the platform and its development ecosystem in general. This is a point that seems lost on the wider development and business community. There is simply no other system that supports—in their original form—applications written more than 40 years ago, without source-code modification. The challenge for today’s IBM i sites is how to retain sufficient development resources to maintain and develop the applications as the number of active RPG people diminish through promotions, retirement, and natural attrition. There has to be a way of enabling new people to understand quickly and accurately the complexities and subtleties of these sometimes vast systems and give them the confidence to make changes and extend these systems, even though they will never have developed anything like them themselves. This chapter describes this growing challenge in some detail and also explains how new technologies and concepts are evolving to provide solutions and bolster IBM i development. A typical application on IBM i could be anything from a few thousand to many millions of lines of code, with all the complexity, design inconsistencies, languages, syntaxes, and semantics that go with years of ongoing development. Mission-critical applications consist of a great many physical files or tables, and programs. The interdependencies of program-to-file and file-to-program alone can easily reach hundreds of thousands. We’re not talking about the abstracted or esoteric nature of individual pieces of technology here, but entire business systems. As with any successful management system, the key is information about your systems. The level of detail and availability of this information is another critical factor, which has already been proven in business by the success of Enterprise Resource Planning (ERP) and business systems in general. The requirement is not a new one but is becoming more universal as systems continue to grow and mature. A key question is how to manage the cost and risk of maintaining and modernizing these systems. Let’s examine how application mapping has become a core solution to the problem. Application mapping means analyzing and extracting a database of information about the resources that constitute a business application system. Making Informed Decisions Mapping an entire application provides a baseline of information for all sorts of metrics and analysis. Counting objects and source lines is generally the most common practice used for obtaining system-wide metrics. Many companies carry out software project estimations and budgeting using only this type of information. To some Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 3 degree, the level of experience and technical knowledge of a manager and his staff might help in getting accurate numbers, but more often than not, it’s mostly guesswork. A slightly more advanced approach used with RPG or Cobol applications is to dig more deeply into the application and count design elements within the programs themselves. These elements include •files •displays •subfiles •source lines •subroutines •called programs •calling programs By using a simple formula to allocate significance to the count of an element, you can categorize programs by their respective counts into low, medium, and high complexities. This type of matrix-based assessment, which Figure 1shows, is still fairly crude but adds enough detail to make estimations and budgeting much more accurate without too much additional effort. Another common practice is to take small representative samples, such as those selected for a proof-of-concept (POC), do project estimations, and then extrapolate this information in a simplistic linear way across the entire system or for an entire project. This method naturally relies upon the assumption that design, style, and syntax for the entire application are consistent with the samples used for the POC. The reality is that samples are most often selected for POCs based on functionality rather than complexity. Sometimes the opposite is true, whereby the most complex example is selected on the basis of “if it works for that, it’ll work for anything.” Calculations that use comprehensive and accurate metrics data for an entire application, versus data from a sample, will exponentially improve the reliability of time and cost estimation. Risk is not entirely removed, but plans, estimates, and budgets can be more accurately quantified, audited, and even reused to measure performance of a project or process. Some more advanced techniques to measure application complexity are worth mentioning. If such techniques are used over an application map, a number of very useful statistics and metrics can be calculated, including detailed testing requirements and a “maintainability index” for entire systems or parts thereof. Building Application Maps As application knowledge is lost and not replaced, the cost of ownership of these large, complex IBM i applications increases, and maintenance becomes more risky. The CL command Display Program References (DSPPGMREF) provides information about how a program object relates to other objects in the system. Figure 2 shows an example of DSPPGMREF’s output. The information is useful in determining how a program relates to other objects. It is possible to extract this information and store it in a file, as Figure 3 shows, and then carry out searches on this file during analysis work. Brought to you by Databorough and System iNetwork eBooks 4 Application Mapping on IBM i A much more efficient way of presenting the same information, however, is to show it graphically. Additional information, such as the directional flow of data, can be added to diagrams easily. Systems design and architecture is best served using diagrams. Color coding within these constructs is also important because it helps people assimilate structure and logically significant information more quickly. A good example of using a diagram for more effective communication is to use it to show where program updates take place, for example, by using the color pink, as Figure 4 shows (SLMEN and CUSTS are the two updated tables in this program). Embedding other important textual information such as an object’s text into or along with diagrams is another way of presenting information effectively and efficiently. In Figure 4, you see how graphical and textual information combine to provide rich information about the program references. The diagram also uses arrows to show the flow of data between the program and the other objects. Tom Demarco, the inventor of the data flow diagram concept, stated that what is critical is the flow of data through a system. Application mapping information can be extended, as Figure 5 shows, to simultaneously include details about individual variables associated with each of the referenced objects. In the case of a program-to-program relationship, the method used to extract this level of precise variable detail is to scan the source code of the programs and establish which entry parameters are used. In a program-to-file relationship, the diagramming job is somewhat more tedious because you must look for instances in which database fields and corresponding variables are used throughout the entire program. Also useful is seeing where individual variables are updated as opposed to being used just as input. The diagram now presents a rich set of information in a simple and intuitive way. The amount of work to extract and present this level of detail in a diagram can quickly become prohibitive, so the task is therefore better suited to a tools-based approach rather than to manual extraction. Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 5 Figure 6 shows a program-centric diagram. The same diagram in which the file is the central object being referenced is also useful in understanding and analyzing complex applications. The same diagrammatic concepts can be used: color coding for updates, arrows for data flow, and simultaneous display of detailed variables. By using the same diagram types for different types of objects in this way, the same skills and methods can be reused to twice the effectiveness. Figure 5 shows how additional information, such as related logical files (displayed as database shapes), can be added and easily recognized by using different shapes to depict different object types. Application mapping and formal metric analysis were first attributed to Thomas J. McCabe Sr. in 1976 and Maurice Howard Halstead in 1977 (see the sidebar, “Calculating Complexity.”) Functionally Organizing an Application Single-level information about an RPG or Cobol program is obviously not enough to understand a business system’s design. You need to be able to follow the logical flow downward through the application. You can use the DSPPGMREF output to do this. If you start at program A and see that it calls program B, you can then look at the DSPPGMREF information for program B, and so on. Additionally, you can deduce precisely in this structure where and how data, print, and display files are being used in the call stack, which is very useful for testing and finding bugs that produce erroneous data. For large, complicated systems, this can be a slow and tedious process if done manually using the display output of DSPPGMREF. Extracting all programs’ DSPPGMREF information out to a single file makes it possible to recursively query the file to follow the calls down successive levels, starting at a given program. This process can then show the entire call stack or structure chart for all levels, starting at a given program or entry point. A given program’s call stack or call structure can be represented much more effectively diagrammatically than with any textual description alone. Quite often, these call stacks may go down as many as 15 levels from a single starting point. Therefore, being able to display or hide details according to the information required at the time is important, along with having search facilities built in to the application map that supports the diagrams. As with other diagrams, color coding plays an important role in classifying objects in the stack by their general use, such as update, display, input only, and so on. Figure 7 shows the structure of a program as seen graphically. Additional Brought to you by Databorough and System iNetwork eBooks 6 Application Mapping on IBM i information, such as what data files, displays, and data areas are used by each object, can be added to enrich the information provided. This diagram alone, however, doesn’t tell you where you are in relation to the overall hierarchal structure of the application. You don’t know whether the program is an entry point into the system or is buried in the lower levels of the application. For better understanding of an entire system, therefore, objects need to be organized into functional groups or areas. This can be achieved by using naming conventions, provided that they exist and are consistent across the application. The entry points into the application need to be established. Sometimes a user menu system is useful for this but is not necessarily complete or concise enough. One way to establish what programs are potential entry points is to determine each program’s call index. If a program isn’t called anywhere but does call other programs, it can essentially be classed as an entry point into the system. If a program is called, and in turn if it calls other programs itself, it’s not an entry point. A functional area can be mapped by selecting an entry point (or a group of them) and then using the underlying application map to include all objects (everything, including programs, files, displays) in the call stack. Figure 8 shows a diagram of a series of entry points and their relative call stacks grouped as a functional area. To more accurately describe an entire system’s architecture, functional application areas might need to be grouped into other functional application areas. These hierarchal application areas can then be diagrammed, showing how they interrelate with each other. This interrelation can be hierarchal but also programmatic because some objects might be found in more than one application area simultaneously. Figure 9 is a diagram showing how application areas interrelate. For the sake of clarity, the diagram includes only those programmatic interrelations from entry-level objects. The diagrams show how the accounting Main application area has other (e.g., B, A1) application areas embedded in it. The red lines show the programmatic links between objects within the application. In this example, the level of interrelation has been limited to programmatic links between entry-point programs and programs they call in other application areas. This is a good way of mapping business functional areas to application architecture in a simple diagram. Logical subdivisions of an entire application are also being employed in other areas of application management. Some of these include Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 7 •clear and concise allocation of responsibility for maintenance/support of a set of objects •integration with source change management tools for check-in and check-out processes during development •production of user documentation for support, training, and testing staff Mapping Databases An IBM i business application is primarily an application written over a relational database. Therefore, no map of an enterprise application would be complete without the database architecture explicitly specified—not just the physical specifications and attributes but the logical or relational constraints, too. With the possible exception of CA 2E systems, virtually all RPG or Cobol applications running on IBM i have no explicit relational data model or schema defined. This means that millions of lines of RPG or Cobol code must be read in order to recover an explicit version of the relational model. What you need to know is what keys constitute these links or relationships between physical files or tables in the database. The first task is to produce a key-map of all the primary keys and fields for all physical files, tables, logical files, access paths, and views in the database. By using a simple algorithm and looking at the DDS or DDL, you can often determine whether foreign-key relationships exist between files. Figure 10 shows a diagram of this simple algorithm using the database definitions themselves. A more advanced and comprehensive approach for determining foreign key relationships is to analyze the program source code for the system. If you look at the source code of a program and see that more than one file/table is used, there’s a possibility that these files are related by foreign key constraints. By finding instances in the program in which one of the files is accessed for any reason, and determining the keys used to do so, you can then trace these variables back through the code to keys in another file in the program. If at least one of the key fields match in attribute and size with the other file and is also part of the unique identifier of the file, you have a strong likelihood that there’s a relationship between these two files. By then looking at the data using these key matches, you can test for the truth of the relationship. By cycling through all the files Brought to you by Databorough and System iNetwork eBooks 8 Application Mapping on IBM i in the system one by one and testing for these matches with each and every other file, you can establish all the relationships. This task is complicated generally by the fact that the same field in different files will usually have a different mnemonic name. When analyzing the program source, you’ll have to deal with data structures, renames, prefixes, and multiple variables. If you have the program variable mapping information at your fingertips beforehand, the analysis process will be a lot quicker. The vast majority of this type of repetitive but structured analysis can be handled programmatically and thus enable completion of the task in a few hours rather than several months. Such automation naturally allows for keeping the relational model current at all times without huge overhead on resources. Once explicitly defined, the relational model or architecture of the database can be reused in a number of scenarios, including • • • • • • • understanding application architecture testing data quality for referential integrity extracting test data scrambling and aging test data building BI applications a data warehouses mapping data for system migrations building object relational maps for modernization Database access in all modern languages today is primarily driven by embedded SQL. IBM i legacy databases are typified by transaction-based table design with many columns and foreign key joins. This makes the task of writing SQL statements much more difficult and error prone unless the design of the database is clearly understood. It also creates an environment in which it’s relatively easy for inexperienced developers or users to write I/O routines or reports that have an extremely negative performance impact. One way to combat this problem is to provide detailed design information about the database being accessed. Figure 11 shows a typical entity relationship diagram, and this can be accompanied with the underlying foreign key details, as Figure 12 shows. Another, more generic approach to ensuring integrity of the database, guaranteeing productivity for modern technology developers, and limiting negative I/O performance impacts is to build a framework of I/O modules as Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 9 stored procedures. The explicitly defined data model is a key source of information and will greatly simplify building of such a framework and can even be used to automate the generation of the framework itself. It’s also worth mentioning that products such as IBM’s DB2 Web Query for i can become exponentially more useful and productive if the metadata layer is properly implemented. The derived data model can be used to build this data instantly for the entire system. Hard-Coding Application Knowledge The output of DSPPGMREF is a great starting point for the type of mapping I’ve described so far. To produce such details and abstractions, the application source code needs to be read and analyzed. From a design perspective, application software is made up of discrete layers or levels of detail. In an IBM i application for example, libraries contain programs, physical files, logical files, data areas, commands, and many more object types, and programs might contain file specs, variables, subroutines, procedures, display definitions, arrays, and various other language constructs. Data files have fields and text descriptions and keys and other attributes. Having an inventory of all these elements is useful—but only in a limited way, from a management perspective. What’s needed is context. For example, mapping what files and displays are specified in a program helps you understand at an object level the impact of change. This rudimentary mapping provided by most program comprehension tools is limited in its usefulness because it still provides information at only a single level. Mapping all levels of detail and how they interrelate with all other elements at all levels is the ultimate objective. The only way to achieve this is to read the source code itself line-by-line and infer all relationships implicit in each statement or specification. Naturally, the mapping process must allow for variants of RPG, Cobol, and CL going back 20 years, if it is to be useful for the vast number of companies that have code written 20 years ago in their mix. Relatively few humans have such knowledge or skill and, as I’ve mentioned, few people could keep up with the workload required for even the most modest of IBM i applications. Computer programs can be “taught” such knowledge and retain it permanently. Such programs can also be reused as often as necessary to keep abreast of any code changes that occur. Prebuilding the application map and storing it in an open and accessible format, such as a spreadsheet in Google Docs, is also an important aspect of the overall usefulness of such information. Figure 13 shows the output of a DSPGMREF uploaded into a Google Docs spreadsheet and being filtered. Having the map available provides for any number of complex, system-wide abstractions or inquiries at acceptable speeds. For a complete and accurate application map, you have to follow the trail of inferred references described in the programs themselves. This is obviously a labor-intensive task made all the more difficult by common coding practices, such as •overriding the database field name in a CL program •prefixing fields from a file being used in an RPG program •moving values from database fields into program variables before passing them as parameters to called programs Brought to you by Databorough and System iNetwork eBooks 10 Application Mapping on IBM i • changing key field names between different database files • passing the name of the program to be called as a parameter to a generic calling program rather than making a direct call If the prebuilt application map includes all these inferred logical references, measurement of impact can be complete and, more important, instant. It also means that higher-level analysis of rules and model-type designs is easier by virtue of the easy availability of variable- and object-level mapping. Moving Forward with Confidence Application mapping provides a new way to manage and modernize complex business applications. It’s also a way facilitate collaboration between modern and legacy developers. Think about what computerized mapping has done for navigational and guidance systems in our day-to-day lives and travels. Similarly, application mapping provides a strong platform for a number of benefits and technologies that will continue to evolve for many years. I’ll discuss these subjects further in the following chapters. Calculating Complexity Halstead Complexity Metrics Halstead complexity metrics were developed by the late Maurice Halstead as a means of determining a quantitative measure of complexity directly from the operators and operands in the module to measure a program module’s complexity directly from source code. These metrics are among the earliest formal software metrics. They’re strong indicators of code complexity based on the fact that they analyze actual source code. These metrics are most often used as maintenance metrics. They’re one of the oldest measures of program complexity. See en.wikipedia.org/wiki/Halstead_complexity_measures for more information. Cyclomatic Complexity Cyclomatic complexity is a software metric (measurement) developed by Thomas McCabe. It measures the amount of decision logic in a single software module. Cyclomatic complexity is used for two related purposes. First, it gives the number of recommended tests for software. Second, it is used during all phases of the software life cycle, beginning with design, to keep software reliable, testable, and manageable. Cyclomatic complexity is based entirely on the structure of software’s control flow graph. See en.wikipedia.org/wiki/ Cyclomatic_complexity for more information. Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 11 Chapter 2: Writing Programs to Update Your Programs As a follow on to Chapter 1, let’s look at how you can use application mapping to actively change an entire system programmatically. Using the application map as a primary input, some simple reengineering concepts, and a fair amount of time to perfect, you can write programs to update application programs. This approach has saved many companies literally thousands of man-hours and millions of dollars. The writing of programs to update your programs is typically used as a way to make structural changes to the application source, not functional changes. When a system enhancement produces a large number of fairly simple system-wide changes, programmatic automation of these changes begins to make sense. The most obvious example of this is Y2K. Some companies spent as much as five million dollars to change their systems for Y2K compliance. Some companies used programs to carry out the same amount of work on similar-sized systems for five percent of the cost. How did they do that, and why is this relevant nearly 10 years later? After an application’s life of 20 to 30 years, it’s fairly safe to assume that there might be a business demand to change important and well-used fields in the database. This demand might be driven by industry standardization, system integration, upgrades, internationalization, or commercial growth (e.g., you run out of invoice numbers or even customer numbers). Y2K affected almost every RPG application in existence. It also affected just about the entire application in each case. Since 2000, most systems have grown at a rate of 10 percent per year. It’s a widely acknowledged fact that RPG resources haven’t kept pace with this growth. In reality, they’ve probably reduced by the same amount each year. So although database changes are now generally industry or company specific, the problems and their related solutions remain the same—but with more code affected and fewer people to fix it. There are several applications for automated reengineering of a system, which I briefly mention later in this chapter. Solving a field-expansion problem is, however, relevant for many companies, so I use it to flesh out the subject of this chapter in more detail. An Engineered Approach A more conventional approach to solving a fieldexpansion problem is to get a feel for the scope and size of the problem, understand clearly the requirements for the change, and then send one or many developers off to fix the problem one program at a time. Figure 1 illustrates this manual approach. Many problems are associated with this approach. Here are a few: •labor-intensive •vague (at best) scope and timelines •not repeatable •prone to human error and inconsistencies and therefore risky Brought to you by Databorough and System iNetwork eBooks 12 Application Mapping on IBM i The upside is of course that such an approach requires little preparation and little initial investment in time or money. It’s also generally flexible and therefore useful for small projects. There is, however, a risk that humans were unable to identify all required changes and do so in a consistent manner across all programs that they changed. As the size of the system increases, the risk of failure increases exponentially. The basis of an engineered approach is to break down the process into a set of discrete, repeatable, and automated steps. Each step is then applied across the entire system or project and repeated until an optimum result is achieved. Figure 2 shows diagrammatically how this approach compares to a conventional manual approach. Many benefits are associated with a structured and engineered approach. Some of these include: •each step is repeatable and so can be perfected •outcome is more predictable •scope and approach can be changed without a loss of expended effort •latest code version can be introduced at the last minute •far fewer resources are required •process is much quicker •potentially less testing is required because changes are consistent Without an explicit, detailed, and very precise measurement of the impact of a database change across the system, automating the required changes would be impossible. Let’s start by looking at this task in more detail. Establishing the Precise Scope of a Task Even in well-designed and well-documented systems, the impact of changing the database of an integrated and complex application on IBM i can be huge. Just the recompilation and data copying tasks can create logistical nightmares. The most significant and difficult task is of course measuring the impact on source code across the entire system. If analysis is done right, subsequent work will be highly predictable and measurable. If analysis is done incorrectly, the results could be catastrophic. Overruns in project timelines are just one possible impact, and I don’t think I need to elucidate the potential outcome of having “missed” something in a production system. Specifying fields to be changed. The first task in the analysis stage is to specify which fields need changing in the database. This task should be straightforward but may be complicated by virtue of integrated systems, poor documentation, or often a combination of both. The next step, which I describe in a moment, may actually produce results that warrant additional fields being added and included in the process. Finding where fields are used. The next step is to establish precisely where these fields are used throughout the system. This is where things start to get tricky. Establishing the explicit use of a given field by its name can be achieved with a simple Find String Using PDM (FNDSTRPDM) command. You then need to start at these specific points and establish where these fields are associated with any other variable or data construct, by virtue of a Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 13 compute or definition statement. There’s only one way to do this, and that’s to read the source code of every single instance in which the field being changed is used. RPG applications have many technical constructs that make this type of analysis complex and time consuming. For example: •the use of variable names that don’t match or resemble database field names •the use of Prefix or Rename key words in the programs •the need to trace input and calling parameters •the existence of CL programs that have no file definitions •the use of data structures and arrays •undefined input and return parameters in procedure prototypes Legacy cross-reference tools can help with this analysis up to a point. That point ends at each level or instance of a variable. So many individual queries—sometimes thousands—need to be run and amalgamated when using these older technologies. Figure 3 shows a simple example of conventional approaches being used to analyze tracing the CUSNO field. The obvious answer to this problem is to prebuild an application map of the entire system being analyzed, where variable-field-variable associations are instantly available. Using this map, you can write a program that traces a field throughout all its iterations and variants across the entire system in a single query. Some of the trace work is accomplished in a previous stage in the form of prebuilding the application map. Let’s look at an example of this at work. Figure 4 shows the source of a CLP named CUSLET. If I were to carry out a traditional analysis on a system with this program in it, looking for the impact of a change to the field CUSNO, this program wouldn’t show in the results. Figure 5, however, shows a snippet of the source of an RPG program that calls CLP CUSLET passing the parameter CUSNO. Figure 6 shows the spreadsheet of the results of our extraction program written over the application map, and we can see that CUSLET has been included in the analysis Brought to you by Databorough and System iNetwork eBooks 14 Application Mapping on IBM i results. This is because the parameter CUSNO was passed to CUSLET from the RPG program displayed in Figure 5. The output of this analysis is a specific list of all source members and lines therein that are affected by the proposed field changes. Making the required changes programmatically. Changes that can be made without causing any conflicts can be done programmatically. The percentage of these against the total changes required may vary from project to project, but essentially this task can be fully automated with a carefully written program. The tedious and timeconsuming part of writing a program to do this is accounting for all instances or specific types of change. Nevertheless, these programmatic changes can provide a significant productivity gain in any project. There are different standards that can be used to notate and make the changes, such as making comments in margins, commenting out replaced code, or just overwriting existing code. This can be done one way during iterative trial conversions and then changed for a production conversion with little effort. Note: It may be desirable to retain the original code as comments during the project but remove it prior to the final production implementation. These programmatic changes can be categorized into two types: Direct Definition Changes: Direct definition changes can be made where database fields or variables that can be traced back to database fields are defined. This includes files, displays, reports, and programs (RPG or CL) and refers to D-specs, arrays, and in-line calc specs, amongst others. This type of change is straightforward and is the most obvious candidate for programmatic change. Figure 7 shows the source of a physical file that has been programmatically updated and has had the original code commented out. Columns 1―5 have had the programmer’s name added for audit purposes. Indirect Definition Changes: In some cases, direct definition changes have a “knock-on” effect. For example, if a field is expanded by two digits, and this field is used before the end of an internal data structure in an RPG program, the other elements in the data structure must be adjusted to accommodate this change. Similarly in a print file format, a column-size increase may require columns to the right to be shifted to make space. In some cases this “knock-on” effect may actually cause conflicts of various types. These conflicts might be resolved by using clever algorithms in the programs that make the changes, but usually conflicts require human intervention. Figure 8 shows an example of how the data structure definition is adjusted, the second element is expanded, and subsequent elements are moved to accommodate this change. This type of change is fairly straightforward to program into the automated process. The time-consuming part is finding and allowing for all different types Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 15 of patterns of instances in a system. As such, the repeated use and fine-tuning of programs that make changes to programs makes them naturally more useful with each successive project. Managing design conflicts and manual intervention. In virtually every fieldexpansion project, there will be design problems that arise from the proposed changes. These might vary from a simple overlay or overflow on a report to embedded business logic based on a field substring. Although it may be impossible to automatically make changes to these constructs, it’s possible to programmatically identify where they occur. Again, the role of the prebuilt application map is critical to this process as a primary input to the search algorithms. These conflicts can be clearly identified by subtracting the changes made programmatically from the total required changes. These conflicts can be generally categorized as follows: Device Problems: Device problems are those in which any direct change or shuffling of affected columns runs out of space. Program Problems: An example of a program problem is lines where there may be a conversion problem because a resized field (or a field dependent upon it) is a subfield in a structure that can’t be resized. Another example is when a work field is used in a program by two fields at various stages. One field is being resized, and the other isn’t. Again this requires design logic to resolve. Database Problems: The whole process of solving a field-expansion problem starts by specifying which fields will be changed. The where-used analysis, when run on resized database fields, might trace to fields not included in the resize exercise. This may or may not be a problem but generally must be assessed manually. Some of these problems might be resolved by making some manual changes before rerunning the analysis and programmatic changes. In certain cases, this process might have an exponential effect of removing problems with a conversion project. In other cases it will be necessary to make these design decisions and changes after the completion of the programmatic changes. The objective of this stage is an optimum result combining programmatic changes with whatever manual intervention is deemed necessary. Final Conversion and Production Integration The automated nature of this process allows for the latest version of the source code to be brought in and run through the first three stages. It’s also only at this stage that formal software configuration management (SCM) policies and procedures need to be implemented. In many cases, no conversion or change will take place, but a recompile will be needed. Again the application map can be used to good effect here. Simply building recompile lists based on the converted source code and all related objects from the where-used information will help ensure that nothing is missed. It also means that simple CL programs can be written to bulk recompile and incorporate any compilation strings in the compile commands. Brought to you by Databorough and System iNetwork eBooks 16 Application Mapping on IBM i Application Modernization Structural changes to an application can be a key part of a company’s modernization strategy. Some of these structural changes are motivated by more strategic objectives, such as agile development, reusable architecture, and functional redesign. Other modernization projects are driven by more commercial demands, such as internationalization. Unicode conversions. An increasingly popular modernization requirement on IBM i is Unicode conversion. The principle of a Unicode conversion is largely the same as that of a field-expansion project: changing the attributes of database and display fields and updating all affected logic in the programs. There are some differences in the process and requirements, but the same approach can generally be followed. Indeed the same programs used for field expansion can be enhanced to accommodate for Unicode conversions without too much work involved. Let’s look at some simple examples of what could be changed programmatically with a Unicode conversion. The first aspect is updating the fields in the files and displays. This sort of change is consistent with the field-expansion algorithms mentioned earlier in this chapter. Figure 9 shows how the field definition for the COMPANY field has been updated to a type G, and the desired Unicode Coded Character Set Identifier (CCSID) has been specified in the function column for this field. Figure 10 shows how the H-spec of an RPGLE program has been automatically updated with the requisite CCSID code. In this instance, the CCSID H-spec keyword is used to set the default UCS-2 CCSID for RPGLE modules and programs. These defaults are used for literals, compile-time data, programdescribed input and output fields, and data definitions that don’t have the CCSID keyword coded. Figure 11 shows how, by using a fairly straightforward algorithm, your automated program can intervene in your C-specs and automatically convert statements Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 17 to include the %UCS built-in function (BIF) where required. In this example, as with the field-expansion samples, old lines have been commented out to show how the programmatically created new line has been changed. There are two important points to make regarding Unicode conversions: •Unicode data isn’t supported in non-ILE versions of RPG. If you want to implement Unicode support in non-ILE RPG programs, you must convert them to RPGIV (ILE RPG) source code and recompile beforehand. •IBM is actively enhancing Unicode support on the IBM i through the release of PTFs both for the DB2 for i database and for the ILE RPG compiler. Externalizing database I/O. Another increasing trend in the IBM i application space is the need to separate out I/O logic from legacy programs. One primary motivation for this trend is the necessity for making significant changes to database architecture without interrupting proven process and business logic. Another business driver for this trend is from companies replacing legacy custom software with off-the-shelf applications but wanting to keep certain core functions running as is, at least for a period of time. In this scenario, mapping to the replacement database architecture can be carried out without interruption to critical legacy functions, provided of course that the database I/O has been externalized from the legacy programs first. The algorithms used by programs that would automatically make such a change would be different from a fieldexpansion process, but once again the core asset here would be the application map for the initial analysis. These reengineering programs can then be designed to identify and convert all source code instructions needed to transfer file I/O into external modules, giving identical functionality. Thus the code in Figure 12 shows how an I/O statement is replaced with a procedure. Another requirement of the reengineering programs is to automatically build fully functional I/O modules, which can then be adapted to a radically changed database, with no impact on the reengineered RPG code—the module returns a buffer identical to the original file. So if you wanted to switch to a completely new customer file, you could simply change the I/O module code (as shown in Figure 13), and the hundreds of RPG programs using the CUSTS file would require no source changes whatsoever! Refactoring monolithic code into services. Another important way of using programs to update programs is in the area of building services from legacy application code. There are many articles and guidelines from leading thinkers, such as Jon Paris, Susan Gantner, and others, on the subject of Brought to you by Databorough and System iNetwork eBooks 18 Application Mapping on IBM i using subprocedures over subroutines. This is fine for new applications, but most interactive legacy programs are written in a monolithic style, which can severely limit long-term modernization opportunities, not to mention add significant stress and complexity to ongoing maintenance and development tasks in general. By advancing the algorithms of replacement and code regeneration described in all three areas here, it’s possible to refactor monolithic programs by externalizing the subroutines into procedures automatically. Breaking up the program into two components like this makes the rewrite of the user interface layer easier but simultaneously makes available externalized subprocedures as callable services. This is a great way to start on a staged application reengineering while realizing immediate benefit. Figure 14 shows two subroutines, VALID1 and VALID2, being invoked in a monolithic legacy program called WWCSUSTS. A “reworked” program was written, using similar logic to the field-expansion and I/O externalization programs mentioned earlier, to create a new Business Logic module that would contain procedures created from all the legacy subroutines in the original programs. Figure 15 shows the definition for the wwcustsvalid1 in this new ILE module WWCUSTSB. The reworked program updated the original program to use the service program WWCUSTSB invoking the appropriate procedure as opposed to subroutine and passing the correct parameters. The reworked program also created the necessary prototypes in the updated WWCUSTS program, as Figure 16 shows. A Way Forward Using programs to update programs is not a new or even an unusual technique. Combined with a very detailed application map of an entire system, this approach to system engineering can help solve the problem of modernizing and enhancing large and complex legacy applications using limited resources in shorter timeframes. For many companies, this approach has saved millions of dollars in development costs and has also provided a means to bring legacy application code into the world of modern architectures and techniques. In the next chapter, I look at how to extract design model assets from legacy systems. I cover areas such as relational data models and business rules and how these make legacy applications relevant in a modern context. Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 19 Chapter 3: Auditing Legacy Application Assets Chances are that you own, support, develop, test, or use a large, complicated application written in RPG, Cobol, or CA 2E on IBM i. You have a vested interest in the designs and assets that make the application useful to its users. In the two previous chapters, I looked at how application maps were built, how they were used to reveal granular architecture and function, and how they were deployed to programmatically reengineer those applications. In this chapter, I look at a higher abstraction of an application’s design and how this can be used to extend ROI in modernization for many years. Architectural Erosion When business applications are first designed and written, well-thought-out application architecture contributes to their success and resulting life span. Nothing demonstrates this more conclusively than the success and longevity over the last 40 years of thousands of IBM i applications, many of which are still in daily use. The continual enhancement, syntax and programming style variations, general maintenance, along with time and budget pressures conspire to compromise application architecture. As with geological erosion, architectural erosion is often not noticeable or problematic until many years have passed. In some cases, and given enough time, the quality, efficiency, and maintainability of the application will begin to suffer from this natural evolution of the code base. This problem will vary in significance from company to company and application to application. It’s not uncommon to hear of cases in which years of continued enhancements to an IBM i application have rendered the application virtually unmaintainable, especially when matched with delivery-time expectations of users and development budgets. The Modern Technology Tease The last decade has seen the introduction of many powerful enhancements to IBM i, the RPG/Cobol syntax/ compilers, and DB2 for i. The benefit of these enhancements has remained tantalizingly out of reach for most of the current applications, for the simple reason that most of the current application code is written using monolithic procedural methods. Integration with other systems, modernizing the user interface, implementing SOA strategies—all expect a distributed application design, if the task is to be done in a sustainable and optimum way. The task of rewriting these entire systems to take advantage of these modern technologies has, for most companies, been too expensive and risky. The optimum approach is to establish what code is useful and relevant and therefore should be rewritten or refactored into modern constructs and program designs. Even with application mapping technologies, this is still a significant task on any complicated legacy application. Optimizing an Application with the Business Generally, there will be varying degrees of consensus about the relevance of a company’s legacy application, depending on who you ask in the organization. The specific touch points between the application function and the business process are rarely known in their entirety and even more rarely documented and, therefore, auditable. It’s also not uncommon for applications to outlive users at companies by many years. If the original application designers are no longer with the company, it stands to reason that potentially large parts of the application design assets are known only by the application itself. With the application designs explicitly defined, documented, and ready to hand, analysts and architects can map these designs to business architecture and process accordingly. This can also form the basis of subsequent renewal or replacement strategies, if applicable. In addition to this, a number of technological project types become feasible even with limited resources. I examine some of those in detail a bit later in the chapter. Brought to you by Databorough and System iNetwork eBooks 20 Application Mapping on IBM i Let’s now look at the two most important design assets: the referential integrity (RI) data model and the business logic. Deriving a Referential Integrity Data Model The most fundamental design asset of an IBM i application is the data model. The data model of an application is not just the design of the files, tables, views, and access paths but includes the foreign key relationships or RI between database tables. The simplest definition of RI is that it defines a relationship between two files in which one is the parent and one is the dependent or child. Records in the dependent file are joined by a unique key in the parent file. For example, the contract header file is a dependent file to the customer master file, and records in the contract header file must always contain a valid customer number. Figure 1 shows a diagram displaying the detail of this example RI. For large and complex applications, this task needs to be approached in a structured manner and can be broken down into a few discrete steps. The first is to establish the physical model by extracting the table, file, view, and logical file definitions. This provides both a data dictionary of the database and a map of all important keys, such as primary or unique identifiers. Taking one table at a time, the unique key of the file is compared with all the keys of other files in turn. Where there is a match between the primary key of the file and any key or at least partial key of the other file, a relationship can be derived. In most cases, the analysis is further complicated by virtue of field names being different in different files. In certain cases, the difference is simple, such as the first two characters being different, whereas in other cases, the names are completely different. Figure 2 shows a simple example of how two files are joined by a foreign key that is similar in name. Even though DB2 for i has been capable of implementing RI in the database itself since OS version 3.1, virtually no applications use this approach, even today. In the absence of referential constraints on the tables, RI is managed by using program logic. There’s nothing wrong with this as a practice, but to use or visualize the referential data model of such an application, the program source of the system must be analyzed. This analysis serves to validate relationships derived by analyzing the file and field structures, and it can be used to derive relationships that would Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 21 otherwise not be obvious at all, because of file and field name differences. Programs that use each of the files are analyzed, and fields and variables are traced through the source code looking for clues that indicate a relationship between the files. Figure 3 shows a source snippet of field SINIT from the file CNTACS, and the text description gives us a clue that this field means Salesperson. Figure 4 shows the source code of the file SLMEN and clearly shows that the key of the file is PERSON. Figure 5 shows a snippet of code from an RPG program that is using the field SINIT read in from the file CNTACS to read the file SLMEN. This is validation that the SINIT field equals the PERSON field, and as such, the two files CNTACS and SLMEN have a foreign key relationship. Admittedly, this is a very simple example, and I knew what I was looking for. In even modest-sized IBM i applications, the task of analyzing an entire system is a tedious and time-consuming one and requires fairly good analytical skills. In Chapter 1, I demonstrate how an application map can be used to accelerate analysis tasks by providing mapping between variables and database fields for an entire system. Deriving foreign keys by analyzing source code is a classic use of application mapping technology. It’s also possible to write programs that use the application map to analyze the source code and look for clues and proof of foreign key relationships between application files. On large applications, it’s virtually a prerequisite to do this sort of analysis programmatically. The added benefit is that it’s very easy to keep up-to-date with a repeatable automated extraction process. Extracting Business Logic Over a 20-year period, a company might invest tens of millions of dollars in adding, fine-tuning, and fixing the business logic in the legacy code. Business Rule Extraction is the process of isolating the code segments directly related to business processes. For example: ensuring that when a customer is added to the system, a valid telephone number is provided by the user. Figure 6 shows a sample of RPG code used to do this. The challenge has always been to identify, isolate, and reuse only those designs relevant in the new context in which they’re desirable. The sheer volume of code, its complexity, and the general lack of resources to understand legacy languages and specifically RPG represents a tragic potential waste of valuable business assets. The problem is that in the vast majority of legacy RPG and Cobol programs, the business-rule logic is mixed in with screen handling, Brought to you by Databorough and System iNetwork eBooks 22 Application Mapping on IBM i database I/O, and flow control. So harvesting these business rules from legacy applications requires knowledge of the application and the language used to implement it, both of which are a steadily diminishing resource. Once harvested, these rules need to be narrated and indexed, thus providing crucial information for any analyst, architect, or developer charged with renewing or maintaining the legacy application. Figure 7 shows the same piece of code as in Figure 6 but with some narrative about the business logic added, along with an index on line 169.99. Indexing the business logic in a systematic and structured way by providing reference to its source, the field and file it refers to, and some form of logic-type classification provides some additional benefits. The first and most obvious is that it provides a mechanism to programmatically extract and document the rules in various ways. Figure 8 shows the business logic of the preceding program documented in a Microsoft Word table. The second benefit is the ability to filter cross-reference information about a field in such a way as to show only where business logic is executed against it. Figure 9 shows a spreadsheet of instances across a system where business logic has been applied to a field TELNO. These features can then be put to use by developers wanting to centralize business logic across the system by rapidly and accurately accelerating the analysis work required to do this. In addition to these uses for indexed business logic, it’s possible to write programs that can extract the indexed logic to create web service modules, provide documentation for redevelopment in a modern language such as Java or C#, or even populate business-rule management systems, such as JBoss Drools or IBM’s ILOG. Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 23 Unlocking the Power of the Database As I mentioned earlier, few IBM i application databases have any form of data model or schema explicitly defined. As many companies have discovered, this can significantly hinder development initiatives that use direct access to the application data. Here are four of the most obvious areas that benefit from an explicitly defined data model: Using modern input/output methods in programs. One of the key aspects of RPG is its native I/O access. The terse and simple syntax for this also gives RPG significant development productivity over other languages. In modern languages such as Java and C#, the most common practice is to handle database I/O by using embedded SQL statements. For simple, single-table reads, most developers can create SQL statements that meet this requirement. Things get more complicated when tables and files must be joined for reads, updates, or deletes. In this context, most developers need to understand the data model and must have access to foreign key relationship information, so as to build the Join statements correctly in SQL. Figure 10 shows an entity relationship or data model diagram extracted from a legacy RPG application, which is a common way to visualize how files/tables in the application database relate to each other. Figure 11 in turn shows a spreadsheet with all the foreign keys that determine the relationships shown in Figure 10. An explicitly defined application data model described in DDL can be imported directly into persistence frameworks such as Hibernate and NHibernate for .NET. These open-source object relational mapping (ORM) solutions let Java and C# developers access DB2 for i databases without needing to know the architecture of the database or JDBC or ODBC technologies, thus greatly simplifying their work. DDL can also be imported by all popular application modeling, such as Rational Software Modeler, Borland Together, and Eclipse. Microsoft Visual Studio also allows the import of DDL for building Data Projects. Data quality analysis—referential integrity testing. Over many years of application use, enhancement, upgrades, and fixes, it’s only natural that RI will suffer. This is untrue of systems that implement RI in the database itself but, as I mentioned, very few, if any IBM i applications use this facility. With an explicitly defined data model of an IBM i application database, database records can be tested for referential integrity programmatically, producing a report of orphaned records as an output. Simply explained, the program starts at the bottom level of the hierarchical Brought to you by Databorough and System iNetwork eBooks 24 Application Mapping on IBM i data model—in other words, those files/tables that have no children/ dependents—and looks for corresponding records in owning files/tables by using the foreign keys provided by the data model. It carries out this test for all files/ tables in the application one after the other. Automated test data extraction. The need for accurate and representative test data is a requirement as old as application development. Most companies use copied production data to fulfill this need. There are a few problems associated with this approach, such as the need to keep test data current, the length of time required to copy production data, the disk space requirements, and the length of time for testing over complete data sets versus limited data. Another well-used method for creating test data is to use simple copy statements in the OS for the required files only. This approach works fine but is still labor intensive and can be error prone. An increasingly popular approach is to select specific master or control records from the production database and then write a program to copy the related records from the other files, using the foreign keys provided by the explicitly defined data model. This approach produces small and current test data quickly and with guaranteed RI. It’s often used by ISVs and support organizations to assist with customer testing of changes and enhancements of base systems. Building BI applications or data warehouses. There has been a big push over the last couple of years in the area of business intelligence (BI). By using the data in the application database more and more effectively, companies expect to attain and sustain a real competitive edge. The technologies available to facilitate this aren’t new or even that complicated for the most part. They all have a fundamental requirement for use: that the application database design be defined or described to them in detail. This requirement is not in and of itself a problem, but when you consider that virtually no IBM legacy applications have an explicitly described relational data model design, it can become a problem for IBM i users. With access to an explicitly defined data model of the legacy application, these tools can be used much more productively and help provide increased and ongoing ROI from the legacy application, even with relatively small development and support teams. Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 25 A good example of this situation is with IBM’s DB2 Web Query for i. A great tool and natural successor to IBM’s Query/400 product, it comes with many powerful BI-type features. To really get the full benefit of all of this rich functionality in DB2 Web Query, you need to populate the meta-data repository with the DB2 application database design, including the foreign key information. The explicitly defined database supplies this information and can be used to create an entire meta-data layer in DB2 Web Query. Figure 12 shows an example of the source and model views of the meta-data of IBM’s DB2 Web Query. The model view shows how a file is joined to the other files in the database. This meta-data was created automatically from an explicit data model derived from an IBM i DB2 database. Reducing Risk and Maximizing ROI IT departments throughout the world are struggling to balance day-to-day support with a backlog of new user requirements—often operating under severe headcount and cost restrictions. Compliance and regulatory pressures have increased over the last 10 years or so, making large “build it from scratch” projects too risky to contemplate. From this unpromising start point, it’s possible to make headway against the storm of conflicting demands by using the proven business processes contained in your existing systems. If we can extract business logic and data model, a world of new possibilities opens up, and a layer of risk and uncertainty around potential projects is reduced because you have access to what your systems really do, as opposed to what people think they do or what the outdated documentation says they do. If the business-logic and data-model extraction processes can be automated, it follows that many project types become feasible with limited resources. These can range from quick solutions, such as making an order enquiry process available as a web service so customers can integrate it into their own processes, to longer-term solutions, such as full ERP systems or other large-scale systems. Recovered business rules can be recycled and reused in Business Rule and Process Management systems, such as JBoss Drools, or workflow systems. Automated businesslogic and data-model extraction processes also serve well in smaller-scale developments, such as using the business rules and processes of a CRM system to build a new Java-based web application—offering peace of mind in the knowledge that the processes contained in the application are proven, and accordingly reducing the risk and development time. Ultimately, this process translates to large-scale reuse precisely because it can be taken step by step in manageable chunks and delivers working prototypes quickly from the recovered business rules and model; there’s no need for development to be done in a vacuum starved of the oxygen of user feedback. The next chapter covers in more detail design recovery and application renewal by using extracted logic. Application development and modernization is all about designs—not pure code philosophy. From a compliance or audit point of view, compare the confidence levels of two IT directors: One IT director has used an automated business-rule and data-model recovery system to reuse and rebuild systems, and the other IT director has used a conventional approach of manually designing and building new functionality to match the functionality of an existing system by looking solely at the user requirements documentation and source code of the existing systems. One IT director has an auditable and repeatable process to recover rules and processes and build new systems. The other IT director is solely dependent on the skill and experience of the people building the new system and on those people’s interpretation of the existing system’s scope and eccentricities. Brought to you by Databorough and System iNetwork eBooks 26 Application Mapping on IBM i Chapter 4: Modernizing Legacy Applications Using Design Recovery The concept of reusing existing code or logic isn’t a new one. The challenge has always been to identify, isolate, and reuse only those designs that are relevant in the new context in which they’re desirable. In the case of IBM i, the sheer volume of code, its complexity, and the general lack of resources to understand legacy languages, specifically RPG, represent a tragic potential waste of valuable business assets for hundreds of thousands of companies. In many cases, these expensive and well-established legacy designs have little chance of even having their relevance assessed, let alone being reused. To fully understand and appreciate the problem domain, just think for a minute of two approaches to these problems, namely screen scraping and code conversion. Simply screen scraping the user interface with a GUI or web emulation product doesn’t improve the situation. The application may appear slightly more “modern,” but the cosmetic changes still leave it with all the same maintenance and enhancement problems, and it may not be much easier for new users to use. The same applies to building web services around wrapper programs written to interpret the interactive data stream from 5250 applications. Another common approach is code conversion—line-by-line syntax conversion of a legacy application. This approach typically transfers the same problems from one environment/language to another. Indeed, it often produces source code that is less maintainable, canceling out the benefit of using modern technologies and architectures in the first place. Syntax conversions are still being done by some companies and are often promoted by vendors of proprietary development tools for obvious reasons. This approach has never to my knowledge produced an optimum long-term result, despite many attempts over the last two decades. The objective, therefore, in a true modernization project is to extract the essence or design from the legacy application and reuse these designs as appropriate in rebuilding the application, using modern languages, development tools, and techniques, and tapping into more widely available skills and resources. In the previous three chapters, I describe how to recover application legacy design assets in a structured and proven manner. In this chapter, I detail how to use these recovered designs to create a modern application. Modern Application Architecture Modern applications are implemented with distributed architecture. A popular standard used for this architecture is Model-View-Controller (MVC). Figure 1 shows the architecture of a typical legacy application and the MVC architecture side by side. MVC allows for independent implementation and development of each layer and facilitates object-oriented (OO) techniques and code reusability rarely found in legacy applications. All these characteristics of a modern application radically improve the maintainability and agility of the application. Legacy applications do have these same elements, but they tend to be embedded and mixed Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 27 up in large monolithic programs, with vast amounts of redundancy and duplication throughout. Using MVC to implement an RPG application requires that the business logic be separate from the user interface and controller logic. Figure 2 shows a schematic of the code implementation in a typical modern application. This architecture can be implemented by using 5250 and pure RPG, but it’s more likely and common when using a web interface for the view and with the controller logic to be written in a modern language that supports web interfaces, such as Java, EGL, or C#. The optimum modernization result is to reduce dependency on legacy and proprietary languages as much as possible. To achieve this, recovered design assets are reused as input to redevelop the appropriate layer. Figure 3 shows an overview of the overall process of modernizing the legacy code by using the recovered designs. In Chapter 3, I discussed how to extract the data model and business rule logic from legacy code. If these extracted designs can be articulated in language or programmatic format, such as Unified Modeling Language (UML), SQL’s Data Definition Language (DDL), and XML, or even in structured database language statements, it’s possible to use them programmatically to generate the basis of a new application skeleton. This can save companies millions of dollars and significantly reduce timelines. It also means that the designs can be perfected before code is written in the new application. Another benefit is that the generation process can be run repeatedly until the optimum start point of the new application development process is achieved, rapidly and with little effort. This programmatic reuse of recovered application designs requires a certain amount of restructuring of the designs. The legacy designs of the interactive logic and flow of the legacy application can be used to build a modern application skeleton, and thereafter the extracted business-rule logic can be added to this skeleton. Modern resources, tools, and methods can then be used independently to enhance and complete the modernization as required. Let’s look at these steps in more detail. Building a Modern Application Skeleton The most fundamental change and the biggest challenge in modernizing a legacy application is moving from a procedural programming model to an event-driven one. This aspect is one of the primary reasons that line-by-line syntax conversions to modern languages produce results that are often less maintainable than the original code. One legacy-application design element that’s almost directly transferrable to a modern, event-driven programming model is individual screen formats. Legacy screen formats largely if not explicitly correspond to individual steps in a transaction or business process. An individual web page or application form largely if not explicitly corresponds to Brought to you by Databorough and System iNetwork eBooks 28 Application Mapping on IBM i an individual step in a transaction or business process. By simple deduction therefore, all the design detail relating to the rendering of this specific legacy screen format can be used to specify and build a modern UI component. I refer to the design information that forms this intersection as a “function definition,” as Figure 4 shows. To rebuild a modern application skeleton from the legacy designs, a function definition should consist of the following elements: •Screen fields - work fields and fields directly traceable to database fields •Screen field types and attributes - fields used as dates, foreign keys, descriptors, size, type, and so forth •Screen constants - column headings, field prompts, function names, command-key descriptions, and so forth •Screen layout - column and row positions can be converted later using relative pixel ratios •Screen field database mapping - where the data for the screen comes from, including join rules for foreign keys •Screen actions - command keys, default entry, and subfile options This design information is entangled in DDS, program logic, and the database of the legacy application. With a reasonable level of skill in legacy languages, developers can extract this information manually by analyzing the source code manually. With larger systems, the best course is to use tools for the analysis and extraction process. Beyond productivity, consistency, and accuracy, the added benefit of using an analysis and extraction tool is that the results can more easily be stored programmatically and thereby used to automate the next step of writing the code. Using UML is one way to achieve this. The function definitions can be generated as a UML model for an application, with a number of specific UML constructs that will also assist in modeling and documenting the new application for modern developers. Some of these constructs include Activity Diagrams, Use Cases, and Class Diagrams. Figure 5 shows a UML Activity Diagram that represents the users’ flow through a series of legacy programs having multiple screen formats. DDL and XML can be used as a means to efficiently specify the detailed aspects of the function definitions. DDL created from the legacy application data model can be imported into a persistence framework or object relational map (ORM), such as Hibernate for Java and NHibernate for .NET. An ORM greatly simplifies the subsequent coding required in Java or C# by subcontracting all the complicated SQL programming required in an enterprise business application. An additional approach is to create a single database I/O class for each table. This approach removes the need to have I/O logic embedded in every program in the system, immediately making the application more maintainable and agile. The function definitions are then used to create the UIs and controller beans in the Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 29 language and standard of choice (one JSF page and corresponding Java bean per legacy screen format). Using XML to store the function definition provides input for documented specifications for manual rebuilds and serves as an input to programs that can create the view and controller components. This approach is applicable to Java, EGL, C#, and PHP implementations and can be used for web, mobile web, and Rich Client Platform (RCP) alike. This is an important factor for enterprise applications that often require a mix of device types and even technology implementation options for a single system. Figure 6 shows a JSF page generated from a function definition extracted from a legacy program. The important factor here is not so much the look and feel but rather the functionality of each button now associated with an event handler in the underlying JSF bean triggered by the HTML code itself. The data in the grid was retrieved from the DB2 for i database by using the SQL in the bean created for the underlying database table. This was invoked by the JSF bean when the JSF page was requested by the user (in this instance from a menu on a previous page). The HTML and CSS layout were created by using the information from the function definition, and the buttons to put into the JSF page from the options, command keys, and default entry were all extracted from the legacy program. In this instance, the design was extracted from legacy RPG/DDS, and the JSF page and Java beans were created automatically, by using a tool, in a few minutes. The style was implemented by using a standard CSS file and supporting images. This is all industry-standard, best-practice, modern stuff. Figure 7 shows a snippet of the underlying HTML code that triggers an event in the bean to invoke the orders page, passing the key of the row selected by the user. The underlying Java bean knows what to do because the parameter being passed tells it where to go next. In this way, the JSF beans can be kept small and simple—another good industry standard and best practice. Figure 8 shows the record-type page that the user is taken to when selecting the Change button in Figure 6. The drop-down combo boxes and date controls were added because of the presence of the foreign key information and date field types, respectively, in Brought to you by Databorough and System iNetwork eBooks 30 Application Mapping on IBM i the extracted function definitions. This simple algorithm can save thousands of hours of configuration and editing of web pages in a modern application with hundreds or thousands of screen formats. Adding Business-Rule Logic In Chapter 3, I describe how business-rule logic can be extracted, indexed, and documented from legacy RPG code. One approach is to add these documented rules manually to the appropriate business logic class in the modern application. This approach should be reserved for cases in which very little of the legacy business logic is to be reused, including, of course, smaller programs that have little or no specific business logic beyond what has already been created in the JSF page, JSF bean, and database I/O beans. I want to reiterate that the same principles that I describe here, using JSF and Java examples, are applicable in .NET and even in modern RPG applications. Another more practical approach that has already been automated is to essentially refactor the original interactive program, to the extent that only the business logic processing is reused. Naturally, the refactoring must include restructuring to turn it from a procedural design to an event-driven one. Again, this process is applicable whether creating Java, .NET, EGL, or even RPGLE business-logic components. During the initial modernization effort, the businesslogic bean should be created as a single class/module/program that services each of the modern, eventdriven JSF pages that came from the original legacy program. This is a maintainable architecture and follows modern coding practices but retains at least some reference to the legacy transactions. Figure 9 shows a schematic representation of the architectural mapping between legacy and modern designs for a single legacy program. The original legacy program (File A) had three screen formats, which become three discrete free-standing JSF pages and corresponding Java beans, two database I/O classes determined by the number of unique tables or physical files, and a single business-logic bean/module/program for the business logic. I used JSF technology and Java for this diagram, but the same architecture is applicable for any modern language. The same architecture would be consistent with using Spring and Hibernate frameworks, too. The business-logic bean now contains a restructured version of all the relevant business logic from the original program. This restricting process turns business logic from procedural into event-driven code, and in doing so maps the relevant business-rule processing to the relevant JSF page. The first step to achieving this restructuring is to recover the logic executed before screen 1 is rendered. This will essentially be placed in the pre-entry method or procedure of a new business-logic bean and invoked by the JSF bean before the JSF page is displayed. Any legacy UI logic, such as interactive indicators, is removed during this extraction. Brought to you by Databorough and System iNetwork eBooks Application Mapping on IBM i 31 The next step is to map the business logic to each JSF page. This is done by identifying the business logic executed after the legacy Format 1 but before legacy Format 2. We ignore the interactive logic and legacy structures such as indicators, which were turned into variables where applicable. This logic is then created as a new method (called JSF1validation, for example) in the new business-logic bean. This business logic is invoked by the JSF bean that corresponds with legacy Format 1, when triggered by the validation event in the new JSF page. The trigger is usually the Submit button on the JSF page itself. This stage is repeated for each of the legacy screen formats/JSF pages. The documented, indexed business rules—which I describe in previous chapters—can be used as a reference for auditing the applicable logic during this refactoring and extraction exercise. Finally, the legacy subroutines, by simple logic, can be considered business logic and, as such, have a scope that’s potentially applicable to any of the newly created validation methods/procedures. Therefore, only the legacy-specific code or redundant interactive lines need to be removed before these subroutines are coded or copied into the new business-logic bean. Figure 10 shows an example program outline documented in a form of platform-independent pseudo code. I’ve included only the method outline with one of them expanded, and I added some simple color coding. The preentry method will be executed before JSF1 is rendered. Validation for JSF1 is executed when the user selects the Submit button from the web page, and so on. The GETREC through ZGETNAMES are subroutines that have had the interactive logic removed and verified to contain valid business logic in them. It is not possible in such a short chapter to show the complete detail, but I can provide detailed examples upon request. Sustainable Reuse The harvesting of valuable designs is now complete, and the application can be enhanced and refactored. It’s worth noting that tools are available to automate each step described in this process. Staged automation of the recovery and rebuild process can reduce a system-rewrite effort by at least 50 percent. Even executed manually, this approach provides for iterative and parallel use of resources and is applicable for individual programs, application areas consisting of multiple programs, and even entire systems. It allows for sustained reuse of legacy technology—but isn’t bound by it—while simultaneously producing a real, modern application, not an emulated one. About the Author Robert Cancilla spent the past four years as market manager for IBM’s Rational Enterprise Modernization tools and compilers group for IBM i. Prior to that, Robert spent 34 years as an IT executive for three major insurance companies and an insurance software house. He has written four books on e-business for the AS/400 or iSeries and founded and operated the electronic user group IGNITe/400. Robert is currently retired and does independent consulting work. Brought to you by Databorough and System iNetwork eBooks