* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ODB Training 2005
Oracle Database wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Relational model wikipedia , lookup
ContactPoint wikipedia , lookup
Introduction to Observational DataBase (ODB) Sami Saarinen, Paul Burton ECMWF 22-Mar-2006 ODB Training 2006 slide 1 ECMWF Overview  Introduction to ODB  Creating a simple database  Use of simulobs2odb –program  Visualizing data using odbviewer, ODBTk  The bigger picture  ODB within IFS/4DVAR-system  A more complex database  Manipulating ODB from Fortran90  Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf ODB Training 2006 slide 2 ECMWF Overview  Introduction to ODB  Creating a simple database  Use of simulobs2odb –program  Visualizing data using odbviewer, ODBTk  The bigger picture  ODB within IFS/4DVAR-system  A more complex database  Manipulating ODB from Fortran90  Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf ODB Training 2006 slide 3 ECMWF Introduction to ODB  ODB is a tailor made database software developed at ECMWF to manage very large observational data volumes through the IFS/4DVAR-system, and to enable flexible postprocessing of observational data  Observational database usually contains following items:  Observation identification, position and time coordinates  Observation value, pressure levels, channel numbers  Various quality control flags  Obs. departures from background and analysis fields  Satellite specific information  Other closely related information ODB Training 2006 slide 4 ECMWF AMSU-A data before screening ODB Training 2006 slide 5 ECMWF Basic components of ODB  ODB/SQL-language  Data Definition Language: To describe what data items belong to database, what are their data types and how they are related (if any) to each other  Data Query Language: To query and return a subset of data which satisfies certain user specified conditions. This is the key feature of the ODB software !!  Fortran90 interface layer  Data manipulation : create, update & remove data  Execute ODB/SQL-queries and retrieve filtered data  To control MPI and OpenMP-parallelization ODB Training 2006 slide 6 ECMWF ODB/SQL compilation system ODB Training 2006 slide 7 ECMWF Typical ODB usage patterns  Database can be created interactively or in batch mode  We usually run our in-house BUFR2ODB in batch  New observation types can also be fed in via text file  Complete database manipulation currently prefers using Fortran90-interface, but read/only database can also be accessed via rudimentary client-server –interface (C/C++)  When database has been created, the application program normally queries data and places the result (also known as view) into a data matrix allocated by the user  There can be virtually any number of active views at any given time. These can be updated and fed back to database ODB Training 2006 slide 8 ECMWF Overview  Introduction to ODB  Creating a simple database  Use of simulobs2odb –program  Visualizing data using odbviewer, ODBTk  The bigger picture  ODB within IFS/4DVAR-system  A more complex database  Manipulating ODB from For  Tools: odbless, odbdif, odbcompress, odbdup, odb2netcdf ODB Training 2006 slide 9 ECMWF Creating a simple database  We will create a very simple database using text files  The 3 text files describe  Data layout i.e. what data items comprise this ODB  Location and time information of observations  Actual observation measurement information for each location at the given pressure levels  Feed these files into simulobs2odb-program  Discover the data values in database by using odbviewer ODB Training 2006 slide 10 ECMWF Data definition layout : MYDB.ddl CREATE TABLE hdr AS ( CREATE TABLE body AS ( seqno pk1int, entryno pk1int, obstype pk1int, varno pk1int, codetype pk1int, vertco_type pk1int, lat pk9real, press pk9real, lon pk9real, obsvalue pk9real, date yyyymmdd, time hhmmss, body @LINK, ); ); ODB Training 2006 slide 11 ECMWF Input file#2 : hdr.txt #hdr obstype = 2 codetype = 141 seqno lat lon 1 45 -15 ODB Training 2006 date time body.len 20041101 000000 1 slide 12 ECMWF Input file#3 : body.txt #body entryno varno vertco_type press obsvalue 2 1 50000 251.0 1 ODB Training 2006 slide 13 ECMWF Running simulobs2odb  Initialize ODB interactive environment :  use odb  Create database using the following simple command :  simulobs2odb –l MYDB –i hdr.txt –i body.txt  As a result of these commands, a small database called MYDB has been created and it contains one data pool with two tables hdr and body, which are linked (related) to each other via special @LINK data type  It is now easy to extend database by providing more data, or specifying more data items, or adding more tables, or all above at the same time ODB Training 2006 slide 14 ECMWF Visualizing with odbviewer  History: odbviewer was originally written to be used as a debugging tool for ODB software development  Linked with ECMWF graphics package MAGICS/MAGICS++ it displays coverage plots  Also a textual report generator  Displays output of data queries  “Sensitive” to ODB/SQL-language : tries automatically produce both coverage plot and textual report for the user  Textual report itself can be invaluable source of information for further post-processing tasks ODB Training 2006 slide 15 ECMWF Running odbviewer  Go to database directory  cd MYDB  Run  odbviewer –q ‘SELECT lat,lon,press,obsvalue\ FROM hdr, body \ WHERE obstype = 2’ ODB Training 2006 slide 16 ECMWF odbviewer coverage plot Our observation !! ODB Training 2006 slide 17 ECMWF Some odbviewer [options] -h List of options (gimme some “help” !) -q ‘SQL-stmt’ Provide ODB/SQL-statement inline -v viewname/poolno Choose SQL name (& optionally pool number) -p “1-10,12,15” Choose from a subset of pools -R No radians-to-degrees conversion for (lat,lon) -r Enforce radians-to-degrees conversion -c Clean start (i.e. recompile all) -e editor Choose preferred editor -e batch Run in batch mode (same as –e pipe) -N Do not produce a report at all -I Do not show plot immediately -P projection Change projection -C file.cmap Supply a color map file -A plot_area Choose plotting area ODB Training 2006 slide 18 ECMWF ODBTk : The ODB Toolkit GUI based ODB visualisation tool Easy way for non-experts to build SQL Interactive viewing of observational data Can refine SQL “WHERE” statement as you view the data Portable, lightweight application  Requires ODB, perl, Fortran90 & C compilers ODB Training 2006 slide 19 ECMWF ODBTk : Building an SQL Twin views on structure  Hierarchical structure  Allows relationship between tables/columns to be seen  “Flat structure”  Easy to find a given column/member or table  Allows user to sort structure SQL library  Both local & shared ODB Training 2006 slide 20 ECMWF Visualising Coverage ODB Training 2006 slide 21 ECMWF Visualising X-Y plots ODB Training 2006 slide 22 ECMWF Overview  Introduction to ODB  Creating a simple database  Use of simulobs2odb –program  Visualizing data using odbviewer, ODBTk  The bigger picture  ODB within IFS/4DVAR-system  A more complex database  Manipulating ODB from Fortran90  Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf ODB Training 2006 slide 23 ECMWF AMSU-A data after screening Under 10% left active !! ODB Training 2006 slide 24 ECMWF ODB within IFS/4DVAR-system ECMA/ODB CCMA/ODB Output BUFRs ODB Training 2006 slide 25 ECMWF A more complex database  In the real world a database may contain many more tables (>>5) than in the simple example earlier  Each table can contain 10—50 data columns  There can also be a sophisticated data hierarchy (next slide) to describe potentially complex relationships between tables  In order to provide a good parallel performance on supercomputers, data tables are furthermore divided into data pools  They behave like sub-databases within a database  Allows much bigger data sets than otherwise possible ODB Training 2006 slide 26 ECMWF Comprehensive data hierarchy ODB Training 2006 slide 27 ECMWF ECMWF BUFR to ODB conversion  ODBs at ECMWF are normally created by using bufr2odb  Enables MPI-parallel database creation  efficient  Allows retrospective inspection of Feedback BUFR data by converting it into ODB  bufr2odb can also be used interactively, for example: bufr2odb –i bufr_input_file –I 1-20 –n 4  The preceding example creates 4 pools of ECMA database from the given BUFR input file, but includes only BUFR subtypes from 1 to 20 (inclusive)  Feedback BUFR to ODB works similarly: fb2odb –i feedback_bufr_file –n 8 –u 2 ODB Training 2006 slide 28 ECMWF Overview  Introduction to ODB  Creating a simple database  Use of simulobs2odb –program  Visualizing data using odbviewer, ODBTk  The bigger picture  ODB within IFS/4DVAR-system  A more complex database  Manipulating ODB from Fortran90  Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf ODB Training 2006 slide 29 ECMWF Manipulating ODB from Fortran90  Currently Fortran90 is the only way to fill an ODB database  simulobs2odb is also a Fortran90-program underneath  likewise odbviewer or practically any other ODB-tool  Also: to fetch and update data, Fortran90 is necessary  ODB Fortran90 interface layer offers a comprehensive set of functions to  Open & close database  Attach to & execute precompiled ODB/SQL queries  Load, update & store queried data ODB Training 2006 slide 30 ECMWF An example ODB program program main use odb_module implicit none integer(4) :: h, rc, nra, nrows, ncols, npools, j, jp real(8), allocatable :: x(:,:) npools = 0 h = ODB_open(‘MYDB’, ’OLD’, npools=npools) < data manipulation loop ; see next page > rc = ODB_close(h, save=.TRUE.) end program main ODB Training 2006 slide 31 ECMWF Data manipulation loop DO jp=1,npools ! Execute SQL, allocate space, get data into matrix rc = ODB_select(h,’sqlview’,nrows,ncols,poolno=jp) allocate(x(nrows,0:ncols)) rc = ODB_get(h,’sqlview’,x,nrows,ncols,poolno=jp) ! Update data, put back to DB, deallocate space call update(x,nrows,ncols) ! Not an ODB-routine rc = ODB_put(h,’sqlview’,x,nrows,ncols,poolno=jp) deallocate(x) rc = ODB_cancel(h,’sqlview’,poolno=jp) ! Use the following only with READONLY-databases ! rc = ODB_release(h,poolno=jp) ENDDO ODB Training 2006 slide 32 ECMWF Compile, link and run (1) use odb # once per session (2) odbcomp MYDB.ddl # once only;often from file MYDB.sch (3) odbcomp sqlview.sql # recompile only when changed (4) odbf90 main.F90 update.F90 –lMYDB –o main.x # link (5) ./main.x ODB Training 2006 # run slide 33 ECMWF Overview  Introduction to ODB  Creating a simple database  Use of simulobs2odb –program  Visualizing data using odbviewer, ODBTk  The bigger picture  ODB within IFS/4DVAR-system  A more complex database  Manipulating ODB from Fortran90  Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf ODB Training 2006 slide 34 ECMWF odbless  A textual browser that allows to look at ODB data page-bypage –basis (a little like Unix less-command):  By default calculates statistical summary for each retrieved data column  Cheap with near-optimal ODB data access pattern  User has a choice of specifying starting row  Usage: odbless –q ‘SELECT column(s) FROM table(s) WHERE …’ \ –s starting_row –n number_of_rows_to_display \ [–b buffer_size –X] ODB Training 2006 slide 35 ECMWF odbdiff  Enables to compare two ODB databases for differences  Very useful tool when trying to identify errors/differences between operational and experimental 4DVAR runs  Usage: odbdiff –q ‘SELECT …’ DATABASE1 DATABASE2  By default brings up an xdiff-window with respect to diffs  If latitude and longitude were given in the data query, then also produces a difference plot using odbviewer-tool ODB Training 2006 slide 36 ECMWF odbcompress  Enables creation of very compact database from the existing one for  archiving purposes, or for smaller footprint  Makes post-processing considerably faster  At this point the user has choices of both  Truncating the data precision  Leaving out columns that are less of importance  Early tests show that this new tool achieves compression factors from 2.5X to 11X  the higher compression being for satellite data !! ODB Training 2006 slide 37 ECMWF odbdup  Duplicates database(s) by copying metadata (low volume), but shares the actual data (high volume)  Allows database sharing between multiple users  Over shared (e.g. NFS mounted) disk  Enables creation of time-series database, for example: odbdup –i “200601*/ECMA.conv” –o USERDB  The previous example creates a new database labelled as USERDB, which presumably spans over all the conventional observations during January 2006  The heureka is : user has now access to a whole month of data as if it was situated in one single database !! ODB Training 2006 slide 38 ECMWF odb2netcdf  Translates the given ODB-query (or whole ODB-table) into a series of NetCDF-files, by default one file for each ODB data pool  Usage: odb2netcdf –q ‘SELECT …’  The result files can be viewed with standard NetCDF tools like ncdump and ncview  The files can also be produced in NetCDF packed format (with a caveat of truncated precision) ODB Training 2006 slide 39 ECMWF Also … Some interesting facts  Written mainly in C-language  Except Fortran90-interface and IFS/4DVAR interface  Except BUFRODB (by Milan Dragosavac)  ODB/SQL is currently converted into C-code  10 lines of SQL generates >> 100 lines of C-code  Standalone ODB installation (w/o IFS) is also available  Can be built in about 30 minutes for Linux/laptop  Tested at least on the following machines  SGI/Altix, IBM Power3/4, Linux Intel/AMD, VPP, …  Automatic binary data conversion guarantees database portability between different machines ODB Training 2006 slide 40 ECMWF … and some ODB “limitations”  ODB software is clearly meant for large scale computation since – given lots of memory and disk space, fast CPUs:  A single program can handle up to 2^31 ODB databases  A single database can have up to 2^31 data pools  A single database can have any number of tables  A single table in a data pool can have up to 2^31 rows and (by default) 9999 columns  A single ODB/SQL-query over active data pools can retrieve up to 2^31 rows in one go  These really big numbers show that ODBs potential is on parallel computers, but we haven’t forgotten desktop PCs! ODB Training 2006 slide 41 ECMWF Finally…  ODB software is developed to allow unprecedented amounts of satellite data through the IFS/4DVAR system  Software has been operational at ECMWF since June’2000, but is still evolving  Emphasis is now on graphical post-processing and how to enable fast access to very large amounts of data  Other ECMWF member states and co-operating countries that are also using or just becoming users of ODB  MeteoFrance, DWD, Hungary, Aladin/HIRLAM-nations  MetOffice is considering via collaboration with BoM  University of Vienna via re-analysis ERA40 collaboration ODB Training 2006 slide 42 ECMWF
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            