Download Getting started…

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
Using XML files as real
corpora
making an XML database with the
dbXML program
http://www.dbxml.com
The dbXML program
• The dbXML program is one of a range of
programs that lets you use a set of XML files as
a database.
• The program is free and can be downloaded
from the web.
• It is likely that many more programs like this will
be springing up over the next couple of years.
Basic concepts
• Using a database requires the following
basic concepts
– the set of files you are looking at is called a
collection
– a collection of files must be indexed so that
the program can find things quickly
– you ask questions by posting queries to the
database manager
Using the dbXML program to
manage an XML database
• Our starting point assumes that we have
some set of marked-up XML files that we
want to manage.
• We first set up these files as a database
• We then use the dbXML tool for extracting
information from this database.
Example XML files in our data set
Steps…
• Now we will see:
– how to add a collection of files to a database
– how to index those files
– how to ask queries to get information about
the content of those files
Getting started… (1)
• First, we need to start up the DBXML
server program
This is the program the does all the actual
work.
To do this:
– Make sure you know where the dbxml folder is
– Run the program startup-server.bat in that folder (e.g.,
by double clicking on it).
– This should start the dbxml server with a message like:
dbXML 2.0 (Dragonfly)
Logging to E:\junk\logging\dbXML.out
Getting started…(2)
• Next, we turn a set of XML files into an
XML database. To do this we must start the
dbxml administration program and tell it
which files to use.
– Start a DOS-Command window
– Make sure you know where the dbxml folder is
– Run the command ‘startup-command-line.bat’ that is in
the dbxml folder
– This should then start the dbxml program and you
should get something that looks like the window on the
next slide…
The program when it starts…
The DBXML administration actions
• Now you can tell the program which files
you want to include in your database.
– To do this, you first have to login to the program:
connect user= scott pass= tiger
You must use exactly this name and password for the
moment!
– make a collection
mkcol myXMLfiles
– Finally, go to the collection and say that everyone is
allowed to look at it and exit:
col myXMLfiles
grant admin READ WRITE EXECUTE CREATE
exit
The dbXML program proper
• With the administrative details aside, we
can start the main program.
• Find the dbxml item in the normal program
start menu from Windows and click on it.
• This should bring up the following window:
If it does not, or if you cannot find it,
you will have to ask for help.
Finding your collection
Expand the items in the
list under “localhost” until
you find the collection
that you made in the
previous step.
Finding your collection
Adding files to your collection
Previous slide
Expand your
collection to find the
‘documents’
Select ‘Documents>
Import Documents’ from
the menu bar.
Click on this.
You will then be asked
which files are to be
added to the collection.
When you have added your
documents…
select them all
at one go if
possible
… you then have to
index them…
Select the indexes folder in your
collection…
Define an index as follows…
1
2
3
1. Give the index a name
2. Then you must type “pattern=*@*” to index all
ELEMENTS + ATTRIBUTES
3. and click on create.
… you can now ask
questions about
their content
• using XPath
• XSLT
• full text
QUERY WINDOW
RESULT WINDOW
Selecting all ‘turns’ in the corpus
Selecting all ‘attrib’ in the corpus
The results….
• are presented as
XML
• therefore you can
pass them straight
to a style sheet to
look at them…