Download Semantic Roles and Ontologies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Semantic Roles and
Ontologies
pala@fi.muni.cz
Ontologies
• Growing interest in the data structures known as
ontologies
• Language expressions covering the individual domains,
hierarchically organized
• E.g. Top Ontology from EuroWordNet – has been designed
for English as whole
• Others – SUMO/MILO, CYC, etc.
• How they are built – introspectively, designed usually from
the ‘top’
• Large ontologies (TO EWN) are not based on evidence, on
language data coming from corpora
• In the area called Semantic Web ontologies are developed
quite pragmatically, thus their compatibility is questionable
Semantic roles
• In NLP area much attention is paid to ‘valency
frames’ of verbs
• Data structures that describe the relational
properties of the verbs
• Thus we speak about predicate-argument
structure of verbs, e.g. for drink there is someone
who drinks (AGENT) and the respective beverage
(beer) – labeled as PATIENT
• Semantic roles are labels used for description of
the arguments – we have their inventories (their
number usually is about 40-50
• Inventories of the semantic roles are built mainly
from the ‘top’ as well (i.e. introspectively)
How Ontologies and SRs are
related?
• Inventories of the semantic roles can be, in fact,
viewed as a kind of ontologies
• Number of verbs in Czech is about 35 000 and we
want to have the semantic roles for them
• We need them also for discrimination of senses – this
is a critical problem
• How to obtain the adequate inventory of the semantic
roles – certainly not from the ‘top’
• It is necessary to have look at the language data that
can be found in corpora
• There are some tools that can help us with that
Verballex -Valency Frames for
Czech
An example:
Pít:1/drink:1, impf
AG<person:1|animal:1>obl(kdo1)
VERB
SUBS<beverage:1>obl(co4)
- The lexicon of such frames is being built
(presently about 8 000 Czech verbs)
- Approx. 3 000 of them are linked to their
English equivalents in Czech WordNet
Verballex – cont.
• The semantic roles within Verballex are twolevel labels
• General labels like AGENT, PATIENT,
ADDRESSEE, SUBSTANCE, … about 50,
taken from EWN TO
• Lower labels such as human:1, animal:1 –
literals from WordNet – approx. 200 (the list)
• The frames can be used to obtain semantic
classes of verbs
• Our inventory of the roles is based on
WordNet – there are some problems
If we have a look at the real data?
• Verbs like vidět/see – we can see ANYTHING, the
right argument is then ENTITY
• This does not help us too much, we want to
describe what one can really see?
• Corpora and Word Sketches – table for vidět, what
follows from it?
• AG(osoba|zvíře|organizace) – vidět –
PAT(SITUAT{situace, problém, věc, svět, rozdíl}
CAUSE{důvod, příčina, smysl, chyba, nebezpečí}
STARTPOINT{východisko, perspektiva,
budoucnost, možnost}
OBJECT{film, karta, tvář, silueta, obrys, světlo,
spousta, svět})
If we have a look…cont.
• The verb slyšet/hear – Word Sketch table
• AG(osoba|zvíře|organizace|ucho) – slyšet –
PAT(SOUND{hluk, rachot, hukot, hučení, klapot}|
SHOOT{výstřel, střelba, rána}|
VOICE{křik, výkřik, řev, zpěv, pláč, nářek, smích,
volání}|
WORD{slovo}|
NOISE{bzukot<hmyz>, šplouchání<voda>,
pleskání<voda>}|
IDIOM1{trávu růst<neodůvodněné podezření>})
What is to be done?
• The inventory of the s. roles we are using in
Verballex cannot capture semantic nature
of some verb arguments
• It concerns frequent verbs
• We obviously need a better
inventory/ontology for Czech/English verbs
(and others as well – universality of the VFs
• The task – how this can be done using
Word Sketches and possibly semiautomatically?
One Solution – CPA?
•
•
•
•
Do meanings (senses) exist?
Can they be empirically justified?
Can corpus data help us?
An example: how many meanings of the
verb držet are there?
• Meaning potentials (Hanks, Pustejovsky) –
realization through contexts
• Learning from the failure of WSD (up to
80 %)
• What is the cause of the WSD failure?
One Solution – cont.
• WSD people are making a wrong
assumption – meanings exist and can be
enumerated (if we use dictionaries)
• But dictionaries differ and yield different
answers to this question, see držet or get in
English
• What is the rescue?
• Look at the verb contexts, sort them, design
an adequate list of their semantic roles, i. e.
prepare an adequate ontology of s. roles
CPA solution