* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Transcription as
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					KONVENS Wien, 15 Sep 2004 EXMARaLDA – A modeling and visualization framework for the computer-assisted transcription of spoken language Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg Background • Multilingual Database, SFB 538 „Mehrsprachigkeit“, University of Hamburg • EXMARaLDA (Extensible Markup Language for Discourse Annotation) • Dissertation project „Computer-based transcription of spoken language as a modelling and visualisiation process“ (Supervisor: Angelika Storrer) Background • Transcription of spoken language – Interviewer / child interaction – Classroom interaction – Interpreted doctor-patient discourse – for discourse / conversation analysis – for (child) language acquisition studies Background • Problem: Diversity of Transcription Data – Theoretical diversity: • Entities of transcription (utterances, turns, non-verbal activities etc.) • Relations between entities (temporal, hierarchical, features, ...) • Presentation formats (partitur notation, column notation, ...) – Technological diversity: • Storage formats (text, binary, RDB) • Software (syncWriter, HIAT-DOS, DBM-Systems, word processors, ...) • Operating Systems (Windows, MAC OS) Background Background Background • Problem: Diversity of Transcription Data • Aim: A common platform for computerassisted transcription Exchange, reuse, archive transcription data Merge corpora Use different software tools with one piece of data Background • Problem: Diversity of Transcription Data • Aim: A common platform for computerassisted transcription • (Elements of a) Solution XML technology Three level architecture Separate form from content Separate logical from physical structure Topics of this talk 1. Some methodological considerations: Linguistic methods  Computer science methods „Computing in the humanities“ Interdisciplinary communication 2. Components of the developed system Methodological considerations Transcript Transcription as... Quality criteria Computer Transcription as... „Verschriftlichung“ Readability Visualisation Visualisation Visualisation Form Analogue model Application vs. Logical layer Document... Form Form View Form Form Form Theory Established view Adequacy Modified view Modelling Symbolic model E/R model Content Model theory view Database view Text technology view Methodological considerations Transcription as Modeling and Visualization of spoken language  Accordance with text-technological concepts  One model, different visualizations  No tradeoff between readability and adequacy  No tradeoff between human and computer processability  No “Standardization” of models  a common modelling framework, not a common model  no ontological specifications  XML = Standardization of physical representation Visualization to Model Visualization to Model Structural relations: 1. Temporal sequence Visualization to Model Structural relations: 1. Temporal sequence 2. Simultaneity Visualization to Model Structural relations: 1. Temporal sequence 2. Simultaneity 3. Equivalence (Entity  Feature) Visualization to Model Structural relations: 1. Temporal sequence 2. Simultaneity 3. Equivalence (Entity  Feature) 4. Hierarchy (Containment) Modeling framework • Relational?  Sequence? Simultaneity? • OHCO?  Simultaneity? • DAG: Annotation Graphs?  Complexity?  Transcription Graphs System architecture Application: Input tools EXMARaLDA Partitur-Editor Application: Input tools Simple EXMARaLDA Text file Application: Input tools TASX annotator Application: Input tools PRAAT Application: Input tools EUDICO Linguistic Annotator (ELAN) Application: Visualization ... as a wrapped partitur ... as a line transcript ... in column notation Application: Corpus management EXMARaLDA Corpus Manager (COMA) Application: Query/Analysis Search and Query Instrument for EXMARaLDA (SQUIRREL) Project status • Software past beta stage • Five projects at our own institution use EXMARaLDA for their corpus work • Around 800 users in research and teaching outside SFB • Used at the IDS in Mannheim • Submitted a suggestion for integration of data model into P5 of the TEI guidelines Summary Transcription as theory and „Verschriftlichung“  Computer-assisted transcription as modelling and visualisation Interdisciplinary bridge / Methodology of computational techniques in „classical“ linguistics  Concrete practical improvements for work with transcription data EXMARaLDA and Database „Multilingalism“ Data model, formats and tools building on the separation of model and visualisation Fin.