400 Sprache, Linguistik
Refine
Year of publication
Document Type
- Conference Proceeding (33)
- Part of a Book (3)
Language
- English (36)
Has Fulltext
- yes (36)
Keywords
- Computerlinguistik (20)
- Korpus <Linguistik> (14)
- Natürliche Sprache (7)
- Automatische Sprachanalyse (6)
- Information Extraction (6)
- Maschinelles Lernen (6)
- Annotation (5)
- Gesprochene Sprache (5)
- Beleidigung (4)
- Deutsch (4)
- Text Mining (4)
- XML (4)
- Beschimpfung (3)
- Dialog (3)
- Sentimentanalyse (3)
- abusive language (3)
- Anapher <Syntax> (2)
- Automatische Textanalyse (2)
- Daten (2)
- Datenmodell (2)
- Datensatz (2)
- Digital Humanities (2)
- Grammatik (2)
- Lebensmittel (2)
- MMAX (2)
- Meinung (2)
- Nominalphrase (2)
- Pronomen (2)
- Propositionale Einstellung (2)
- Python <Programmiersprache> (2)
- Semantik (2)
- Social Media (2)
- Syntaktische Analyse (2)
- annotation scheme (2)
- it (2)
- natural language processing (2)
- sentiment analysis (2)
- API (1)
- Abfragesprache (1)
- Ambiguität (1)
- Ausrichten <Technik> (1)
- Automatische Klassifikation (1)
- Automatische Spracherkennung (1)
- Code (1)
- Computerunterstützte Kommunikation (1)
- Conversation analysis (1)
- Corpus linguistics (1)
- Crowdsourcing (1)
- Data Science (1)
- Datenanalyse (1)
- Deixis (1)
- Dependenzgrammatik (1)
- Diskurs (1)
- Entscheidungsbaum (1)
- Enzyklopädie (1)
- Experiment (1)
- Feedback (1)
- Forschungsdaten (1)
- Französisch (1)
- Fremdsprachenlernen (1)
- French (1)
- Gamification (1)
- Graphisches Symbol (1)
- HPSG (1)
- Hassrede (1)
- Head-driven phrase structure grammar (1)
- Historische Lexikografie (1)
- Komposition <Wortbildung> (1)
- Kompositum (1)
- Kontrastive Linguistik (1)
- Kontrastive Syntax (1)
- Konversationsanalyse (1)
- Latein (1)
- Lateinunterricht (1)
- Lexikon (1)
- Lyrics <Lyrik> (1)
- Meinungsverb (1)
- Mitschrift (1)
- Morphologie <Linguistik> (1)
- Negation (1)
- Neurolinguistisches Programmieren (1)
- Optische Zeichenerkennung (1)
- Phrasenstruktur (1)
- Phraseologie (1)
- Qualitative Inhaltsanalyse (1)
- SABIO-RK (1)
- Schriftsprache (1)
- Schriftstück (1)
- Segmentierung (1)
- Semasiologie (1)
- Smiley (1)
- Sprachanalyse (1)
- Sprachdaten (1)
- Sprachtypologie (1)
- Sprachwandel (1)
- Text Encoding Initiative (1)
- Uralische Sprachen (1)
- Verbalphrase (1)
- Vergleich <Rhetorik> (1)
- Vokabellernen (1)
- Volltext (1)
- WOrd eMBedding dATabase (WOMBAT) (1)
- Wortschatz (1)
- abusive comparisons (1)
- abusive emojis (1)
- abusive words (1)
- ambiguous words (1)
- anaphoric relations (1)
- annotation tool (1)
- annotation tools (1)
- antecedence (1)
- automatic processing (1)
- biomedical language processing (1)
- bridging relations (1)
- co-training (1)
- complex prepositions (CPs) (1)
- document management and text processing (1)
- document processing (1)
- document triage (1)
- fuck (1)
- implicit abuse (1)
- implicitly abusive comparisons (1)
- implicitly abusive language (1)
- language resources (1)
- life science (1)
- manual database curation (1)
- manual information extraction (1)
- multi-level annotation (1)
- multi-party dialog (1)
- multiword expressions (1)
- opinion verb (1)
- pronoun resolution (1)
- reference resolution (1)
- that (1)
- unrestricted dialog (1)
- word embedding (1)
- word-level alignment (1)
- Übersetzung (1)
Publicationstate
Reviewstate
- Peer-Review (31)
- (Verlags)-Lektorat (1)
- Peer-review (1)
Publisher
- Association for Computational Linguistics (36) (remove)
MULLE is a tool for language learning that focuses on teaching Latin as a foreign language. It is aimed for easy integration into the traditional classroom setting and syllabus, which makes it distinct from other language learning tools that provide standalone learning experience. It uses grammar-based lessons and embraces methods of gamification to improve the learner motivation. The main type of exercise provided by our application is to practice translation, but it is also possible to shift the focus to vocabulary or morphology training.
In this paper, we investigate the practical applicability of Co-Training for the task of building a classifier for reference resolution. We are concerned with the question if Co-Training can significantly reduce the amount of manual labeling work and still produce a classifier with an acceptable performance.
We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main features of the tool, emphasizing its simplicity, customizability and versatility
We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP- and non-NP-antecedents. We present a set of features designed for pronoun resolution in spoken dialogue and determine the most promising features. We evaluate the system on twenty Switchboard dialogues and show that it compares well to Byron’s (2002) manually tuned system.
We present an implemented XML data model and a new, simplified query language for multi-level annotated corpora. The new query language involves automatic conversion of queries into the underlying, more complicated MMAXQL query language. It supports queries for sequential and hierarchical, but also associative (e.g. coreferential) relations. The simplified query language has been designed with non-expert users in mind.
We present an implemented machine learning system for the automatic detection of nonreferential it in spoken dialog. The system builds on shallow features extracted from dialog transcripts. Our experiments indicate a level of performance that makes the system usable as a preprocessing filter for a coreference resolution system. We also report results of an annotation study dealing with the classification of it by naive subjects.
We present an implemented system for the resolution of it, this, and that in transcribed multi-party dialog. The system handles NP-anaphoric as well as discourse-deictic anaphors, i.e. pronouns with VP antecedents. Selectional preferences for NP or VP antecedents are determined on the basis of corpus counts. Our results show that the system performs significantly better than a recency-based baseline.
We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.
We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.