Refine
Year of publication
Document Type
- Conference Proceeding (442) (remove)
Has Fulltext
- yes (442)
Keywords
- Korpus <Linguistik> (176)
- Deutsch (106)
- Computerlinguistik (75)
- Annotation (52)
- Automatische Sprachanalyse (44)
- Gesprochene Sprache (34)
- Forschungsdaten (32)
- Datenmanagement (27)
- Metadaten (27)
- Natürliche Sprache (22)
- German (19)
- Information Extraction (19)
- Fremdsprachenlernen (18)
- Digital Humanities (16)
- Englisch (16)
- Text Mining (15)
- Corpus linguistics (14)
- Computerunterstützte Lexikographie (13)
- Französisch (13)
- Maschinelles Lernen (13)
- Syntaktische Analyse (13)
- XML (13)
- corpus linguistics (13)
- Sentimentanalyse (12)
- Infrastruktur (11)
- Auszeichnungssprache (10)
- Corpus technology (10)
- Forschung (10)
- Grammatik (10)
- Polnisch (10)
- Propositionale Einstellung (10)
- Semantik (10)
- Sprachdaten (10)
- Datensatz (9)
- Head-driven phrase structure grammar (9)
- Kontrastive Linguistik (9)
- Text Encoding Initiative (9)
- Texttechnologie (9)
- Lehnwort (8)
- Online-Wörterbuch (8)
- Prosodie (8)
- Semantische Analyse (8)
- Urheberrecht (8)
- Wörterbuch (8)
- Augenfolgebewegung (7)
- Beleidigung (7)
- Blickbewegung (7)
- CLARIN (7)
- Datenqualität (7)
- Experimentelle Psychologie (7)
- Frame-Semantik (7)
- HPSG (7)
- Langzeitarchivierung (7)
- Large corpora (7)
- Standardisierung (7)
- metadata (7)
- Archivierung (6)
- Corpus annotation (6)
- Datenbanksystem (6)
- Dialog (6)
- French (6)
- Information Retrieval (6)
- Lebensmittel (6)
- Lexikografie (6)
- Spanisch (6)
- Sprachverarbeitung (6)
- Syntax (6)
- Verb (6)
- Automatische Sprachverarbeitung (5)
- Computerunterstützte Kommunikation (5)
- Computerunterstützte Lexikografie (5)
- Daten (5)
- Datenbank (5)
- Deutsches Referenzkorpus (DeReKo) (5)
- Forschungsmethode (5)
- Interaktion (5)
- Kollokation (5)
- Phonetik (5)
- Polarität (5)
- Semasiologie (5)
- Sprachstatistik (5)
- Textlinguistik (5)
- corpus processing (5)
- sentiment analysis (5)
- Akustische Phonetik (4)
- Ambiguität (4)
- Audiovisuelles Material (4)
- Automatische Textanalyse (4)
- Beschimpfung (4)
- Computerunterstütztes Verfahren (4)
- Concurrent Markup/Overlap (4)
- Corpus management (4)
- Datenmodell (4)
- Diskurs (4)
- Geräuschverb (4)
- Internet (4)
- Konversationsanalyse (4)
- Metadatenmodell (4)
- Morphologie <Linguistik> (4)
- National corpus (4)
- Natural Language Processing (4)
- Opinion Mining (4)
- Paronym (4)
- Parser (4)
- Pragmatik (4)
- Pronomen (4)
- Präposition (4)
- Recht (4)
- Sakkade (4)
- Sprachvariante (4)
- TEI (4)
- Transkription (4)
- Ukrainisch (4)
- Worthäufigkeit (4)
- abusive language (4)
- language learning (4)
- spoken German (4)
- word embeddings (4)
- Übersetzung (4)
- API (3)
- Algorithmus (3)
- Anapher <Syntax> (3)
- Automatische Spracherkennung (3)
- Bibliografische Daten (3)
- British English (3)
- Bulgarian (3)
- Bulgarisch (3)
- CLARIN-D (3)
- Chatten <Kommunikation> (3)
- Component MetaData Infrastructure (CMDI) (3)
- Component Metadata Infrastructure (CMDI) (3)
- Corpus query language (3)
- Data Mining (3)
- Datenverarbeitung (3)
- Digitale Sprachressourcen (3)
- Diskursanalyse (3)
- Ethik (3)
- Forschungsinfrastruktur (3)
- FrameNet (3)
- Fremdsprache (3)
- Institut für Deutsche Sprache <Mannheim> (3)
- Interoperabilität (3)
- Interview (3)
- Kommunikation (3)
- Komposition <Wortbildung> (3)
- Lexikographie (3)
- Lexikon (3)
- Linguistik (3)
- Mehrsprachigkeit (3)
- Mehrworteinheit (3)
- Nominalphrase (3)
- Normung (3)
- Open Source (3)
- Parlamentsdebatte (3)
- Phraseologie (3)
- Polish (3)
- Politische Sprache (3)
- Rechtschreibung (3)
- Repository <Informatik> (3)
- Semantic Web (3)
- Social Media (3)
- Sozialwissenschaften (3)
- Sprachpolitik (3)
- Terminologie (3)
- Textverstehen (3)
- Thematische Relation (3)
- Tonhöhe (3)
- Visualisierung (3)
- Web corpora (3)
- Wortfeld (3)
- Wortschatz (3)
- Wortstellung (3)
- Zweisprachigkeit (3)
- corpus management (3)
- cross-language differences (3)
- electronic lexicography (3)
- infrastructure (3)
- language resources (3)
- legal issues (3)
- morphology (3)
- personal data (3)
- phonetics (3)
- pitch range (3)
- pitch variation (3)
- prosody (3)
- syllable prominence (3)
- web corpora (3)
- Abfragesprache (2)
- Adjektiv (2)
- Anonymisierung (2)
- Antonym (2)
- Artikulatorische Phonetik (2)
- Benutzeroberfläche (2)
- Bibliothek (2)
- Bibliothekskatalog (2)
- Bildung (2)
- CLARIAH-DE (2)
- CMC (2)
- CMDI (2)
- Creative Commons (2)
- Czech (2)
- DHd2023 (2)
- Dateiformat (2)
- Datenanalyse (2)
- Datenaufbereitung (2)
- Datenerfassung (2)
- Datenerhebung (2)
- Datenformat (2)
- Datenschutz (2)
- Datenschutz-Grundverordnung (2)
- Deutsch als Fremdsprache (2)
- Digitale Daten (2)
- Digitalisierung (2)
- Dokumentation (2)
- Einsprachiges Wörterbuch (2)
- Elektronische Publikation (2)
- Elektronisches Wörterbuch (2)
- Empirische Linguistik (2)
- Enzyklopädie (2)
- Erzähltechnik (2)
- Europäische Union : Datenschutz-Grundverordnung (2)
- Evaluation methodologies (2)
- Experiment (2)
- Faux amis (2)
- Fehleranalyse (2)
- Font (2)
- Formale Semantik (2)
- GDPR (2)
- Geisteswissenschaften (2)
- Hamlet (2)
- Hausa-Sprache (2)
- Hebrew (2)
- Implementation (2)
- Indirekte Rede (2)
- Informationsmanagement (2)
- Kommunikationsstrategie (2)
- Kompositum (2)
- Kontrastive Grammatik (2)
- Korpusanalyseplattform (KorAP) (2)
- Korpustechnologie (2)
- Lautquantität (2)
- Leibniz-Institut für Deutsche Sprache (IDS) (2)
- Lernsoftware (2)
- Lexikalisch funktionale Grammatik (2)
- Literary corpus (2)
- Literaturwissenschaft (2)
- Lizenzvergabe (2)
- Lyrics <Lyrik> (2)
- MMAX (2)
- Mandarin (2)
- Mehrsprachiges Wörterbuch (2)
- Meinungsverb (2)
- Metadata (2)
- Methode (2)
- Modeling (2)
- Morphologie (2)
- Morphology (2)
- Morphosyntax (2)
- Mundart (2)
- Mündliche Kommunikation (2)
- Nominalsyntagma (2)
- Open Science (2)
- Optische Zeichenerkennung (2)
- Partikelverb (2)
- Personenbezogene Daten (2)
- Phraseologismus (2)
- Popmusik (2)
- Portugiesisch (2)
- Programmierung (2)
- Python <Programmiersprache> (2)
- Rechtsfrage (2)
- Rechtsstellung (2)
- Richtlinie (2)
- Russisch (2)
- Schimpfwort (2)
- Schriftsprache (2)
- Schriftstück (2)
- Schwedisch (2)
- Segmentierung (2)
- Sentiment Analyse (2)
- Sentiment Analysis (2)
- Shakespeare, William (2)
- Softwarewerkzeug (2)
- Sprachdidaktik (2)
- Sprachgebrauch (2)
- Sprachkontakt (2)
- Sprachproduktion (2)
- Synonym (2)
- Syntagma (2)
- Tempus (2)
- Terminologiemanagement (2)
- Text Encoding Initiative (TEI) (2)
- Text Technology (2)
- Text-to-Speech (2)
- Textanalyse (2)
- Textgestaltung (2)
- Textsorte (2)
- Trees/Graphs (2)
- Tschechisch (2)
- Valenz <Linguistik> (2)
- Validating (2)
- Verbalphrase (2)
- Vietnamese (2)
- Volltext (2)
- Warlpiri (2)
- Web Services (2)
- WebLicht (2)
- Wikipedia (2)
- Zulu-Sprache (2)
- agent prominence (2)
- agent prototypicality (2)
- annotation scheme (2)
- audiovisual data (2)
- bibliographic metadata (2)
- communicative deviation (2)
- computerunterstützte Lexikographie (2)
- corpora (2)
- corpus infrastructures (2)
- corpus-based (2)
- dependency parsing (2)
- dictionary writing system (2)
- eLexiko (2)
- formal semantics (2)
- genre and register variation (2)
- it (2)
- language models (2)
- lexical borrowings (2)
- lexical database (2)
- lexicon (2)
- long-term archival (2)
- modality (2)
- multi-party dialog (2)
- multilingual lexicography (2)
- multiword expressions (2)
- natural language processing (2)
- non-native speech (2)
- online lexicography (2)
- parser adaptation (2)
- part-of-speech (POS) (2)
- perception (2)
- perception experiment (2)
- read speech (2)
- research infrastructure (2)
- reusability (2)
- semantic role labeling (2)
- semantic roles (2)
- semantic similarity (2)
- serif (2)
- speech corpus (2)
- standardization (2)
- stops (2)
- understudied languages (2)
- visualisation (2)
- українська мова (2)
- Abfrage (1)
- Abweichung (1)
- Active Learning (1)
- Active learning (1)
- Adjective (1)
- Adverb (1)
- Adverbial Noun Phrases (AdvNps) (1)
- Adverbiale (1)
- Affirmation (1)
- Afrikaans (1)
- Afrikanische Sprachen (1)
- Agency-Theorie (1)
- Agens (1)
- Aichinger, Ilse (1)
- Alveolar (1)
- Annotation of discourse relations (DRs) (1)
- Annotator Agreement (1)
- Annotieren (1)
- Antwort (1)
- Anwendung (1)
- Anwendungsbereich (1)
- Anwendungssystem (1)
- Archiv (1)
- Archiv für Gesprochenes Deutsch (AGD) (1)
- Argument (1)
- Argumentation (1)
- Argumentstruktur (1)
- Argumentstrukturgrammatik (1)
- Aspekt <Linguistik> (1)
- Assoziationsmaß (1)
- Auskunftsanspruch (1)
- Ausrichten <Technik> (1)
- Automated information (1)
- Automatische Klassifikation (1)
- Automatische Sprachproduktion (1)
- Baltikum (1)
- Bangante Sprache (1)
- Bantu (1)
- Bedeutung (1)
- Bedeutungsvielfalft (1)
- Begriffsgeschichte <Fach> (1)
- Beispiel (1)
- Benutzerforschung (1)
- Benutzerfreundlichkeit (1)
- Bereinigung (1)
- Best-Practice (1)
- Bibliografie (1)
- Bibliographie (1)
- Bildungspolitik (1)
- Bologna-Prozess (1)
- Bosnian (1)
- Bosnisch (1)
- British National Corpus (1)
- CART (1)
- CELEX (1)
- CLARIN infrastructure (1)
- CMC corpora (1)
- CMDI experiences (1)
- CMDI infrastructure use (1)
- CMDI metadata (1)
- CMDI profile creation (1)
- CTS (1)
- Canonical text services (1)
- Charakterisierung (1)
- Chatbot (1)
- Chunking (1)
- Clarin (1)
- Co-Reference (1)
- CoMParS (1)
- Code (1)
- Collocation analysis (1)
- Comitative case (1)
- Communicative Functions (1)
- Comparable corpora (1)
- Component Metadata Description Infrastructure (1)
- Computational linguistics (1)
- Computerlingustik (1)
- Computerprogramm (1)
- Computerunterstützte Übersetzung (1)
- Computerunterstützter Unterricht (1)
- Computerunterstütztes Lernen (1)
- Concurrency (1)
- Constraint-Erfüllung (1)
- Contrastive linguistics (1)
- Controlled Natural Language (CNL) (1)
- Conversation analysis (1)
- Conversational Feedback (1)
- Copyright (1)
- Coreference (1)
- Corpora (Linguistics) (1)
- Corpus Pattern Analysis (1)
- Croatian (1)
- Crowdsourcing (1)
- DARIAH-DE (1)
- DKPro repository (1)
- DMPTY (1)
- DRs in spoken and written genres (1)
- DSSSL (1)
- DaF-Unterricht (1)
- DaZ-Unterricht (1)
- Darmstadt Knowledge Processing Software Repository (1)
- Data Architecture (1)
- Data Augmentation (1)
- Data Formats (1)
- Data Science (1)
- Database Management Systems (1)
- Datenauswertung (1)
- Datenbank für Gesprochenes Deutsch (1)
- Datendomäne Sammlungen (1)
- Datenstruktur (1)
- Datenverwaltung (1)
- Dativ (1)
- Deixis (1)
- Demokratie (1)
- Demokratisierung (1)
- Dependency Parsing (1)
- Deutscher Referenzkorpus (DeReKo) (1)
- Deutsches Spracharchiv (1)
- Deutsches Textarchiv (1)
- Deutschland (1)
- Deutschland. Deutscher Bundestag (1)
- Dialektologie (1)
- Digital Library (1)
- Digitale Edition (1)
- Digitale Forschungsdaten (1)
- Digitale Lehre (1)
- Digitale Werkzeuge (1)
- Digitaler Sprachassistent (1)
- Digitales Wörterbuch der deutschen Sprache (1)
- Diminutiv (1)
- Disambiguation (1)
- Diskurssemantik (1)
- Distribution <Linguistik> (1)
- Distributional semantics (1)
- Document Images (1)
- Dokument (1)
- Dokumentenserver (1)
- Dokumentverarbeitung (1)
- Domain-specific Relation Extraction (1)
- Dortmunder Chat-Korpus (1)
- Dublin Core (1)
- Dzongkha (1)
- E-Learning (1)
- E-Science (1)
- ERP (1)
- Editor (1)
- Educational software (1)
- Effects (1)
- Eigentumsrecht (1)
- Einbettung <Linguistik> (1)
- Elektronisches Forum (1)
- Emotion (1)
- Endlicher Zustandsraum (1)
- Entlehnung (1)
- Entscheidungsbaum (1)
- Erlebte Rede (1)
- Erzähltheorie (1)
- Erzählung (1)
- Etymologie (1)
- Europa (1)
- European Reference Corpus (EuReCo) (1)
- Europäische Kommission. Digital Single Market (1)
- Europäisierung (1)
- Evaluation (1)
- Expertenmeinung (1)
- FAIR (1)
- FAIR data (1)
- FAIR data principles (1)
- FML (1)
- FO prediction (1)
- FORGE2021 (1)
- Fachkommunikation (1)
- Fachsprache (1)
- Fallstudie (1)
- Feedback (1)
- Feldforschung (1)
- Filmkritik (1)
- Fokus <Linguistik> (1)
- Food Domain (1)
- Forensische Linguistik (1)
- Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) (1)
- Forschungsdateninfrastruktur (1)
- Forschungsdatenmanagement (1)
- Forschungseinrichtung (1)
- Forschungsprojekt (1)
- Frage (1)
- Freie Variation (1)
- Fremdsprachenunterricht (1)
- French-German (1)
- Fugenelement (1)
- Funktionelle Kernspintomografie (1)
- Futur (1)
- GDE-V (1)
- GOLD standard (1)
- Gamification (1)
- Gebrauchsbasiertheit (1)
- Gefangenenliteratur (1)
- GeoBib (1)
- Geoinformationssystem (1)
- Germaans (1)
- German Reference Corpus (DeReKo) (1)
- German data (1)
- German spoken language (1)
- German, Italian, Spanish (1)
- Germanic (1)
- Germanische Sprachen (1)
- Germanistik (1)
- Geschichte (1)
- Geschichtskarte (1)
- Gespräch (1)
- Gesture (1)
- Gitksan-Sprache (1)
- Graded Tense (1)
- Grammatikalisation (1)
- Grammis (1)
- Graphisches Symbol (1)
- Griechisch (1)
- Haftung (1)
- Handlung <Literatur> (1)
- Hassrede (1)
- Hausa (1)
- Hebräisch (1)
- Hermeneutik (1)
- Higher Education (1)
- Hilfesystem (1)
- Historische Sprachwissenschaft (1)
- Hochliteratur (1)
- Homonym (1)
- Hyperkorrektur (1)
- Hypertext (1)
- Höheres Bildungswesen (1)
- IP Rights (1)
- ISO-Norm (1)
- ISO/TEI (1)
- ISOcat (1)
- Ideologie (1)
- Indexierung <Inhaltserschließung> (1)
- Indikator (1)
- Indirekte Anapher (1)
- Information (1)
- Information Science (1)
- Informationsintegration (1)
- Informationssystem (1)
- Informationstheorie (1)
- Informationsverarbeitung (1)
- Integer Linear Program (1)
- Integration (1)
- Intensivierung (1)
- Interaktionsanalyse (1)
- International Corpus of English (1)
- Interoperability of annotation schemes (1)
- Interrelated document grammars (1)
- Intertextuality (1)
- Intertextualität (1)
- Interviewter (1)
- Intonation (1)
- Irisch (1)
- Italienisch (1)
- Jugendsprache (1)
- Kapitalismus (1)
- Kategorisierung (1)
- Kausalität (1)
- Kiezdeutsch (1)
- Klammer / Linguistik (1)
- Kochbuch (1)
- Kognitive Linguistik (1)
- Kollokationsforschung (1)
- Komitativ <Kasus> (1)
- Kommunikationsstörung (1)
- Kommunikationsstörungen (1)
- Kommunikative Abweichungen (1)
- Kompensation (1)
- Kompositum <Wortbildung> (1)
- Konditional (1)
- Konferenz (1)
- Konfigurationsmanagement (1)
- Konflikt (1)
- Konjunktion (1)
- Kontextanalyse (1)
- Kontrastive Lexikologie (1)
- Kontrastive Semantik (1)
- Konversation (1)
- Kooperation (1)
- KorAP (1)
- Korpusaufbereitung (1)
- Korpusvergleich (1)
- Korrektur (1)
- Kroatisch (1)
- Kymrisch (1)
- Künstliche Intelligenz (1)
- LIVE-Data (1)
- LR infrastructures and architectures (1)
- LRTwiki (1)
- Labeling approach (1)
- Large Corpora (1)
- Laryngal (1)
- Latein (1)
- Lateinunterricht (1)
- Lautstärke (1)
- Lehnwortportal Deutsch (LWPD) (1)
- Lehre (1)
- Lehrmaterial (1)
- Lelxikographie (1)
- Lemma (1)
- Lernerlexikographie (1)
- Lernerwörterbuch (1)
- Lesekompetenz (1)
- Lettgallen (1)
- Lettisch (1)
- Lexem (1)
- Lexical Database (1)
- Lexical Functional Grammar (LFG) (1)
- Lexical functional grammar (1)
- Lexicographically interpreted information (1)
- Lexicon (1)
- Lexikostatistik (1)
- License (1)
- Liedtext (1)
- Likelihood-Quotienten-Test (1)
- Lindenberg, Udo (1)
- Linguistic Landscape (1)
- Linguistic Retrieval (1)
- Linguistische Analyse (1)
- Linguistische Datenverarbeitung (1)
- Linked Open Data (1)
- Linked Data (1)
- Literarische Gestalt (1)
- Literatur (1)
- Long-Term Archiving (1)
- MARC 21 (1)
- META-SHARE (1)
- MLSA (1)
- MTAS (1)
- Machine learning (1)
- Machine translating (1)
- Manipulation (1)
- Mann, Thomas (1)
- Markup Languages (1)
- Markup Languages & Programming (1)
- Maschinelle Sprachverarbeitung (1)
- Maschinelle Übersetzung (1)
- Meinung (1)
- Merkel, Angela (1)
- Metadata Management (1)
- Metapher (1)
- Methoden (1)
- Methodik (1)
- Methodologie (1)
- Militär (1)
- Minderheitensprache (1)
- Mitschrift (1)
- Modality (1)
- Modalität <Linguistik> (1)
- Modalpartikel (1)
- Modalverb (1)
- Modus (1)
- Monitorkorpus (1)
- Morph Moulder (MoMo) (1)
- Morphem (1)
- Morphemanalyse (1)
- Multi- Word Patterns (1)
- Multi-layer Annotation (1)
- Multi-modality (1)
- Multikulturelle Gesellschaft (1)
- Multilingual corpora (1)
- Multilingual corpus (1)
- Multimedia (1)
- Multimodalität (1)
- Multiple annotations (1)
- MySQL (1)
- N-Gram (1)
- N400 (1)
- NFDI (1)
- NFDI section (1)
- NLP pipeline (1)
- NaLiDa (1)
- Nachhaltigkeit (1)
- Namespaces (1)
- Narrative (1)
- Nationalbewusstsein (1)
- Nationale Forschungsdateninfrastruktur (NFDI) e.V. (1)
- Nationalsozialistische Verbrechen (1)
- Natural language processing (1)
- Negation (1)
- Neue Medien (1)
- Neurolinguistisches Programmieren (1)
- Nominalisierung (1)
- Nordchinesisch (1)
- Nordsotho (1)
- Normdatei (1)
- Nutzungsrecht (1)
- OCR (1)
- OCR-Schrift (1)
- OTRS (1)
- Objekt <Linguistik> (1)
- Online dictionary (1)
- Online-Dienst (1)
- Online-Grammatiken (1)
- Online-Wortschatz-Informationssystem Deutsch (OWID) (1)
- Onomasiologie (1)
- Ontologie <Wissensverarbeitung> (1)
- Ontology (1)
- Open Access (1)
- Open Data (1)
- Opinion Inference (1)
- Organisation (1)
- Ortsadverb (1)
- Ortsverteilt (1)
- Paradigma (1)
- Parallel corpora (1)
- Paronymie (1)
- Parsing (1)
- Parsing Systems (1)
- Part-of-Speech-Tagging (1)
- Part-of-Speech-Tagging = POS (1)
- Particle Verbs (1)
- Partikel (1)
- Parts of speech (1)
- Persistent identifier (1)
- Personalpronomen (1)
- Persönlichkeitsrecht (1)
- Perzeptionsexperiment (1)
- Phonatory behavior (1)
- Phonologie (1)
- Phrasenstrukturgrammatik (1)
- Pitch Range (1)
- Pleonastic Prepositions (1)
- Plural Comitative Construction (PCC) (1)
- Polish dialectology (1)
- Politische Kultur (1)
- Politische Rede (1)
- Polysem (1)
- Poroschenko, Petro (1)
- Possessivpronomen (1)
- Pragmatikalisierte Mehrworteinheit (1)
- Privatheit (1)
- Privatsphäre (1)
- Processing (1)
- Progressiv (1)
- Projektalltag (1)
- Projektplanung (1)
- Prolog (1)
- Prosody Transplantation (1)
- Prototyp <Linguistik> (1)
- Prädikat (1)
- Prädikatives Adjektiv (1)
- Präteritum (1)
- Pseudonymisierung (1)
- QUEST (1)
- QUEST project (1)
- Qualitative Inhaltsanalyse (1)
- Qualitätssicherung (1)
- Quantifier Restriction (1)
- Quantitative Linguistik (1)
- Query Languages (1)
- Reanalyse (1)
- Rechercheschnittstelle (1)
- Rechtsschutz (1)
- Redaktionssystem (1)
- Reduktionsform (1)
- Reduktionsprozess (1)
- Referenz <Linguistik> (1)
- Reflexitität <Linguistik> (1)
- Register <Linguistik> (1)
- Reibelaut (1)
- Relation extraction (1)
- Reproduzierbarkeit (1)
- Research infrastructures (1)
- Rezeption (1)
- Rhetorik (1)
- Rockmusik (1)
- Romanheft (1)
- Rumänisch (1)
- Rückmeldung (1)
- Rēzekne (1)
- SABIO-RK (1)
- SALSA (1)
- SALSA corpus (1)
- SGML (1)
- SKOS (1)
- SOA (1)
- Satz (1)
- Satzrahmen (1)
- Satzsemantik (1)
- Schema Languages (1)
- Schnittstelle Konstruktionsgrammatik – Phraseologie (1)
- Schulbuch (1)
- SciLogs (1)
- Second Language Learning (1)
- SemEval (1)
- Semantic Analysis (1)
- Semantic Interoperability (1)
- Semantic similarity (1)
- Semantics (1)
- Semantisches Netz (1)
- Semiotik (1)
- SentiFrameNet (1)
- Sentiment analysis (1)
- Sequentialanalyse (1)
- Serbian (1)
- Serbisch (1)
- Server (1)
- Serviceorientierte Architektur (1)
- Sketch engine (1)
- Skript <Programm> (1)
- Slavic languages (1)
- Slavische Sprachen (1)
- Slovak (1)
- Slowakisch (1)
- Smiley (1)
- Software (1)
- Softwarewiederverwendung (1)
- Sotho-Sprache (1)
- Soziale Software (1)
- Sozialpolitik (1)
- Speech Corpora (1)
- Speech Lexica (1)
- Spoken Language Data (1)
- Sport (1)
- Sport-Interview (1)
- Sprachanalyse (1)
- Sprachanfragen (1)
- Sprache (1)
- Spracherwerb (1)
- Sprachgeschichte (1)
- Sprachtypologie (1)
- Sprachverstehen (1)
- Sprachwandel (1)
- Sprachübersetzung (1)
- Sprechakt (1)
- Sprecherwechsel (1)
- Stadtmundart (1)
- Standard (1)
- Standardsprache (1)
- Statistisches Modell (1)
- Stimmgebung (1)
- Studiengang (1)
- Subjectivity (1)
- Subjekt <Linguistik> (1)
- Subjektivität (1)
- Suchmaschine (1)
- Summary (1)
- Swedish (1)
- TEI XML (1)
- TEI encoding (1)
- TEI-Lex0 (1)
- Tagging (1)
- Taktik (1)
- Technologie (1)
- Temporal Reference (1)
- Temporaladverb (1)
- Tenseless Languages (1)
- Terminologiedatenbank (1)
- Testdaten (1)
- Text (1)
- Text data (1)
- Text mining (1)
- Text retrieval (1)
- Text+ (1)
- TextGrid (1)
- Textbaustein (1)
- Textklassifikation (1)
- Textkorpus (1)
- Textproduktion (1)
- Textstruktur (1)
- Textverarbeitung (1)
- Thema-Rhema-Gliederung (1)
- Thematische Rolle (1)
- Topikmodellierung (1)
- Tourismus (1)
- Transitives Verb (1)
- Treebank (1)
- Treebanks (1)
- Tweet (1)
- Twitter <Softwareplattform> (1)
- Typologie (1)
- UIMA (1)
- Ukrainian (1)
- Ukrainian language (1)
- Ukrainian national identity (1)
- Ungarisch (1)
- Universalgrammatik (1)
- Universitätsbibliothek (1)
- Unterricht (1)
- Unterrichtsmethode (1)
- Uralische Sprachen (1)
- Usability (1)
- Valences (1)
- Variation (1)
- Verbalagression (1)
- Verbale Äußerung (1)
- Vereinheitlichung (1)
- Vergleich <Rhetorik> (1)
- Vergleichende politische Wissenschaft (1)
- Very Large Corpora (1)
- Videointerview (1)
- Vietnamesisch (1)
- Virtual Language Observatory (VLO) (1)
- Virtuelle Forschungsumgebungen (1)
- Virtuelle Hochschule (1)
- Vokabellernen (1)
- WOrd eMBedding dATabase (WOMBAT) (1)
- WSD (1)
- Wahrnehmung (1)
- Walbiri-Sprache (1)
- Walisisch (1)
- Web corpus (1)
- Web spam (1)
- Weblog (1)
- Weißrussisch (1)
- Welsh (1)
- Wiederholung (1)
- Wikipedia articles (1)
- Wissenschaftssprache (1)
- Wissenserwerb (1)
- Wissensextraktion (1)
- Wissenstechnik (1)
- Wissensverarbeitung (1)
- Word associations (1)
- WordNet (1)
- Wortbildung (1)
- Wortverbindung (1)
- XForms (1)
- XML database (1)
- XQuery (1)
- Zeitsemantik (1)
- Zertifizierung (1)
- Zulu (1)
- Zustandsverb (1)
- Zuverlässigkeit (1)
- Zweisprachiges Wörterbuch (1)
- Zweitsprache (1)
- abusive comparisons (1)
- abusive emojis (1)
- abusive remarks (1)
- abusive words (1)
- acceptability ratings (1)
- acoustic analysis (1)
- acoustic correlates (1)
- adjectives (1)
- agent role (1)
- agentivity effect (1)
- ambiguous words (1)
- anaphoric relations (1)
- annotated corpora (1)
- annotation tool (1)
- annotation tools (1)
- anonymisation (1)
- antecedence (1)
- application (1)
- application domain (1)
- arbitrary scripts (1)
- archiving support (1)
- archiving workflow (1)
- artificial intelligence (1)
- aspect (1)
- authority records (1)
- automatic processing (1)
- bibliographic database (1)
- bilingual electronic dictionaries (1)
- bilingual paronyms (1)
- biomedical language processing (1)
- borrowing (1)
- bound word (1)
- bridging relations (1)
- bridging resolution (1)
- categorisation (1)
- clitic climbing (1)
- cluster analysis (1)
- cmc corpora (1)
- co-training (1)
- cognitive lexicography (1)
- collocation (1)
- collocation analysis (1)
- colonial language contact (1)
- combinatoric semantics (1)
- commonly confused words (1)
- communicative strategy (1)
- comparable corpora (1)
- complex preposition (1)
- complex prepositions (CPs) (1)
- computational linguistics (1)
- computer-assisted language learning (1)
- computer-mediated communication (1)
- computer-mediated communication (CMC) (1)
- conceptual history (1)
- conditionals (1)
- constraint solving (1)
- contrastive lexicography (1)
- copyright (1)
- corpus analysis (1)
- corpus annotation (1)
- corpus exploitation (1)
- corpus pragmatics (1)
- corpus retrieval (1)
- corpus reusability (1)
- corpus semantics (1)
- corpus storage (1)
- cross-linguistic data (1)
- cross-national policy convergence (1)
- crosswalks (1)
- data (1)
- data curation (1)
- data exploration (1)
- data migration (1)
- data quality (1)
- data repositories (1)
- database (1)
- database systems (1)
- deduplication (1)
- deep-structure morphological analyses (1)
- dialect lexicography (1)
- dialectometry (1)
- dialektometrie (1)
- dictionaries (1)
- dictionary design (1)
- dictionary encoding (1)
- dictionary of language contact (1)
- digital humanities (1)
- discourse (1)
- discourse processing (1)
- discourse semantics (1)
- distributional semantics (1)
- document management and text processing (1)
- document processing (1)
- document triage (1)
- duration prediction (1)
- e-dictionary (1)
- easily confused words (1)
- encyclopedic-conceptual approach (1)
- error collection (1)
- exclusive particles (1)
- exploration of CMDI metadata (1)
- extralexicographic features (1)
- false friends (1)
- feedback (1)
- finite state (1)
- finite state tokenization (1)
- first person plural pronouns (1)
- fonologie (1)
- format migration (1)
- fuck (1)
- gesprochene Sprache (1)
- gesture (1)
- global structural information (1)
- gradable adjectives (1)
- grammar (1)
- grammar-based language learning (1)
- grammatische Terminologie (1)
- grammis (1)
- graph database (1)
- help desk (1)
- high-variability training (1)
- higher education research (1)
- historical corpora (1)
- historical lexicography (1)
- history of science (1)
- identity groups (1)
- ideology (1)
- idiom detection (1)
- idiosyncrasy (1)
- implicit abuse (1)
- implicitly abusive comparisons (1)
- implicitly abusive language (1)
- information theory (1)
- integrated e-dictionary (1)
- inter-rater variability (1)
- internet lexicography (1)
- interoperability (1)
- intersemiotic translation adequacy (1)
- intra-rater variability (1)
- justification (1)
- keyphrase extraction (1)
- knowledge sources (1)
- kontrastive Linguistik (1)
- korpusbasierte Phraseologie (1)
- language contact (1)
- language corpora (1)
- language data (1)
- language documentation (1)
- language technology (1)
- large corpora (1)
- large corpus data (1)
- learner corpora (1)
- learner corpus (1)
- learner's dictionary (1)
- legal aspects (1)
- less-resourced languages (1)
- lexical borrowings (1)
- lexical data (1)
- lexicography (1)
- lexicology (1)
- life science (1)
- likelihood ratio test (1)
- linguistic research software (1)
- linguistically based measures (1)
- linked data (1)
- loanword lexicography (1)
- locally uninstantiated arguments (1)
- machine learning (1)
- machine translation (1)
- manual database curation (1)
- manual information extraction (1)
- markup language (1)
- mehrdeutige Ausdrucke (1)
- metadata editor (1)
- metadata formats (1)
- metadata quality (1)
- metadata quality assessment (1)
- metadata score (1)
- metadata standards (1)
- methodology (1)
- methodology of lexicography (1)
- microservices (1)
- mobile devices (1)
- modal meaning (1)
- modellbasiertes inkrementelles Knowledge Engineering (1)
- monospaced font (1)
- morfologie (1)
- morphological analyses (1)
- multi-layer annotation (1)
- multi-layer corpora (1)
- multi-level annotation (1)
- multi-lingual grammar (1)
- multidimensional scaling (1)
- multidimensionele skalering (1)
- multilingual corpora (1)
- multilingual grammar (1)
- multilingual platform (1)
- multilinguality (1)
- mysql (1)
- n-grams (1)
- narrative (1)
- narrative comparison (1)
- national corpora (1)
- native speech (1)
- normalization (1)
- onomasiological search (1)
- opinion inference (1)
- opinion mining (1)
- opinion role extraction (1)
- opinion verb (1)
- opinion verbs (1)
- oral corpora (1)
- oral corpus platform (1)
- oral language (1)
- orthography (1)
- paradigm uniformity (1)
- parallel corpora (1)
- paronyms (1)
- paronyms, easily confused words (1)
- parser evaluation (1)
- perceptual evaluation (1)
- persistent identifiers (1)
- phonetic databases (1)
- phonological status (1)
- phonological word (1)
- phonology (1)
- policy convergence (1)
- political video interview (1)
- pop lyrics (1)
- possessives (1)
- postlexical processes (1)
- posture verb (1)
- predicative adjectives (1)
- prediction error (1)
- preposition-pronoun contraction (PPC) (1)
- primary research data repository (1)
- priming (1)
- privacy (1)
- probabilistic approach (1)
- processing pipeline (1)
- product feature extraction (1)
- project report (1)
- prominence (1)
- pronoun resolution (1)
- proportional font (1)
- pseudo-coordination (1)
- pseudonymisation (1)
- quality checking (1)
- quality evaluation (1)
- quantitative quality metrics (1)
- query (1)
- query building (1)
- raising (1)
- rating scales (1)
- reading speed (1)
- reference corpora (1)
- reference resolution (1)
- regional variation (1)
- register variation (1)
- relational database (1)
- reply relations (1)
- research data management (1)
- research literature (1)
- resources (1)
- rhetorical device (1)
- sans-serif (1)
- schema.org (1)
- schwa (1)
- second language learning (1)
- semantic classification (1)
- semantic information management (1)
- semantic interoperability (1)
- semantic web (1)
- semantische Analyse (1)
- semiotic of dictionaries (1)
- sentence processing (1)
- sentience (1)
- separation of adjectives (1)
- shared task (1)
- sharing data (1)
- sintaksis (1)
- sitzen <Wort> (1)
- social media interaction (1)
- software (1)
- software quality management (1)
- space-delimited languages (1)
- speech data (1)
- speech database (1)
- speech technology (1)
- spoken corpora (1)
- spoken language (1)
- spoken language corpora (1)
- spoken language data (1)
- spoken vs. written (1)
- standard (1)
- standards for LRs (1)
- stehen <Wort> (1)
- sub-grammar extraction (1)
- subjectivity (1)
- sustainable archives (1)
- syllable (1)
- syllable duration (1)
- symbolic prosody prediction (1)
- syntax (1)
- tagging (1)
- text production (1)
- text-to-speech (1)
- that (1)
- theory of lexicography (1)
- tipologie (1)
- tokenization (1)
- top-down (1)
- topic models (1)
- tourism (1)
- transcription (1)
- translation exercises (1)
- transnational communication (1)
- treebanks (1)
- trosanalise (1)
- turn taking (1)
- typology (1)
- under-resourced language varieties (1)
- unrestricted dialog (1)
- urban youth language (1)
- usability (1)
- user guidance (1)
- user interface (1)
- user satisfication (1)
- user support (1)
- user survey (1)
- variasie (1)
- variation (1)
- video interview (1)
- virtual collections (1)
- visualisering (1)
- visualization (1)
- vocabulary of quotation expressions (1)
- web application (1)
- web-based information system (1)
- wir (1)
- word (1)
- word embedding (1)
- word predictability (1)
- word senses (1)
- word structure (1)
- word trees (1)
- word-level alignment (1)
- Ähnlichkeitssuche (1)
- Äquivalenztheorien (1)
- Österreich (1)
- Übersetzungswissenschaft (1)
- відеоінтерв’ю (1)
- комунікативна девіація (1)
- комунікативна невдача (1)
- комунікативна стратегія (1)
- німецька мова (1)
- політичне телеінтерв’ю (1)
- українська національна ідентичність (1)
Publicationstate
- Veröffentlichungsversion (442) (remove)
Reviewstate
- Peer-Review (262)
- (Verlags)-Lektorat (112)
- Review-Status-unbekannt (6)
- Peer-review (5)
- Verlags-Lektorat (1)
Publisher
- Association for Computational Linguistics (41)
- European Language Resources Association (ELRA) (37)
- European Language Resources Association (23)
- Institut für Deutsche Sprache (17)
- Lexical Computing CZ s.r.o. (12)
- Linköping University Electronic Press (12)
- Zenodo (12)
- CLARIN (10)
- International Speech Communication Association (9)
- Leibniz-Institut für Deutsche Sprache (9)
Ungoliant: An optimized pipeline for the generation of a very large-scale multilingual web corpus
(2021)
Since the introduction of large language models in Natural Language Processing, large raw corpora have played a crucial role in Computational Linguistics. However, most of these large raw corpora are either available only for English or not available to the general public due to copyright issues. Nevertheless, there are some examples of freely available multilingual corpora for training Deep Learning NLP models, such as the OSCAR and Paracrawl corpora. However, they have quality issues, especially for low-resource languages. Moreover, recreating or updating these corpora is very complex. In this work, we try to reproduce and improve the goclassy pipeline used to create the OSCAR corpus. We propose a new pipeline that is faster, modular, parameterizable, and well documented. We use it to create a corpus similar to OSCAR but larger and based on recent data. Also, unlike OSCAR, the metadata information is at the document level. We release our pipeline under an open source license and publish the corpus under a research-only license.
Preface
(2019)
Preface
(2020)
The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is still ample scope for improvement regarding computational approaches. We assume that idiomatic constructions can be characterized by gradual intensities of semantic non-compositionality, formal fixedness, and unusual usage context, and introduce a number of measures for these characteristics, comprising count-based and predictive collocation measures together with measures of context (un)similarity. We evaluate our approach on a manually labelled gold standard, derived from a corpus of German pop lyrics. To this end, we apply a Random Forest classifier to analyze the individual contribution of features for automatically detecting idioms, and study the trade-off between recall and precision. Finally, we evaluate the classifier on an independent dataset of idioms extracted from a list of Wikipedia idioms, achieving state-of-the art accuracy.
In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting idioms in a dataset consisting of random ngram samples. The classification is done by implementing a rather shallow, statistics-based pipeline without intensive preprocessing and examinations on the morphosyntactic and semantic level. We describe the overall approach, the differences between the original dataset and the dataset of the KONVENS task, provide experimental classification results, and analyse the individual contributions of our feature sets.
This study investigates cross-language differences in pitch range and variation in four languages from two language groups: English and German (Germanic) and Bulgarian and Polish (Slavic). The analysis is based on large multi-speaker corpora (48 speakers for Polish, 60 for each of the other three languages). Linear mixed models were computed that include various distributional measures of pitch level, span and variation, revealing characteristic differences across languages and between language groups. A classification experiment based on the relevant parameter measures (span, kurtosis and skewness values for pitch distributions for each speaker) succeeded in separating the language groups.
This study presents the results of a large-scale comparison of various measures of pitch range and pitch variation in two Slavic (Bulgarian and Polish) and two Germanic (German and British English) languages. The productions of twenty-two speakers per language (eleven male and eleven female) in two different tasks (read passages and number sets) are compared. Significant differences between the language groups are found: German and English speakers use lower pitch maxima, narrower pitch span, and generally less variable pitch than Bulgarian and Polish speakers. These findings support the hypothesis that inguistic communities tend to be characterized by particular pitch profiles.
Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.
In this paper, we describe a data processing pipeline used for annotated spoken corpora of Uralic languages created in the INEL (Indigenous Northern Eurasian Languages) project. With this processing pipeline we convert the data into a loss-less standard format (ISO/TEI) for long-term preservation while simultaneously enabling a powerful search in this version of the data. For each corpus, the input we are working with is a set of files in EXMARaLDA XML format, which contain transcriptions, multimedia alignment, morpheme segmentation and other kinds of annotation. The first step of processing is the conversion of the data into a certain subset of TEI following the ISO standard ’Transcription of spoken language’ with the help of an XSL transformation. The primary purpose of this step is to obtain a representation of our data in a standard format, which will ensure its long-term accessibility. The second step is the conversion of the ISO/TEI files to a JSON format used by the “Tsakorpus” search platform. This step allows us to make the corpora available through a web-based search interface. As an addition, the existence of such a converter allows other spoken corpora with ISO/TEI annotation to be made accessible online in the future.
This paper presents the QUEST project and describes concepts and tools that are being developed within its framework. The goal of the project is to establish quality criteria and curation criteria for annotated audiovisual language data. Building on existing resources developed by the participating institutions earlier, QUEST develops tools that could be used to facilitate and verify adherence to these criteria. An important focus of the project is making these tools accessible for researchers without substantial technical background and helping them produce high-quality data. The main tools we intend to provide are the depositors’ questionnaire and automatic quality assurance, both developed as web applications. They are accompanied by a Knowledge base, which will contain recommendations and descriptions of best practices established in the course of the project. Conceptually, we split linguistic data into three resource classes (data deposits, collections and corpora). The class of a resource defines the strictness of the quality assurance it should undergo. This division is introduced so that too strict quality criteria do not prevent researchers from depositing their data.