Deutsch
Login
Open Access
Home
Search
Metasearch
Browse
Publish
FAQ
Volltext-Downloads (blau) und Frontdoor-Views (grau)
Schließen
Refine
Author
Benko, Vladimír (1)
(remove)
Year of publication
2019
(1)
Document Type
Conference Proceeding
(1)
Language
English
(1)
Has Fulltext
yes
(1)
Is part of the Bibliography
no
(1)
Keywords
corpus processing (1)
(remove)
Publicationstate
Veröffentlichungsversion
(1)
Reviewstate
Peer-Review
(1)
Publisher
Leibniz-Institut für Deutsche Sprache
(1)
1
search hit
1
to
1
Export
BibTeX
CSV
RIS
10
10
20
50
100
Deduplication in large web corpora
(2019)
Benko, Vladimír
Our paper tries to find answers to some questions related to deduplication process in large-scale web-crawled corpora. An experiment based on eight corpora from the Aranea family is introduced, and first results are presented.
1
to
1