1,104
edits
No edit summary Tag: Reverted |
Jan Ke Tami (talk | contribs) m (Reverted edits by Mods are asleep qazw (talk) to last revision by Jan Ke Tami) Tag: Rollback |
||
Line 1:
{{Extra license|[https://creativecommons.org/publicdomain/zero/1.0/ CC0 1.0]|it mostly consists of
{{Hatnote|This page was previously located at [https://pad.snopyta.org/lDb2EfOZQpmleu-ZktbDzg pad.snopyta.org].}}
== davidar’s metacorpus ==
* type: varied
* quality: varied (i’ve heard jan Kipo tampered with his corpus to conform to his idea of toki pona grammar)
* dialect: mostly old/pu
* size: 4675k ''but'' a bunch of it is in english or duplicated, gotta count it properly eventually
* license: none, some under various cc licenses (247k, mostly by jan Kipo)
* preprocessing: probably a lot
* where: https://github.com/davidar/nltk-tp/tree/master/Corpus
note that this corpus contains the entirety of:
* the [https://github.com/matthewdeanmartin/tokipona.parser/tree/master/TokiPonaTools/TokiPona/corpus/forums TokiPonaTools] corpus (104k)
* the jan Kipo corpus [with english] (3281k)
* the Little Prince translation [with english if you wanna look for it] (54k)
|