Machine translation

From sona pona, the Toki Pona wiki

Machine translation of Toki Pona into another language is relatively easy due to its simple grammar. Going in the other direction, however, is exceptionally difficult, as most words need to be turned into ad-hoc definitions instead of a simple dictionary lookup. This would require a semantic decomposition algorithm.

Word-level translation[edit | edit source]

The simplest way to automatically turn Toki Pona into something resembling another language is to replace each word with its definition. This process may be called glossing or relexing. While it doesn't result in grammatical English text, it can still be "readable enough".

The earliest known public implementation is TokiPana by Sean B. Palmer, with archives reaching as far back as March 2006. It picks a random definition from the Classic Word List, resulting in a different translation each time.

tenpo mute la mi tu li lon poka 

tenpo mute la mi tu li lon poka.

time many it's said I pair double is be present side.
time very it's said we my two duo is be in/at/on be there hip.[note 1]

A more modern implementation is found in ilo Linku's /relex command, which uses the most popular translation in the freely available subset of Toki Pona Dictionary data if available (or the first word of the definition, if the former isn't available).[1]

tenpo mute la mi tu li lon poka.

time plenty if me two is at side.

  1. Notably, some of the words become two definitions each, perhaps due to the line break in the word list not being interpreted as a separator.

Rule-based translation[edit | edit source]

A more advanced approach would be to parse the grammar of the sentence, then reconstruct it in the target language. Likely the first such tool was implemented as "English Gloss" in jan Mato's Toki Pona Parser, once hosted on the now-defunct tokipona.net website (archive link).

ilo Token, another rule-based translator, is being developed by jan Koko.

Machine learning[edit | edit source]

There have been several attempts to create translators for Toki Pona based on neural networks, often relying on the Tatoeba corpus. This would theoretically make translations from Toki Pona more natural, as well as allow for translating other languages into it. However, they often produce incorrect output due to the low amount of data used.

Publicly available attempts include:

Example translations from ckb's models:

tenpo mute la mi tu li lon poka.

We often go hand in hand.

You and I have been through a lot together.

sina en mi li awen lon poka pi mute mute.[note 1]

Some people have attempted to use ChatGPT for translation, with poor results.

tenpo mute la mi tu li lon poka.

Most of the time, my two exist together.

You and I have been through a lot together.

mi en sina li tawa e pona mute lon poka.[note 2]

  1. Literally: You and I stay many-muchly together.
  2. Literally: Me and you move much good nearby.

References[edit | edit source]

  1. (25 October 2021). "ilo/src/ilo/relexer.py at main · lipu-linku/ilo". GitHub. Retrieved 16 January 2024.