From sona pona, the English–Toki Pona wiki
Under construction: This page needs work. If you know about this topic, you can help us by editing it. (See all)
Logo of the Unicode Consortium

Unicode (often tokiponized as nasin Juniko) is a text encoding standard designed to support every major writing system, avoiding the incompatibilities of character sets. Each character in Unicode is assigned a codepoint, which is often written as "U+" followed by its index in hexadecimal. Most text on the Internet is encoded in Unicode.

Toki Pona[edit | edit source]

As of 2023, Unicode does not include any Toki Pona writing systems. Many tokiponists hope for sitelen pona and sitelen sitelen to eventually receive Unicode support. In the meantime, sitelen pona has been specified for the UCSUR, adding unofficial support within a Private Use Area of Unicode.

Proposal roadmap[edit | edit source]

Several issues may have to be resolved before Toki Pona is proposed for Unicode. These have been discussed in the #nasin-Juniko thread (under the #pali-musi channel) of ma pona pi toki pona Discord server. Potential issues include:

  • Recency. While the ISO 639-3 code ought to relieve concerns that Toki Pona itself is transient, the words and features used to write it are still in flux, and might need time to become more static before a Unicode proposal is feasible. For comparison, the Deseret and Shavian alphabets have a longer history and seem to be completely stable. Shavian only dates to around 1960 and was encoded in 2003. A naïve comparison places Toki Pona encoding in the 2040s or 2050s, although its scripts will probably continue to be used online much more than comparable constructed scripts in Unicode, eliciting sooner demand—perhaps as long as usage continues into the 2030s?
  • Font standardization. Many fonts support different features and sets of characters, and implement them in different ways.
    • In sitelen pona, the UCSUR has resolved this to an extent, yet some common features remain unstandardized, such as directional ni, te to, and alternative glyphs (possibly handled in OpenType?). It is also unclear whether to handle spaces with the halfwidth (U+0020) or ideographic space (U+3000). Also, there may be established UCSUR codepoints for features that should be handled in OpenType instead of Unicode.
    • sitelen sitelen has not been properly implemented into a font due to its nonlinear rendering, which would require more involved research and discussion with the Unicode Consortium, and may be considered too unusual to support.
  • nimi sin. Which words ought to be encoded beyond those in the UCSUR is a point ripe for debate.
    • While the Linku usage data seems useful for determining a cutoff point, many sub-widespread words have seen notable use and may be necessary to encode existing documents.
    • The prospect of adding nimi sin over time, as they meet some criteria for inclusion, is evocative of new emoji being added to account for the limitations of the original set. The Unicode Consortium may be motivated to avoid a similar situation by avoiding or postponing support for Toki Pona. If not, there could be a system outside of Unicode for encoding nimi sin, possibly using a new UCSUR block.[1] This would come at the expense of complicating the encoding situation rather than submitting one proposal and being done with it.
    • The encoding of different scripts may become inconsistent if their nimi sin glyphs are encoded in different orders.
  • Commerciality. While the Toki Pona "logo" toki-pona would not receive its own codepoint, it occurs as part of sitelen pona pu. Moreover, pu and ku refer to and graphically represent commercial products that are not fully in the public domain. (This is not an issue in sitelen sitelen, which writes pu and ku as generic syllable glyphs instead.)

References[edit | edit source]

English Wikipedia has an article on
  1. jan Pensa [@jpensa]. (2 November 2023). [Pinned message posted in the #nasin-Juniko thread in the #pali-musi channel in the ma pona pi toki pona Discord server]. Discord.

    My idea for encoding rare nimi sin that fonts still want to support, was to basically have two separate standards. Here's how I'd want that to look like in (hopefully) 1 or 2 years:

    In the actual Unicode standard we want the 137 nimi ku suli, and as many other common words as we and the Unicode Consortium feel comfortable giving a permanent codepoint. ("majuna" and "apeja" probably yes, "Pingo" almost definitely no)

    Then in addition to the "core" standard, we can have a UCSUR block of "Extended Sitelen Pona", where we can dump all the experimental features and all obscure nimi sin that font makers want to support. This could include Pingo, sutopatikuna, molusa, extended preposition glyphs, and the cool diacritic-like things linja sike supports (like writing a little o under a verb instead of using o in front)

    Once a new word or other feature becomes common enough, we can apply to permanently add it to actual Unicode, and deprecate its old UCSUR codepoint (i.e. recommend people to stop using the UCSUR codepoint when a proper Unicode codepoint has been added)

    I think this way we can have the best of both worlds. Words and features that are widely used are widely supported according to Unicode standards, but font makers and users are still free to experiment and be creative using the Extended encodings.

    (And as for which nimi sin to add to the Extended block, I think for the time being we could add any word that any fontmaker wants to support in their font. Once SP font making becomes more common, perhaps when 2 or 3 font makers promise to add it to a font, or something like that.)