Unicode

From sona pona, the Toki Pona wiki
Caution: The subject of this article is undergoing change. Specific and technical details here might not be up-to-date. This article will need to be updated once the subject is no longer in flux.
Logo of the Unicode Consortium

Unicode (often tokiponized as nasin Juniko) is a text encoding standard designed to support every major writing system, avoiding the incompatibilities of character sets. Each character in Unicode is assigned a codepoint, which is often written as "U+" followed by its index in hexadecimal. Most text on the Internet is encoded in Unicode.

Toki Pona[edit | edit source]

As of January 2024, Unicode does not include any Toki Pona writing systems. Many tokiponists hope for sitelen pona and sitelen sitelen to eventually receive Unicode support. In the meantime, sitelen pona has been specified for the UCSUR, adding unofficial support within a Private Use Area of Unicode.

Tokiponists have also created several original scripts written using preexisting, unrelated Unicode characters. These writing systems are referred to as sitelen Juniko.

Proposals[edit | edit source]

There are have been a few past proposals for a sitelen pona Unicode block. In 2021, shortly after the publication of Toki Pona Dictionary, Gabriel Tellez submitted a proposal consisting only of the glyph chart for linja pona.[1] It was rejected by the Script Ad Hoc Group in its recommendation.[2] As of January 2024, there is undergoing work for a proposal for sitelen pona, cowritten by Under-ConScript Unicode Registry maintainer Rebecca Bettencourt and many important figures in the Toki Pona community.[3] A preliminary proposal was submitted on 16 April 2024.

sitelen sitelen does not seem to be under consideration, as it has far fewer users[4] and lacks font implementation of its nonlinear writing direction. Additionally, there is currently little existing architecture for scripts similar to sitelen sitelen, such as Mayan hieroglyphs.

Concerns[edit | edit source]

Several issues may have to be resolved before Toki Pona is proposed for Unicode. For sitelen pona, the size of the community and public interchange might not be concerns.[5] Potential issues for the sitelen pona proposal include:

  • Recency. While the ISO 639-3 code ought to relieve concerns that Toki Pona itself is transient, the words and features used to write it have been in flux. However, the UCSUR and the font sitelen seli kiwen (both also by jan Lepeka) may provide a working standard.
  • Font standardization. Many fonts support different features and sets of characters, and implement them in different ways. While the UCSUR has largely resolved this to an extent, some common features remain unstandardized, such as directional ni, te to, and whether to use the halfwidth or ideographic space. Also, there may be established UCSUR codepoints for features that should be handled in OpenType instead of Unicode, chiefly cartouche extension.
  • nimi sin. Which words ought to be encoded beyond those in the UCSUR is a point ripe for debate. A word's usage may not be proportional to usage of corresponding glyphs—as less used words have less recognized glyphs, and nimi sin usage may differ between sitelen pona and Toki Pona in general—so Linku usage data may not be sufficient. The proposal team is planning to conduct research throughout 2024 to obtain better data for which words to encode.
    • The prospect of adding nimi sin over time, as they meet some criteria for inclusion, is evocative of new emoji being added to account for the limitations of the original set. The Unicode Consortium may be motivated to avoid a similar situation by avoiding or postponing support for Toki Pona. If not, there could be a system outside of Unicode for encoding nimi sin, possibly using a new UCSUR block.[6] This would come at the expense of complicating the encoding situation rather than submitting one proposal and being done with it.
  • Commerciality. While the Toki Pona "logo" toki-pona would not receive its own codepoint, it occurs as part of sitelen pona pu. Moreover, pu and ku refer to and graphically represent commercial products that are not fully in the public domain. If this is an issue, an idea is to reserve the codepoints for these glyphs, not officially defining them, but allowing de facto use. Otherwise, fonts would have to create workarounds to encode them, which may create another standardization problem.

References[edit | edit source]

English Wikipedia has an article on
Unicode.
  1. Gabriel Tellez. (27 April 2021). "Toki Pona for Unicode". The Unicode Consortium. Retrieved 29 January 2024.
  2. Deborah Anderson, Ken Whistler, Roozbeh Pournader, Liang Ha. (26 July 2021). "Recommendations to UTC #168 July 2021 on Script Proposals". The Unicode Consortium. Retrieved 29 January 2024.
  3. jan Kekan San. (24 January 2024). "nasin Juniko" (in Toki Pona). Toki Pona VR [@TokiPonaVR]. YouTube. Retrieved 29 January 2024.
  4. jan Tamalu. (3 October 2022). "Results of the 2022 Toki Pona census". Toki Pona census. Retrieved 29 January 2024.
  5. jan Kekan San [@gregdan3]. (3 January 2024). [Message posted in the #nasin-Juniko thread in the #pali-musi channel in the ma pona pi toki pona Discord server]. Discord. Retrieved 15 January 2024.
  6. jan Pensa [@jpensa]. (2 November 2023). [Pinned message posted in the #nasin-Juniko thread in the #pali-musi channel in the ma pona pi toki pona Discord server]. Discord.

    My idea for encoding rare nimi sin that fonts still want to support, was to basically have two separate standards. Here's how I'd want that to look like in (hopefully) 1 or 2 years:

    In the actual Unicode standard we want the 137 nimi ku suli, and as many other common words as we and the Unicode Consortium feel comfortable giving a permanent codepoint. ("majuna" and "apeja" probably yes, "Pingo" almost definitely no)

    Then in addition to the "core" standard, we can have a UCSUR block of "Extended Sitelen Pona", where we can dump all the experimental features and all obscure nimi sin that font makers want to support. This could include Pingo, sutopatikuna, molusa, extended preposition glyphs, and the cool diacritic-like things linja sike supports (like writing a little o under a verb instead of using o in front)

    Once a new word or other feature becomes common enough, we can apply to permanently add it to actual Unicode, and deprecate its old UCSUR codepoint (i.e. recommend people to stop using the UCSUR codepoint when a proper Unicode codepoint has been added)

    I think this way we can have the best of both worlds. Words and features that are widely used are widely supported according to Unicode standards, but font makers and users are still free to experiment and be creative using the Extended encodings.

    (And as for which nimi sin to add to the Extended block, I think for the time being we could add any word that any fontmaker wants to support in their font. Once SP font making becomes more common, perhaps when 2 or 3 font makers promise to add it to a font, or something like that.)