Text compression

From sona pona, the Toki Pona wiki

Text compression is the process of encoding information using fewer characters or bits than the original representation. The small size of Toki Pona has attracted interest in many compression techniques. Some writing systems have been created expressly for this purpose.

In March 2010, inspired to compress Toki Pona text to use fewer characters on Twitter, jan Mato collated several potential lossy and lossless compression schemes. Of the options presented, Toki Pona Script was noted as having the best compression ratio,[1] and is lossless. Owing to poor Unicode support for Toki Pona Script at the time, jan Josan and jan Mato created a sitelen Kansi character set in July of that year.[2] Later equivalents to Toki Pona Script include the Sitelen Pona UCSUR block and the sitelen Emosi writing systems, which also only use one Unicode character per word.

jan Misali's ASCII syllabary allows each syllable to be reduced to 7 bits. Most punctuation would be lost upon conversion into this system, and there is no recommendation for how to mark proper names. A major limiting factor for the compression ratio is the need to separate words, which is generally done with the ASCII space.

References[edit | edit source]

English Wikipedia has an article on
text compression.
  1. [janMato (original poster), zeme]. (19 March 2010). "Best compression for toki lili?". Toki Pona Forums. Retrieved 10 January 2024.
  2. [janMato (original poster), janKipo, jan Josan, et al.]. (18 July 2010). "toki pona in chinese/kanji?". Toki Pona Forums. Retrieved 10 January 2024.
This page is a stub. You can help us by expanding it.