Usage categories

From sona pona, the Toki Pona wiki
Revision as of 13:11, 16 August 2024 by Menasewi (talk | contribs) (added Category:Statistics using HotCat)

A usage category is a criterion for word usage based on the annual surveys by Linku. These are a more granular, more frequently updated replacement for the book presence categories.[1]

Table of categories

In the following tables, a bold line represents the cutoff for the categories that are selected by default. Numbers are rounded to the nearest percentage point. For words that were below these categories, Linku used the term non-notable; nimi.li used the term marginal,[2] which has been adopted on sona pona to avoid the interpretation that these words are not notable for the wiki.

On 22 February 2024, waso Keli brought up the idea of simplifying Linku categories to aid their adoption. Teachers on the kama sona community discussed and reached a consensus, reducing the number of categories from six to four. This change was rolled out alongside the lipu Linku redesign on 30 March 2024.[3]

2024 redesign
Category Users
Core [90%, 100%]
Common [60%, 90%)
Uncommon [30%, 60%)
Obscure[a] [5%, 30%)
Sandbox[a] [0%, 5%)
2023 survey results
Category Users
n = 868
Core [90%, 100%]
Widespread [70%, 90%)
Common [50%, 70%)
Uncommon[b] [20%, 50%)
Rare[b] [10%, 20%)
Obscure[c][d][e] [2%, 10%)
2022 survey results
Category Users
n = 345
Core [90%, 100%]
Widespread [70%, 90%)
Common [50%, 70%)
Uncommon [20%, 50%)
Rare [10%, 20%)
Obscure [1%, 10%)

Notes

  1. 1.0 1.1 The sandbox threshold was at 2% for a few days after being implemented, then raised to 5% by consensus.
  2. 2.0 2.1 On 12 February 2024, a message was added clarifying that most speakers don't use uncommon or rare words.
  3. In the 2023 results post, the obscure category is split into a high end [5%, 10%) and low end [2%, 5%) purely for readability.
  4. New words below 2% usage are considered not notable for inclusion in the dictionary. Words below this threshold that are already included were planned to be moved into a separate sandbox resource. This was completed on 9 April 2024, shortly after the redesign was launched.
  5. On 12 February 2024, a message was added clarifying that most speakers don't use or understand obscure words.

Survey results

Correlations

According to a June 2024 study by jan Kekan San, less-used words tend to be discussed in other languages (such as English) more often than being used in Toki Pona.[4] Almost every sub-common word (below 60% usage) is used in other languages at least 30% of the time, and usually far more often; and the trend becomes "more pronounced" for sandboxed words (below 5% usage).[5]

References

  1. kala Asi. (7 August 2023). "wile sona nimi". kala Asi [@kala_asi]. YouTube. Archived from the original on 18 October 2023. Retrieved 18 October 2023.
  2. jan Tani. "about". nimi.li. Retrieved 17 January 2024.
  3. (30 March 2024). "lipu Linku". lipu Linku. Archived from the original on 30 March 2024. Retrieved 30 March 2024.
  4. jan Kekan San [@gregdan3]. (12 June 2024). Message in the #Word frequency in Toki Pona thread in #toki-suli. ma pona pi toki pona. Discord. Retrieved 14 June 2024.
  5. jan Kekan San [@gregdan3]. (12 June 2024). Message in the #Word frequency in Toki Pona thread in #toki-suli. ma pona pi toki pona. Discord. Retrieved 14 June 2024.