Usage categories: Difference between revisions

From sona pona, the Toki Pona wiki
Content added Content deleted
No edit summary
No edit summary
 
(14 intermediate revisions by 4 users not shown)
Line 1: Line 1:
A '''usage category''' is a criteria for [[word usage]] based on the annual surveys by {{tok|[[Linku]]}}. These are a more granular, more frequently updated replacement for the [[book presence]] categories.<ref>{{cite YouTube|url=https://www.youtube.com/watch?v=wrFB1ETL1Hg|title={{tp|wile sona nimi}}|name={{tok|kala Asi}}|channel={{tok|kala Asi}}|handle=kala_asi|date=2023-08-07|archive-url=https://web.archive.org/web/20231018173305/https://www.youtube.com/watch?v=wrFB1ETL1Hg|archive-date=2023-10-18|access-date=2023-10-18}}</ref>
A '''usage category''' is a criterion for [[word usage]] based on the annual surveys by {{tok|[[Linku]]}}. These are a more granular, more frequently updated replacement for the [[book presence]] categories.<ref>{{cite YouTube|url=https://www.youtube.com/watch?v=wrFB1ETL1Hg|title={{tp|wile sona nimi}}|name={{tok|kala Asi}}|channel={{tok|kala Asi}}|handle=kala_asi|date=2023-08-07|archive-url=https://web.archive.org/web/20231018173305/https://www.youtube.com/watch?v=wrFB1ETL1Hg|archive-date=2023-10-18|access-date=2023-10-18}}</ref>


==Table of categories==
==Table of categories==
In the following tables, a bold line represents the cutoff for the categories that are selected by default. Numbers are rounded to the nearest percentage point. For words that are below these categories, {{tok|Linku}} uses the term ''non-notable''; {{tok|[[nimi.li]]}} uses the term ''marginal'',<ref>{{cite web|url=https://nimi.li/about|title=about|website=nimi.li|author={{tok|jan Tani}}|access-date=2024-01-17}}</ref> which has been adopted on {{tp|[[sona pona]]}} to avoid the interpretation that these words are not notable for the wiki.
In the following tables, a bold line represents the cutoff for the categories that are selected by default. Numbers are rounded to the nearest percentage point. For words that were below these categories, {{tok|Linku}} used the term ''non-notable''; {{tok|[[nimi.li]]}} used the term ''marginal'',<ref>{{cite web|url=https://nimi.li/about|title=about|website=nimi.li|author={{tok|jan Tani}}|access-date=2024-01-17}}</ref> which has been adopted on {{tp|[[sona pona]]}} to avoid the interpretation that these words are not [[Project:Notability|notable for the wiki]].


On 22 February 2024, {{tok|waso Keli}} brought up the idea of simplifying {{tok|Linku}} categories to aid their adoption. Teachers on the {{tp|kama sona}} community discussed and reached a consensus, reducing the number of categories from six to four. This change was rolled out alongside the {{tok|lipu Linku}} redesign on 30 March 2024.<ref>{{cite web|url=https://linku.la|title={{tok|lipu Linku}}|author=|username=|date=20240330|website={{tok|lipu Linku}}|publisher=|archive-url=https://web.archive.org/web/20240330232258/https://linku.la|archive-date=20240330|access-date=20240330|quote=}}</ref>
{| style="vertical-align: top;"

|
{|style="overflow-x:auto;"
|style="vertical-align:top;"|
{|class="wikitable" style="text-align:center;"
{|class="wikitable" style="text-align:center;"
|+2024 redesign
|+2023
|-
|-
!Category
!Category
!Users
!Users<br/><small>{{abbr|<var>n</var>|Sample size}} = 868</small>
|-
|-
!Core
!Core
|[90%, 100%]
|[90%, 100%]
|-
|-
!style="border-bottom:2px solid currentColor;"|Widespread
! style="border-bottom:2px solid currentColor;" |Common
|style="border-bottom:2px solid currentColor;"|[70%, 90%)
| style="border-bottom:2px solid currentColor;" |[60%, 90%)
|-
!Uncommon
|[30%, 60%)
|-
!Obscure<ref group="lower-alpha" name="sandbox 2% to 5%">The sandbox threshold was at 2% for a few days after being implemented, then raised to 5% by consensus.</ref>
|[5%, 30%)
|-
|colspan="2" style="background:#fff;border:solid #fff;border-width:1px 1px 0;"|
|-
!Sandbox<ref group="lower-alpha" name="sandbox 2% to 5%" />
|[0%, 5%)
|}
|style="vertical-align:top;"|
{|class="wikitable" style="text-align:center;"
|+2023 survey results
|-
!Category
!Users<br /><small>{{abbr|<var>n</var>|Sample size}} = 868</small>
|-
!Core
|[90%, 100%]
|-
! style="border-bottom:2px solid currentColor;" |Widespread
| style="border-bottom:2px solid currentColor;" |[70%, 90%)
|-
|-
!Common
!Common
Line 27: Line 53:
|[10%, 20%)
|[10%, 20%)
|-
|-
!Obscure<ref group="lower-alpha">In the 2023 results post, the obscure category is split into a high end [5%, 10%) and low end [2%, 5%) purely for readability.</ref><ref group="lower-alpha">New words below 2% usage are considered not notable for inclusion in the dictionary. Words below this threshold that are already included are planned to be moved into a separate sandbox resource. As of the publication of the 2023 results, this is yet to be done.</ref><ref group="lower-alpha">On 12 February 2024, a message was added clarifying that most speakers don't use or understand obscure words.</ref>
!Obscure<ref group="lower-alpha">In the 2023 results post, the obscure category is split into a high end [5%, 10%) and low end [2%, 5%) purely for readability.</ref><ref group="lower-alpha">New words below 2% usage are considered not notable for inclusion in the dictionary. Words below this threshold that are already included were planned to be moved into a separate sandbox resource. This was completed on 9 April 2024, shortly after the redesign was launched.</ref><ref group="lower-alpha">On 12 February 2024, a message was added clarifying that most speakers don't use or understand obscure words.</ref>
|[2%, 10%)
|[2%, 10%)
|}
|}
|style="vertical-align:top;"|
|
{|class="wikitable" style="text-align:center;"
{|class="wikitable" style="text-align:center;"
|+2022
|+2022 survey results
|-
|-
!Category
!Category
!Users<br/><small>{{abbr|<var>n</var>|Sample size}} = 345</small>
!Users<br /><small>{{abbr|<var>n</var>|Sample size}} = 345</small>
|-
|-
!Core
!Core
|[90%, 100%]
|[90%, 100%]
|-
|-
!style="border-bottom:2px solid currentColor;"|Widespread
! style="border-bottom:2px solid currentColor;" |Widespread
|style="border-bottom:2px solid currentColor;"|[70%, 90%)
| style="border-bottom:2px solid currentColor;" |[70%, 90%)
|-
|-
!Common
!Common
Line 57: Line 83:
|}
|}


===Survey results===
===Notes===
<references group="lower-alpha" />

==Survey results==
* [https://github.com/lipu-linku/ijo/blob/main/survey/2023/README.md 2023 survey results] (868 responses)
* [https://github.com/lipu-linku/ijo/blob/main/survey/2023/README.md 2023 survey results] (868 responses)
* [https://reddit.com/r/tokipona/comments/wqyczo/survey_results_how_many_people_use_words_in_2022 2022 survey results] (345 responses) {{Indent|The 2022 survey changed the methodology. It asks "do you use this word". Previous years asked "do you consider this word real". Because of this, results from 2022 and after cannot be directly compared to 2021 and before.}}
* [https://reddit.com/r/tokipona/comments/wqyczo/survey_results_how_many_people_use_words_in_2022 2022 survey results] (345 responses) {{Indent|The 2022 survey changed the methodology. It asks "do you use this word". Previous years asked "do you consider this word real". Because of this, results from 2022 and after cannot be directly compared to 2021 and before.}}
Line 63: Line 92:
* [https://reddit.com/r/tokipona/comments/g9ne0s/survey_results_heres_how_real_these_tp_words_are 2020 survey results] (86 responses)
* [https://reddit.com/r/tokipona/comments/g9ne0s/survey_results_heres_how_real_these_tp_words_are 2020 survey results] (86 responses)


==Notes==
==Correlations==
According to a June 2024 study by {{tok|[[jan Kekan San]]}}, less-used words tend to be discussed in other languages (such as English) more often than being used in Toki Pona.<ref>{{cite Discord|url=//discord.com/channels/301377942062366741/1250592437085274152/1250592437085274152|thread=Word frequency in Toki Pona|channel={{tok|toki-suli}}|server={{tp|ma pona pi toki pona}}|author={{tok|[[jan Kekan San]]}}|username=gregdan3|access-date=2024-06-14|quote=}}</ref> Almost every sub-common word (below 60% usage) is used in other languages at least 30% of the time, and usually far more often; and the trend becomes "more pronounced" for sandboxed words (below 5% usage).<ref>{{cite Discord|url=//discord.com/channels/301377942062366741/1250592437085274152/1250592474842267730|thread=Word frequency in Toki Pona|channel={{tok|toki-suli}}|server={{tp|ma pona pi toki pona}}|author={{tok|[[jan Kekan San]]}}|username=gregdan3|access-date=2024-06-14|quote=}}</ref>
<references group="lower-alpha" />


==References==
==References==

Latest revision as of 21:54, 14 June 2024

A usage category is a criterion for word usage based on the annual surveys by Linku. These are a more granular, more frequently updated replacement for the book presence categories.[1]

Table of categories[edit | edit source]

In the following tables, a bold line represents the cutoff for the categories that are selected by default. Numbers are rounded to the nearest percentage point. For words that were below these categories, Linku used the term non-notable; nimi.li used the term marginal,[2] which has been adopted on sona pona to avoid the interpretation that these words are not notable for the wiki.

On 22 February 2024, waso Keli brought up the idea of simplifying Linku categories to aid their adoption. Teachers on the kama sona community discussed and reached a consensus, reducing the number of categories from six to four. This change was rolled out alongside the lipu Linku redesign on 30 March 2024.[3]

2024 redesign
Category Users
Core [90%, 100%]
Common [60%, 90%)
Uncommon [30%, 60%)
Obscure[a] [5%, 30%)
Sandbox[a] [0%, 5%)
2023 survey results
Category Users
n = 868
Core [90%, 100%]
Widespread [70%, 90%)
Common [50%, 70%)
Uncommon[b] [20%, 50%)
Rare[b] [10%, 20%)
Obscure[c][d][e] [2%, 10%)
2022 survey results
Category Users
n = 345
Core [90%, 100%]
Widespread [70%, 90%)
Common [50%, 70%)
Uncommon [20%, 50%)
Rare [10%, 20%)
Obscure [1%, 10%)

Notes[edit | edit source]

  1. 1.0 1.1 The sandbox threshold was at 2% for a few days after being implemented, then raised to 5% by consensus.
  2. 2.0 2.1 On 12 February 2024, a message was added clarifying that most speakers don't use uncommon or rare words.
  3. In the 2023 results post, the obscure category is split into a high end [5%, 10%) and low end [2%, 5%) purely for readability.
  4. New words below 2% usage are considered not notable for inclusion in the dictionary. Words below this threshold that are already included were planned to be moved into a separate sandbox resource. This was completed on 9 April 2024, shortly after the redesign was launched.
  5. On 12 February 2024, a message was added clarifying that most speakers don't use or understand obscure words.

Survey results[edit | edit source]

Correlations[edit | edit source]

According to a June 2024 study by jan Kekan San, less-used words tend to be discussed in other languages (such as English) more often than being used in Toki Pona.[4] Almost every sub-common word (below 60% usage) is used in other languages at least 30% of the time, and usually far more often; and the trend becomes "more pronounced" for sandboxed words (below 5% usage).[5]

References[edit | edit source]

  1. kala Asi. (7 August 2023). "wile sona nimi". kala Asi [@kala_asi]. YouTube. Archived from the original on 18 October 2023. Retrieved 18 October 2023.
  2. jan Tani. "about". nimi.li. Retrieved 17 January 2024.
  3. (30 March 2024). "lipu Linku". lipu Linku. Archived from the original on 30 March 2024. Retrieved 30 March 2024.
  4. jan Kekan San [@gregdan3]. (12 June 2024). [Message posted in the #Word frequency in Toki Pona thread in the #toki-suli channel in the ma pona pi toki pona Discord server]. Discord. Retrieved 14 June 2024.
  5. jan Kekan San [@gregdan3]. (12 June 2024). [Message posted in the #Word frequency in Toki Pona thread in the #toki-suli channel in the ma pona pi toki pona Discord server]. Discord. Retrieved 14 June 2024.