Probabilistic Finite-State Morphological Segmenter for Wixarika (Huichol) Language

In this scholarly article, linguists present a morphological segmenter for the Wixarika language. Segmentation is fundamental for rich morphological languages, a common aspect of the Indigenous languages of the Americas, to improve other tasks like automated translation, dialogue systems, summarization, etc. On top of the agglutinative nature of the language, the low amount of resources and the lack of an orthographic standard among dialects add to the challenge. Their proposal is based on a probabilistic finite-state approach that exploits regular agglutinative patterns and requires little linguistic knowledge. They seek to show that their approach outperforms unsupervised and semi-supervised methods in a low-resource context. The dataset used in this work was openly released for future work by the community.

Download and read full article here.

English