Unsupervised Stemmer for Hindi and Marathi

Jan 1, 1970 · 1 min read

In this project, I developed an unsupervised stemming algorithm tailored for Hindi and Marathi languages. Leveraging word embeddings and similarity measures, the stemmer effectively reduces words to their root forms. This approach enhances text preprocessing in various Natural Language Processing (NLP) tasks without the need for labeled datasets.

Key Features:

Unsupervised Learning: Eliminates the dependency on annotated corpora by utilizing unsupervised techniques.
Word Embeddings: Employs vector representations of words to capture semantic similarities, aiding in accurate stemming.
Language Focus: Addresses the morphological complexities inherent in Hindi and Marathi.

Last updated on Jan 1, 1970