Filtering Non-Devanagari Words: A Heuristic-based Approach

2020-04-21

When collecting Nepali text corpus, we usually collect it from various online sources such as Wikipedia, News portals, and other websites. The online sources introduce a lot of errors due to imperfect online tools such as translators, font convertors, spelling checker, etc. Some of these errors include typos, spelling mistakes, foreign words, incorrect symbols. Dealing with these errors poses a challenging task. In this post, we will look at a simple heuristic-based algorithm to filter Non-Devanagari words from a Nepali corpus.

Read More

Nepali NLP Research Papers

2019-02-04

Research on Natural Language Processing for Nepali Language began in 2005 with the Bhasa Sanchar project and now it is being actively carried out in universities and research institutes of Nepal and India. I have curated a list of NLP papers for Nepali language published in different journals and categorized them according to different NLP tasks in descending order of published date.

Read More