When collecting Nepali text corpus, we usually collect it from various online sources such as Wikipedia, News portals, and other websites. The online sources introduce a lot of errors due to imperfect online tools such as translators, font convertors, spelling checker, etc. Some of these errors include typos, spelling mistakes, foreign words, incorrect symbols. Dealing with these errors poses a challenging task. In this post, we will look at a simple heuristic-based algorithm to filter Non-Devanagari words from a Nepali corpus.
Bridges are vital infrastructures in the development process. Moreover, in a mountainous country like Nepal with many thousands of rivers and streams, bridges play a vital role in connecting roads, villages and cities and hence moving the economy.
FFMPEG is a popular open-source tools used to handle media files. It can be used to change audio and video file formats, and other characteristics of the media files.
Research on Natural Language Processing for Nepali Language began in 2005 with the Bhasa Sanchar project and now it is being actively carried out in universities and research institutes of Nepal and India. I have curated a list of NLP papers for Nepali language published in different journals and categorized them according to different NLP tasks in descending order of published date.
I’m constantly building my skills and knowledge in Machine Learning. It usually means spinning up new project directories and installing different packages for each of these projects. When you are working on multiple projects in parallel, managing the dependencies for the projects becomes a hassle.