Paraphrase Detection in a Low Resourced Language: Kannada

Published in 2023 IEEE 8th International Conference for Convergence in Technology (I2CT), 2023

Paraphrase detection is a Natural Language Processing classification problem whose main goal is to determine if two sentences are synonymous. The research undertaken in this paper strives to implement the same in Kannada, a low resourced language. Kannada is a South-Indian language that originates from the Dravidian language. On comparison of many appropriate state-of-the-art classifiers for this problem, the decision tree gave the best results due to the non-linear aspect of the corpora. Such a system would be best suited to implement a plagiarism detection system for Kannada documents that are in the official dialect of the Kannada language.