MULTI-STAGE FILTERING ALGORITHM FOR DETECTING OBFUSCATED SPAM MESSAGES IN THE UZBEK LANGUAGE
DOI:
https://doi.org/10.47390/ts-v4i4y2026N01Keywords:
spam, SMS spam, obfuscated spam, spam filtering, machine learning, text classification, Uzbek language.Abstract
This paper considers the problem of detecting obfuscated spam messages in the Uzbek language. A multi-stage filtering algorithm including text preprocessing, obfuscation normalization, feature extraction, and classification is proposed. Experimental results show that the proposed algorithm improves spam detection performance.
References
1. Aggarwal C. C. Machine Learning for Text. Springer, 2018. – pp. 63–95. https://link.springer.com/book/10.1007/978-3-319-73531-3
2. Alpaydin E. Introduction to Machine Learning. MIT Press, 2020. – pp. 35–58. https://mitpress.mit.edu/9780262043793/introduction-to-machine-learning/
3. Androutsopoulos I., Koutsias J., Chandrinos K., Spyropoulos C. An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering. SIGIR, 2000. – pp. 160–167. https://dl.acm.org/doi/10.1145/345508.345545
4. Devlin J., Chang M., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL, 2019. – pp. 4171–4186. https://arxiv.org/abs/1810.04805
5. Forman G. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 2003. – Vol. 3, pp. 1289–1305. https://jmlr.org/papers/volume3/forman03a/forman03a.pdf
6. Jurafsky D., Martin J. Speech and Language Processing. Pearson Education, 2020. – Chapter 6, pp. 245–280. https://web.stanford.edu/~jurafsky/slp3/
7. Manning C. D., Raghavan P., Schütze H. Introduction to Information Retrieval. Cambridge University Press, 2008. – Chapter 13, pp. 259–296. https://nlp.stanford.edu/IR-book/
8. Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys, 2002. – Vol. 34, No. 1, pp. 1–47. https://dl.acm.org/doi/10.1145/505282.505283
9. Shamili S., Karthikeyan S., Balakumar T. A survey on spam filtering techniques. International Journal of Computer Science and Information Technology, 2010. – pp. 45–52. https://arxiv.org/abs/1006.0976
10. Zhang Y., Jin R., Zhou Z. Understanding bag-of-words model in text classification. International Journal of Machine Learning and Cybernetics, 2010. – pp. 43–52. https://link.springer.com/article/10.1007/s13042-010-0001-0


