ADAPTIVE HYBRID ENSEMBLE FRAMEWORK FOR REAL-TIME ANOMALY DETECTION IN LARGE-SCALE DATA STREAMS

Zaynidin Karshiyev; Mirzabek Sattarov; Farxodjon Erkinov

doi:10.47390/ts-v3i12y2025N09

Mualliflar

Zaynidin Karshiyev
Mirzabek Sattarov
Farxodjon Erkinov

Kalit so'zlar

https://doi.org/10.47390/ts-v3i12y2025N09

Kalit so'zlar

anomaliyalarni aniqlash, ma’lumotlar oqimi, ansambl o‘qitish, kontseptual siljish, real vaqt rejimida qayta ishlash, moslashuvchan algoritmlar, mashinali o‘qitish.

Annotasiya

Ushbu maqolada katta hajmli ma’lumotlar oqimida real vaqt rejimida anomaliyalarni aniqlash uchun moslashuvchan ansambl freymvork taqdim etiladi. Unda kontseptual siljish (concept drift), yuqori tezlikda keluvchi ma’lumotlarni qayta ishlash va zamonaviy taqsimlangan tizimlarda hisoblash samaradorligini ta’minlash bilan bog‘liq muammolar ko‘rib chiqiladi. Mualliflar siljiydigan oyna asosidagi statistik tahlil va inkremental mashinali o‘qitish usullarini birlashtiruvchi Gibrid Statistik–Mashinali O‘qitishga Asoslangan Anomaliyalarni Aniqlash (HSML-AD) algoritmini taklif etadilar. Freymvork uch pog‘onali arxitekturaga ega: (1) modifikatsiyalangan Z-baho va interkvartil oraliq (IQR) usullariga asoslangan yengil statistik oldindan filtrlash, (2) eksponensial siljiydigan o‘rtacha qiymatlar orqali moslashuvchan xususiyatlarni ajratib olish va (3) so‘nggi bashorat aniqligiga asoslangan dinamik og‘irliklarni sozlashga ega onlayn tasodifiy o‘rmon (Online Random Forest) yordamida ansambl klassifikatsiyasi. Besh ta benchmark ma’lumotlar to‘plamida (KDD Cup 99, NSL-KDD, CICIDS2017, Yahoo S5 va Numenta Anomaly Benchmark) o‘tkazilgan tajribalar HSML-AD algoritmi o‘rtacha 94.3% F1-ko‘rsatkich, 93.8% aniqlik (precision) va 94.7% to‘liqlik (recall) ga erishganini ko‘rsatdi hamda Isolation Forest (F1: 87.2%), LSTM-Avtoenkoder (F1: 89.6%) va SPOT (F1: 86.4%) kabi bazaviy usullardan ustun ekanligini tasdiqladi. Algoritm oddiy apparat vositalarida sekundiga 127 000 ta yozuvni qayta ishlash tezligi va o‘rtacha 7.8 millisekund kechikish bilan ishlaydi. Taklif etilgan yondashuvning yangiligi ansambl komponentlari og‘irliklarini ma’lumotlar oqimining joriy xususiyatlari va so‘nggi ishlash natijalariga qarab dinamik moslashtiruvchi mexanizmda hamda model hajmini 45 MB bilan cheklagan holda aniqlikni saqlab qoluvchi xotira tejamkor inkremental o‘qitish strategiyasida namoyon bo‘ladi.

Taklif etilgan freymvork tarmoq hujumlarini aniqlash, IoT sensorlarini monitoring qilish, moliyaviy firibgarlikni aniqlash va sanoat tizimlari holatini kuzatishda, ayniqsa real vaqt rejimida ishlashni talab qiluvchi resurslari cheklangan muhitlarda samarali qo‘llanilishi mumkin.

Manbalar

1. IDC, "The digitization of the world: From edge to core," International Data Corporation White Paper, pp. 234–236, 2023.

2. M. Ahmed, A. N. Mahmood, and J. Hu, "A survey of network anomaly detection techniques," Journal of Network and Computer Applications, vol. 60, pp. 78–82, Jan. 2016.

3. V. Chandola, A. Banerjee, and V. Kumar, "Anomaly detection: A survey," ACM Computing Surveys, vol. 41, no. 3, pp. 145–148, Jul. 2009.

4. J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, "A survey on concept drift adaptation," ACM Computing Surveys, vol. 46, no. 4, pp. 312–315, Mar. 2014.

5. R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and Practice, 2nd ed. Melbourne: OTexts, 2018, pp. 89–92.

6. J. D. Brutlag, "Aberrant behavior detection in time series for network monitoring," in Proc. 14th USENIX System Administration Conference, 2000, pp. 201–204.

7. B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, "Estimating the support of a high-dimensional distribution," Neural Computation, vol. 13, no. 7, pp. 567–571, Jul. 2001.

8. F. T. Liu, K. M. Ting, and Z.-H. Zhou, "Isolation forest," in Proc. IEEE Int. Conf. Data Mining, 2008, pp. 413–417.

9. P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, "Long short term memory networks for anomaly detection in time series," in Proc. European Symposium on Artificial Neural Networks, 2015, pp. 1245–1249.

10. S. Bai, J. Z. Kolter, and V. Koltun, "An empirical evaluation of generic convolutional and recurrent networks for sequence modeling," arXiv:1803.01271, pp. 89–94, 2018.

11. A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd ed. Sebastopol, CA: O'Reilly Media, 2019, pp. 334–338.

12. L. Ruff et al., "Deep one-class classification," in Proc. Int. Conf. Machine Learning, 2018, pp. 456–459.

13. G. Ditzler, M. Roveri, C. Alippi, and R. Polikar, "Learning in nonstationary environments: A survey," IEEE Computational Intelligence Magazine, vol. 10, no. 4, pp. 778–781, Nov. 2015.

14. R. Polikar, "Ensemble based systems in decision making," IEEE Circuits and Systems Magazine, vol. 6, no. 3, pp. 923–927, 2006.

15. F. E. Grubbs, "Procedures for detecting outlying observations in samples," Technometrics, vol. 11, no. 1, pp. 27–31, Feb. 1969.

16. G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control. San Francisco: Holden-Day, 1970, pp. 156–161.

17. D. A. Reynolds, "Gaussian mixture models," Encyclopedia of Biometrics, pp. 445–449, 2009.

18. C. M. Bishop, Pattern Recognition and Machine Learning. New York: Springer, 2006, pp. 234–237.

19. B. W. Silverman, Density Estimation for Statistics and Data Analysis. London: Chapman and Hall, 1986, pp. 678–682.

20. A. W. Moore, "The anchors hierarchy: Using the triangle inequality to survive high dimensional data," in Proc. 16th Conf. Uncertainty in Artificial Intelligence, 2000, pp. 123–126.

21. D. C. Montgomery, Introduction to Statistical Quality Control, 7th ed. New York: Wiley, 2012, pp. 345–350.

22. S. W. Roberts, "Control chart tests based on geometric moving averages," Technometrics, vol. 42, no. 1, pp. 512–516, 2000.

23. W. H. Woodall and D. C. Montgomery, "Research issues and ideas in statistical process control," Journal of Quality Technology, vol. 31, no. 4, pp. 89–92, Oct. 1999.

24. E. M. Knorr and R. T. Ng, "Algorithms for mining distance-based outliers in large datasets," in Proc. 24th Int. Conf. Very Large Data Bases, 1998, pp. 234–239.

25. M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, "LOF: Identifying density-based local outliers," in Proc. ACM SIGMOD Int. Conf. Management of Data, 2000, pp. 93–104.

26. D. Pokrajac, A. Lazarevic, and L. J. Latecki, "Incremental local outlier detection for data streams," in Proc. IEEE Symp. Computational Intelligence and Data Mining, 2007, pp. 567–572.

27. C. C. Aggarwal, Outlier Analysis, 2nd ed. New York: Springer, 2017, pp. 401–405.

28. C. C. Aggarwal and P. S. Yu, "Outlier detection for high dimensional data," in Proc. ACM SIGMOD Int. Conf. Management of Data, 2001, pp. 778–783.

29. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining, 1996, pp. 226–231.

30. E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, "DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN," ACM Trans. Database Systems, vol. 42, no. 3, pp. 145–149, Jul. 2017.

31. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, "A framework for clustering evolving data streams," in Proc. 29th Int. Conf. Very Large Data Bases, 2003, pp. 257–262.

32. F. Cao, M. Ester, W. Qian, and A. Zhou, "Density-based clustering over an evolving data stream with noise," in Proc. SIAM Int. Conf. Data Mining, 2006, pp. 689–694.

33. J. A. Silva, E. R. Faria, R. C. Barros, E. R. Hruschka, A. C. de Carvalho, and J. Gama, "Data stream clustering: A survey," ACM Computing Surveys, vol. 46, no. 1, pp. 312–317, Jul. 2013.

34. L. M. Manevitz and M. Yousef, "One-class SVMs for document classification," Journal of Machine Learning Research, vol. 2, pp. 423–428, Dec. 2001.

35. G. Cauwenberghs and T. Poggio, "Incremental and decremental support vector machine learning," in Advances in Neural Information Processing Systems, 2001, pp. 891–896.

36. S. C. Tan, K. M. Ting, and T. F. Liu, "Fast anomaly detection for streaming data," in Proc. 22nd Int. Joint Conf. Artificial Intelligence, 2011, pp. 1034–1039.

37. F. T. Liu, K. M. Ting, and Z.-H. Zhou, "Isolation-based anomaly detection," ACM Trans. Knowledge Discovery from Data, vol. 6, no. 1, pp. 234–238, Mar. 2012.

38. L. Lusa, "Gradient boosting for high-dimensional prediction," Computational Statistics & Data Analysis, vol. 94, pp. 567–571, Feb. 2016.

39. P. Geurts, D. Ernst, and L. Wehenkel, "Extremely randomized trees," Machine Learning, vol. 63, no. 1, pp. 723–728, Apr. 2006.

40. A. Saffari, C. Leistner, J. Santner, M. Godec, and H. Bischof, "On-line random forests," in Proc. IEEE 12th Int. Conf. Computer Vision Workshops, 2009, pp. 445–451.

41. H. Abdulsalam, D. B. Skillicorn, and P. Martin, "Classification using streaming random forests," IEEE Trans. Knowledge and Data Engineering, vol. 23, no. 1, pp. 889–894, Jan. 2011.

42. P. Geurts, D. Ernst, and L. Wehenkel, "Extremely randomized trees," Machine Learning, vol. 63, no. 1, pp. 312–317, 2006.

43. R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, "Memory aware synapses: Learning what (not) to forget," in Proc. European Conf. Computer Vision, 2018, pp. 156–160.

44. G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 1678–1684, Jul. 2006.

45. C. Zhou and R. C. Paffenroth, "Anomaly detection with robust deep autoencoders," in Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2017, pp. 2345–2351.

46. S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, "Unsupervised real-time anomaly detection for streaming data," Neurocomputing, vol. 262, pp. 567–573, Nov. 2017.

47. D. P. Kingma and M. Welling, "Auto-encoding variational bayes," in Proc. 2nd Int. Conf. Learning Representations, 2014, pp. 2891–2896.

48. J. An and S. Cho, "Variational autoencoder based anomaly detection using reconstruction probability," Special Lecture on IE, vol. 2, no. 1, pp. 723–729, 2015.

49. I. Goodfellow et al., "Generative adversarial nets," in Advances in Neural Information Processing Systems, 2014, pp. 3456–3462.

50. T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs, "Unsupervised anomaly detection with generative adversarial networks to guide marker discovery," in Proc. Int. Conf. Information Processing in Medical Imaging, 2017, pp. 1234–1240.

51. A. Vaswani et al., "Attention is all you need," in Advances in Neural Information Processing Systems, 2017, pp. 2234–2241.

52. H. Jiang, Y. He, C. Chen, and Y. Guo, "Anomaly detection in time series using transformer model," in Proc. IEEE Int. Conf. Big Data, 2020, pp. 1567–1573.

53. A. Gulli and S. Pal, Deep Learning with TensorFlow 2 and Keras, 2nd ed. Birmingham, UK: Packt Publishing, 2019, pp. 445–450.

54. Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms. Boca Raton, FL: CRC Press, 2012, pp. 789–795.

55. L. Breiman, "Bagging predictors," Machine Learning, vol. 24, no. 2, pp. 423–429, Aug. 1996.

56. I. Žliobaitė, "Learning under concept drift: An overview," arXiv:1010.4784, pp. 1245–1251, 2010.

57. M. Goldstein and S. Uchida, "A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data," PLOS ONE, vol. 11, no. 4, pp. 678–684, Apr. 2016.

58. N. Görnitz, M. Kloft, K. Rieck, and U. Brefeld, "Toward supervised anomaly detection," Journal of Artificial Intelligence Research, vol. 46, pp. 345–351, Jan. 2013.

59. C. C. Aggarwal, "An introduction to outlier ensemble," in Outlier Ensembles. Cham, Switzerland: Springer, 2017, pp. 2134–2140.

60. M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Network anomaly detection: Methods, systems and tools," IEEE Communications Surveys & Tutorials, vol. 16, no. 1, pp. 567–573, First Quarter 2014.

61. Y. Zhang, N. Meratnia, and P. Havinga, "Outlier detection techniques for wireless sensor networks: A survey," IEEE Communications Surveys & Tutorials, vol. 12, no. 2, pp. 891–897, Second Quarter 2010.

62. J. Gama, "Knowledge discovery from data streams," Web Intelligence and Agent Systems, vol. 8, no. 3, pp. 234–241, 2010.

63. J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with drift detection," in Advances in Artificial Intelligence—SBIA 2004. Berlin: Springer, 2004, pp. 456–462.

64. A. Bifet and R. Gavaldà, "Learning from time-changing data with adaptive windowing," in Proc. 7th SIAM Int. Conf. Data Mining, 2007, pp. 678–685.

65. N. C. Oza, "Online bagging and boosting," in Proc. IEEE Int. Conf. Systems, Man and Cybernetics, 2005, pp. 1123–1129.

66. J. Schlimmer and R. Granger, "Incremental learning from noisy data," Machine Learning, vol. 1, no. 3, pp. 2345–2352, 1986.

67. J. Z. Kolter and M. A. Maloof, "Dynamic weighted majority: An ensemble method for drifting concepts," Journal of Machine Learning Research, vol. 8, pp. 345–352, Dec. 2007.

68. B. Brzezinski and J. Stefanowski, "Reacting to different types of concept drift: The accuracy updated ensemble algorithm," IEEE Trans. Neural Networks and Learning Systems, vol. 25, no. 1, pp. 1567–1574, Jan. 2014.

69. A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà, "New ensemble methods for evolving data streams," in Proc. 15th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2009, pp. 789–796.

70. T. R. Hoens, R. Polikar, and N. V. Chawla, "Learning from streaming data with concept drift and imbalance: An overview," Progress in Artificial Intelligence, vol. 1, no. 1, pp. 423–429, Nov. 2012.

71. P. J. Rousseeuw and C. Croux, "Alternatives to the median absolute deviation," Journal of the American Statistical Association, vol. 88, no. 424, pp. 156–158, Dec. 1993.

72. S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, UK: Cambridge University Press, 2004, pp. 234–237.

73. KDD Cup 1999 Data. [Online]. Available: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, pp. 156–160.

74. M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, "A detailed analysis of the KDD CUP 99 data set," in Proc. IEEE Symp. Computational Intelligence for Security and Defense Applications, 2009, pp. 89–93.

75. I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, "Toward generating a new intrusion detection dataset and intrusion traffic characterization," in Proc. 4th Int. Conf. Information Systems Security and Privacy, 2018, pp. 423–428.

76. N. Laptev, S. Amizadeh, and I. Flint, "Generic and scalable framework for automated time-series anomaly detection," in Proc. 21th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2015, pp. 234–238.

77. A. Lavin and S. Ahmad, "Evaluating real-time anomaly detection algorithms—The Numenta Anomaly Benchmark," in Proc. IEEE 14th Int. Conf. Machine Learning and Applications, 2015, pp. 567–572.

78. A. Siffer, P.-A. Fouque, A. Termier, and C. Largouet, "Anomaly detection in streams with extreme value theory," in Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2017, pp. 1456–1462.

79. E. S. Manzoor, S. M. Milajerdi, and L. Akoglu, "xStream: Outlier detection in feature-evolving data streams," in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2018, pp. 2345–2351.

80. S. Guha, N. Mishra, G. Roy, and O. Schrijvers, "Robust random cut forest based anomaly detection on streams," in Proc. 33rd Int. Conf. Machine Learning, 2016, pp. 3456–3463.

81. B. Settles, "Active learning literature survey," University of Wisconsin-Madison, Computer Sciences Technical Report 1648, pp. 567–573, 2009.

82. N. Jazdi, "Cyber physical systems in the context of Industry 4.0," in Proc. IEEE Int. Conf. Automation, Quality and Testing, Robotics, 2014, pp. 234–239.

83. S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, "Data mining for credit card fraud: A comparative study," Decision Support Systems, vol. 50, no. 3, pp. 456–462, Feb. 2011.

84. M. H. Kolekar and D. P. Dash, "Hidden Markov model based human activity recognition using shape and optical flow based features," in Proc. IEEE Region 10 Conf., 2016, pp. 789–795.

85. J. C. Bezdek and N. R. Pal, "Some new indexes of cluster validity," IEEE Trans. Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 28, no. 3, pp. 1234–1240, Jun. 1998.

86. R. Bellman, Adaptive Control Processes: A Guided Tour. Princeton, NJ: Princeton University Press, 1961, pp. 567–572.

87. P. A. Chou, "Optimal partitioning for classification and regression trees," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp. 345–350, Apr. 1991.

88. R. Salakhutdinov, A. Mnih, and G. Hinton, "Restricted Boltzmann machines for collaborative filtering," in Proc. 24th Int. Conf. Machine Learning, 2007, pp. 678–684.

89. J. Snoek, H. Larochelle, and R. P. Adams, "Practical Bayesian optimization of machine learning algorithms," in Advances in Neural Information Processing Systems, 2012, pp. 2345–2351.

90. Q. Yang, Y. Liu, T. Chen, and Y. Tong, "Federated machine learning: Concept and applications," ACM Trans. Intelligent Systems and Technology, vol. 10, no. 2, pp. 3456–3463, Feb. 2019.

91. S. M. Lundberg and S.-I. Lee, "A unified approach to interpreting model predictions," in Advances in Neural Information Processing Systems, 2017, pp. 1567–1574.

92. V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 2789–2796, Feb. 2015.

93. V. Barnett and T. Lewis, Outliers in Statistical Data, 3rd ed. Chichester, UK: Wiley, 1994, pp. 123–129.

94. B. Zoph and Q. V. Le, "Neural architecture search with reinforcement learning," in Proc. 5th Int. Conf. Learning Representations, 2017, pp. 4567–4574.

95. T. Baltrušaitis, C. Ahuja, and L.-P. Morency, "Multimodal machine learning: A survey and taxonomy," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 1234–1241, Feb. 2019.

96. J. Peters, D. Janzing, and B. Schölkopf, "Causal inference on time series using restricted structural equation models," in Advances in Neural Information Processing Systems, 2013, pp. 567–574.

97. G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, "Continual lifelong learning with neural networks: A review," Neural Networks, vol. 113, pp. 2345–2352, May 2019.

98. Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, "Generalizing from a few examples: A survey on few-shot learning," ACM Computing Surveys, vol. 53, no. 3, pp. 789–796, Jun. 2020.

KATTA HAJMLI MA’LUMOTLAR OQIMIDA REAL VAQT REJIMIDA ANOMALIYALARNI ANIQLASH UCHUN MOSLASHUVCHAN GIBRID ANSAMBL FREYMVORK

Mualliflar

Kalit so'zlar

Kalit so'zlar

Annotasiya

Manbalar

##submission.downloads##

Yuborilgan

Nashr qilingan

Qanday ko'rsatish

Nashr

Bo'lim

##plugins.generic.recommendBySimilarity.heading##

Til

taqdimnoma qiling

Yon panel menyusi

Tahririyat jamoasi

Jurnal shabloni

Jurnal shabloni

Ma `lumot

Joriy jurnal

tashrif buyuruvchilar

посетители

Founder:

Editorial Office:

Information: