AN APPROACH TO FEATURE SPACE FORMATION FOR MACHINE LEARNING MODELS
DOI:
https://doi.org/10.47390/ts-v3i9y2025No2Keywords:
One-Hot Encoding, Label Encoding, Binning, Empirical Rule, MinMax Normalization, z-Normalization, StandardScaler, Normalization, Scaling.Abstract
In this study, various approaches to feature space formation for solving data processing tasks are examined, and their implementation mechanisms in a programming environment are investigated. The paper describes methods for establishing internal relationships based on data characteristics, categorizing information, and handling categorical variables. Furthermore, techniques for grouping, scaling, and normalizing numerical data are presented, including practical approaches to data preprocessing in software environments.
References
1. Ma, Z., Jørgensen, B. N., & Ma, Z. G. (2025). A systematic data characteristic understanding framework towards physical-sensor big data challenges. *arXiv*, 22-yanvar. Model 6V (volume, variety, velocity, veracity, value, variability) asosida yirik ma’lumotlardagi muammolarni tahlil qiladi. :contentReference[oaicite:4]{index=4}.
2. Mualliflar aniqlanmagan]. (2025). *A Comprehensive Survey on Big Data Analytics*. **ACM Digital Library**. Yirik ma’lumotlarning volume, velocity, variety xususiyatlari va ulardan kelib chiqadigan murakkabliklar — inconsistency, scalability, real-time analytics — haqida keng qamrovli sharh. :contentReference[oaicite:5]{index=5}.
3. Ф.Н. Нумонов, & Ш.С. Кахаров. (2023). Юзни таниб олиш тизимларида белгилар фазосини шакллантириш масаласи. Qo‘qon universiteti xabarnomasi, 1(1), 1228–1230. https://doi.org/10.54613/ku.v1i1.583.
4. Kunal Pal, Samit Ari, Arindam Bit, Saugat Bhattacharyya. (2023). 1 - Feature engineering methods. Advanced Methods in Biomedical Signal Processing and Analysis journal. Pages 1-29, ISBN 9780323859554. https://doi.org/10.1016/B978-0-323-85955-4.00004-1.
5. (2025). Evaluating Label Encoding and Preprocessing Techniques for Breast Cancer Prediction Using Machine Learning Algorithms. International Journal of Computational Intelligence Systems. 18. 10.1007/s44196-025-00957-7. https://www.researchgate.net/publication/394877247_Evaluating_Label_Encoding_and_Preprocessing_Techniques_for_Breast_Cancer_Prediction_Using_Machine_Learning_Algorithms.
6. Samuels, Jamell. (2024). One-Hot Encoding and Two-Hot Encoding: An Introduction. 10.13140/RG.2.2.21459.76327. https://www.researchgate.net/publication/377159812_One-Hot_Encoding_and_Two-Hot_Encoding_An_Introduction.
7. Muhammad Ali, Peshawa. (2022). Investigating the Impact of Min-Max Data Normalization on the Regression Performance of K-Nearest Neighbor with Different Similarity Measurements. ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY. 10. 85-91. 10.14500/aro.10955. https://www.researchgate.net/publication/361504456_Investigating_the_Impact_of_Min-Max_Data_Normalization_on_the_Regression_Performance_of_K-Nearest_Neighbor_with_Different_Similarity_Measurements