Automated Fuzzy Weighted Multi-Semantic Label Extraction of Persian News

Document Type : Original Article

Authors

1 Department of Management, Faculty of Social Sciences and Economics, Alzahra University, Tehran, Iran

2 Associate Prof., Department of Management, Faculty of Social Sciences and Economics, Alzahra University, Tehran, Iran

Abstract
 
The vast number of online text documents and Semantic Web trends has increased researchers’ interest in semantic multi-label extraction. Research on the semantic extraction of multiple labels in Western and Eastern European languages is already well established. The challenge of machine reading and web-based knowledge extraction requires a scalable system to extract diverse information from large and heterogeneous collections. Hence, this study developed the multi-semantic fuzzy weight labeling system using natural language processing and supervised deep learning techniques. A long short-term memory (LSTM) was used for the extraction of labels, and the LSTM2 introduced by Yan, Wang, Gao, Zhang, Yang & Yin (2018) was used for the extraction of the label weights. To assess the degree of belonging of each document to each label, the resulting weights were modified according to their appearance in the document’s subject or in the Meta section of the web page, and the weights were normalized and fuzzified. Finally, the C-means fuzzy clustering algorithm was applied to the documents to assign each data point a degree of membership in relevant clusters. According to the results, the model's accuracy was 59.8%, indicating that the extraction of weighted key phrases and the semantic labeling of the text could be improved through supervised methods.
 
 

Keywords

Subjects


Ahmadi, P., Tabandeh, M., & Gholampour, I. (2016, May). Persian text classification based on topic models. In 2016, the 24th Iranian Conference on Electrical Engineering (ICEE) (pp. 86-91). IEEE. https://doi.org/10.1109/IranianCEE.2016.7585495
Altınel, B., Ganiz, M. C. & Diri, B. (2015). A corpus-based semantic kernel for text classification by using the meaning values of terms. Engineering Applications of Artificial Intelligence, 43, 54-66. https://doi.org/10.1016/j.engappai.2015.03.015
Aghighi, R. & Bashiri, H. (2025). Text classification of Persian documents with deep learning. In Advanced Interdisciplinary Applications of Deep Learning for Data Science (pp. 143–170). IGI Global. https://doi.org/10.4018/979-8-3693-4759-1.ch006
Ashtiani, M. N. & Raahemi, B. (2023). News-based intelligent prediction of financial markets using text mining and machine learning: A systematic literature review. Expert Systems with Applications, 217, 119509. https://doi.org/10.1016/j.eswa.2023.119509
Azarafza, M., Feizi-Derakhshi, M. R. & Shendi, M. B. (2020, March). TextRank-based microblogs keyword extraction method for the Persian language. In Proceedings of the 3rd International Congress on Science and Engineering (Hamburg, Germany). Retrieved from https://www.researchgate.net/publication/338533840_TextRank-based_Microblogs Keyword_Extraction_Method_for_Persian_Language/link/5e19d387a6fdcc28376b9525/download?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6InB1YmxpY2F0aW9uIiwicGFnZSI6InB1YmxpY2F0aW9uIn19
Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. Plenum Press. https://doi.org/10.1007/978-1-4757-0450-1
Brin, S. & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117. https://doi.org/10.1016/S0169-7552(98)00110-X
Cejuela, J. M., Bojchevski, A., Uhlig, C., Bekmukhametov, R., Kumar Karn, S., Mahmuti, S., Baghudana, A., Dubey, A., Satagopam, V. P., & Rost, B. (2017). nala: Text mining natural language mutation mentions. Bioinformatics, 33(12), 1852–1858. https://doi.org/10.1093/bioinformatics/btx083
 
 
Chen, X., Zou, D., Cheng, G., Liu, Y. & Xie, H. (2024). Deep neural networks for the automatic understanding of the semantic content of online course reviews. Education and Information Technologies, 29(4), 3953–3991. https://doi.org/10.1007/s10639-023-11980-6
Cheng, Q. & Shi, W. (2025). Hierarchical multi-label text classification of tourism resources using a label-aware dual graph attention network. Information Processing & Management, 62(1), 103952. https://doi.org/10.1016/j.ipm.2024.103952
Chollet, F. (2017). Deep learning with Python. Simon and Schuster.
Chopra, S., Auli, M. & Rush, A. M. (2016, June). Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings of the 2016 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 93-98). https://doi.org/10.18653/v1/N16-1012
Dang, J., Kalender, M., Toklu, C. & Hampel, K. (2017). Semantic search tool for document tagging, indexing, and search. US Patent 9,684,683.
Dastgheib, M. B. & Koleini, S. (2019). Persian text classification enhancement by latent semantic space. International Journal of Information Science and Management (IJISM), 17(1), 33-46. Retrieved from  https://ijism.isc.ac/article_698289_45b04d125578f24c0ca1b3f1e06e346e.pdf
Dastgheib, M. B., Koleini, S., & Rasti, F. (2020). The application of deep learning in Persian document sentiment analysis. International Journal of Information Science and Management (IJISM), 18(1), 1-15. https://dor.org/20.1001.1.20088302.2020.18.1.1.0
Davari, N., Mahdian, M., Akhavanpour, A. & Daneshpour, N. (2020, August). Persian Document Classification Using Deep Learning Methods. In 202, the 28th Iranian Conference on Electrical Engineering (ICEE) (pp. 1-5). IEEE. https://doi.org/10.1109/ICEE50131.2020.9260650
Deng, L., Tur, G., He, X. & Hakkani-Tur, D. (2012, December). Use of kernel deep convex networks and end-to-end learning for spoken language understanding. In 2012, IEEE Spoken Language Technology Workshop (SLT) (pp. 210-215). IEEE. https://doi.org/10.1109/SLT.2012.6424224
Farahani, B. D., Fatemi, S. O., & Ghorbani, M. (2019, April). Automatic keyphrase extraction from Persian scientific documents using semantic relations. In 2019, the 27th Iranian Conference on Electrical Engineering (ICEE) (pp. 1972-1978). IEEE. https://doi.org/10.1109/iraniancee.2019.8786696  
Gattiker, J. R., Hamada, M. S., Higdon, D. M., Schonlau, M. & Welch, W. J. (2016). Using a Gaussian process as a nonparametric regression model. Quality and Reliability Engineering International, 32(2), 673-680. https://doi.org/10.1002/qre.1782
Ghasemi, S. & Jadidinejad, A. H. (2018, April). Persian text classification via character-level convolutional neural networks. In 2018, the 8th Conference of AI & Robotics and the 10th RoboCup Iran Open International Symposium (IRANOPEN) (pp. 1-6). IEEE. https://doi.org/10.1109/RIOS.2018.8406623
Hadifar, A. & Momtazi, S. (2018). The impact of corpus domain on word representation: a study on Persian word embeddings. Language Resources and Evaluation, 52(4), 997-1019. https://doi.org/10.1007/s10579-018-9419-x
 
 
 
Johnson, R. & Zhang, T. (2014). Effective use of word order for text categorization with convolutional neural networks. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 103–112, Denver, Colorado. Association for Computational Linguistics (pp. 103-112). Association for Computational Linguistics. https://doi.org/10.3115/v1/N15-1011
 Kim, J. & Lee, M. (2014, November). Robust lane detection based on a convolutional neural network and random sample consensus. In International Conference on Neural Information Processing (pp. 454-461). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-12637-1_57
Kingma, D. P. & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980 
Khuntia, M. & Gupta, D. (2023). Indian news headlines classification using word embedding techniques and an LSTM Model. Procedia Computer Science, 218, 899-907. https://doi.org/10.1016/j.procs.2023.01.070
Khurana, D., Koli, A., Khatter, K. & Singh, S. (2023). Natural language processing: state of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713-3744. https://doi.org/10.1007/s11042-022-13428-4
Kou, W., Li, F. & Baldwin, T. (2015). Automatic labelling of topic models using word vectors and letter trigram vectors. In Information Retrieval Technology: 11th Asia Information Retrieval Societies Conference, AIRS 2015 (pp. 253–264). Springer. https://doi.org/10.1007/978-3-319-28940-3_20
Lazemi, S., Ebrahimpour-Komleh, H. & Noroozi, N. (2019). PAKE: A supervised approach for Persian automatic keyword extraction using statistical features. SN Applied Sciences, 1,1574. https://doi.org/10.1007/s42452-019-1627-5
Leeson, W., Resnick, A., Alexander, D. & Rovers, J. (2019). Natural Language Processing (NLP) in Qualitative Public Health Research: A Proof of Concept Study. International Journal of Qualitative Methods, 18. https://doi.org/10.1177/1609406919887021 
Li, X., Liu, J., Wang, X. & Chen, S. (2024). A survey on incomplete multi-label learning: Recent advances and future trends. https://doi.org/10.48550/arXiv.2406.06119
Lin, B. (2022). Knowledge management system with NLP-assisted annotations: A brief survey and outlook. https://doi.org/10.48550/arXiv.2206.07304
 Liu, J., Chang, W. C., Wu, Y. & Yang, Y. (2017, August). Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 115-124). https://doi.org/10.1145/3077136.3080834
Liu, Z., Li, P., Zheng, Y. & Sun, M. (2009, August). Clustering to find exemplar terms for key phrase extraction. In Proceedings of the 2009 conference on empirical methods in natural language processing (pp. 257-266). Retrieved from https://aclanthology.org/D09-1027.pdf
Masoudian, S., Derhami, V. & Zarifzadeh, S. (2019, April). Hierarchical Persian text categorization in the absence of labeled data. In 2019, the 27th Iranian Conference on Electrical Engineering (ICEE) (pp. 1951-1955). IEEE.
 
 
 
Meesad, P. & Li, J. (2014, December). Stock trend prediction relying on text mining and sentiment analysis with tweets. In 2014, the 4th World Congress on Information and Communication Technologies (WICT 2014) (pp. 257-262). IEEE. https://doi.org/10.1109/wict.2014.7077275
Mei, Q., Shen, X. & Zhai, C. (2007, August). Automatic labeling of multinomial topic models. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 490-499). https://doi.org/10.1145/1281192.1281246
Mikolov, T., Yih, W. T. & Zweig, G. (2013, June). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 746-751). Retrieved from https://aclanthology.org/N13-1090.pdf
Paik, J. H. (2013, July). A novel TF-IDF weighting scheme for effective ranking. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 343-352). https://doi.org/10.1145/2484028.2484070
Pan, P., Swaroop, S., Immer, A., Eschenhagen, R., Turner, R., & Khan, M. E. E. (2020). Continual deep learning by functional regularisation of memorable past. Advances in Neural Information Processing Systems, 33, 4453-4464.
Pawar, D., Phansalkar, S., Sharma, A., Sahu, G. K., Ang, C. K. & Lim, W. H. (2023). Survey on the biomedical text summarization techniques with an emphasis on databases, techniques, semantic approaches, classification techniques, and similarity measures. Sustain. 15(5), 4216. https://doi.org/10.3390/su15054216
Roy, S., Das, N., Kundu, M. & Nasipuri, M. (2017). Handwritten isolated Bangla compound character recognition: A new benchmark using a novel deep learning approach. Pattern Recognition Letters, 90, 15-21. https://doi.org/10.1016/j.patrec.2017.03.004
Rajabi, E., Sahebari, M. & Thomas, T. (2022). Analyzing systemic lupus erythematosus publications using neural network–based multi-label classification algorithms. Lupus, 31(7), 820-827. https://doi.org/10.1177/09612033221093548
Saaty, T. L. (1988). What is the analytic hierarchy process? In Mitra, G., Greenberg, H. J., Lootsma, F. A., Rijkaert, M. J., Zimmermann, H. J. (eds) Mathematical Models for Decision Support. NATO ASI Series, vol 48. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-83555-1_5
Schoene, A. M., Basinas, I., van Tongeren, M. & Ananiadou, S. (2022). A Narrative Literature Review of Natural Language Processing Applied to the Occupational Exposome. International Journal of Environmental Research and Public Health, 19(14), 8544. https://doi.org/10.3390/ijerph19148544
Sharifi, A. & Mahdavi, M. A. (2019). Supervised approach for keyword extraction from Persian documents using lexical chains. Signal and Data Processing, 15(4), 95-110. http://dx.doi.org/10.29252/jsdp.15.4.95 [in Persian]
Soloshenko, A. N., Orlova, Y. A., Rozaliev, V. L. & Zaboleeva-Zotova, A. V. (2015). Establishing the semantic similarity of the cluster documents and extracting key entities in the problem of the semantic analysis of news texts. Modern Applied Science, 9(5), 246-268. http://dx.doi.org/10.5539/mas.v9n5p246
 
 
Sorodoc, I., Lau, J. H., Aletras, N. & Baldwin, T. (2017, April). Multimodal topic labelling. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers (pp. 701-706). Retrieved from https://aclanthology.org/E17-2111.pdf
Sullivan, F. R. & Keith, P. K. (2019). Exploring the potential of natural language processing to support microgenetic analysis of collaborative learning discussions. British Journal of Educational Technology, 50(6), 3047–3063. https://doi.org/10.1111/bjet.12875
Sun, Y., Sun, H. & Cheng, R. (2016, April). Fast and semantic measurements on collaborative tagging quality. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 363-375). Cham: Springer International Publishing. Cham: Springer International Publishing. http://dx.doi.org/10.1007%2F978-3-319-31750-2_29
Swayamdipta, S., Thomson, S., Lee, K., Zettlemoyer, L., Dyer, C., & Smith, N. A. (2018). Syntactic scaffolds for semantic structures. arXiv preprint arXiv:1808.10485
Tai, K. S., Socher, R. & Manning, C. D. (2015). Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), (pp. 1556-1566). Beijing, China. Association for Computational Linguistics. https://doi.org/10.3115/v1/P15-1150 
Tarekegn, A. N., Ullah, M. & Cheikh, F. A. (2024). Deep learning for multi-label learning: A comprehensive survey. https://doi.org/10.48550/arXiv.2401.16549
W3Techs (2023). Usage statistics of content languages for websites. Retrieved from  https://w3techs.com/technologies/overview/content_language
Wang, J., Chen, Y., Hao, S., Peng, X. & Hu, L. (2019). Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters, 119, 3-11. https://doi.org/10.1016/j.patrec.2018.02.010
Wu, C. H., Chuang, Z. J. & Lin, Y. C. (2006). Emotion recognition from text using semantic labels and separable mixture models. ACM transactions on Asian language information processing (TALIP), 5(2), 165-183. https://doi.org/10.1145/1165255.1165259
Xie, J., Deng, Q., Xia, S., Zhao, Y., Wang, G., & Gao, X. (2023). Research on an Efficient Fuzzy Clustering Method Based on Local Fuzzy Granules. https://doi.org/10.48550/arXiv.2303.03590
Xiong, H., Jin, K., Liu, J., Cai, J. & Xiao, L. (2023, May). Deep learning-based image text processing research. In 202, the IEEE 9th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing (HPSC), and IEEE Intl Conference on Intelligent Data and Security (IDS) (pp. 163-168). IEEE. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS58521.2023.00037
Yacob, F. & Semere, D. (2021). A multilayer shallow learning approach to variation prediction and variation source identification in multistage machining processes. Journal of Intelligent Manufacturing, 32(4), 1173-1187. https://doi.org/10.1007/s10845-020-01649-z
Yan, Y., Wang, Y., Gao, W. C., Zhang, B. W., Yang, C. & Yin, X. C. (2018). LSTM2: Multi-label ranking for document classification. Neural Processing Letters, 47, 117-138. https://doi.org/10.1007/s11063-017-9636-0
 
 
Yu, S., Li, X., Zhao, X., Zhang, Z. & Wu, F. (2015). Tracking news article evolution by dense subgraph learning. Neurocomputing, 168, 1076–1084. https://doi.org/10.1016/j.neucom.2015.05.077 
Za’in, C., Pratama, M., Lughofer, E. & Anavatti, S. G. (2017). Evolving type-2 web news mining. Applied Soft Computing, 54, 200-220. https://doi.org/10.1016/J.ASOC.2016.11.034
Zha, D. & Li, C. (2019). Multi-label dataless text classification with topic modeling. Knowledge and Information Systems, 61(1), 137-160. https://doi.org/10.1007/s10115-018-1280-0
Zhang, R., Lee, H. & Radev, D. (2016). Dependency-sensitive convolutional neural networks for modeling sentences and documents. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (pp.1512–1521). San Diego, California. Association for Computational Linguistics. https://doi.org/10.18653/v1/N16-1177
Zheng, J. & Zheng, L. (2019). A hybrid bidirectional recurrent convolutional neural network attention-based model for text classification. IEEE Access, 7, 106673-106685. https://doi.org/10.1109/ACCESS.2019.2932619
Volume 23, Issue 4
Autumn 2025
Pages 167-188

  • Receive Date 19 April 2023
  • Revise Date 05 July 2023
  • Accept Date 30 September 2025