Automated Fuzzy Weighted Multi-Semantic Label Extraction of Persian News

Esmaeili Shayan, Sahar; Abdolvand, Neda; Rajaee Harandi, Saeedeh

doi:10.22034/ijism.2025.1990502.1057

Automated Fuzzy Weighted Multi-Semantic Label Extraction of Persian News

Document Type : Original Article

Authors

Sahar Esmaeili Shayan ¹

Neda Abdolvand ²

Saeedeh Rajaee Harandi ¹

¹ Department of Management, Faculty of Social Sciences and Economics, Alzahra University, Tehran, Iran

² Associate Prof., Department of Management, Faculty of Social Sciences and Economics, Alzahra University, Tehran, Iran

https://doi.org/10.22034/ijism.2025.1990502.1057

Abstract

The vast number of online text documents and Semantic Web trends has increased researchers’ interest in semantic multi-label extraction. Research on the semantic extraction of multiple labels in Western and Eastern European languages is already well established. The challenge of machine reading and web-based knowledge extraction requires a scalable system to extract diverse information from large and heterogeneous collections. Hence, this study developed the multi-semantic fuzzy weight labeling system using natural language processing and supervised deep learning techniques. A long short-term memory (LSTM) was used for the extraction of labels, and the LSTM2 introduced by Yan, Wang, Gao, Zhang, Yang & Yin (2018) was used for the extraction of the label weights. To assess the degree of belonging of each document to each label, the resulting weights were modified according to their appearance in the document’s subject or in the Meta section of the web page, and the weights were normalized and fuzzified. Finally, the C-means fuzzy clustering algorithm was applied to the documents to assign each data point a degree of membership in relevant clusters. According to the results, the model's accuracy was 59.8%, indicating that the extraction of weighted key phrases and the semantic labeling of the text could be improved through supervised methods.

Keywords

Text Mining

Multi-Label Extraction

Persian Natural Language Processing

LSTM

20.1001.1.20088302.2025.23.4.8.8

Subjects

natural language processing (NLP)

Ahmadi, P., Tabandeh, M., & Gholampour, I. (2016, May). Persian text classification based on topic models. In 2016, the 24th Iranian Conference on Electrical Engineering (ICEE) (pp. 86-91). IEEE. https://doi.org/10.1109/IranianCEE.2016.7585495

Altınel, B., Ganiz, M. C. & Diri, B. (2015). A corpus-based semantic kernel for text classification by using the meaning values of terms. Engineering Applications of Artificial Intelligence, 43, 54-66. https://doi.org/10.1016/j.engappai.2015.03.015

Aghighi, R. & Bashiri, H. (2025). Text classification of Persian documents with deep learning. In Advanced Interdisciplinary Applications of Deep Learning for Data Science (pp. 143–170). IGI Global. https://doi.org/10.4018/979-8-3693-4759-1.ch006

Ashtiani, M. N. & Raahemi, B. (2023). News-based intelligent prediction of financial markets using text mining and machine learning: A systematic literature review. Expert Systems with Applications, 217, 119509. https://doi.org/10.1016/j.eswa.2023.119509

Azarafza, M., Feizi-Derakhshi, M. R. & Shendi, M. B. (2020, March). TextRank-based microblogs keyword extraction method for the Persian language. In Proceedings of the 3rd International Congress on Science and Engineering (Hamburg, Germany). Retrieved from https://www.researchgate.net/publication/338533840_TextRank-based_Microblogs Keyword_Extraction_Method_for_Persian_Language/link/5e19d387a6fdcc28376b9525/download?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6InB1YmxpY2F0aW9uIiwicGFnZSI6InB1YmxpY2F0aW9uIn19

Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. Plenum Press. https://doi.org/10.1007/978-1-4757-0450-1

Brin, S. & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117. https://doi.org/10.1016/S0169-7552(98)00110-X

Cejuela, J. M., Bojchevski, A., Uhlig, C., Bekmukhametov, R., Kumar Karn, S., Mahmuti, S., Baghudana, A., Dubey, A., Satagopam, V. P., & Rost, B. (2017). nala: Text mining natural language mutation mentions. Bioinformatics, 33(12), 1852–1858. https://doi.org/10.1093/bioinformatics/btx083

Chen, X., Zou, D., Cheng, G., Liu, Y. & Xie, H. (2024). Deep neural networks for the automatic understanding of the semantic content of online course reviews. Education and Information Technologies, 29(4), 3953–3991. https://doi.org/10.1007/s10639-023-11980-6

Cheng, Q. & Shi, W. (2025). Hierarchical multi-label text classification of tourism resources using a label-aware dual graph attention network. Information Processing & Management, 62(1), 103952. https://doi.org/10.1016/j.ipm.2024.103952

Chollet, F. (2017). Deep learning with Python. Simon and Schuster.

Chopra, S., Auli, M. & Rush, A. M. (2016, June). Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings of the 2016 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 93-98). https://doi.org/10.18653/v1/N16-1012

Dang, J., Kalender, M., Toklu, C. & Hampel, K. (2017). Semantic search tool for document tagging, indexing, and search. US Patent 9,684,683.

Dastgheib, M. B. & Koleini, S. (2019). Persian text classification enhancement by latent semantic space. International Journal of Information Science and Management (IJISM), 17(1), 33-46. Retrieved from https://ijism.isc.ac/article_698289_45b04d125578f24c0ca1b3f1e06e346e.pdf

Dastgheib, M. B., Koleini, S., & Rasti, F. (2020). The application of deep learning in Persian document sentiment analysis. International Journal of Information Science and Management (IJISM), 18(1), 1-15. https://dor.org/20.1001.1.20088302.2020.18.1.1.0

Davari, N., Mahdian, M., Akhavanpour, A. & Daneshpour, N. (2020, August). Persian Document Classification Using Deep Learning Methods. In 202, the 28th Iranian Conference on Electrical Engineering (ICEE) (pp. 1-5). IEEE. https://doi.org/10.1109/ICEE50131.2020.9260650

Deng, L., Tur, G., He, X. & Hakkani-Tur, D. (2012, December). Use of kernel deep convex networks and end-to-end learning for spoken language understanding. In 2012, IEEE Spoken Language Technology Workshop (SLT) (pp. 210-215). IEEE. https://doi.org/10.1109/SLT.2012.6424224

Farahani, B. D., Fatemi, S. O., & Ghorbani, M. (2019, April). Automatic keyphrase extraction from Persian scientific documents using semantic relations. In 2019, the 27th Iranian Conference on Electrical Engineering (ICEE) (pp. 1972-1978). IEEE. https://doi.org/10.1109/iraniancee.2019.8786696

Gattiker, J. R., Hamada, M. S., Higdon, D. M., Schonlau, M. & Welch, W. J. (2016). Using a Gaussian process as a nonparametric regression model. Quality and Reliability Engineering International, 32(2), 673-680. https://doi.org/10.1002/qre.1782

Ghasemi, S. & Jadidinejad, A. H. (2018, April). Persian text classification via character-level convolutional neural networks. In 2018, the 8th Conference of AI & Robotics and the 10th RoboCup Iran Open International Symposium (IRANOPEN) (pp. 1-6). IEEE. https://doi.org/10.1109/RIOS.2018.8406623

Hadifar, A. & Momtazi, S. (2018). The impact of corpus domain on word representation: a study on Persian word embeddings. Language Resources and Evaluation, 52(4), 997-1019. https://doi.org/10.1007/s10579-018-9419-x

Johnson, R. & Zhang, T. (2014). Effective use of word order for text categorization with convolutional neural networks. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 103–112, Denver, Colorado. Association for Computational Linguistics (pp. 103-112). Association for Computational Linguistics. https://doi.org/10.3115/v1/N15-1011

Kim, J. & Lee, M. (2014, November). Robust lane detection based on a convolutional neural network and random sample consensus. In International Conference on Neural Information Processing (pp. 454-461). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-12637-1_57

Kingma, D. P. & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980

Khuntia, M. & Gupta, D. (2023). Indian news headlines classification using word embedding techniques and an LSTM Model. Procedia Computer Science, 218, 899-907. https://doi.org/10.1016/j.procs.2023.01.070

Khurana, D., Koli, A., Khatter, K. & Singh, S. (2023). Natural language processing: state of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713-3744. https://doi.org/10.1007/s11042-022-13428-4

Kou, W., Li, F. & Baldwin, T. (2015). Automatic labelling of topic models using word vectors and letter trigram vectors. In Information Retrieval Technology: 11th Asia Information Retrieval Societies Conference, AIRS 2015 (pp. 253–264). Springer. https://doi.org/10.1007/978-3-319-28940-3_20

Lazemi, S., Ebrahimpour-Komleh, H. & Noroozi, N. (2019). PAKE: A supervised approach for Persian automatic keyword extraction using statistical features. SN Applied Sciences, 1,1574. https://doi.org/10.1007/s42452-019-1627-5

Leeson, W., Resnick, A., Alexander, D. & Rovers, J. (2019). Natural Language Processing (NLP) in Qualitative Public Health Research: A Proof of Concept Study. International Journal of Qualitative Methods, 18. https://doi.org/10.1177/1609406919887021

Li, X., Liu, J., Wang, X. & Chen, S. (2024). A survey on incomplete multi-label learning: Recent advances and future trends. https://doi.org/10.48550/arXiv.2406.06119

Lin, B. (2022). Knowledge management system with NLP-assisted annotations: A brief survey and outlook. https://doi.org/10.48550/arXiv.2206.07304

Liu, J., Chang, W. C., Wu, Y. & Yang, Y. (2017, August). Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 115-124). https://doi.org/10.1145/3077136.3080834

Liu, Z., Li, P., Zheng, Y. & Sun, M. (2009, August). Clustering to find exemplar terms for key phrase extraction. In Proceedings of the 2009 conference on empirical methods in natural language processing (pp. 257-266). Retrieved from https://aclanthology.org/D09-1027.pdf

Masoudian, S., Derhami, V. & Zarifzadeh, S. (2019, April). Hierarchical Persian text categorization in the absence of labeled data. In 2019, the 27th Iranian Conference on Electrical Engineering (ICEE) (pp. 1951-1955). IEEE.

Meesad, P. & Li, J. (2014, December). Stock trend prediction relying on text mining and sentiment analysis with tweets. In 2014, the 4th World Congress on Information and Communication Technologies (WICT 2014) (pp. 257-262). IEEE. https://doi.org/10.1109/wict.2014.7077275

Mei, Q., Shen, X. & Zhai, C. (2007, August). Automatic labeling of multinomial topic models. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 490-499). https://doi.org/10.1145/1281192.1281246

Mikolov, T., Yih, W. T. & Zweig, G. (2013, June). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 746-751). Retrieved from https://aclanthology.org/N13-1090.pdf

Paik, J. H. (2013, July). A novel TF-IDF weighting scheme for effective ranking. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 343-352). https://doi.org/10.1145/2484028.2484070

Pan, P., Swaroop, S., Immer, A., Eschenhagen, R., Turner, R., & Khan, M. E. E. (2020). Continual deep learning by functional regularisation of memorable past. Advances in Neural Information Processing Systems, 33, 4453-4464.

Pawar, D., Phansalkar, S., Sharma, A., Sahu, G. K., Ang, C. K. & Lim, W. H. (2023). Survey on the biomedical text summarization techniques with an emphasis on databases, techniques, semantic approaches, classification techniques, and similarity measures. Sustain. 15(5), 4216. https://doi.org/10.3390/su15054216

Roy, S., Das, N., Kundu, M. & Nasipuri, M. (2017). Handwritten isolated Bangla compound character recognition: A new benchmark using a novel deep learning approach. Pattern Recognition Letters, 90, 15-21. https://doi.org/10.1016/j.patrec.2017.03.004

Rajabi, E., Sahebari, M. & Thomas, T. (2022). Analyzing systemic lupus erythematosus publications using neural network–based multi-label classification algorithms. Lupus, 31(7), 820-827. https://doi.org/10.1177/09612033221093548

Saaty, T. L. (1988). What is the analytic hierarchy process? In Mitra, G., Greenberg, H. J., Lootsma, F. A., Rijkaert, M. J., Zimmermann, H. J. (eds) Mathematical Models for Decision Support. NATO ASI Series, vol 48. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-83555-1_5

Schoene, A. M., Basinas, I., van Tongeren, M. & Ananiadou, S. (2022). A Narrative Literature Review of Natural Language Processing Applied to the Occupational Exposome. International Journal of Environmental Research and Public Health, 19(14), 8544. https://doi.org/10.3390/ijerph19148544

Sharifi, A. & Mahdavi, M. A. (2019). Supervised approach for keyword extraction from Persian documents using lexical chains. Signal and Data Processing, 15(4), 95-110. http://dx.doi.org/10.29252/jsdp.15.4.95 [in Persian]

Soloshenko, A. N., Orlova, Y. A., Rozaliev, V. L. & Zaboleeva-Zotova, A. V. (2015). Establishing the semantic similarity of the cluster documents and extracting key entities in the problem of the semantic analysis of news texts. Modern Applied Science, 9(5), 246-268. http://dx.doi.org/10.5539/mas.v9n5p246

Sorodoc, I., Lau, J. H., Aletras, N. & Baldwin, T. (2017, April). Multimodal topic labelling. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers (pp. 701-706). Retrieved from https://aclanthology.org/E17-2111.pdf

Sullivan, F. R. & Keith, P. K. (2019). Exploring the potential of natural language processing to support microgenetic analysis of collaborative learning discussions. British Journal of Educational Technology, 50(6), 3047–3063. https://doi.org/10.1111/bjet.12875

Sun, Y., Sun, H. & Cheng, R. (2016, April). Fast and semantic measurements on collaborative tagging quality. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 363-375). Cham: Springer International Publishing. Cham: Springer International Publishing. http://dx.doi.org/10.1007%2F978-3-319-31750-2_29

Swayamdipta, S., Thomson, S., Lee, K., Zettlemoyer, L., Dyer, C., & Smith, N. A. (2018). Syntactic scaffolds for semantic structures. arXiv preprint arXiv:1808.10485

Tai, K. S., Socher, R. & Manning, C. D. (2015). Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), (pp. 1556-1566). Beijing, China. Association for Computational Linguistics. https://doi.org/10.3115/v1/P15-1150

Tarekegn, A. N., Ullah, M. & Cheikh, F. A. (2024). Deep learning for multi-label learning: A comprehensive survey. https://doi.org/10.48550/arXiv.2401.16549

W3Techs (2023). Usage statistics of content languages for websites. Retrieved from https://w3techs.com/technologies/overview/content_language

Wang, J., Chen, Y., Hao, S., Peng, X. & Hu, L. (2019). Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters, 119, 3-11. https://doi.org/10.1016/j.patrec.2018.02.010

Wu, C. H., Chuang, Z. J. & Lin, Y. C. (2006). Emotion recognition from text using semantic labels and separable mixture models. ACM transactions on Asian language information processing (TALIP), 5(2), 165-183. https://doi.org/10.1145/1165255.1165259

Xie, J., Deng, Q., Xia, S., Zhao, Y., Wang, G., & Gao, X. (2023). Research on an Efficient Fuzzy Clustering Method Based on Local Fuzzy Granules. https://doi.org/10.48550/arXiv.2303.03590

Xiong, H., Jin, K., Liu, J., Cai, J. & Xiao, L. (2023, May). Deep learning-based image text processing research. In 202, the IEEE 9th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing (HPSC), and IEEE Intl Conference on Intelligent Data and Security (IDS) (pp. 163-168). IEEE. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS58521.2023.00037

Yacob, F. & Semere, D. (2021). A multilayer shallow learning approach to variation prediction and variation source identification in multistage machining processes. Journal of Intelligent Manufacturing, 32(4), 1173-1187. https://doi.org/10.1007/s10845-020-01649-z

Yan, Y., Wang, Y., Gao, W. C., Zhang, B. W., Yang, C. & Yin, X. C. (2018). LSTM²: Multi-label ranking for document classification. Neural Processing Letters, 47, 117-138. https://doi.org/10.1007/s11063-017-9636-0

Yu, S., Li, X., Zhao, X., Zhang, Z. & Wu, F. (2015). Tracking news article evolution by dense subgraph learning. Neurocomputing, 168, 1076–1084. https://doi.org/10.1016/j.neucom.2015.05.077

Za’in, C., Pratama, M., Lughofer, E. & Anavatti, S. G. (2017). Evolving type-2 web news mining. Applied Soft Computing, 54, 200-220. https://doi.org/10.1016/J.ASOC.2016.11.034

Zha, D. & Li, C. (2019). Multi-label dataless text classification with topic modeling. Knowledge and Information Systems, 61(1), 137-160. https://doi.org/10.1007/s10115-018-1280-0

Zhang, R., Lee, H. & Radev, D. (2016). Dependency-sensitive convolutional neural networks for modeling sentences and documents. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (pp.1512–1521). San Diego, California. Association for Computational Linguistics. https://doi.org/10.18653/v1/N16-1177

Zheng, J. & Zheng, L. (2019). A hybrid bidirectional recurrent convolutional neural network attention-based model for text classification. IEEE Access, 7, 106673-106685. https://doi.org/10.1109/ACCESS.2019.2932619

International Journal of Information Science and Management (IJISM)

Volume 23, Issue 4
Autumn 2025
Pages 167-188

XML

PDF 542.59 K

Receive Date 19 April 2023
Revise Date 05 July 2023
Accept Date 30 September 2025

Article View	392
PDF Download	317

Advanced Search

Language editing services

International Journal of Information Science and Management (IJISM)

Automated Fuzzy Weighted Multi-Semantic Label Extraction of Persian News

Volume 23, Issue 4
Autumn 2025
Pages 167-188

Home

Guide for Authors

Submit Manuscript

Reviewers

Contact Us

Copyright Ownership

International Journal of Information Science and Management (IJISM)

Automated Fuzzy Weighted Multi-Semantic Label Extraction of Persian News

Volume 23, Issue 4Autumn 2025Pages 167-188

Files

History

Share

How to cite

Statistics

Home

Browse

Journal Info

Guide for Authors

Submit Manuscript

Reviewers

Contact Us

Author Services

Copyright Ownership

Volume 23, Issue 4
Autumn 2025
Pages 167-188