Predicting the Economic Impact of Scientific Publications in Biotechnology Using Machine Learning

Document Type : Original Article

Authors

1 Assistant Prof., Policy Evaluation and Monitoring of Science, Technology, and Innovation Department, National Research Institute for Science Policy (NRISP), Tehran, Iran

2 Assistant Prof., Department of Computer Engineering, Hamedan University of Technology, Hamedan, Iran

Abstract
The economic impact of research papers reveals the diffusion of information and its applicability to other technical fields. This research aims to predict the number of academic paper citations in patents. Papers gathered as a dataset for the study are the outputs of Iran's biotechnology field, indexed in the Scopus database from 2003 to 2024. To conduct the research, 15 indicators have been extracted for these articles in five categories: Journal, Altmetrics, Impact, Open Access, and Collaboration. We performed data processing, exploratory data analysis (EDA), machine learning modeling, and predictions using Python and libraries such as Pandas, NumPy, Matplotlib, Seaborn, and Scikit-Learn. The findings indicated that strong positive correlations are observed between the "Cite Score" and "SJR" indices, reflecting their related nature in evaluating journal impact. The "impact" category shows the strongest positive correlation with "patent information." The "journal" and "Altmetrics" categories show significant correlations, albeit to a lesser extent, indicating their complementary role in predicting economic impacts. Journal category indices, including SNIP, CiteScore, CiteScore percentile, SJR, and SJR percentile, exhibit a range of correlations with Patent citations. Altmetrics indices show a positive correlation with patent citations, which means that articles with higher visibility and engagement have a more significant impact on the patent literature. The results suggest that while machine learning is a powerful tool for predicting economic impact, further model refinement, feature selection, and more advanced techniques are necessary to achieve more accurate predictions. Considering the large gap between scientific papers and applied research in Iran's biotechnology field, is essential for managers and policymakers to identify and remove obstacles to the commercialization of scientific advancements.
 
 
 

Keywords

Subjects


Abramo, G. (2018). Revisiting the scientometric conceptualization of impact and its measurement. Journal of Informetrics, 12(3), 590-597. https://doi.org/10.1016/j.joi.2018.05.001
Abramo, G., D’Angelo, C.A. & Felici, G. (2019). Predicting long-term publication impact through a combination of early citations and journal impact factor. Journal of Informetrics, 13(1), 32-49. https://doi.org/10.1016/j.joi.2018.11.003
Abrishami, A. & Aliakbary, S. (2019). Predicting citation counts based on deep neural network learning techniques. Journal of Informetrics, 13(2), 485-499. https://doi.org/10.1016/j.joi.2019.02.011
Acuna, D. E., Allesina, S. & Kording, K. P. (2012). Predicting scientific success. Nature, 489(7415), 201-202. https://doi.org/10.1038/489201a
Akella, A. P., Alhoori, H., Kondamudi, P. R., Freeman, C. & Zhou, H. (2021). Early indicators of scientific impact: Predicting citations with Altmetrics. Journal of Informetrics, 15(2), 101128. https://doi.org/10.1016/j.joi.2020.101128
Alchokr, R., Haider, R., Shakeel, Y., Leich, T., Saake, G. & Krüger, J. (2023). Forecasting Publication’s Success Using Machine Learning. In International Workshop on Bibliometric-Enhanced Information (BIR) (pp. 1-13). CEUR-WS. org. Retrieved from https://jacobkrueger.github.io/assets/papers/Alchokr2023ForcastingSuccess.pdf
Alohali, Y. A., Fayed, M. S., Mesallam, T., Abdelsamad, Y., Almuhawas, F. & Hagr, A. (2022). A machine learning model to predict citation counts of scientific papers in the otology field. BioMed Research International, 2022(1), 2239152. https://doi.org/10.1155/2022/2239152
Azadi Ahmadabadi, G. (2025). Predicting scientific research impacts in biotechnology by machine learning algorithms. Scientometrics Research Journal, 11(1) 1-24. https://doi.org/10.22070/rsci.2024.18868.1719 [in Persian]
Bai, X., Liu, H., Zhang, F., Ning, Z., Kong, X., Lee, I., & Xia, F. (2017). An overview on evaluating and predicting scholarly article impact. Information, 8(3), 73. https://doi.org/10.48550/arXiv.2008.03867
Bai, X., Zhang, F. & Lee, I. (2019). Predicting the citations of a scholarly paper. Journal of Informetrics, 13(1), 407-418. https://doi.org/10.1016/j.joi.2019.01.010
Bu, Y., Lu, W., Wu, Y., Chen, H. & Huang, Y. (2021). How wide is the citation impact of scientific publications? A cross-disciplinary and large-scale analysis. Information Processing & Management, 58(1), 102429. https://doi.org/10.1016/j.ipm.2020.102429
Chen, J., & Zhang, C. (2015, July). Predicting citation counts of papers. In 2015, IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI CC) (pp. 434-440). IEEE. Beijing, China. https://doi.org/10.1109/ICCI-CC.2015.7259421
Clarke, N. S. (2018). The basics of patent searching. World Patent Information, 54, S4-S10. https://doi.org/10.1016/j.wpi.2017.02.006
El Mohadab, M., Bouikhalene, B. & Safi, S. (2019). Predicting rank for scientific research papers using supervised learning. Applied Computing and Informatics, 15(2), 182-190. https://doi.org/10.1016/j.aci.2018.02.002
Elsevier (2019). Research metrics guidebook. Retrieved from https://elsevier.widen.net/s/chpzk57rqk/acad_rl_elsevierresearchmetricsbook_web
Elsevier (2024). SciVal impact. Retrieved from https://www.elsevier.com/products/scival/impact
Gao, T., Liu, J., Pan, R. & Wang, H. (2024). Citation counts prediction of statistical publications based on multi-layer academic networks via a neural network model. Expert Systems with Applications, 238, 121634. https://doi.org/10.1016/j.eswa.2023.121634
Mazloumian, A. (2012). Predicting scholars' scientific impact. PLoS ONE, 7(11), e49246. https://doi.org/10.1371/journal.pone.0049246
Morris, Z., Wooding, S. & Grant, J. (2011). The answer is 17 years. What is the question: Understanding time lags in translational research? Journal of the Royal Society of Medicine, 104(12), 510–520. https://doi.org/10.1258/jrsm.2011.110180
Nezhadbiglari, M., Gonçalves, M. A. & Almeida, J. M. (2016, June). Early prediction of scholar popularity. In Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries (pp. 181-190). https://doi.org/10.1145/2910896.2910905
Radicchi, F., Weissman, A. & Bollen, J. (2017). Quantifying the perceived impact of scientific publications. Journal of Informetrics, 11(3), 704-712. https://doi.org/10.1016/J.JOI.2017.05.010
Sharma, P. & Tripathi, R. C. (2017). Patent citation: A technique for measuring the knowledge flow of information and innovation. World Patent Information, 51, 31-42. https://doi.org/10.1016/j.wpi.2017.11.002
Shen, H., Wang, D., Song, C. & Barabási, A. L. (2014, June). Modeling and predicting popularity dynamics via reinforced Poisson processes. In Proceedings of the AAAI conference on artificial intelligence (Vol. 28, No. 1). https://doi.org/10.1609/aaai.v28i1.8739
Stegehuis, C., Litvak, N. & Waltman, L. (2015). Predicting the long-term citation impact of recent publications. Journal of Informetrics, 9(3), 642-657. http://dx.doi.org/10.1016/j.joi.2015.06.005
Stern, D. I. (2014). High-ranked social science journal articles can be identified from early citation information. PLoS One, 9(11), e112520. http://dx.doi.org/10.1371/journal.pone.0112520
Szomszor, M., Pendlebury, D. A. & Adams, J. (2020). How much is too much? The difference between research influence and self-citation excess. Scientometrics, 123(2), 1119-1147. https://doi.org/10.1007/s11192-020-03417-5
 
Talaat, F. M., & Gamel, S. A. (2023). Predicting the impact of the number of authors on the number of citations of research publications based on neural networks. Journal of Ambient Intelligence and Humanized Computing, 14(7), 8499-8508. https://doi.org/10.1007/s12652-022-03882-1
van der Zwaard, S., de Leeuw, A. W., Meerhoff, L. R. A., Bodine, S. C. & Knobbe, A. (2020). Articles with impact: Insights into 10 years of research with machine learning. Journal of Applied Physiology, 129(4), 967-979. http://dx.doi.org/10.1152/japplphysiol.00489.2020
Virtusnational (2016). Elsevier announces the launch of new metrics on Scival to help institutions measure the economic impact of their research. Retrieved from https://www.prnewswire.com/news-releases/elsevier-announces-the-launch-of-new-metrics-on-scival-to-help-institutions-measure-the-economic-impact-of-their-research-570508761.html
Wang, D., Song, C., & Barabási, A. L. (2013). Quantifying long-term scientific impact. Science, 342(6154), 127-132. https://doi.org/10.1126/science.1237825
Weihs, L. & Etzioni, O. (2017, June). Learning to predict citation-based impact measures. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL) (pp. 1-10). IEEE. https://doi.org/10.1109/JCDL.2017.7991559
Williams, K. & Grant, J. (2018). A comparative review of how the policy and procedures to assess research impact evolved in Australia and the UK. Research Evaluation, 27(2), 93–105. https://doi.org/10.1093/reseval/rvx042
Yan, R., Huang, C., Tang, J., Zhang, Y., & Li, X. (2012, June). To better stand on the shoulders of giants. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries (pp. 51-60). https://doi.org/10.1145/2232817.2232831
Yu, T., Yu, G., Li, P. Y. & Wang, L. (2014). Citation Impact Prediction for Scientific Papers Using Stepwise Regression Analysis. Scientometrics, 101(2), 1233-1252. https://doi.org/10.1007/s11192-014-1279-6 
Zhang, F. & Wu, S. (2024). Predicting citation impact of academic papers across research areas using multiple models and early citations. Scientometrics, 129(7), 4137-4166. https://doi.org/10.1007/s11192-024-05086-0
 
Volume 23, Issue 4
Autumn 2025
Pages 63-87

  • Receive Date 25 October 2024
  • Revise Date 18 September 2025
  • Accept Date 18 September 2025