Document Type : Articles

Authors

1 Professor, Knowledge and Information Science, National Library & Archive of Iran, Tehran, Iran.

2 Ph.D. student in Knowledge and Information Science, Shahid Beheshti University, Tehran, Iran.

Abstract

patents are a significant competitive strategy to categorize commercial value based on the source information of technology; researchers use patent analysis as a practical tool to infer various types of information. This shows how important it is to retrieve and access them.  Clustering is a method used in different fields to group similar natures. Citations are commonly used to cluster documents, and two methods are widely used for this purpose. The first method uses bibliographic coupling, and the second method identifies the words in the citation titles, also called co-citation. However, it is necessary to investigate which methods provide better patent clustering and retrieval results. This study examines citation contents instead of citations in building relevant groups of patents. Experimental research was done on a set of US patents. The analysis is divided into three phases. The first is appropriate databases to conduct patent searches according to the subject and objective of this study. The basic inventions and the experimental set were selected. Phase II, for developing a patent clustering system based on patent similarities and assisting the relationships among categories, we used fuzzy c-means (FCM) clustering because it can handle overlapping clusters similar to k-means. As fuzzy clustering is a kind of overlapping clustering, extended B Cubed precision and recall - measures for evaluating overlapping clustering - were used. Since patents can belong to multiple technology domains, in phase III, a Perl program was written to manage the matching process. The study involved creating two patent clusters using bibliographic coupling and citation title words, respectively. The results indicated that the bibliographic coupling method produced better clustering performance than the citation title words. Moreover, the cluster structure was more extensive in terms of exhaustivity than the citation title words.  It's interesting to note that the use of cited patent title words resulted in a reduction of nearly 40% of the number of attributes. Additionally, when compared to the use of bibliographic coupling, the cited title words method had a nearly equal recall of clustering by cited patents in high exhaustivity. As a result, it appears that using cited title words may be preferable when the high exhaustivity approach is selected for patent clustering and retrieval.

Keywords

Amigo, E., Gonzalo, J., Artiles, J. & Verdejo, F. (2008). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 12(4), 461-486. https://doi.org/10.1007/s10791-008-9066-8
Fujii, A. (2007). Integrating content and citation information for the ntcir-6 patent retrieval task. Paper presented at the Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (pp. 377-380). Tokyo, Japan. Retrieved from  https://research.nii.ac.jp/ntcir/workshop/OnlineProceedings6/NTCIR/76.pdf
Graf, E. & Azzopardi, L. (2008).  A methodology for building a patent test collection for prior art research. In Proceeding of the 2nd International Workshop on Evaluating Information Access (EVIA), December 16, 2008, Tokyo, Japan (pp. 60-71). Retrieved from http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings7/pdf/EVIA2008/11-EVIA2008-GrafE.pdf
Habiba, S. (2004). Iran patent system after acceptance of Trade-Related Aspects of Intellectual Property Rights (TRIPS). Law & Political Science, 66, 145-181. Retrieved from https://journals.ut.ac.ir/article_11229_adfb2c1c942393c295f8f640669c5e08.pdf [in Persian]
 
 
Huang, A. (2008, April). Similarity measures for text document clustering. In Proceedings of the sixth New Zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand (Vol. 4, pp. 9-56).
Jain, A. K., Murty, M. N. & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264-323. https://doi.org/10.1145/331499.331504

Kessler, M. M. (1963). Bibliographic coupling extended in time: Ten case histories. Information Storage and Retrieval, 1(4):169-187. https://doi.org/10.1016/0020-0271(63)90016-0 

Kim, Y. G., Suh, J. H. & Park, S. C. (2008). Visualization of patent analysis for emerging technology. Expert Systems with Applications,  34(3), 1804-1812. https://doi.org/10.1016/j.eswa.2007.01.033

Lai, K.-K. & Wu, S.-J. (2005). Using the patent co-citation approach to establish a new patent classification system. Information Processing and Management, 41(2), 313-330. https://doi.org/10.1016/j.ipm.2003.11.004
Leydesdorff, L. (1987). Various methods for the mapping of science. Scientometrics, 11(5),
            295-324. https://doi.org/10.1007/BF02279351
Li, X., Chen, H., Zhang, Z. & Li, J. (2007, June). Automatic patent classification using citation network information: An experimental study in nanotechnology. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (pp. 419-427). Vancouver, BC, Canada.
Meyer, M. (2000). What is special about patent citations? Differences between scientific and patent citations.  Scientometrics, 49(1), 93-123. https://doi.org/10.1023/A:1005613325648
Salton, G. (1963). Associative document retrieval techniques using bibliographic information. Journal of ACM, 10(4), 440-457. https://doi.org/10.1145/321186.321188
Salton, G. (1971). Automatic indexing using bibliographic citations. Journal of Documentation, 27 (2), 98 - 110. https://doi.org/10.1108/eb026511
Shaw Jr, W. M. (1990). Subject indexing and citation indexing- Part II: An evaluation and comparison. Information Processing & Management, 26(6),705-718. https://doi.org/10.1016/0306-4573(90)90047-6
Shaw Jr, W. M. (1990). Subject indexing and citation indexing- Part I: Clustering structure in the cystic fibrosis document collection. Information Processing & Management, 26(6), 693-703. https://doi.org/10.1002/(SICI)1097-4571(199110)42:9%3C669::AID-ASI5%3E3.0.CO;2-Y
Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265-269. https://doi.org/10.1002/asi.4630240406
Tan, P. N., Steinbach, M. & Kumar, V. (2006). Cluster analysis: Basic concept and algorithm introduction to data mining. Boston, Massachusetts: Pearson Addison-Wesley.
Tiwana, S. & Horowitz, E. (2009, November). Find cite: Automatically finding prior art patents. In Proceedings of the 2nd international workshop on Patent information retrieval (pp. 37-40). Hong Kong, China.
USPTO. (2012). Manual of Patent Examining Procedure (MPEP). Final Revision
USPTO. (2020). Manual of Patent Examining Procedure (MPEP). E9R-10.2019
 
Wedding, D. K. (2009). Extending the data mining software packages sas enterprise miner and spss clementine to handle fuzzy cluster membership: implementation with examples. Master of Science Central Connecticut State University, Connecticut.
Wouters, P. (1999). The Citation Culture. Ph.D. Theses. University of Amsterdam, Amsterdam. Retrieved from http://garfield.library.upenn.edu/wouters/wouters.pdf