مروری بر پژوهش‌های انجام‌شده در خصوص خوشه‌بندی سری‌های زمانی مالی: رویکرد نگاشت دانش

نوع مقاله : مقاله پژوهشی

نویسندگان

1 دانشجوی دکتری مهندسی مالی، دانشگاه یزد، ایران

2 گروه مدیریت مالی.دانشکده اقتصاد،مدیریت و حسابداری.دانشگاه یزد.ایران

3 استادیار مدیریت مالی، دانشگاه یزد، ایران

10.30495/afi.2021.1919857.1002

چکیده

میزان اطلاعاتی که ما بازیابی و استفاده می‌کنیم، به سرعت افزایش یافته است. داده کاوی فرایند استخراج داده-های مربوط از حجم زیادی از داده‌ها و روش کشف و پیدا کردن الگوی مناسب از حجم زیادی از مجموعه داده‌ها است. خوشه‌بندی یکی از روش‌های معمول تجزیه‌وتحلیل داده‌های آماری و همچنین یکی از بهترین رویکردهای داده‌کاوی است. این رویکرد به‌عنوان یکی از روش‌های یادگیری بدون نظارت، با به‌کارگیری الگوریتم‌هایی، داده‌های سری‌های زمانی را برحسب معیارهای متفاوتی طبقه‌بندی می‌کند. هدف از پژوهش حاضر بررسی انواع کاربردهای خوشه‌بندی و شبکه‌سازی در حوزه‌های مختلف مالی ازجمله ریسک، معاملات الگوریتمی، بانکداری و دیگر موضوعات پرکاربرد در این حوزه است. در این پژوهش با استفاده از پکیج bibliometrix به‌مرور کلیه پژوهش‌های انجام شده در خصوص خوشه بندی پرداخته می‌شود. ضمن استخراج انواع معیارها و رویکردهای خوشه‌بندی به بررسی کاربردهای آن پرداخته‌شده است. این پژوهش با مروری جامع بر کلیه پژوهش‌های این حوزه می‌تواند به‌عنوان جعبه‌ابزاری در جهت ارائه انواع روش‌های خوشه‌بندی محققان را در ایده پردازی و انتخاب روش مناسب در طبقه‌بندی و تحلیل داده‌های مالی یاری دهد.

کلیدواژه‌ها


عنوان مقاله [English]

A Review of Research on Financial Time Series Clustering: A Bibliometrics Approach

نویسندگان [English]

  • Marziyeh Nourahmadi 1
  • Fatemeh Rasti 2
  • Hojjatollah Sadeqi 3
1 Ph.D. Student in Financial engineering, Faculty of Economic ,Management and Accounting, Yazd University, Yazd, Iran
2 Department of Economics, management and Accounting.Faculty of Humanities and Social Sciences.Yazd University.Iran
3 Assistant Professor of Finance, Yazd University
چکیده [English]

The amount of information and data we retrieve and use is growing rapidly. Data mining is the process of extracting relevant data from large volumes of data and the method of discovering and finding the appropriate pattern from large volumes of data sets. Clustering is one of the most common methods of statistical data analysis, and also one of the best data mining approaches. This approach, as a method of unsupervised learning, uses algorithms to classify time series data according to different criteria. The purpose of this study is to investigate the types of applications of clustering and networking in various financial fields, including risk, algorithmic trading, banking and other widely used topics in this field. In this research, using the bibliometrix package in the software, all the researches on clustering is reviewed. While extracting various criteria and clustering approaches, its applications have been studied. This study with a comprehensive review of all research in this field can help researchers as a toolbox to provide a variety of clustering methods in ideation and selection of appropriate methods in classifying and analyzing financial data.

کلیدواژه‌ها [English]

  • Clustering
  • Financial time series
  • financial networks
  • distance metrics
  • Bibliometrics Approach
Aghabozorgi, Saeed, Ali Seyed Shirkhorshidi, and Teh Ying Wah. 2015. “Time-Series Clustering–a Decade Review.” Information Systems 53:16–38.
Almog, Assaf, and Erez Shmueli. 2019. “Structural Entropy: Monitoring Correlation-Based Networks over Time with Application to Financial Markets.” Scientific Reports 9(1):1–13.
Aria, Massimo, and Corrado Cuccurullo. 2017. “Bibliometrix: An R-Tool for Comprehensive Science Mapping Analysis.” Journal of Informetrics 11(4):959–75.
Aßfalg, Johannes, Hans-Peter Kriegel, Peer Kröger, Peter Kunath, Alexey Pryakhin, and Matthias Renz. 2006. “Similarity Search on Time Series Based on Threshold Queries.” Pp. 276–94 in International Conference on Extending Database Technology. Springer.
Baitinger, Eduard, and Jochen Papenbrock. 2016. “Interconnectedness Risk and Active Portfolio Management.” Journal of Investment Strategies, Forthcoming.
Banerjee, Arindam, and Joydeep Ghosh. 2001. “Clickstream Clustering Using Weighted Longest Common Subsequences.” P. 144 in Proceedings of the web mining workshop at the 1st SIAM conference on data mining. Vol. 143.
Battiston, Stefano, J. Doyne Farmer, Andreas Flache, Diego Garlaschelli, Andrew G. Haldane, Hans Heesterbeek, Cars Hommes, Carlo Jaeger, Robert May, and Marten Scheffer. 2016. “Complexity Theory and Financial Regulation.” Science 351(6275):818–19.
Bhattacharjee, Biplab, Muhammad Shafi, and Animesh Acharjee. 2017. “Investigating the Evolution of Linkage Dynamics among Equity Markets Using Network Models and Measures: The Case of Asian Equity Market Integration.” Data 2(4):41.
Bhattacharjee, Biplab, Muhammad Shafi, and Animesh Acharjee. 2019. “Network Mining Based Elucidation of the Dynamics of Cross-Market Clustering and Connectedness in Asian Region: An MST and Hierarchical Clustering Approach.” Journal of King Saud University-Computer and Information Sciences 31(2):218–28.
Billio, Monica, Mila Getmansky, Andrew W. Lo, and Loriana Pelizzon. 2012. “Econometric Measures of Connectedness and Systemic Risk in the Finance and Insurance Sectors.” Journal of Financial Economics 104(3):535–59.
Börner, Katy, Chaomei Chen, and Kevin W. Boyack. 2003. “Visualizing Knowledge Domains.” Annual Review of Information Science and Technology 37(1):179–255.
Briner, Rob B., and David Denyer. 2012. “Systematic Review and Evidence Synthesis as a Practice and Scholarship Tool.” Handbook of Evidence-Based Management: Companies, Classrooms and Research 112–29.
Broadus, Robert N. 1987. “Toward a Definition of ‘Bibliometrics.’” Scientometrics 12(5–6):373–79.
Buonocore, R. J., G. Brandi, R. N. Mantegna, and T. Di Matteo. 2020. “On the Interplay between Multiscaling and Stock Dependence.” Quantitative Finance 20(1):133–45.
Chandra, B., Manish Gupta, and M. P. Gupta. 2008. “A Multivariate Time Series Clustering Approach for Crime Trends Prediction.” Pp. 892–96 in 2008 IEEE International Conference on Systems, Man and Cybernetics. IEEE.
Cheeseman, Peter C., and John C. Stutz. 1996. “Bayesian Classification (AutoClass): Theory and Results.” Advances in Knowledge Discovery and Data Mining 180:153–80.
Chen, Lei, and Raymond Ng. 2004. “On the Marriage of Lp-Norms and Edit Distance.” Pp. 792–803 in Proceedings of the Thirtieth international conference on Very large data bases-Volume 30.
Chen, Lei, and M. Tamer Özsu. 2005. “Using Multi-Scale Histograms to Answer Pattern Existence and Shape Match Queries.” in In SSDBM. Citeseer.
Chen, Lei, M. Tamer Özsu, and Vincent Oria. 2005. “Robust and Fast Similarity Search for Moving Object Trajectories.” Pp. 491–502 in Proceedings of the 2005 ACM SIGMOD international conference on Management of data.
Chen, Yueguo, Mario A. Nascimento, Beng Chin Ooi, and Anthony K. H. Tung. 2007. “Spade: On Shape-Based Pattern Detection in Streaming Time Series.” Pp. 786–95 in 2007 IEEE 23rd International Conference on Data Engineering. IEEE.
Chiu, Bill, Eamonn Keogh, and Stefano Lonardi. 2003. “Probabilistic Discovery of Time Series Motifs.” Pp. 493–98 in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining.
Cimini, Giulio, Tiziano Squartini, Diego Garlaschelli, and Andrea Gabrielli. 2015. “Systemic Risk Analysis on Reconstructed Economic and Financial Networks.” Scientific Reports 5:15758.
Cobo, Manuel J., Antonio Gabriel López‐Herrera, Enrique Herrera‐Viedma, and Francisco Herrera. 2011. “Science Mapping Software Tools: Review, Analysis, and Cooperative Study among Tools.” Journal of the American Society for Information Science and Technology 62(7):1382–1402.
Crane, Diana. 1972. “Invisible Colleges; Diffusion of Knowledge in Scientific Communities.”
Diodato, Virgil P., and Peter Gellatly. 2013. Dictionary of Bibliometrics. Routledge.
Dose, Christian, and Silvano Cincotti. 2005. “Clustering of Financial Time Series with Application to Index and Enhanced Index Tracking Portfolio.” Physica A: Statistical Mechanics and Its Applications 355(1):145–51.
Durante, Fabrizio, Roberta Pappadà, and Nicola Torelli. 2014. “Clustering of Financial Time Series in Risky Scenarios.” Advances in Data Analysis and Classification 8(4):359–76.
Ezugwu, Absalom E., Amit K. Shukla, Moyinoluwa B. Agbaje, Olaide N. Oyelade, Adán José-García, and Jeffery O. Agushaka. 2020. “Automatic Clustering Algorithms: A Systematic Review and Bibliometric Analysis of Relevant Literature.” Neural Computing and Applications 1–60.
Faloutsos, Christos, Mudumbai Ranganathan, and Yannis Manolopoulos. 1994. “Fast Subsequence Matching in Time-Series Databases.” Acm Sigmod Record 23(2):419–29.
Fraley, Chris, and Adrian E. Raftery. 1998. “How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis.” The Computer Journal 41(8):578–88.
Frentzos, Elias, Kostas Gratsias, and Yannis Theodoridis. 2007. “Index-Based Most Similar Trajectory Search.” Pp. 816–25 in 2007 IEEE 23rd International Conference on Data Engineering. IEEE.
Fu, Tak-chung, F. L. Chung, Vincent Ng, and Robert Luk. 2001. “Pattern Discovery from Stock Time Series Using Self-Organizing Maps.” Pp. 26–29 in Workshop Notes of KDD2001 Workshop on Temporal Data Mining. Citeseer.
Golay, Xavier, Spyros Kollias, Gautier Stoll, Dieter Meier, Anton Valavanis, and Peter Boesiger. 1998. “A New Correlation‐based Fuzzy Logic Clustering Algorithm for FMRI.” Magnetic Resonance in Medicine 40(2):249–60.
Graves, Daniel, and Witold Pedrycz. 2010. “Proximity Fuzzy Clustering and Its Application to Time Series Clustering and Prediction.” Pp. 49–54 in 2010 10th International Conference on Intelligent Systems Design and Applications. IEEE.
Harmon, Dion, Blake Stacey, Yavni Bar-Yam, and Yaneer Bar-Yam. 2010. “Networks of Economic Market Interdependence and Systemic Risk.” ArXiv Preprint ArXiv:1011.3707.
Hautamaki, Ville, Pekka Nykanen, and Pasi Franti. 2008. “Time-Series Clustering by Approximate Prototypes.” Pp. 1–4 in 2008 19th International Conference on Pattern Recognition. IEEE.
He, Wenping, Guolin Feng, Qiong Wu, Tao He, Shiquan Wan, and Jifan Chou. 2012. “A New Method for Abrupt Dynamic Change Detection of Correlated Time Series.” International Journal of Climatology 32(10):1604–14.
Huang, Wei-Qiang, Xin-Tian Zhuang, Shuang Yao, and Stan Uryasev. 2016. “A Financial Network Perspective of Financial Institutions’ Systemic Risk Contributions.” Physica A: Statistical Mechanics and Its Applications 456:183–96.
Hüttner, Amelie, Jan-Frederik Mai, and Stefano Mineo. 2018. “Portfolio Selection Based on Graphs: Does It Align with Markowitz-Optimal Portfolios?” Dependence Modeling 6(1):63–87.
Indyk, Piotr, Nick Koudas, and Shanmugavelayutham Muthukrishnan. 2000. “Identifying Representative Trends in Massive Time Series Data Sets Using Sketches.” Pp. 363–72 in 26th International Conference on Very Large Data Bases, VLDB 2000.
Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. 1999. “Data Clustering: A Review.” ACM Computing Surveys (CSUR) 31(3):264–323.
Jain, Prayut, and Shashi Jain. 2019. “Can Machine Learning-Based Portfolios Outperform Traditional Risk-Based Portfolios? The Need to Account for Covariance Misspecification.” Risks 7(3):74.
Kakushadze, Zura, and Willie Yu. 2016. “Statistical Industry Classification.” Journal of Risk & Control 3(1):17–65.
Kalpakis, Konstantinos, Dhiral Gada, and Vasundhara Puttagunta. 2001. “Distance Measures for Effective Clustering of ARIMA Time-Series.” Pp. 273–80 in Proceedings 2001 IEEE international conference on data mining. IEEE.
Keogh, Eamonn, Stefano Lonardi, and Bill’Yuan-chi’ Chiu. 2002. “Finding Surprising Patterns in a Time Series Database in Linear Time and Space.” Pp. 550–56 in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining.
Keogh, Eamonn, Stefano Lonardi, Chotirat Ann Ratanamahatana, Li Wei, Sang-Hee Lee, and John Handley. 2007. “Compression-Based Data Mining of Sequential Data.” Data Mining and Knowledge Discovery 14(1):99–129.
Kohonen, Teuvo. 1990. “The Self-Organizing Map.” Proceedings of the IEEE 78(9):1464–80.
Kumar, Mahesh, Nitin R. Patel, and Jonathan Woo. 2002. “Clustering Seasonality Patterns in the Presence of Errors.” Pp. 557–63 in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining.
Lang, Willis, Michael Morse, and Jignesh M. Patel. 2009. “Dictionary-Based Compression for Long Time-Series Similarity.” IEEE Transactions on Knowledge and Data Engineering 22(11):1609–22.
Latecki, Longin Jan, Vasilis Megalooikonomou, Qiang Wang, Rolf Lakaemper, Chotirat Ann Ratanamahatana, and Eamonn Keogh. 2005. “Elastic Partial Matching of Time Series.” Pp. 577–84 in European Conference on Principles of Data Mining and Knowledge Discovery. Springer.
Lautier, Delphine, and Franck Raynaud. 2012. “Systemic Risk in Energy Derivative Markets: A Graph-Theory Analysis.” The Energy Journal 33(3).
Lee Rodgers, Joseph, and W. Alan Nicewander. 1988. “Thirteen Ways to Look at the Correlation Coefficient.” The American Statistician 42(1):59–66.
Leng, Mingwei, Xinsheng Lai, Guolv Tan, and Xiaohui Xu. 2009. “Time Series Representation for Anomaly Detection.” Pp. 628–32 in 2009 2nd IEEE International Conference on Computer Science and Information Technology. IEEE.
León, Diego, Arbey Aragón, Javier Sandoval, Germán Jairo Hernández, Andrés Arévalo, and Jaime Niño. 2017. “Clustering Algorithms for Risk-Adjusted Portfolio Construction.” Pp. 1334–43 in ICCS.
Letizia, Elisa, and Fabrizio Lillo. 2019. “Corporate Payments Networks and Credit Risk Rating.” EPJ Data Science 8(1):21.
Lohre, Harald, Carsten Rother, and Kilian Axel Schäfer. 2020. “Hierarchical Risk Parity: Accounting for Tail Dependencies in Multi-Asset Multi-Factor Allocations.” Machine Learning and Asset Management, Forthcoming.
MacQueen, James. 1967. “Some Methods for Classification and Analysis of Multivariate Observations.” Pp. 281–97 in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Vol. 1. Oakland, CA, USA.
Meng, Hao, Wen-Jie Xie, Zhi-Qiang Jiang, Boris Podobnik, Wei-Xing Zhou, and H. Eugene Stanley. 2014. “Systemic Risk and Spatiotemporal Dynamics of the US Housing Market.” Scientific Reports 4(1):1–7.
Möller-Levet, Carla S., Frank Klawonn, Kwang-Hyun Cho, and Olaf Wolkenhauer. 2003. “Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points.” Pp. 330–40 in International symposium on intelligent data analysis. Springer.
Morales, Raffaello, T. Di Matteo, and Tomaso Aste. 2014. “Dependency Structure and Scaling Properties of Financial Time Series Are Related.” Scientific Reports 4(1):1–9.
Morse, Michael D., and Jignesh M. Patel. 2007. “An Efficient and Accurate Method for Evaluating Time Series Similarity.” Pp. 569–80 in Proceedings of the 2007 ACM SIGMOD international conference on Management of data.
Münnix, Michael C., Takashi Shimada, Rudi Schäfer, Francois Leyvraz, Thomas H. Seligman, Thomas Guhr, and H. Eugene Stanley. 2012. “Identifying States of a Financial Market.” Scientific Reports 2:644.
Musmeci, Nicoló, Tomaso Aste, and Tiziana Di Matteo. 2014. “Risk Diversification: A Study of Persistence with a Filtered Correlation-Network Approach.” ArXiv Preprint ArXiv:1410.5621.
Onnela, J. P., Anirban Chakraborti, Kimmo Kaski, Janos Kertesz, and Antti Kanto. 2003. “Dynamics of Market Correlations: Taxonomy and Portfolio Analysis.” Physical Review E 68(5):56110.
Papenbrock, Jochen, and Peter Schwendner. 2015. “Handling Risk-on/Risk-off Dynamics with Correlation Regimes and Correlation Networks.” Financial Markets and Portfolio Management 29(2):125–47.
Peralta, Gustavo, and Abalfazl Zareei. 2016. “A Network Approach to Portfolio Selection.” Journal of Empirical Finance 38:157–80.
Polz, E. P. Patrick, Erik Hortnagl, and E. Prem. 2003. “Processing and Clustering Time Series of Mobile Robot Sensory Data.” Austrian Research Institute for Artificial Intelligence: Systemic Intelligence for GrowiNgup Artefacts That Live-SIGNAL.
de Prado, Marcos Lopez. 2020. Machine Learning for Asset Managers. Cambridge University Press.
De Prado, Marcos Lopez. 2016. “Building Diversified Portfolios That Outperform out of Sample.” The Journal of Portfolio Management 42(4):59–69.
Pritchard, Alan. 1969. “Statistical Bibliography or Bibliometrics.” Journal of Documentation 25(4):348–49.
Raffinot, Thomas. 2017. “Hierarchical Clustering-Based Asset Allocation.” The Journal of Portfolio Management 44(2):89–99.
Raffinot, Thomas. 2018. “The Hierarchical Equal Risk Contribution Portfolio.” Available at SSRN 3237540.
Rai, Pradeep, and Shubha Singh. 2010. “A Survey of Clustering Techniques.” International Journal of Computer Applications 7(12):1–5.
Ren, Fei, Ya-Nan Lu, Sai-Ping Li, Xiong-Fei Jiang, Li-Xin Zhong, and Tian Qiu. 2017. “Dynamic Portfolio Strategy Using Clustering Approach.” PloS One 12(1):e0169299.
Rousseau, Denise M. 2012. The Oxford Handbook of Evidence-Based Management. Oxford University Press.
Sakoe, Hiroaki. 1971. “Dynamic-Programming Approach to Continuous Speech Recognition.” in 1971 Proc. the International Congress of Acoustics, Budapest.
Sandhu, Romeil, Tryphon Georgiou, and Allen Tannenbaum. 2015. “Market Fragility, Systemic Risk, and Ricci Curvature.” ArXiv Preprint ArXiv:1505.05182.
Saxena, Amit, Mukesh Prasad, Akshansh Gupta, Neha Bharill, Om Prakash Patel, Aruna Tiwari, Meng Joo Er, Weiping Ding, and Chin-Teng Lin. 2017. “A Review of Clustering Techniques and Developments.” Neurocomputing 267:664–81.
Sfetsos, Athanasios, and Costas Siriopoulos. 2004. “Time Series Forecasting with a Hybrid Clustering Scheme and Pattern Recognition.” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 34(3):399–405.
Smyth, Padhraic. 1997. “Clustering Sequences with Hidden Markov Models.” Pp. 648–54 in Advances in neural information processing systems.
Squartini, Tiziano, Iman Van Lelyveld, and Diego Garlaschelli. 2013. “Early-Warning Signals of Topological Collapse in Interbank Networks.” Scientific Reports 3:3357.
Theodoridis, Sergios, and Rama Chellappa. 2013. Academic Press Library in Signal Processing: Signal Processing Theory and Machine Learning. Academic Press.
Tola, Vincenzo, Fabrizio Lillo, Mauro Gallegati, and Rosario N. Mantegna. 2008. “Cluster Analysis for Portfolio Optimization.” Journal of Economic Dynamics and Control 32(1):235–58.
Tumminello, Michele, Fabrizio Lillo, and Rosario Nunzio Mantegna. 2007. “Shrinkage and Spectral Filtering of Correlation Matrices: A Comparison via the Kullback-Leibler Distance.” ArXiv Preprint ArXiv:0710.0576.
Vlachos, Michail, George Kollios, and Dimitrios Gunopulos. 2002. “Discovering Similar Multidimensional Trajectories.” Pp. 673–84 in Proceedings 18th international conference on data engineering. IEEE.
Waltman, Ludo. 2016. “A Review of the Literature on Citation Impact Indicators.” Journal of Informetrics 10(2):365–91.
Wang, Haixun, Wei Wang, Jiong Yang, and Philip S. Yu. 2002. “Clustering by Pattern Similarity in Large Data Sets.” Pp. 394–405 in Proceedings of the 2002 ACM SIGMOD international conference on Management of data.
Zhang, Xin, Boris Podobnik, Dror Y. Kenett, and H. Eugene Stanley. 2014. “Systemic Risk and Causality Dynamics of the World International Shipping Market.” Physica A: Statistical Mechanics and Its Applications 415:43–53.
Zupic, Ivan, and Tomaž Čater. 2015. “Bibliometric Methods in Management and Organization.” Organizational Research Methods 18(3):429–72.