Tianchen Gao, Yan Zhang, Rui Pan
2024, 3(4): 237-260.
Relational structures consisting of different types of interactions among several groups of entities are very common nowadays.As a useful tool for analyzing this type of data,multi-layer networks have gained increasing attention in recent years due to their ability to capture the complexity of real-world systems.Multi-layer academic networks are a specific type of multi-layer network that consists of multiple layers of relationships among academic entities,such as researchers,institutions,papers,or journals.Typical examples of multi-layer academic networks include the collaboration network that represents co-authorship relationships among researchers,the citation network that represents citation relationships among papers,and the journal citation networks that represent citation relationships among journals.They have been used for various purposes,such as identifying research areas,evaluating research impact,predicting scientific trends,studying the diffusion of scientific knowledge,and supporting science policy and decision-making.Overall,multi-layer academic networks provide a powerful tool for understanding and analyzing the complex relationships that underlie academic communities and their impact on scientific knowledge production and dissemination.
In this work,we collect data from 42 statistical journals published between 1981 and 2021 from the Web of Science(www.webofscience.com).Our LMANStat dataset includes basic information on 97,436 papers,including their title,abstract,keywords,publisher,published date,volume and pages,document type,citation counts,author information (name,ORCID,address,region,and institution),as well as their reference lists.Based on this information,we construct multi-layer academic networks,including collaboration network,co-institution network,citation network,co-citation network,journal citation network,author citation network,author-paper network,and keyword co-occurrence network.These networks change dynamically over time,providing a dynamic analytical perspective during analysis.Moreover,we also include rich nodal attributes of authors,such as the authors’ research interests,to enhance the usefulness of our dataset.The LMANStat dataset is publicly available on GitHub,and can be accessed directly at https://github.com/Gaotianchen97/LMANStat.
We present a comprehensive overview of our methodology,which covers the complete workflow from data collection to data cleaning,as well as the construction of multi-layer academic networks.Subsequently,we provide detailed explanations regarding author and paper identification,the extraction of author attributes,and the construction of multi-layer academic networks.Next,we validate the dataset through various potential scenarios for exploring and analyzing our multi-layer academic networks.To emphasize the usability of our dataset,key insights into the characteristics of the data are also provided,aligning with historical research findings and the consensus among statisticians.More importantly,the LMANStat dataset is extensively utilized by our research team to validate its usability.In our multi-layer academic networks,the collaboration network and citation network are the most commonly used networks.Therefore,we utilize them for verification.Additionally,we also consider the journal citation network with journals as nodes and the keyword co-occurrence network with keywords as nodes to validate the LMANStat dataset.
For the collaboration network,the scale-free phenomenon can be detected through a log-log degree distribution plot,which is also referenced in research on collaborative networks.The average number of authors per paper shows an increasing trend by year,indicating collaboration trends in statistical research.By visualizing a sub-network of the collaboration network,we identified the top 4 authors with the highest degrees,whose innovative methods and insights have had a profound impact on the field of statistics.As for the citation network,the in-degree of a paper is crucial as it represents the number of times the paper has been cited within the network.A higher in-degree implies a greater number of citations for a paper.Within our dataset,the average in-degree per paper is 5.31,which correlates closely with the Impact Factor (IF) of the selected journals.
In addition,journal citation networks are often employed for ranking journals,which is considered an important indicator for evaluating the quality and impact of publications in specific research fields.Therefore,we validate the accessibility of the journal citation network through journal ranking.By calculating the PageRank centrality of each node (journal),we can effectively rank journals based on their importance.Interestingly,we observe the phenomenon that the ranking of journals based on PageRank centrality closely aligns with the expectations and intuitions of statisticians.This suggests that the PageRank-based approach provides a ranking that resonates well with the perceptions of experts in the field.
It is important to note that the multi-layer academic networks presented in this paper are all dynamic in nature.Taking the citation network as an illustration,we showcase the dynamic nature of the network.The visualization includes snapshots of the network from different time periods:1980—2006,1980—2010,and 1980—2020.It is evident from the visualization that the citation network exhibits a community structure that undergoes constant changes over time.This community has shown continuous growth over the years,as indicated by the increasing number of papers associated with variable selection.In conclusion,it is worth emphasizing that the networks within this dataset are all dynamic,thereby enabling the exploration of dynamic nature.
In conclusion,the paper utilizes statistical publication data collected from the Web of Science to provide a large-scale,high-quality multi-layer academic network dataset (LMANStat dataset).The study further validates the quality and usability of these constructed multi-layer academic networks from multiple perspectives.It discusses feasible research directions and application scenarios,including but not limited to exploring community structures within academic networks,tracing the development and evolution of research topics,investigating mechanisms behind citation counts of papers,discussing the impact of international and inter-institutional collaborations,exploring career planning and development of researchers,and establishing more diversified journal ranking systems.