Social Network Analysis of 207 teachers on bilibili.com with Python and R
date
Apr 30, 2023
slug
social-network-analysis
status
Published
tags
Research
summary
The web of influence: teacher slash influencers’ social sphere
type
Post
I. Research Background and Research Questions
Bilibili, commonly referred to as Bilibili, is a prominent cultural community and video platform in China, predominantly frequented by the younger generation. Established on June 26, 2009, the platform has, over a decade, expanded to encompass more than 7,000 interest circles. It has cultivated a robust content production system and a diverse cultural ecosystem centered around its users, creators, and content.
Currently, Bilibili boasts a roster of over 5,000 educators who have registered under their individual names. Notable figures among them include Professors Luo Xiang and Dai Jinhua. As of September 9, 2022, Bilibili's official "Treasure Teacher" list (https://www.BiliBili.com/BlackBoard/theteachersatBiliBili-pc.html) features 207 educators. This list comprises 7 members of the Chinese Academy of Sciences, 50 educators each from the fields of social science, law, psychology, literature, history, philosophy, science, engineering, agriculture, and workplace interests. The significant influence and popularity of these educators on Bilibili underscore the platform's evolving dynamics.
Drawing an analogy, if traditional educational institutions represent curated knowledge galleries, Internet New Media, like Bilibili, can be likened to vibrant marketplaces of knowledge. On Bilibili, content creators maintain distinct accounts and homepages, making the "teacher" identity as prominent as the knowledge they share. Furthermore, the platform's interactive nature fosters a symbiotic relationship between knowledge dissemination and community engagement.
This research aims to delve into the social network dynamics among the 207 educators listed in Bilibili's "Treasure Teacher" roster. Our objective is to delineate and analyze the structural nuances of this network, offering insights into how knowledge disseminators navigate online social interactions and mass communication in the contemporary platform economy.
Additionally, we seek to juxtapose perceptions of real-world social dynamics with those manifested online. A prevalent assumption, for instance, is that training institution educators and university faculty represent distinct, non-overlapping cohorts. Can such assumptions withstand empirical scrutiny in the digital realm? This study endeavors to address such queries.
Our specific research questions include:
- What constitutes the relational structure of the 207 educators' network? What foundational attributes, such as network density and centrality, characterize this network?
- Do content creators (UPs) with analogous attributes exhibit a propensity to engage with one another, leading to discernible network segmentation? Is there a tangible correlation between the network ingress and egress degrees of educator nodes and the volume and tier of their followers?
- Do instances of emulation, citation, and collaboration exist among the 207 educators?
We hypothesize that the network primarily delineates segmentation and core-periphery dynamics, potentially mirroring broader societal stratifications, such as gender disparities in academia or institutional hierarchies stemming from social labor divisions. In essence, online platforms, contrary to utopian ideals, might merely reflect societal realities.
II. Research Design
2.1 Research Methods
Social Network Analysis (SNA) offers a comprehensive toolset for examining intricate social relationships, enhancing the exploration of social network structures both methodologically and in terms of specific techniques. This study leverages the foundational principles of SNA, employing Python and R programming languages to extract and analyze the followership network data of Bilibili's "Treasure Teacher" UPs. By discerning key participants from both the overarching network structure and individual positional structures, we aim to elucidate the characteristics of the Bilibili "Treasure Teacher" network community.
2.2 Data Collection
The data for this research is sourced from Bilibili's network data. We utilized a Python-based "Spider" (incorporating sqlite3, json, requests, and pandas packages) to retrieve user details of the 207 teachers (refer to Appendix I for code) and the associated followership network data (refer to Appendix II for code). The extracted node data is archived in the "207userData.csv" file, while the network connectivity data is housed in the "207relation.csv" file. The network connectivity data encapsulates the mutual followership interactions among the 207 teachers (directional in nature). The node data encompasses Bilibili User Identification, UID, gender, LV ranking, fan count, number of followings, subject domain, and teacher identity for the 207 educators, culminating in a dataset comprising eight distinct data sets.
2.3 Data Processing
Post data acquisition, we employed the R programming language within RStudio to execute social network analysis and associated statistical evaluations on the data.
For data import, the R network package was utilized to construct the directed, non-comprehensive network relationship dataset, termed net207. The user data from "207userData.csv" served as the node data. Concurrently, we established the igraph network dataset, denoted as g, to facilitate analyses using diverse packages.
The data analysis is bifurcated into two segments: descriptive analysis and correlation analysis. Within the descriptive analysis, we assess the network density of the followership network formed by the 207 educators and delve into the distribution patterns of degree centrality, betweenness centrality, and closeness centrality. The correlation analysis segment employs node data to apply color labels to distinct nodes, enabling us to discern whether UPs with analogous individual attributes exhibit a propensity for mutual followership, thereby leading to internal network segmentation. Simultaneously, we explore the correlation between node in-degrees and out-degrees, follower count, and LV ranking.
For data visualization, we predominantly employ histograms, scatter plots, and network graphs to present our findings.
III Analysis and results
3.1 Network Visualization
To elucidate the variations in attention among different UPs on the network platform, we utilized a node's radius, which was determined by the number of its fans. Initially, the number of fans was divided by one million, which was then assigned to determine the node size. The resulting visualization is presented below:

It is evident that certain teachers command a disproportionately high level of attention, overshadowing others in the network. For instance, a prominent node in the visualization represents the Bilibili top streamer, Luo Xiang. To achieve a more balanced representation, we applied a cube root transformation to the data and then divided by 100. The formula used was
vertex.cex = ((network::get.vertex.attribute(net207, "follower"))^(1/3))/100
. This resulted in a clearer visualization for the network of 207 teachers, where the size disparities between nodes more accurately reflected fan number differences.
207 teacher network
From the visualization, it is apparent that 68 teachers do not have any connections with their peers. For a more streamlined visualization, we excluded these 68 teachers, resulting in a network of the remaining 139 teachers.

139 teacher network
In this network, distinct colors were employed to categorize individuals based on specific attributes:

Coloring nodes in the network according to LV level (purple-LV6; teal-LV5)
- Node Coloration by LV Level: Purple represents LV6, and Teal denotes LV5. Bilibili's levels are determined by experience points, with various activities contributing to these points. It is observed that the majority of UPs in this network are at LV6, followed by those at LV5, with fewer at LV4 and below.

Coloring nodes in the network by gender (black - male; red - female)
- Node Coloration by Gender: Black signifies male, and Red indicates female. The majority of nodes in this network represent males.

Coloring nodes in the network according to whether they are in-service teachers in colleges and universities (gray - colleges; yellow - non-colleges)
- Node Coloration by Professional Affiliation: Gray represents college-affiliated teachers, while Yellow denotes those not affiliated with colleges. A significant portion of the network comprises college-affiliated teachers.
3.2 Description and analysis
3.2.1 Network Density Analysis
Network density provides insights into the interconnectedness of nodes, highlighting the extent of interactions. Based on R's base operations, the network density for the 207 teachers stands at 0.005229586. This suggests a sparse network with minimal connections. Excluding the 68 isolated nodes, the network density for the remaining 139 teachers is 0.01162548, which remains relatively sparse.
3.2.2 Centrality Analysis
Centrality serves as a crucial metric in social network analysis, signifying the prominence and influence of a node within the network. A node's position at the network's core directly correlates with its centrality value, denoting its significance and sway within the network. Key metrics for assessing centrality encompass degree centrality, betweenness centrality, and closeness centrality.

In terms of degree centrality within the network of 139 teachers, there is a notable disparity. A significant proportion of teachers have a centrality value less than 4. Most teachers are connected to no more than six peers. One particular teacher, identified as "Wu Ya Lin who plays art," has connections with 25 peers, while "Luo Xiang all about criminal law" is connected to by 19 peers.

Degree centrality offers a straightforward measure of a node's significance within the network. A higher degree centrality indicates a node's elevated importance.
The third visualization from the left depicts the distribution of standardized betweenness centrality measurements. Betweenness centrality gauges a node's importance based on the frequency it appears on the shortest paths between other nodes. As illustrated, in the network formed by the 139 teachers, a majority of nodes have a betweenness value less than 0.001. This suggests that only a few individuals are likely to be positioned on the shortest path between two other individuals. Consequently, most nodes occupy non-central positions, indicating a relatively dispersed network structure.
The fourth visualization from the left presents the distribution of normalized closeness centrality measurements. Closeness centrality measures the proximity of a node to all other nodes in the network, calculated as the cumulative reciprocal of the shortest path distances from one node to all others. Essentially, the closer a node is to its peers, the higher its closeness centrality. The depicted distribution reveals a relatively sparse network, with most nodes having a closeness centrality value not exceeding 0.5, predominantly ranging between 0 and 0.4.
3.3 Correlation analysis
3.3.1 Internal Group Differentiation
We next investigate potential group variations within the network based on gender and professional identity. This is substantiated by calculating the mixing degree (briefly presented) and employing the z-score test (tabulated).
(1) Gender
Our analysis reveals that only the male → female interaction is statistically significant (where the absolute z-value exceeds 1.96). This suggests that within this UP network, males tend to interact less with females. This pattern is not merely a result of random association. However, we cannot conclusively state that females in the network interact less with males. This observation might be influenced by the male-dominated composition of the network (168/207, or 81.16%). It also implies that female UPs are more peripheral compared to their male counterparts, rather than merely forming gender-based clusters.
(2) Professional Identity
We categorize UP teachers into two groups: college-affiliated and non-college-affiliated. Using the z-score as previously, we observe significant group clustering within the network. Specifically, interactions are more prevalent within members of the same group, while cross-group interactions are less frequent. It's worth noting that categorizing teachers' professional identity as binary is an oversimplification. The non-college group encompasses educators from business training institutions, primary and secondary schools, and researchers from advanced research institutions. Given the limited number of non-college teachers within the 207 UPs (66 in total), a more granular classification was not pursued.
3.3.2 Linear Regression Analysis

The above visualization depicts a scatter plot correlating individual LV rank, follower count, and respective network degrees. The red line represents a linear fit. Notably, ingress degree exhibits a significant positive correlation with LV rank and follower count. However, egress degree does not show a significant correlation with either LV rank or follower count.
Utilizing the
cor.test()
function in R, we derived the following correlations:- Entry degree and LV rank: p-value = 0.001, cor = 0.2259.
- Exit degree and LV rank: p-value = 0.632, cor = -0.0335.
- Follower count and entry degree: p-value < 2.2e-16, cor = 0.7441.
- Follower count and exit degree: p-value = 0.2669, cor = 0.0775.
These results suggest that UPs with more followers and higher LV ranks are more likely to be followed by other educators within the network of 207 selected teachers.
3.3.3 Regression analysis of ERGM model
To further quantify the impact of gender and professional status on network connections, we employed the ERGM model for regression analysis. This model accounts for reciprocity in network connections. Our findings indicate that when considering reciprocity, the model offers enhanced predictability.
Based on the ERGM model regression results, within the network of 207 selected teachers, two UP teachers affiliated with colleges are approximately 1.726 times more likely to form connections than two UP teachers without college affiliations.
Furthermore, our analysis reveals that female UPs are more inclined to interact with other UPs within the same academic discipline.
3.4 Other observations
3.4.1 UP Imitation
We observed a trend where lesser-known UPs with lower ranks and fewer followers emulate prominent UPs by mimicking their nicknames and content. For instance, "Professor Guo who doesn't brush the question" in biology appears to be inspired by "Grandma Wu who doesn't brush the question" in physics.
3.4.2 Mutual UP Interactions
Examples include collaborations between Liang Yongan and Luo Yuming from Fudan University and Dong Chenyu and Liu Hailong from Renmin University's Department of Journalism and Communication. These collaborations often manifest as LIVE sessions on Bilibili.
Specifically:
- Among the 139 UPs, mutual interactions are limited, with only 25 pairs.
- These mutual interactions span diverse social relationships, such as affiliations with universities, off-campus institutions, and academic disciplines. Notably, there are more mutual interactions within the humanities than in science and engineering.
- Three foreign Nobel laureates exhibit mutual interactions, with notable similarities in their profile designs, possibly curated by Bilibili's Overseas Operations Department.
3.4.3 Collaborative Projects and Their Influence
Beyond academic collaborations, many educators also partner with official departments, publishers, or participate in events, variety shows, and documentaries organized by Bilibili.
IV Summary
4.1 Research Conclusions
- The relational network among educators is notably sparse, with a network density of 0.005229586 for 207 teachers and 0.01162548 for 139 teachers.
- Within the network of 207 selected teachers, individuals with similar academic disciplines and institutional affiliations are more likely to form connections.
- The likelihood of being noticed by other educators in the network increases with the number of followers and higher LV rank.
- Mutual interactions, collaborations, and imitation are prevalent among educators. However, the overall number of such interactions is limited, representing 25 pairs out of 233 network links.
4.2 Research Limitations
This study, while providing a foundational descriptive analysis and correlation exploration of the network data of 207 educators, has its limitations:
- The sample may not be fully representative. The list of "officially certified" UP educators by Bilibili was used, but the criteria for this list are not transparent.
- The data collected is not comprehensive. Only basic data was scraped, and additional data like age, onboarding time, number of works, and total video views were not obtained.
- A significant portion of the network consists of isolated nodes, making data correlation and subgroup analysis challenging.
- The study contains numerous variables, and establishing clear causal relationships is challenging. Many insights cannot be quantified through network analysis and remain speculative.
References
Liu Lu, 2020-06-29, Research on Cultural Communication of Interest Groups Based on Social Network Analysis-Taking the Chinese-style Cultural Interest Groups in Station B as an Example, Master's Thesis of Jinan University
Huo Lili, Research on the Network Connectivity Relationship of Takeaway Riders from the Perspective of Social Network Analysis, 2021, Theoretical Research, DOI: 10.16604/j.cnki.issn 2096-0360.2021.24.003
Bilibili bullet comment website, 2022-09-09, Bilibili Teachers' Day, Mid-Autumn Festival special project "The Man Who Sends the Moon" | One inch of moonlight and thousands of miles away, no scroll of life. https://www.bilibili.com/video/BV1Bd4y1X7Ej/
surging news, 2022-09-25, When college teachers become Internet celebrity "UP", https://www.thepaper.cn/newsDetail_forward_20033089