摘要
Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e. covariates. However, current methods often struggle with selecting tuning parameters and analyzing low-degree nodes. In this paper, we introduce a novel method that addresses these challenges by constructing network-adjusted covariates, which leverage the network connections and covariates with a unique weight to each node based on the node's degree. Spectral clustering on network-adjusted covariates yields an exact recovery of community labels under certain conditions, which is tuning-free and computationally efficient. We present novel theoretical results about the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of mis-specification and sparse communities with bounded degrees. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network, and provides interpretable community structures in a statistics publication citation network where $30\%$ of nodes are isolated.
嘉宾介绍
王婉洁,本科毕业于北大数学系,之后去Carnegie Mellon Univ读了统计学博士,在Jin Jiashun老师的指导下做高维聚类分析方面的研究。拿到博士学位后,在UPenn跟随Tony Cai老师和Li Hongzhe老师做了一段时间博士后,之后入职新加坡国立大学到如今。研究兴趣包括高维统计(基因分析),社交网络分析,以及统计心理学分析。
狗熊会线上学术报告厅向数据科学及相关领域的学者及从业者开放,非常期待各位熊粉报名或推荐报告人。相关事宜,请联系:常莹,ying.chang@clubear.org。