智能科学与技术学报2025,Vol.7Issue(4):505-516,12.DOI:10.11959/j.issn.2096-6652.202542
开源社区中项目热度阶段预测方法研究——以大语言模型项目为例
Research on popularity stage prediction methods for open-source community projects:a case study of large language model projects
摘要
Abstract
In recent years,the cumulative number of registered developers on the open-source community GitHub has ex-ceeded 100 million,leading to increasingly fierce competition for user attention among projects.Understanding the dy-namics of project popularity in open-source communities not only helps researchers and developers grasp technological trends,but also provides valuable references for resource allocation,collaboration decisions,and research investment.However,existing studies often treat popularity as a continuous numerical regression or binary classification task,over-looking the stage-wise evolutionary patterns and gradual transitions exhibited in real project popularity dynamics.Open-source large language model(LLM)projects were taken as the research object.An integrated framework for popularity stage segmentation and prediction was proposed,enabling the automatic characterization and forecasting of project popu-larity stages.Specifically,feature vectors based on growth rate and cumulative scale of project attention within fixed win-dows were extracted,and a Gaussian mixture model(GMM)for soft clustering was employed to obtain four interpretable stages of popularity with probabilistic labels.Subsequently,a prediction model based on the temporal fusion transformer(TFT)was designed,which leveraged sequential data of five behavioral indicators,such as attention,number of forks,and number of commits,to forecast the stage probability distribution in future windows.Experimental results demonstrate that the proposed method outperforms existing baselines in terms of cross-entropy,mean squared error,mean absolute error,and accuracy,while maintaining high precision across all four patterns of popularity evolution,thereby validating its gen-eralization capability.The proposed approach provides a practical and scalable framework for popularity stage prediction of open-source projects,offering strong utility and potential for applications.关键词
开源社区/大语言模型/机器学习Key words
open-source community/large language model/machine learning分类
信息技术与安全科学引用本文复制引用
卢小艺,陈阳,傅晓明,李聪..开源社区中项目热度阶段预测方法研究——以大语言模型项目为例[J].智能科学与技术学报,2025,7(4):505-516,12.基金项目
国家自然科学基金项目(No.62173095,No.U23A20331,No.62573133,No.62072115) (No.62173095,No.U23A20331,No.62573133,No.62072115)
上海市科技创新行动计划项目(No.22510713600) The National Natural Science Foundation of China(No.62173095,No.U23A20331,No.62573133,No.62072115),Shanghai Science and Technology Innovation Action Plan Project(No.22510713600) (No.22510713600)