农业机械学报2026,Vol.57Issue(1):51-61,11.DOI:10.6041/j.issn.1000-1298.2026.01.005
多模态作物表型数据分布式存取方法研究
Distributed Access Method for Multimodal Crop Phenotypic Data
摘要
Abstract
The rapid development of high-throughput crop phenotyping acquisition equipment has provided modern data collection means for breeding and cultivation research,while spawning massive multi-modal and unstructured phenotypic data.Traditional structured data storage models can no longer meet the efficient access requirements of such data.A hybrid access framework was proposed based on distributed technology,which used HBase and HDFS to build a structured and unstructured fusion storage engine,integrated client-side cache and Redis cache to design an efficient retrieval mechanism,and optimized core issues:aiming at the inherent defects of native HDFS in storing phenotypic data,a modal aggregation-based MCH storage framework was designed.By classifying and merging phenotypic data according to modalities and constructing local indexes by using double-layer hashing technology,it effectively reduced NameNode memory pressure while improving access efficiency and storage space utilization of single-modal data.For high-concurrency data reading scenarios,a double-layer cache mechanism based on data popularity was constructed.It optimized hot data reading efficiency through metadata hierarchical caching and innovatively proposed a data popularity evaluation model combining access frequency and time characteristics,which effectively improved cache hit rate.Experimental results showed that when the data scale was 1.0×105,the proposed distributed access method reduced the NameNode memory occupancy rate by 31.2%compared with the optimal native solution(SequenceFile),and the retrieval time by 25.4%compared with the optimal native solution(MapFile),providing technical support for the storage and retrieval of massive multi-modal phenotypic data.关键词
多模态作物表型数据/分布式存取/文件合并/双层缓存机制Key words
multimodal crop phenotypic data/distributed access/file merging/two-level cache mechanism分类
信息技术与安全科学引用本文复制引用
HAO Zichao,ZHAO Xiangyu,PAN Shouhui,LIU Dongming,WANG Kaiyi..多模态作物表型数据分布式存取方法研究[J].农业机械学报,2026,57(1):51-61,11.基金项目
国家重点研发计划项目(2022YFD2002303-03)和北京市乡村振兴项目(NY2401040425) (2022YFD2002303-03)