校内登录

柯炜

副教授

基本信息 / Basic Information

  • 电子邮箱:
  • 所在单位: 软件学院
  • 学历: 硕博连读
  • 办公地点:
  • 性别: 男
  • 联系方式:
  • 学位: 博士
  • 博士生导师: 是
  • 硕士生导师: 是
  • 所属院系: 软件学院
  • 学科: 计算机科学与技术

我的新闻

当前位置: 中文主页 - 我的新闻

两篇论文被ICCV接收,恭喜师玮、张鹏!

发布时间:2025-06-27
点击次数:
发布时间:
2025-06-27
文章标题:
两篇论文被ICCV接收,恭喜师玮、张鹏!
内容:

CountSE: Soft Exemplar Open-set Object Counting

Abstract:

Open-set counting is garnering increasing attention due to its capability to enumerate objects of arbitrary category. It can be generally categorized into two methodologies: text-guided zero-shot counting methods and exemplar-guided few-shot counting methods. Previous text-guided zero-shot methods only provide limited object information through text, resulting in poor performance. Besides, though exemplar-guided few-shot approaches gain better results, they rely heavily on manually annotated visual exemplars, resulting in low efficiency and high labor intensity. Therefore, we propose CountSE, which simultaneously achieves high efficiency and high performance. CountSE is a new text-guided zero-shot object counting algorithm that generates multiple precise soft exemplars at different scales to enhance counting models driven solely by semantics. Specifically, to obtain richer object information and address the diversity in object scales, we introduce Semantic-guided Exemplar Selection, a module that generates candidate soft exemplars at various scales and selects those with high similarity scores. Then, to ensure accuracy and representativeness, Clustering-based Exemplar Filtering is introduced to refine the candidate exemplars by effectively eliminating inaccurate exemplars through clustering analysis. In the text-guided zero-shot setting, CountSE outperforms all state-of-the-art methods on the FSC-147 benchmark by at least 15%. Additionally, experiments on two other widely used datasets demonstrate that CountSE significantly outperforms all previous text-guided zero-shot counting methods and is competitive with the most advanced exemplar-guided few-shot methods. Codes will be available.

 

Enhancing Zero-shot Object Counting via Text-guided Local Ranking and Number-evoked Global Attention

Text-guided zero-shot object counting leverages vision-language models (VLMs) to count objects of an arbitrary class given by a text prompt. Existing approaches for this challenging task only utilize local patch-level features to fuse with text feature, ignoring the important influence of the global image-level feature. In this paper, we propose a universal strategy that can exploit both local patch-level features and global image-level feature simultaneously. Specifically, to improve the localization ability of VLMs, we propose Text-guided Local Ranking. Depending on the prior knowledge that foreground patches have higher similarity with the text prompt, a new local-text rank loss is designed to increase the differences between the similarity scores of foreground and background patches which push foreground and background patches apart. To enhance the counting ability of VLMs, Number-evoked Global Attention is introduced to first align global image-level feature with multiple number-conditioned text prompts. Then, the one with the highest similarity is selected to compute cross-attention with the global image-level feature. Through extensive experiments on widely used datasets and methods, the proposed approach has demonstrated superior advancements in performance, generalization, and scalability. Furthermore, to better evaluate text-guided zero-shot object counting methods, we propose a dataset named ZSC-8K, which is larger and more challenging, to establish a new benchmark. Codes and ZSC-8K dataset will be available.