Query 3

[24.08.19 / CVPR 24'] BoQ: A Place is Worth a Bag of Learnable Queries

https://openaccess.thecvf.com/content/CVPR2024/papers/Ali-bey_BoQ_A_Place_is_Worth_a_Bag_of_Learnable_Queries_CVPR_2024_paper.pdfAbstractIn visual place recognition, accurately identifying and matching images of locations under varying environmental conditions and viewpoints remains a significant challenge. In this paper, we introduce a new technique, called Bag-of-Queries (BoQ), which learns a ..

[24.08.05 / ICCV23'] Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

https://openaccess.thecvf.com/content/ICCV2023/papers/Cao_Attention_Where_It_Matters_Rethinking_Visual_Document_Understanding_with_Selective_ICCV_2023_paper.pdfAbstractWe propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automati..

[24.07.28 / CVPR24'] Querying as Prompt: Parameter-Effcient Learning for Multimodal Language Model

https://openaccess.thecvf.com/content/CVPR2024/papers/Liang_Querying_as_Prompt_Parameter-Efficient_Learning_for_Multimodal_Language_Model_CVPR_2024_paper.pdfAbstractRecent advancements in language models pre-trained on large-scale corpora have signifcantly propelled developments in the NLP domain and advanced progress in multimodal tasks. In this paper, we propose a Parameter Effcient multimodal..