[24.07.28 / CVPR24'] Querying as Prompt: Parameter-Effcient Learning for Multimodal Language Model

Daily Abstract Digest

[24.07.28 / CVPR24'] Querying as Prompt: Parameter-Effcient Learning for Multimodal Language Model

Emos Yalp 2024. 7. 28. 09:04

https://openaccess.thecvf.com/content/CVPR2024/papers/Liang_Querying_as_Prompt_Parameter-Efficient_Learning_for_Multimodal_Language_Model_CVPR_2024_paper.pdf

Abstract

Recent advancements in language models pre-trained on large-scale corpora have signifcantly propelled developments in the NLP domain and advanced progress in multimodal tasks. In this paper, we propose a Parameter Effcient multimodal language model learning strategy, named QaP $($Querying as Prompt$)$. Its core innovation is a novel modality-bridging method that allows a set of modality-specifc queries to be input as soft prompts into a frozen pre-trained language model. Specifcally, we introduce an effcient Text-Conditioned Resampler that is easy to incorporate into the language models, which enables adaptive injection of text-related multimodal information at different levels of the model through query learning. This approach effectively bridges multimodal information to the language models while fully leveraging its token fusion and representation potential. We validated our method across four datasets in three distinct multimodal tasks. The results demonstrate that our QaP multimodal language model achieves state-of-the-art performance in various tasks with training only 4.6% parameters.

Task: Multimodal Language Model
Problem Definition: $($Not defined in the abstract$)$ Computation cost, parameter inefficiency
Approach: Allowing a set of modality-specific queries to be input as soft prompts into a frozen pre-trained language model

*Soft Prompts: soft prompts are learnable tensors concatenated with the input embeddings that can be optimized to a dataset; the downside is that they aren’t human readable because you aren’t matching these “virtual tokens” to the embeddings of a real word

'Daily Abstract Digest' 카테고리의 다른 글

[24.07.31 / CVPR24'] LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment (0)	2024.07.31
[24.07.30 / CVPR24'] DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences Neuron Visualizations and Visual Counterfactual Explanations (0)	2024.07.30
[24.07.29 / CVPR24'] Multimodal Representation Learning by Alternating Unimodal Adaptation (0)	2024.07.29
[24.07.27 / CVPR24'] Unbiased Estimator for Distorted Conics in Camera Calibration (0)	2024.07.27
[24.07.26 / CVPR24'] An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning (0)	2024.07.26

현재글[24.07.28 / CVPR24'] Querying as Prompt: Parameter-Effcient Learning for Multimodal Language Model

Emos Yalp

ICCV, llm, eccv, cvpr, inpainting, visual place recognition, rendering, NERF, diffusion, Abstract, 3d, noisy label, Convolution, 3d shape, TTA, re-identification, multimodal, clip, astract, Query,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Emos Yalp