분류 전체보기 29

[24.08.24 / CVPR 24'] BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

https://openaccess.thecvf.com/content/CVPR2024/papers/Bai_BadCLIP_Trigger-Aware_Prompt_Learning_for_Backdoor_Attacks_on_CLIP_CVPR_2024_paper.pdfAbstractContrastive Vision-Language Pre-training known as CLIP has shown promising effectiveness in addressing downstream image recognition tasks. However recent works revealed that the CLIP model can be implanted with a downstream-oriented backdoor. On ..

[24.08.22 / CVPR 24'] LTM: Lightweight Textured Mesh Extraction and Refnement of Large Unbounded Scenes for Effcient Storage and Real-time Rendering

https://openaccess.thecvf.com/content/CVPR2024/papers/Choi_LTM_Lightweight_Textured_Mesh_Extraction_and_Refinement_of_Large_Unbounded_CVPR_2024_paper.pdfAbstractAdvancements in neural signed distance felds (SDFs) have enabled modeling 3D surface geometry from a set of 2D images of real-world scenes. Baking neural SDFs can extract explicit mesh with appearance baked into texture maps as neural fe..

[24.08.21 / CVPR 24'] Brush2Prompt: Contextual Prompt Generator for Object Inpainting

https://openaccess.thecvf.com/content/CVPR2024/papers/Chiu_Brush2Prompt_Contextual_Prompt_Generator_for_Object_Inpainting_CVPR_2024_paper.pdfObject inpainting is a task that involves adding objects to real images and seamlessly compositing them. With the recent commercialization of products like Stable Diffusion and Generative Fill, inserting objects into images by using prompts has achieved imp..

[24.08.20 / CVPR 24'] Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

https://openaccess.thecvf.com/content/CVPR2024/papers/Saha_Improved_Zero-Shot_Classification_by_Adapting_VLMs_with_Text_Descriptions_CVPR_2024_paper.pdfAbstractThe zero-shot performance of existing vision-language models $($VLMs$)$ such as CLIP is limited by the availability of large-scale, aligned image and text datasets in specific domains. In this work, we leverage two complementary sources o..

[24.08.19 / CVPR 24'] BoQ: A Place is Worth a Bag of Learnable Queries

https://openaccess.thecvf.com/content/CVPR2024/papers/Ali-bey_BoQ_A_Place_is_Worth_a_Bag_of_Learnable_Queries_CVPR_2024_paper.pdfAbstractIn visual place recognition, accurately identifying and matching images of locations under varying environmental conditions and viewpoints remains a significant challenge. In this paper, we introduce a new technique, called Bag-of-Queries (BoQ), which learns a ..

[24.08.18 / CVPR 24'] HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation

https://openaccess.thecvf.com/content/CVPR2024/papers/Leng_HyperSDFusion_Bridging_Hierarchical_Structures_in_Language_and_Geometry_for_Enhanced_CVPR_2024_paper.pdfAbstract3D shape generation from text is a fundamental task in 3D representation learning. The text-shape pairs exhibit a hierarchical structure, where a general text like “chair” covers all 3D shapes of the chair, while more detailed ..

[24.08.17 / CVPR 24'] Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Bridging_the_Synthetic-to-Authentic_Gap_Distortion-Guided_Unsupervised_Domain_Adaptation_for_Blind_CVPR_2024_paper.pdfAbstractThe annotation of blind image quality assessment (BIQA) is labor-intensive and time-consuming especially for authentic images. Training on synthetic data is expected to be beneficial but synthetically trained models..

[24.08.16 / ECCV 22'] Fast and High Quality Image Denoising via Malleable Convolution

https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136780420.pdfAbstractMost image denoising networks apply a single set of static convolutional kernels across the entire input image. This is sub-optimal for natural images, as they often consist of heterogeneous visual patterns. Dynamic convolution tries to address this issue by using per-pixel convolution kernels, but this greatly increas..

[24.08.15 / ECCV 22'] KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints

https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136750176.pdfAbstractImage-based volumetric humans using pixel-aligned features promise generalization to unseen poses and identities. Prior work leverages global spatial encodings and multi-view geometric consistency to reduce spatial ambiguity. However, global encodings often suffer from overfitting to the distribution of the training da..