ICCV 7

[24.08.09 / ICCV23'] Efficient 3D Semantic Segmentation with Superpoint Transformer

https://openaccess.thecvf.com/content/ICCV2023/papers/Robert_Efficient_3D_Semantic_Segmentation_with_Superpoint_Transformer_ICCV_2023_paper.pdfAbstract We introduce a novel superpoint-based transformer architecture for efficient semantic segmentation of large-scale 3D scenes. Our method incorporates a fast algorithm to partition point clouds into a hierarchical superpoint structure, which makes ..

[24.08.08 / ICCV 23'] Multi-Task Learning with Knowledge Distillation for Dense Prediction

https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_Multi-Task_Learning_with_Knowledge_Distillation_for_Dense_Prediction_ICCV_2023_paper.pdfAbstract While multi-task learning (MTL) has become an attractive topic, its training usually poses more difficulties than the single-task case. How to successfully apply knowledge distillation into MTL to improve training efficiency and model performan..

[24.08.06 / ICCV23'] Leveraging Inpainting for Single-Image Shadow Removal

https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Leveraging_Inpainting_for_Single-Image_Shadow_Removal_ICCV_2023_paper.pdfAbstractFully-supervised shadow removal methods achieve the best restoration qualities on public datasets but still generate some shadow remnants. One of the reasons is the lack of large-scale shadow & shadow-free image pairs. Unsupervised methods can alleviate the is..

[24.08.05 / ICCV23'] Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

https://openaccess.thecvf.com/content/ICCV2023/papers/Cao_Attention_Where_It_Matters_Rethinking_Visual_Document_Understanding_with_Selective_ICCV_2023_paper.pdfAbstractWe propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automati..

[24.08.04 / ICCV23'] Towards Open-Vocabulary Video Instance Segmentation

https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_Towards_Open-Vocabulary_Video_Instance_Segmentation_ICCV_2023_paper.pdfAbstract Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this limitation, we make the follo..

[24.08.03 / ICCV23'] Towards Open-Set Test-Time Adaptation Utilizing the Wisdom of Crowds in Entropy Minimization

https://openaccess.thecvf.com/content/ICCV2023/papers/Lee_Towards_Open-Set_Test-Time_Adaptation_Utilizing_the_Wisdom_of_Crowds_in_ICCV_2023_paper.pdfAbstractTest-time adaptation (TTA) methods, which generally rely on the model's predictions (e.g., entropy minimization) to adapt the source pretrained model to the unlabeled target domain, suffer from noisy signals originating from 1) incorrect or ..

[24.08.02 / ICCV23'] Distribution-Aligned Diffusion for Human Mesh Recovery

https://openaccess.thecvf.com/content/ICCV2023/papers/Foo_Distribution-Aligned_Diffusion_for_Human_Mesh_Recovery_ICCV_2023_paper.pdfAbstractRecovering a 3D human mesh from a single RGB image is a challenging task due to depth ambiguity and self-occlusion, resulting in a high degree of uncertainty. Meanwhile, diffusion models have recently seen much success in generating high-quality outputs by p..