|
|
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Zhuoyang Zhang*, Luke J. Huang*, Chengyue Wu, Shang Yang, Kelly Peng, Yao Lu, Song Han
Arxiv 2025
[paper]
[code]
|
|
|
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Yecheng Wu*, Zhuoyang Zhang*, Junyu Chen, Haotian Tang, Dacheng Li, Yunhao Fang, Ligeng Zhu, Enze Xie, Hongxu Yin, Li Yi, Song Han, Yao Lu
ICLR 2025
[paper]
[code]
[online demo]
|
|
|
EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss
Zhuoyang Zhang, Han Cai, Song Han
CVPR 2024 ELVM Workshop
[paper]
[code]
[online demo]
|
|
|
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Minghua Liu*, Ruoxi Shi*, Linghao Chen*, Zhuoyang Zhang*, Chao Xu*, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su
CVPR 2024
[paper]
[website]
|
|
|
Complete-to-partial 4D distillation for self-supervised point cloud sequence representation learning
Zhuoyang Zhang*, Yuhao Dong*, Yunze Liu, Li Yi
CVPR 2023
[paper]
[code]
|
|
|
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Zhuoyang Zhang*, Luke J. Huang*, Chengyue Wu, Shang Yang, Kelly Peng, Yao Lu, Song Han
Arxiv 2025
[paper]
[code]
|
|
|
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
Yecheng Wu, Junyu Chen, Zhuoyang Zhang, Enze Xie, Jincheng Yu, Junsong Chen, Jinyi Hu, Yao Lu, Song Han, Han Cai
ICCV 2025
[paper]
[code]
|
|
|
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, Ankur Handa, Ming-Yu Liu, Donglai Xiang, Gordon Wetzstein, Tsung-Yi Lin
CVPR 2025
[paper]
[website]
|
|
|
NVILA: Efficient Frontier Visual Language Models
Zhijian Liu*, Ligeng Zhu*, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, Dacheng Li, Xiuyu Li, Yunhao Fang, Yukang Chen, Cheng-Yu Hsieh, De-An Huang, An-Chieh Cheng, Vishwesh Nath, Jinyi Hu, Sifei Liu, Ranjay Krishna, Daguang Xu, Xiaolong Wang, Pavlo Molchanov, Jan Kautz, Hongxu Yin, Song Han, Yao Lu
CVPR 2025
[paper]
[code]
[demo]
[website]
|
|
|
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Yecheng Wu*, Zhuoyang Zhang*, Junyu Chen, Haotian Tang, Dacheng Li, Yunhao Fang, Ligeng Zhu, Enze Xie, Hongxu Yin, Li Yi, Song Han, Yao Lu
ICLR 2025
[paper]
[code]
[online demo]
|
|
|
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Haotian Tang, Yecheng Wu, Shang Yang, Enze Xie, Junsong Chen, Junyu Chen, Zhuoyang Zhang, Han Cai, Yao Lu, Song Han
ICLR 2025
[paper]
[code]
[online demo]
|
|
|
EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss
Zhuoyang Zhang, Han Cai, Song Han
CVPR 2024 ELVM Workshop
[paper]
[code]
[online demo]
|
|
|
Sparse Refinement for Efficient High-resolution Semantic Segmentation
Zhijian Liu*, Zhuoyang Zhang*, Samir Khaki, Shang Yang, Haotian Tang, Chenfeng Xu, Kurt Keutzer, Song Han
ECCV 2024
[paper]
[code]
|
|
|
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Minghua Liu*, Ruoxi Shi*, Linghao Chen*, Zhuoyang Zhang*, Chao Xu*, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su
CVPR 2024
[paper]
[website]
|
|
|
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, Hao Su
Technical Report
[paper]
[code]
|
|
|
Complete-to-partial 4D distillation for self-supervised point cloud sequence representation learning
Zhuoyang Zhang*, Yuhao Dong*, Yunze Liu, Li Yi
CVPR 2023
[paper]
[code]
|
- Conference reviewer: ICLR, ICML, NeurIPS, CVPR, ICCV, ECCV, etc.
|
|