Jinrui Yang

I am a second-year (starting from 2023) Ph.D. student at University of California, Santa Cruz, supervised by Prof. Cihang Xie and Prof. Yuyin Zhou. Before joining UCSC, I was a researcher at Tencent. Previously, I got the Master degree at SUN YAT-SEN UNIVERSITY in 2021, supervised by Prof. Wei-Shi Zheng. Before that, I received my B.Eng. from Sichuan University in 2019.

My research interests lie in deep learning and computer vision. I have published several works on image and video perception as well as multimodal learning. Currently, I focus on generative AI and foundation model training.

Feel free to reach out to me at jinruiyang.ray@gmail.com for discussions or opportunities. I'm looking for research intern positions for Summer 2025.

CV  /  Google Scholar  /  Github

profile photo
Work Experiences
Adobe Research
Research Intern, 2024.6 - 2025.5 (Expected)
Advisor: Qing Liu and Zhe Lin
Tencent YouTu Lab
Researcher, 2021.7 - 2023.8
News
  • [Feb. 2025] One paper accepted by CVPR2025!
  • [Oct. 2024] One paper accepted by NeurIPS2024!
  • [Jun. 2024] I join Adobe Research as a research intern.
  • [Sept. 2023] I join UCSC CSE as a PhD student.
Selected Research

Google Scholar

(*: Equal contribution)

Generative Image Layer Decomposition with Visual Effects
Jinrui Yang, Qing Liu, Yijun Li, Soo Ye Kim, Shilong Zhang, Daniil Pakhomov, Mengwei Ren, Jianming Zhang, Zhe Lin, Cihang Xie, Yuyin Zhou
CVPR, 2025
paper/ page

LayerDecomp outputs photorealistic clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.

Scaling White-Box Transformers for Vision
Jinrui Yang, Xianhang Li, Druv Pai, Yuyin Zhou, Yi Ma, Yaodong Yu, Cihang Xie
NeurIPS, 2024
paper/ page/ code/ 新智元

We scaled the white-box transformer architecture, resulting in the CREATE-α model, which significantly improves upon the vanilla white-box transformer while preserving interpretability. This advancement nearly closes the gap between white-box transformers and ViTs.

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, Yunsheng Wu, Rongrong Ji
Preprint, 2023
paper/ page/ code/ 新智元

The paper introduces the first comprehensive Multimodal Large Language Model (MLLM) evaluation benchmark, MME. It measures both perception and cognition abilities across 14 subtasks, with 30 advanced MLLMs evaluated comprehensively on MME.

Learning To Know Where To See: A Visibility-Aware Approach for Occluded Person Re-Identification
Jinrui Yang, Jiawei Zhang, Fufu Yu, Xinyang Jiang, Mengdan Zhang, Xing Sun, Ying-Cong Chen, Wei-Shi Zheng
ICCV, 2021
paper

We propose a novel method to discretize pose information into visibility labels for body parts, ensuring robustness against sparse and noisy pose data in Occluded Person Re-ID.

Spatial-temporal graph convolutional network for video-based person re-identification
Jinrui Yang, Wei-Shi Zheng, Qize Yang, Ying-Cong Chen, Qi Tian
CVPR, 2020
paper

We propose a novel Spatial-Temporal Graph Convolutional Network (STGCN) to address the occlusion problem and the visual ambiguity issue caused by visually similar negative samples in video-based person Re-ID.

Academic Service

Reviewer

  • Conference: ICML 2024, CVPR 2024, WACV 2023
  • Journals: TIP, TCSVT, TMM, ect
Teaching
  • CSE 144 - Applied Machine Learning: Deep Learning, Winter 2024.
  • CSE 290D - Neural Computation, Fall 2025.

Design and source code from Jon Barron's website