Look Every Frame All at Once: Video-Ma
2
mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing
Hosu Lee
*
,
Junho Kim
*
,
Hyunjun Kim
,
Yong Man Ro
Integrated Vision and Language Lab, KAIST
*
Indicates Equal Contribution
COMING SOON
Paper
Code
arXiv