Video-Only ToM: Enhancing Theory of Mind in Multimodal Large Language Models