Meta has released a new AI model called V-JEPA that improves machines' understanding of the world by analyzing interactions between objects in videos.
This model continues the vision of Yann LeCun, the company's vice president and chief artificial intelligence scientist, to create artificial intelligence that learns in the same way that humans learn.
The fifth version of the I-JEPA model, released by Meta in the middle of last year, has made progress in comparing abstract representations of images rather than the pixels themselves, and in extending it to videos.
V-JEPA advances predictive image learning methods by turning to video learning and introducing the complexity of temporal dynamics and time-dependent spatial information.
V-JEPA can predict missing parts of a video without having to recreate all the details because it learns from unlabeled videos and therefore does not need human-labeled data to start learning.
This method improves the efficiency of V-JEPA and requires few training resources. The model learns from small amounts of information and is faster and more resource intensive than older models.
When developing the model, large portions of the videos were blocked. This approach forces V-JEPA to make guesses based on limited context, which helps it understand complex scenes without needing detailed data.
V-JEPA focuses on the general idea of what's happening in the video, rather than specific details, such as the movement of individual leaves on a tree.
V-JEPA showed promising results in testing, outperforming other video analytics models by a fraction of the data typically required.
This efficiency is considered a breakthrough in the field of artificial intelligence and allows the model to be used in various tasks without extensive retraining.
In the future, Meta plans to expand V-JEPA's capabilities, including adding audio analysis and improving its ability to understand longer videos.
This work supports Meta's ambitious goal of developing artificial intelligence to perform complex tasks like humans.
V-JEPA is available under a non-commercial Creative Commons license, allowing researchers around the world to explore and use the technology.