Catching Up on Yann LeCun: JEPA, World Models, AMI Labs, and the War Against LLMs

Catching Up on Yann LeCun: JEPA, World Models, AMI Labs, and the War Against LLMs

Everything he's been researching, saying, and building — and why it matters


If you've been watching the AI space closely this past year, you know that while the rest of Silicon Valley races to build bigger language models, Yann LeCun — Turing Award winner, co-inventor of convolutional neural networks, and until recently Meta's Chief AI Scientist — has been banging a very different drum. He thinks we're all going in the wrong direction. He's been saying so loudly for years. And now he's put his career where his mouth is.

This is a deep-dive primer on LeCun's full vision: what JEPA actually is, why he thinks LLMs are architecturally broken for reaching human-level intelligence, what his papers show, how it all applies to robotics, and what his new company Advanced Machine Intelligence (AMI) Labs is setting out to build.


Who Is Yann LeCun?

Born in France in 1960, LeCun is one of the most decorated figures in the history of AI. Alongside Geoffrey Hinton and Yoshua Bengio, he won the 2018 ACM Turing Award — computing's Nobel Prize — for foundational work on deep learning. [1]

His most celebrated invention is the convolutional neural network (CNN), developed at AT&T Bell Labs in the late 1980s. His early CNN system for reading handwritten digits became so effective that NCR deployed it in bank check-reading machines, at one point processing 10–20% of all checks written in the United States. [2]

He joined Facebook in 2013 as founding director of its Fundamental AI Research (FAIR) lab, later became Meta's Chief AI Scientist, and simultaneously maintained his professorship as a Silver Professor at NYU. In November 2025, after 12 years, he announced his departure from Meta to found his own AI startup. [2]

"Before 'urgently figuring out how to control AI systems much smarter than us' we need to have the beginning of a hint of a design for a system smarter than a house cat."
— Yann LeCun, on Twitter/X

Why He Thinks LLMs Are a Dead End

To understand JEPA, you first need to understand why LeCun rejects the mainstream. His critique of large language models like GPT-4 or Llama is not a quibble — it's a fundamental philosophical rejection.

LLMs are trained by predicting the next word in a sequence, on vast datasets of internet text. The result is impressive fluency but no grounded understanding of the physical world. LeCun argues these models hallucinate, cannot plan reliably, lack multi-step consistency, and have none of the commonsense background knowledge a two-year-old has acquired just by existing in the world. [3]

"We are not going to get to human-level AI just by scaling LLMs. They cannot achieve that milestone because they simply predict text rather than truly understand the world."
— Yann LeCun, Big Technology Podcast, 2025

He told researchers at a 2025 conference to "absolutely not work on LLMs." He called them "useful but fundamentally limited" — and in one Financial Times interview, called them a "dead end" on the road to superintelligence. [4][5]

The Moravec Paradox and the Real Challenge

LeCun frequently invokes the Moravec Paradox: tasks that are trivially easy for humans — picking up a coffee cup, understanding that a ball will bounce — are extraordinarily difficult for current AI. The reason is that current AI has no model of how the physical world actually works. A crow can solve puzzles as well as a five-year-old. An infant builds intuition for gravity and object permanence just by watching the world. LLMs, trained on text, have no equivalent grounding. [6]


The Blueprint: "A Path Towards Autonomous Machine Intelligence" (2022)

In February 2022, LeCun published what's effectively his foundational manifesto — part philosophical treatise, part technical proposal, titled "A Path Towards Autonomous Machine Intelligence." [7]

His proposed architecture for a truly intelligent agent has five modular components:

  • Perception module — maps raw sensory input into abstract representations
  • World model — predicts how the world will evolve, including consequences of the agent's actions. This is the heart of the system, and JEPA is the architecture he proposes to implement it.
  • Cost module — measures discomfort, encoding both hard constraints (danger) and trainable objectives (task goals)
  • Short-term memory — stores the current state of the world as perceived
  • Actor/Effector — generates actions to minimize cost over time, using the world model to simulate consequences before committing

The key insight: intelligence requires a world model — an internal simulation of reality that an agent can consult to predict the consequences of its actions before acting. The agent doesn't rely on traditional reinforcement learning. It uses its world model to plan, the way humans mentally simulate outcomes before deciding what to do. [7]


What Exactly Is JEPA?

JEPA — Joint Embedding Predictive Architecture — is a self-supervised learning framework that learns by predicting parts of the world from other observed parts, entirely in abstract representation space rather than at the pixel level. [8]

How It Works

JEPA takes a pair of related inputs — two image patches, or two video frames. Both are passed through an encoder that produces abstract representations. A predictor module then tries to predict the representation of the target input from the context input.

The critical distinction: JEPA does not predict raw pixels. It predicts an abstract embedding. This means the model can discard irrelevant, unpredictable details — exact surface textures, fine-grained lighting — and focus on high-level structure: object shapes, trajectories, spatial relationships. [8]

Formally, JEPA is an Energy-Based Model (EBM) operating on representations. It assigns low "energy" (low prediction error) when the predicted representation matches the actual target, and high energy when they mismatch. Training shapes an energy landscape where compatible pairs have low energy and incompatible pairs have high energy. [9]

JEPA vs. Generative Models vs. Contrastive Models

LeCun situates JEPA against two dominant self-supervised learning paradigms:

Generative models (diffusion, masked autoencoders) try to reconstruct every missing pixel or token. The world is stochastic — predicting exact pixel values is noisy, expensive, and wastes capacity on irrelevant details. [10]

Contrastive/invariant models (CLIP, SimCLR) learn to produce similar embeddings for different "views" of the same input. They work well but can be biased by the specific augmentations chosen, and don't explicitly model directional relationships across space or time. [10]

JEPA sits in the middle: it operates in abstract embedding space (not pixel-generative) and explicitly predicts from one location to another (directionally predictive, not just invariant). [10]

"The world is unpredictable. If you try to build a generative model that predicts every detail of the future, it will fail. JEPA learns the underlying rules of the world from observation, like a baby learning about gravity."
— LeCun, MIT Technology Review, January 2026

The Research Papers

I-JEPA (2023): Starting with Images

The first major JEPA implementation applied it to images. Published as "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture" (Assran et al., 2023, arXiv:2301.08243), I-JEPA trains by masking image patches and predicting their representations from context patches. [11]

Key results: trains a ViT-Huge/14 on ImageNet in under 72 hours on 16 A100 GPUs, achieving state-of-the-art low-shot classification with only 12 labeled examples per class — outperforming methods requiring 2–10x more compute. No hand-crafted data augmentations required. Representations transfer well to object counting, depth prediction, and more. [11]

V-JEPA (2024): Moving to Video

"Revisiting Feature Prediction for Learning Visual Representations from Video" (Bardes et al., 2024, arXiv:2404.08471) extended JEPA to video. Trained on over 2 million unlabeled videos — no text labels, no pretrained encoder, pure feature prediction. [12]

Results: 81.9% on Kinetics-400 action recognition, 72.2% on Something-Something v2, with a frozen backbone. V-JEPA was the first video model to perform well on "frozen evaluations" — where the pretrained encoder is completely locked and only a lightweight probe is trained on top. [4]

V-JEPA 2 (June 2025): World Models for Robotics

Released June 11, 2025 at Viva Tech in Paris, V-JEPA 2 is a 1.2-billion-parameter world model and the most significant practical application of JEPA to date. [13]

It trains in two phases:

Phase 1 — Self-supervised pretraining: Over 1 million hours of web video and 1 million images. No action labels, no annotations. The model learns to predict masked spatio-temporal regions in latent space. [14]

Phase 2 — Action-conditioned post-training: Using only 62 hours of robot data from the open-source DROID dataset, the model learns to predict how the world evolves under specific robot actions — converting a passive world model into an active planning engine. [14]

"We believe world models will usher a new era for robotics, enabling real-world AI agents to help with chores and physical tasks without needing astronomical amounts of robotic training data."
— Yann LeCun, Viva Tech, 2025

Robotics results: zero-shot planning in new lab environments with objects never seen during training, 65–80% success on pick-and-place tasks, reported 30x faster than Nvidia's Cosmos model on equivalent evaluations. State-of-the-art on Epic-Kitchens-100 action anticipation. [13][15][16]

VL-JEPA (December 2025): Vision + Language

Chen, Shukor, LeCun et al. (arXiv:2512.10942) introduced a vision-language model that predicts continuous embeddings of target texts rather than autoregressively generating tokens. In controlled comparisons against standard VLM training with identical vision encoders and data: stronger performance with 50% fewer trainable parameters, 2.85x reduction in decoding operations via selective decoding, and state-of-the-art on video classification and retrieval benchmarks. [17]

LLM-JEPA (September 2025): Bringing JEPA to Language Models

Huang, LeCun, and Balestriero (arXiv:2509.14252) asked: if JEPA-style objectives outperform reconstruction objectives in vision, can they help LLMs too? Their LLM-JEPA framework applies embedding-space prediction to both LLM pretraining and finetuning.

Results: significant margin improvements over standard LLM training on GSM8K, Spider, RottenTomatoes, and NL-RX, across Llama3, Gemma2, OpenELM, and Olmo model families, while being more robust to overfitting. [18]


Robotics: The Real-World Test

LeCun has been clear: the ultimate test of JEPA-style world models isn't benchmark performance — it's whether machines can navigate and manipulate the physical world.

V-JEPA 2's control loop for robotics works as follows:

  1. Goal specification — the robot is given a goal image of the desired end state
  2. Action simulation — using V-JEPA 2's predictor, the robot internally simulates candidate action sequences
  3. Action selection — a Cross-Entropy Method algorithm evaluates each simulated action and selects those that bring the world state closest to the goal
  4. Receding horizon control — the robot executes only the first action, observes the new state, and replans. This makes the system robust to real-world perturbations. [15]

For longer-horizon tasks, V-JEPA 2 uses a sequence of visual subgoals to break complex tasks into manageable steps.

The economic case is compelling too: at 1.2 billion parameters, V-JEPA 2 fits on a single high-end GPU. Its abstract prediction targets reduce inference load. Teams can run closed-loop control on-prem or at the edge without streaming video off-site. [15]

"A world model is like an abstract digital twin of reality that an AI can reference to understand the world and predict consequences of its actions — and therefore be able to plan a course of action."
— Yann LeCun, MIT Technology Review, January 2026

The Child vs. the LLM: LeCun's Most Memorable Argument

In early March 2026, a viral thread summarizing a recent LeCun lecture captured one of his sharpest arguments against LLM scaling:

"The biggest LLM is trained on about 30 trillion words — roughly 10 to the power 14 bytes of text. That sounds huge. But a 4-year-old who has been awake about 16,000 hours has also taken in about 10 to the power 14 bytes through their eyes alone. So a small child has already seen as much raw data as the largest LLM has read."
— Yann LeCun, Pioneer Works lecture [22]

The point: same quantity, completely different type. The child's data is visual, continuous, noisy, and tied to actions — gravity, objects falling, hands grabbing, cause and effect. From this, the child builds a world model and intuitive physics, and can learn to load a dishwasher from a handful of demonstrations. LLMs see disconnected text and are trained to predict the next token. They get very good at symbol patterns and code. They lack grounded physical understanding. [22]

This is not just a data efficiency argument. It's an argument about data structure. JEPA is LeCun's attempt to build AI systems that learn from the right kind of signal.


The SAI Paper: Replacing AGI with a Better Target (February 2026)

Published February 27, 2026 — just days before his official Meta departure — "AI Must Embrace Specialization via Superhuman Adaptable Intelligence" (Goldfeder, Wyder, LeCun, Shwartz-Ziv, arXiv:2602.23643) may be his most provocative intellectual statement yet. [23]

The Core Argument: AGI Is Conceptually Broken

The paper argues that AGI — as commonly defined — uses human intelligence as its benchmark for "generality." But humans are not actually general. We are survival specialists. The paper identifies two ways people claim humans are general and argues both are circular:

  • Task range generality: humans can do a wide range of tasks — but this range is defined by what's important for human survival, not true generality
  • Specialization capacity: humans can specialize in any task — but this is bounded by human biology

Both claims define generality in human terms and then assert humans as its paradigm. [23]

"We struggle to perceive our own blind spots; this leads to the illusion of generality. In truth, we are only good at the specific subset of tasks that are important to our existence."
— Goldfeder, Wyder, LeCun, Shwartz-Ziv, arXiv:2602.23643

The Proposed Alternative: SAI

The paper introduces Superhuman Adaptable Intelligence (SAI): intelligence that can learn to exceed humans at anything important that we can do, and that can fill in skill gaps where humans are incapable.

The reframe: instead of asking "can it do what a human can do?", SAI asks "how fast can it learn something new?"

The paper argues this goal naturally points toward self-supervised learning, world models, and specialized expert systems — exactly what LeCun has been advocating. One widely-shared summary put it cleanly: "One giant model mimicking human limits isn't the ceiling. It's the trap." [23][24]

The timing matters. LeCun reposted the viral thread about this paper on March 5, 2026 — one of his last public acts before transitioning to AMI Labs. It reads as a final theoretical manifesto.


AMI Labs: The New Company

The Departure

LeCun confirmed his exit from Meta in a LinkedIn post on November 18, 2025. The friction had been building:

  • Meta's $14.3 billion acquisition of Scale AI made 28-year-old Alexandr Wang LeCun's effective boss. LeCun: "You don't tell a researcher what to do. You certainly don't tell a researcher like me what to do." [5]
  • Meta cut ~600 positions from its AI division in October 2025, including from FAIR [19]
  • LeCun says he had difficulty getting resources for JEPA research as Meta prioritized LLM products [4]
  • He admitted Llama 4 benchmarks were "fudged a little bit" — and says Mark Zuckerberg "was really upset and basically lost confidence in everyone who was involved" [5]

What He's Building

AMI — which means "friend" in French — stands for Advanced Machine Intelligence. Headquartered in Paris, with LeCun as executive chairman. French President Emmanuel Macron texted LeCun after the announcement, and AMI's Paris headquarters aligns with France's broader push to become a European AI hub.

"The goal of the startup is to bring about the next big revolution in AI: systems that understand the physical world, have persistent memory, can reason, and can plan complex action sequences."
— Yann LeCun, LinkedIn, November 2025

The Funding

On March 9, 2026, AMI Labs officially launched and closed its seed round: $1.03 billion (€890M) at a $3.5 billion pre-money valuation — the largest seed round in European startup history. [20]

The round was co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions. Strategic investors include Nvidia, Samsung, Temasek, and Toyota Ventures. Individual backers include Jeff Bezos, Mark Cuban, Eric Schmidt, and Tim Berners-Lee.

The Team

Beyond LeCun as executive chairman and Alex LeBrun as CEO, AMI has assembled a senior research leadership team: Saining Xie (chief science officer, formerly Meta FAIR), Pascale Fung (chief research and innovation officer), Laurent Solly (COO, former Meta VP for Europe), and Michael Rabbat (VP of world models, formerly Meta FAIR).

LeBrun has been clear about the timeline: no revenue in the near term. The first year is devoted to research and talent acquisition across AMI's four hubs in Paris, New York, Montreal, and Singapore. Healthcare AI company Nabla — LeBrun's former startup — is AMI's first disclosed partner and will receive early access to world model technology. LeCun's horizon is roughly a decade to build the kind of autonomous, physically grounded AI agents he envisions. He is not promising AGI next year. He's promising a principled research program aimed at the right target. [18][20]


The Bottom Line

The dispute between LeCun and the LLM mainstream is a genuine fork in the road. Billions of dollars and thousands of researchers are betting on one path. LeCun is betting his next chapter on another.

He may be wrong. The scaling hypothesis has surprised its critics before. But the JEPA program has produced real, empirically verifiable results — from I-JEPA's compute efficiency to V-JEPA 2's zero-shot robot manipulation. The SAI paper gives his critique a formal theoretical framework. And AMI Labs gives it a home.

"I'm sure there's a lot of people at Meta, including perhaps Alex, who would like me to not tell the world that LLMs basically are a dead end when it comes to superintelligence. But I'm not gonna change my mind because some dude thinks I'm wrong. I'm not wrong. My integrity as a scientist cannot allow me to do this."
— Yann LeCun, The Decoder, January 2026

Notebook LM explainer video:


References

[1] ACM Turing Award 2018 — Yoshua Bengio, Geoffrey Hinton, Yann LeCun
[2] TechCrunch. "Who's Behind AMI Labs, Yann LeCun's 'World Model' Startup." January 23, 2026
[3] Bandaru, Rohit. "Deep Dive into Yann LeCun's JEPA." July 2024
[4] Meta AI Blog. "V-JEPA: The Next Step Toward Advanced Machine Intelligence." 2024
[5] The Decoder. "You Certainly Don't Tell a Researcher Like Me What to Do." January 3, 2026
[6] Turing Post. "What Is Joint Embedding Predictive Architecture (JEPA)?"
[7] LeCun, Yann. "A Path Towards Autonomous Machine Intelligence." OpenReview, 2022
[8] The Singularity Project. "Yann LeCun's JEPA and the General Theory of Intelligence." March 2025
[9] Bandaru, Rohit. "Deep Dive into Yann LeCun's JEPA — Energy-Based Models section."
[10] Meta AI Blog. "I-JEPA: The First AI Model Based on Yann LeCun's Vision for More Human-Like AI." 2023
[11] Assran, M., Duval, Q., Misra, I., et al. "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture." arXiv:2301.08243, 2023
[12] Bardes, A., et al. "Revisiting Feature Prediction for Learning Visual Representations from Video." arXiv:2404.08471, 2024
[13] Meta AI Blog. "Introducing the V-JEPA 2 World Model and New Benchmarks for Physical Reasoning." June 11, 2025
[14] Meta AI. "Introducing V-JEPA 2." June 2025
[15] VentureBeat. "Meta's New World Model Lets Robots Manipulate Objects in Environments They've Never Encountered Before." August 24, 2025
[16] TechCrunch. "Meta's V-JEPA 2 Model Teaches AI to Understand Its Surroundings." June 11, 2025
[17] Chen, D., Shukor, M., Moutakanni, T., LeCun, Y., et al. "VL-JEPA: Joint Embedding Predictive Architecture for Vision-Language." arXiv:2512.10942, December 2025
[18] MIT Technology Review. "Yann LeCun's New Venture Is a Contrarian Bet Against Large Language Models." January 22, 2026
[19] CNBC. "Meta Lays Off 600 from 'Bloated' AI Unit as Wang Cements Leadership." October 22, 2025
[20] TechCrunch. "Yann LeCun's AMI Labs Raises $1.03 Billion to Build World Models." March 9, 2026
[21] Huang, H., LeCun, Y., Balestriero, R. "LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures." arXiv:2509.14252, September 2025
[22] Rohan Paul (@rohanpaul_ai). Thread summarizing LeCun lecture on child vs. LLM data argument. Twitter/X, March 4, 2026 (Twitter/X links are unstable — search "@rohanpaul_ai LeCun data" to locate)
[23] Goldfeder, J., Wyder, P., LeCun, Y., Shwartz Ziv, R. "AI Must Embrace Specialization via Superhuman Adaptable Intelligence." arXiv:2602.23643, February 27, 2026
[24] EvoAI Labs. "Stop Waiting for AGI: Why 'Superhuman Adaptable Intelligence' Is AI's True North Star." March 2026


Read more