Tech

The hegemony of specialized silicon: why the TPU "come-back" is redefining AI in 2026

  • Date de l’événement Jan. 07 2026
  • Temps de lecture min.

In 2026, Google's new TPU chips are optimized for reasoning and agentic AI, enabling fast inference and a reduction in TCO thanks to OCS.

The AI industry is going through a phase of hardware crystallization. While 2023 and 2024 were the years of GPU (Graphics Processing Unit) omnipresence, 2026 marks the triumphant return of the ASIC (Application-Specific Integrated Circuit) to the top of the technology stack. For us, this is no surprise. As early as 2019, with our Warhol prototype, we were already experimenting with the limits and virtues of these dedicated chips.

▶️ Project Warhol: The Ultimate Edge Computing AI Prototype for Cooking & Wine Pairing

 

Specialization: a logical, but non-linear path

It would be tempting to see the ASIC as the inevitable culmination of all computing technology. The analogy with Bitcoin mining is striking: the shift from CPU to GPU, then to ASIC, allowed for efficiency gains of the order of $10^3$ in terms of hash/watt.

However, unlike Bitcoin where the algorithm (SHA-256) is immutable, AI is a moving target. The shift to specialized silicon carries major risks:

  • Architectural rigidity: A TPU is optimized for large-scale matrix multiplications. If tomorrow a radically different architecture from Transformers (like State Space Models or Liquid Neural Networks) were to become dominant, part of the hardware advantage could evaporate.
  • The software gap: Where NVIDIA has built an unbridgeable gap with CUDA, specialized chips often require more complex compilation chains (XLA, MLIR).

 

Flashback 2019: when Smile bet on the edge TPU

When we designed Warhol in 2019, the choice of the Google Coral Edge TPU was not dictated by a simple question of cost. It met two strategic imperatives which, seven years later, remain at the heart of industrial concerns:

  1. The Dogma of confidentiality (Edge-First): Our goal was to process the entire video and audio stream locally. At a time when "Cloud AI" was the norm, Warhol proved that a fluid experience could be offered without ever exfiltrating biometric or behavioral data to third-party servers.
  2. Operational resilience in a "Hostile Environment": Warhol was intended for trade shows and field use. Anyone who has tried to run a bandwidth-intensive AI demo on a trade show's Wi-Fi knows it's a recipe for failure. The local inference guaranteed by the Coral TPU offered us sub-20ms latency, independent of any network infrastructure.

 

The 2026 landscape: NVIDIA Vera Rubin vs Google Ironwood

CES 2026 has just confirmed the new situation. NVIDIA, with its Vera Rubin platform, is no longer content to sell chips, but "rack-mounted supercomputers" (NVL72) capable of 10x cheaper inference than the Blackwell generation. NVIDIA is massively pivoting towards "Physical AI" (robotics and simulation) with its Cosmos model.

Facing this, Google is not trying to beat NVIDIA on the versatility front, but on that of Vertical Hypercomputing.

  • Trillium (TPU v6) and Ironwood (TPU v7): These chips are the first to natively integrate optimizations for Agentic AI. Unlike classic chips, they are designed to manage fast feedback loops and multi-step reasoning capabilities.
  • The TCO advantage: For players like Anthropic, the total cost of ownership (TCO) of a TPU v7 cluster is estimated to be 30% less than an equivalent infrastructure under NVIDIA, thanks to the use of the Optical Circuit Switch (OCS) which allows for an ultra-flexible network topology.

 

Sustainability: has Google anticipated e-waste?

The risk of turning millions of chips into electronic bricks is real. For 2025/2026, Google is shifting its strategy:

  • Datacenter Circularity: over 8.8 million components recovered in 2024 via the reverse supply chain for reuse or resale.
  • Coral NPU and RISC-V: the new Coral NPU (successor to the Edge TPU) is based on RISC-V. The opening of specifications increases modularity: the silicon is no longer a disposable black box, but a block that can be reintegrated by other foundries.

 

Thought experiment: Warhol v2.0

If we were to update Warhol today, "re-use" would be our watchword. Our 2019 Coral key is not waste, by the way: it still works perfectly for "low-level" vision tasks (presence detection, object segmentation).

In 2026, we would imagine an asymmetric architecture:

  • The 2019 Coral would retain its role as a sentinel (low consumption, system wake-up).
  • A new Coral NPU based on RISC-V would manage a local language model (like Gemini Nano) to transform the terminal into a real agent capable of reasoning: "I see you have tomatoes and basil, but your cheese is two days past its expiration date, should I adjust the recipe?"
  • Gemini Nano replaces the late Snips (acquired by Sonos then withdrawn), for a more fluid and contextual voice.

 

Conclusion: maturity of efficiency

The return of TPUs is not a marketing cycle. It signals the transition of AI out of the experimental era towards precision engineering. Warhol, in 2019, was a precursor. In 2026, silicon specialization is emerging as a viable path for a performing, sovereign, and finally economically responsible AI.

Thibault Milan

Thibault Milan

Directeur de l'Innovation