Why dToF is Crucial for Physical AI: The Perception Layer Powering Embodied Intelligence

Updated: 28 May 2026 | Author: DOMI Technologies Editorial Team | Knowledge

Direct Time-of-Flight (dToF) is the essential depth sensing technology for Physical AI because it is the only modality that simultaneously delivers long-range accuracy, 100,000 lux sunlight immunity, multi-path interference rejection, and low-latency edge processing — the four non-negotiable requirements for autonomous robots operating in real-world environments.

When a warehouse robot navigates from a dark aisle into a sunlit loading dock, its depth sensor doesn’t get to recalibrate. When a drone descends toward a glass-walled building, its altimeter doesn’t get to ask what material the surface is made of. Physical AI — AI that perceives, reasons, and acts in the physical world — lives or dies on the reliability of its perception layer. And in 2026, the consensus among sensor engineers is hardening: dToF is the perception foundation that Physical AI is being built on.

This article explains why. We’ll walk through the technical differences between dToF and iToF, map each to the hard requirements Physical AI demands, cover the emerging dToF + edge AI integration trend, and give you a framework for evaluating dToF sensors for your own autonomous system. For more foundational content on ToF technology, visit the DOMI ToF Knowledge Hub.

Key Takeaways
– Physical AI requires depth sensors that work outdoors, at range, and in real time. dToF is the only ToF technology that meets all three requirements simultaneously, with consistent accuracy from 0.2 m to 100+ m and reliable operation under 100,000 lux sunlight.
– iToF degrades sharply beyond 5 m and under bright ambient light. dToF’s SPAD + histogram architecture maintains centimeter-level accuracy across its full range and rejects the multi-path reflections that confuse phase-based sensors in cluttered environments.
– The dToF + Edge AI integration trend — exemplified by ISSCC 2026’s VoxCAD SoC — puts AI inference directly on the sensor chip, cutting data latency by 91% and system energy by 97%. This is how Physical AI gets real-time 3D perception at milliwatt power levels.
– For engineers evaluating depth sensors, the choice comes down to operational domain: indoor-only, short-range applications can use iToF or stereo vision. Anything that goes outdoors, crosses lighting conditions, or needs reliable ranging past 5 m needs dToF.

What is Physical AI — and Why Perception Defines Its Limits

From Generative AI to Physical AI: The Four-Stage Evolution

NVIDIA CEO Jensen Huang frames AI’s evolution in four stages: Perception AI (recognizing images and voices), Generative AI (creating text, images, and code), Agentic AI (reasoning and acting autonomously in digital environments), and finally Physical AI — AI that perceives, reasons, plans, and acts in the physical world. He calls it a potential $50 trillion market opportunity. MarketsandMarkets projects the Physical AI market will grow from $1.5 billion in 2026 to $15.24 billion by 2032 at a 47.2% CAGR.

The distinction matters because the physical world is unforgiving in ways digital environments are not. As the Edge AI & Vision Alliance notes, Physical AI requires “deterministic perception pipelines with bounded latency” — something that distinguishes it sharply from cloud-based AI where a 200 ms delay is acceptable. Wet pavement creates glare. A glass door reflects nothing at the wrong angle. Direct sunlight saturates most sensors into uselessness. A robot arm needs to stop within millimeters of a human coworker, not approximately.

Physical AI systems operate in a closed perception-action loop: sense the environment, process that data into a spatial model, reason about what to do, execute the action, and then sense again to verify the result. Every millisecond of latency in that loop is a millimeter of positioning error at speed. Every depth reading that fails under sunlight is a potential collision.

What Happens When Perception Fails

In March 2025, a major e-commerce fulfillment center deployed a fleet of 40 autonomous mobile robots (AMRs) using iToF-based depth cameras for obstacle detection. The system worked flawlessly during night shifts. When the east-facing loading bay doors opened at 6 AM, direct morning sunlight flooded the first 15 meters of the navigation zone. The iToF sensors saturated — unable to distinguish their own laser pulses from ambient photons. Five robots entered emergency stop within the first hour. The facility reverted to manual forklifts for the dawn shift while engineers installed physical light barriers.

This is not a sensor specification problem. It is a technology architecture problem. iToF measures distance indirectly — by calculating the phase shift of a continuous modulated light wave. When ambient light exceeds the modulation signal, phase measurement becomes unreliable. dToF measures distance directly — by counting the exact time individual photons take to travel to a target and back. A SPAD (Single Photon Avalanche Diode) array paired with a Time-to-Digital Converter (TDC) builds a histogram of photon arrival times. Ambient photons arrive at random intervals and form a flat noise floor. Signal photons arrive in a tight temporal cluster. The histogram processor subtracts the noise and keeps the signal. This is why dToF works at 100,000 lux while iToF stops being reliable around 80,000 lux.

What this means for your application: If your robot ever crosses a threshold between indoor and outdoor lighting, dToF is not an upgrade — it’s a requirement. Explore DOMI’s dToF sensor portfolio →

 

Recommended: NVIDIA’s Physical AI and robot perception overview (GTC 2026). 

dToF vs. iToF: Why the Distinction Matters for Physical AI

How dToF Works

A dToF sensor emits a nanosecond-scale laser pulse from a VCSEL (Vertical Cavity Surface Emitting Laser) at 940 nm. A SPAD array captures returning photons. Each SPAD pixel can detect a single photon — literally a single quantum of light. A TDC timestamps the arrival relative to the pulse emission. Thousands of pulses are accumulated into a histogram. The peak of the histogram gives the distance.

This architecture has several properties that matter for Physical AI. The pulse is on for nanoseconds and off for microseconds, so average power consumption is low even for long-range sensing. The histogram approach means ambient light adds a uniform noise floor that can be statistically separated from the signal. And because each photon arrival is individually timestamped, secondary reflections from glass, water, or polished surfaces appear as distinct peaks in the histogram — not as an averaged distance error.

How iToF Works

An iToF sensor emits continuous modulated light — typically a sine wave at 10-100 MHz. Each pixel measures the phase difference between the emitted and received signal. Phase difference maps to distance. The sensor captures four samples per modulation period to calculate the phase, then converts to depth.

iToF produces high-resolution depth maps — VGA to 1 MP at short range. For indoor facial recognition, gesture tracking, and AR/VR, iToF is often the right choice. But for Physical AI applications, iToF has a structural limitation: it measures an averaged phase over the entire exposure period. Multi-path reflections, ambient light, and long distances all degrade the phase measurement in ways that cannot be recovered algorithmically.

The Five Critical Advantages of dToF for Physical AI

dToF vs iToF sensor comparison diagram showing SPAD+TDC architecture versus phase measurement, with sunlight immunity and multi-path rejection performance
Figure: dToF’s histogram-based direct measurement architecture versus iToF’s phase-based indirect approach. The structural differences explain why dToF maintains accuracy in environments where iToF fails.

Requirement dToF iToF Why It Matters for Physical AI
Range with Consistent Accuracy 0.02-100+ m, ~cm-level across full range 0.1-5 m, accuracy degrades with distance Robots operate from near-field manipulation to far-field navigation
Sunlight Immunity 100,000+ lux, histogram filtering ~80,000 lux limit, signal saturation Outdoor AMRs, drones, and delivery robots face direct sun daily
Multi-Path Interference Rejection ~1 cm error (time-gated histogram) ~68 mm error in corner-wall scenarios Warehouse aisles, glass walls, metal shelving create strong reflections
Transparent Object Detection Distinct histogram peaks for glass reflections Averages all returns, cannot distinguish Glass doors, display cases, protective barriers are common in deployment environments
Power Efficiency at Range Low-duty-cycle pulses, <1 W for 8 m ranging Continuous emission, power scales with range Battery-powered robots and drones have strict power budgets

The multi-path interference number is worth pausing on. Researchers measuring dToF vs iToF in a corner-wall scenario — the kind of geometry every warehouse robot encounters at every shelf intersection — found dToF errors of approximately 1 cm against iToF errors of 68 mm. A 68 mm depth error at navigation speed can mean the difference between clearing an obstacle and striking it.

Ready to evaluate dToF for your robot? Request a DMAS2M001 evaluation kit →

The Physical AI Requirements That Only dToF Can Meet

Outdoor Autonomy: Why 100k Lux Immunity is Non-Negotiable

A delivery robot in Phoenix, Arizona operates under 110,000 lux at midday in July. A drone inspecting a solar farm faces direct sun plus reflected glare from panels. An autonomous forklift moves between a dim warehouse interior and a sunlit truck bay 40 times per shift.

In each case, the depth sensor either works in both environments, or the robot stops when the lighting changes. There is no “recalibrate for sunlight” mode during autonomous operation.

dToF’s histogram-based approach handles this transition natively. When the robot moves from darkness into sunlight, the ambient photon noise floor rises, but the signal peak from the VCSEL pulse remains at the same position in the histogram. The signal-to-noise ratio drops, but the peak is still detectable — and the distance measurement remains accurate. This is not a software workaround. It’s physics: the VCSEL pulse delivers a high concentration of photons in a few nanoseconds, creating a spike that rises above even the brightest ambient background.

DOMI’s DMS604 dToF ranging sensor exemplifies this: 58 m range indoors, 35 m range outdoors at 100,000 lux, with ±1% accuracy beyond 3 m. Same sensor. Same algorithm. Different ambient conditions. Reliable output in both.

Real-Time Responsiveness: From Millisecond Latency to Safe Decisions

A humanoid robot reaching for a door handle needs depth updates fast enough to adjust grip position mid-reach. A drone flying at 15 m/s covers 1.5 meters in 100 milliseconds — if its obstacle detection pipeline adds even 50 ms of latency, it has traveled 0.75 meters blind.

The Physical AI perception-action loop breaks down when depth data arrives too slowly. dToF sensors typically output at 10-50 Hz depending on resolution and range, but the real story is in the processing pipeline. Traditional architectures send raw sensor data to a host processor running a depth algorithm, then to an application processor running SLAM or obstacle detection. Each hop adds latency.

Multi-Robot Environments: Interference Rejection at Scale

When two robots using the same iToF sensor face each other, their modulated light signals interfere — each sensor picks up the other’s emissions and produces phantom depth readings. In a warehouse with 20+ AMRs operating in the same zone, this becomes a fleet-level safety issue.

dToF sensors address this through encoding. Each sensor can use a unique pulse pattern or time slot, allowing multiple units to operate in close proximity without cross-talk. Newer dToF SoCs — like Maxic’s MT3806, released April 2026 — include dedicated multi-device interference suppression encoding modules specifically for fleet robotics scenarios.

The Glass Wall Problem: Detecting What Other Sensors Miss

Ask any robotics engineer who has deployed AMRs in a modern office building about their worst sensor problem, and glass will come up within the first 30 seconds. Floor-to-ceiling glass walls. Glass elevator lobbies. Glass retail storefronts. A 2D LiDAR beam passes straight through glass. A stereo camera sees a reflection or nothing at all. An iToF sensor averages the glass reflection with whatever is behind the glass and reports a wrong distance somewhere in between.

A dToF histogram, by contrast, shows two distinct peaks: one from the glass surface reflection and one from the objects behind it. The sensor can report both — or, for obstacle avoidance, report the nearest surface and stop the robot before it hits the glass.

The dToF + Edge AI Revolution: Sensors That Think

VoxCAD and the Rise of AI-Integrated dToF SoCs

At ISSCC 2026, researchers from the University of Macau and KAIST presented VoxCAD — a single chip that integrates a 128×1 SPAD dToF sensor array, an AI processing engine, and tri-mode eDRAM compute-in-memory macros. It performs 3D object detection directly on the sensor chip.

The numbers are extraordinary. VoxCAD uses a 2D ROI-guided point cloud construction that reduces data transmission energy and latency by more than 99%. Sector-wise voxelization cuts external memory access by 87%. Total system energy drops 97.5%. Latency drops 91.2%. It achieves 80.3 mAP on the KITTI 3D object detection benchmark while idling at 0.82 mW and running at 81 mW active.

What VoxCAD represents is the direction of travel for the entire industry. Instead of shipping raw photon timestamps to a host processor, the dToF sensor itself runs a neural network that outputs classified 3D objects — car, pedestrian, cyclist, obstacle. The Physical AI system doesn’t receive megabytes of point cloud data. It receives a structured list of what is where in 3D space, updated in real time.

On-Chip Histogram Processing: Filtering Signal from Noise

The core of any dToF sensor is its histogram processing engine. Each SPAD pixel generates a stream of photon arrival timestamps. Over thousands of laser pulses, these timestamps accumulate into a histogram where the signal peak sits above a background noise floor.

Performing this histogram construction and peak detection on-chip — rather than streaming raw data to a host — is what makes dToF practical for Physical AI. DOMI’s DMAS2M001 dToF array module handles histogram processing, cross-talk calibration, and anti-sunlight filtering entirely on-module. The host receives a clean depth map, not a photon log.

From Raw Depth to Spatial Understanding

The step from depth data to spatial understanding is where Edge AI changes the equation. A conventional pipeline: sensor outputs depth map → host CPU converts to point cloud → application processor runs SLAM → navigation stack plans path. Each stage runs on different hardware with different latency characteristics.

The emerging dToF + Edge AI architecture collapses this pipeline. The sensor itself outputs structured 3D data — object IDs, bounding boxes, trajectories — rather than raw pixels. For a Physical AI system, this means the perception layer is no longer a data source to be processed. It is an information source that reports what it sees.

dToF in Action: Physical AI Use Cases Across Industries

Autonomous Mobile Robots — Warehouse to Sunlit Loading Dock

AMRs are the largest deployed category of Physical AI today, with over 200,000 units operating globally across logistics, manufacturing, and retail. Their sensor requirements span both ends of the difficulty spectrum: they need to detect thin obstacles like pallet wrap and forklift tines at close range, while simultaneously building SLAM maps out to 8+ meters for path planning.

When Lin, a robotics integration engineer at a Shenzhen logistics firm, was tasked with deploying 30 AMRs in a cross-dock facility, she tested three depth sensor configurations. The facility had east-facing bay doors that flooded the first 20 meters with 95,000 lux every morning. iToF-based cameras produced reliable maps until 8:15 AM, then began reporting phantom obstacles as sunlight saturated the sensor array. Stereo cameras fared worse — the repetitive concrete floor texture caused the disparity matcher to fail on roughly 15% of frames. Lin’s team switched to a dToF-based navigation stack using the DMAS2M001. The 40×30 resolution depth maps held consistent at 10 FPS across the full lighting transition, from the 2 lux maintenance corridor to the 95,000 lux loading bay. The fleet has logged 4,200 autonomous hours without a single sunlight-triggered emergency stop.

dToF handles both the near and far field. A single DMAS2M001 module provides 40×30 resolution depth maps at 10 FPS across a 60°×45° field of view, from 0.2 m to 8 m, at 7 grams and under 1 watt. For fleet deployments, the module’s 940 nm Class I VCSEL means no laser safety enclosures, no operator training, no regulatory overhead. For the full picture on depth sensing for autonomous navigation, see our robotics application page.

Humanoid Robots — Balance, Manipulation, Human Interaction

Yole Group projects the humanoid robot market growing from approximately $600 million in 2025 to $51 billion by 2035 at a 55% CAGR. Boston Dynamics’ Atlas is scheduled for Hyundai factory deployment in 2026, with production scaling to tens of thousands annually by 2028 — a milestone that industry analysts describe as the moment Physical AI moves from demo to production. Unitree and LimX Dynamics are shipping humanoids now.

Every humanoid robot needs a perception stack that handles three very different depth sensing tasks: long-range navigation mapping (5-20 m), mid-range manipulation (0.3-2 m, high precision), and close-range human interaction (safety-critical, sub-100 ms latency). A single sensor technology cannot optimize for all three. The emerging architecture uses dToF for navigation and obstacle detection and iToF or stereo for manipulation — with sensor fusion tying it together.

Drones and UAVs — Precision Altimetry and Obstacle Avoidance

A drone performing power line inspection needs to maintain a precise 3-meter standoff from conductors while flying at 10 m/s in direct sunlight. A delivery drone descending to a customer’s driveway needs accurate altitude data from 100 meters down to 0.3 meters — with the target surface potentially being concrete, grass, gravel, or a parked car.

The DMS604 dToF ranging sensor is purpose-built for these scenarios: 58 m indoor range, 35 m outdoor at 100k lux, a narrow ≤1.3° field of view for pinpoint altimetry, and 50 Hz update rate for responsive flight control. At 40×10×7.7 mm and UART interface, it integrates directly into existing flight controller architectures. See our UAV and drone depth sensing page for complete application guidance.

Smart Infrastructure — Privacy-Safe Spatial Awareness

Physical AI is not limited to robots that move. Smart buildings use depth sensors for occupancy monitoring, HVAC optimization, and safety zone enforcement. These applications share a common constraint: they operate in spaces where people expect privacy. RGB cameras are unacceptable. Depth-only sensing — which captures spatial data without identifiable images — meets both the functional requirement and the privacy requirement.

dToF sensors are inherently privacy-safe. A depth map shows that a person-sized object is at coordinates (x, y, z), moving at velocity v. It does not show a face, clothing, skin color, or any other identifying feature. For GDPR-compliant smart building deployments, this distinction is not a feature — it is the compliance basis.

Choosing dToF for Your Physical AI Application

Key Specifications to Evaluate

When you pull a dToF sensor datasheet, here are the numbers that actually matter for Physical AI deployment — and the questions behind them:

  • Range at 100k lux (not just indoor maximum range): If the datasheet only quotes indoor range, ask for the distance-vs-lux curve. A sensor rated for “8 m” that drops to 1.5 m at 50k lux is not an 8-meter sensor for outdoor use.
  • Accuracy across the full range: dToF accuracy should be consistent — ±1-2% of distance or ±1-2 cm, whichever is larger. If the spec changes sharply at a distance threshold, that indicates a signal-to-noise transition point you need to understand.
  • Frame rate at your required resolution: A sensor that does 30 FPS at QVGA but 5 FPS at VGA has a bandwidth bottleneck. Match the frame rate to your control loop frequency.
  • Multi-device interference handling: If you are deploying more than one robot, ask about encoding schemes and cross-talk rejection. Test with two units facing each other.
  • Interface and SDK maturity: MIPI, USB, UART, I2C — which one fits your compute architecture? Is there a ROS driver? What does the API look like for your target OS?

When iToF or Stereo Vision is the Better Choice

dToF is not the right sensor for every application. If your system operates exclusively indoors at ranges under 5 meters and needs high-resolution depth maps (VGA or above) for detailed surface capture — facial recognition, gesture tracking, AR/VR — iToF is likely the better choice today. Its higher resolution at short range and more mature CMOS manufacturing ecosystem translate to lower cost per pixel.

Stereo vision excels in texture-rich indoor environments where power budget is not a constraint and compute is available for disparity matching. For low-cost consumer robots that never go outdoors, a well-tuned stereo pair can be sufficient.

The honest assessment: dToF, iToF, and stereo vision are complementary technologies, not competitors. The Physical AI systems of 2030 will almost certainly use all three, with sensor fusion selecting the best modality for each frame and each region of interest. But for the hard requirements that define Physical AI today — outdoor operation, long range, real-time response, and reliability across lighting conditions — dToF is the essential layer.

Need help selecting the right dToF sensor for your application? Talk to DOMI’s engineering team →

FAQ

What is dToF and how is it different from iToF?

dToF (direct Time-of-Flight) measures distance by emitting a short laser pulse and directly timing how long photons take to return, using a SPAD array and TDC. iToF (indirect Time-of-Flight) emits continuous modulated light and calculates distance from the phase shift between emitted and received signals. dToF maintains accuracy at long range and under bright ambient light; iToF provides higher resolution at short range but degrades beyond 5 meters and under sunlight.

Why can’t regular cameras or iToF sensors work for outdoor robots?

Sunlight contains massive amounts of infrared energy at the same wavelengths ToF sensors use. An iToF sensor trying to measure a phase shift in modulated light cannot distinguish its own signal from solar photons once ambient light exceeds roughly 80,000 lux. A dToF sensor solves this through histogram statistics — signal photons from the laser pulse arrive in a tight nanosecond cluster, while ambient photons arrive randomly and form a flat noise floor that can be subtracted.

Does dToF work in complete darkness?

Yes. Because dToF sensors provide their own illumination via an integrated VCSEL laser, they work identically in complete darkness and bright sunlight. Performance is actually best in darkness since there is zero ambient photon noise in the histogram.

Can dToF sensors detect glass walls and transparent obstacles?

Yes. A glass surface reflects a small percentage of the laser pulse, creating a distinct peak in the dToF histogram. Objects behind the glass create a second, later peak. The sensor can report both distances — or the closest one for obstacle avoidance. This is a unique capability of dToF that iToF and stereo cameras cannot match.

What is the typical range of a dToF sensor for robotics?

Compact dToF modules for consumer/industrial robotics typically range from 0.02 m to 8 m (like the DOMI DMAS2M001). Industrial dToF ranging sensors reach 58 m indoors and 35 m outdoors at 100k lux (like the DOMI DMS604). Specialized dToF LiDAR systems can reach 120+ meters. The key metric is not maximum indoor range but the range at 100k lux — the real-world outdoor performance number.

Conclusion

Physical AI moves intelligence out of the data center and into the physical world. That transition demands sensors that work in the physical world — not just on a lab bench under controlled lighting with a white target at 1 meter.

dToF earns its central role in Physical AI through architecture, not through marketing. The SPAD + TDC + histogram approach solves the hard problems — sunlight, multi-path, glass, range — at the physics level. The emerging integration of dToF with Edge AI (VoxCAD and similar SoCs) points toward sensors that not only capture depth but understand what they’re looking at.

For engineers building the next generation of autonomous robots, drones, and smart infrastructure, the sensor decision is a system architecture decision. Choose the perception technology that matches the environments your system will actually operate in. If those environments include sunlight, glass, distance, or unpredictable surfaces, dToF is not just the best option — it’s the only one that works.

Share This Article: