Deep learning is a cornerstone technology in modern AI. Startup Aria Networks is now applying that same layered intelligence approach to the network with its Deep Networking platform.
Aria Networks was founded in January 2025 by Mansour Karam, who previously founded intent-based networking vendor Apstra, which Juniper Networks acquired in 2020. Aria has been developing a path-centric approach built around microsecond telemetry rather than the switch-centric model of incumbent vendors. That direction has led to the general availability this week of Deep Networking. The platform combines purpose-built switching hardware, hardened SONiC, fine-grain telemetry collected across switches, transceivers and host NICs, and intelligent agents that operate at each layer of the stack. The company is also disclosing $125 million in total funding from Sutter Hill Ventures, Atreides Management, Valor Equity Partners and Eclipse Ventures.
“In order for AI to become effective, you really need to specialize it for this domain, meaning building an architecture from the ground up that is optimized for AI,” Karam told Network World.
How Deep Networking works
Deep Networking is designed to treat the network as an active participant in AI cluster performance rather than passive infrastructure. It does that through fine-grain telemetry collected at the ASIC level, intelligent agents at each layer of the stack and continuous cloud-delivered software updates.
The telemetry layer is where Aria claims its primary technical differentiation. Traditional network monitoring tools such as NetFlow collect data after the fact, at coarse resolution. Aria collects telemetry in real time at microsecond granularity from the switching ASIC itself.
“We have embedded code sitting right inside the ASIC, right on the ARM processors in the ASIC, that is extracting telemetry,” Karam said.
That embedded telemetry feeds adaptive tuning of Dynamic Load Balancing parameters, Data Center Quantized Congestion Notification (DCQCN) and failover logic without waiting for a threshold breach or a manual intervention.
The platform architecture is layered. At the lowest levels, agents react in microseconds to link-level events such as transceiver flaps, rerouting leaf-spine traffic in milliseconds. At higher layers, agents make more strategic decisions about flow placement across the cluster. At the cloud layer, a large language model-based agent surfaces correlated insights to operators in natural language, allowing them to ask questions about specific jobs or alert conditions and receive context-aware responses.
Karam argued that simply bolting an LLM onto an existing architecture does not deliver the same result. “If you ask it to do anything, it could hallucinate and bring down the network,” he said. “It doesn’t have any of the context or the data that’s required for this approach to be made safe.”
Aria also exposes an MCP server, allowing external systems such as job schedulers and LLM routers to query network state directly and integrate it into their own decision-making.
MFU and token efficiency as the target metrics
Traditional networking is often evaluated in terms of bandwidth and latency. Aria is centering its platform around two metrics: Model FLOPS Utilization (MFU) and token efficiency. MFU is defined as the ratio of achieved FLOPS per accelerator to the theoretical peak. In practice, Karam said, MFU for training workloads typically runs between 33% and 45%, and inference often comes in below 30%.
“The network has a major impact on the MFU, and therefore the token efficiency, because the network touches every aspect, every other component in your cluster,” Karam said.
Token efficiency is expressed as either tokens consumed per dollar or tokens produced per unit of time. Aria’s position is that both metrics are directly affected by network performance.
Karam explained the connection through specific failure modes. A single bad NIC in a 10,000-XPU cluster can drop MFU by 1.7% during an All Reduce operation. A bad transceiver can trigger persistent traffic rerouting that burns both MFU and a significant share of infrastructure spend. Congestion settings that were never tuned to a specific workload create sustained underperformance.
Aria’s own modeling puts the business case in revenue terms. A 3% MFU improvement across a 10,000-XPU cluster translates to approximately $49.8 million in annual revenue gain, or 7.9% revenue improvement, at prevailing token pricing.
Switch portfolio
Aria’s hardware line is built on Broadcom ASICs and runs a standards-based, hardened SONiC implementation. The portfolio includes three switch models.
- Aria Switch 800G. Based on the 51.2T Broadcom Tomahawk 5 ASIC, it provides 64 x 800G OSFP ports with support for DSP, LRO and LPO optics.
- Aria Switch 1.6T High Radix. A 4RU air-cooled unit based on the 102.4T TH6 ASIC, with 128 x 800G OSFP ports.
- Aria Switch 1.6T. A 2RU unit in EIA 19 and ORV3 form factors supporting both air and full liquid cooling, with 64 x 1.6T OSFP ports.
Forward deployed engineers and the road ahead
Aria is embedding what it calls forward deployed engineers (FDEs) with customers from deployment onward. Karam said this model is structurally different from professional services.
“Everything the forward deploy engineers do ultimately gets engineered back into the products,” he said. “They are totally aligned directionally with the product. They are not a separate business.”
The distinction matters for how Aria thinks about product iteration. FDEs feed real customer environment data back into the platform continuously. That data drives both the agent improvements and the software update cadence Aria is targeting, which is weekly rather than the semi-annual or annual cycles typical for incumbent networking vendors.
“Bringing in all that intelligence so that we can increase the breadth of the solution, the capabilities of the solution, while keeping it super safe to use — that’s going to be a big, continued area of investment,” Karam said. “Job number one is to make sure your network is always up.”