Tuesday, December 04, 2018

AI Expo 2019: Emilio Billi (CTO, A3Cube) - Part 3 of 4

Why and How the computational power influences the rate of progress in the technology

Emilio Billi CTO A3Cube Inc

Background: ML, Big Data & Analytics, AI, HPC.


This was a big data infra focused talk. The speaker had a background in systems infra with past DoD experience. Not the most engaging delivery, but really nice takeaways:


Moving 128 bytes on a CPU using 100Gbit ETH: CPU waits 8900ns for nothing (~7.1M compute ops lost);

Moving the same 128 bytes using optimized RDMA intra-cluster costs 1200ns CPU time (~0.96M compute ops lost)

You get 6M ops extra per second for ML. That's a great acceleration for ML workloads.


Basic contention: ETH, TCP, slow storage is legacy technology.

The clusters of the future will look like the supercomputer systems of today:

1. Low latency converged parallel file systems (think S3 for the cluster).

2. Built in Distributed Resource scheduler (think Kubernetes for the cluster).

3. Cooperative RAM over network fabric (RDMA over Infiniband)

4. Cluster wide sharing of Accelerators (eg. GPUs / FPGAs) (


Do the above without changing the server setup.


Case Study:

Optimizing the stack with these features got 64 nodes to the same compute capacity as 360 AWS nodes (6x speedup).

Work was done for a DoD project.