Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning and HPC Workloads
During the past decade, novel Deep Learning (DL) algorithms, workloads and hardware have been developed to tackle a wide range of problems.Despite the advances in Trousers workload and hardware ecosystems, the programming methodology of DL systems is stagnant.DL workloads leverage either highly-optimized, yet platform-specific and inflexible kernel