Performing machine learning inference closer to the data sources, i.e., at the edge, rather than in the cloud, has several benefits. Thanks to the reduced data movement, it can lead to lower latency, higher privacy and security, and reduced dependency on the continuous availability of a powerful network connection. While the frameworks and methods that perform machine learning inference in the cloud is better understood and gets more and more homogenized, the opposite is true at the edge due to higher and more heterogeneous specialization. This makes the knowledge transfer across the edge devices difficult. Furthermore, edge devices are increasingly expected to perform multiple machine learning tasks, which can lead to resource contention and performance degradation without careful management.
In this thesis we characterize the hardware landscape at the edge with a special focus on machine learning workloads and use the findings to guide the design of a smart resource manager for edge machine learning systems. Furthermore, we identify the common inefficiencies in these workloads and develop optimization techniques to combat them.