Consider an AI application running on a host machine. The application needs to run inference of an AI model on some input. There are two ways to run AI inference:
- Application can run the AI model on the host machine using libraries that directly access the hardware on the host machine. We refer to this option as local inference.
- Application can use server-client protocol to communicate with a server that serves the inference requests for AI models. In this case, the AI server can be running:
- In the AI Hub: The AI server in this case is identified by its IP address or url. We refer to this option as hosted inference.
- On a machine located in the same local area network (LAN) as the application host machine: The AI server in this case is identified by its local IP address. The AI server can also run on the application host itself, in which case it is identified as
loalhost
. We refer to this option as AI server inference.
One Code, Three Types of Inference
PySDK has been designed to work with the above three types of inference with a single line change in the code. The entry point to the DeGirum PySDK package is the connect
function which takes in the type of inference as an argument. Once the application connects to the model zoo using the connect
function with the type of inference, the rest of the application code is the same for all types of inference. This allows users to develop the application in the host and deploy it at the edge with the same code.