Azure AI Foundry Local

Azure AI Foundry Local is a runtime and development toolkit that lets you run generative AI models directly on your own device.

Azure AI Foundry Local

Azure AI Foundry Local is a runtime and development toolkit that lets you run generative AI models directly on your own device-without needing an Azure subscription or cloud access. It’s part of Microsoft’s broader Azure AI Foundry ecosystem, but designed specifically for on-device inference.

Core Capabilities

Local Model Execution: Run models like Phi-3.5, Qwen 2.5, Mistral, and DeepSeek R1 on Windows (x64/ARM) and macOS (Apple Silicon).
Hardware Optimization: Automatically selects the best model variant for your system-CPU, GPU, NPU, or Apple Neural Engine.
OpenAI-Compatible API: Seamlessly integrates with existing apps using familiar SDKs (Python, JavaScript).
CLI & SDK Tools: Includes a command-line interface (foundry) and SDKs for managing models, services, and cache.
Privacy & Performance: Keeps all data processing on-device, reducing latency and eliminating cloud costs.

Installation

Windows:
```
winget install Microsoft.FoundryLocal
```

macOS:

brew tap microsoft/foundrylocal  
brew install foundrylocal

Example Usage

To run a model:

foundry model run phi-3.5-mini

This downloads the model and starts an interactive session in your terminal.

Integration

You can use the Python SDK to interact with models:

from foundry_local import FoundryLocalManager
import openai

manager = FoundryLocalManager("phi-3.5-mini")
client = openai.OpenAI(base_url=manager.endpoint, api_key=manager.api_key)

response = client.chat.completions.create(
    model=manager.get_model_info("phi-3.5-mini").id,
    messages=[{"role": "user", "content": "Why is the sky blue?"}]
)
print(response.choices[0].message.content)

Foundry Local is especially useful for developers building AI-powered apps that require offline capability, data privacy, or low-latency performance. It allows you to leverage powerful AI models without relying on cloud infrastructure, making it ideal for scenarios where internet access is limited or data sensitivity is a concern.