Azure AI Foundry Local

Azure AI Foundry Local is a runtime and development toolkit that lets you run generative AI models directly on your own device.

Azure AI Foundry Local

Azure AI Foundry Local is a runtime and development toolkit that lets you run generative AI models directly on your own device—without needing an Azure subscription or cloud access. It’s part of Microsoft’s broader Azure AI Foundry ecosystem, but designed specifically for on-device inference.

Core Capabilities

  • Local Model Execution: Run models like Phi-3.5, Qwen 2.5, Mistral, and DeepSeek R1 on Windows (x64/ARM) and macOS (Apple Silicon).
  • Hardware Optimization: Automatically selects the best model variant for your system—CPU, GPU, NPU, or Apple Neural Engine.
  • OpenAI-Compatible API: Seamlessly integrates with existing apps using familiar SDKs (Python, JavaScript).
  • CLI & SDK Tools: Includes a command-line interface (foundry) and SDKs for managing models, services, and cache.
  • Privacy & Performance: Keeps all data processing on-device, reducing latency and eliminating cloud costs.

Installation

  • Windows:
    winget install Microsoft.FoundryLocal
  • macOS:
    brew tap microsoft/foundrylocal  
    brew install foundrylocal

Example Usage

To run a model:

foundry model run phi-3.5-mini

This downloads the model and starts an interactive session in your terminal.

Integration

You can use the Python SDK to interact with models:

from foundry_local import FoundryLocalManager
import openai

manager = FoundryLocalManager("phi-3.5-mini")
client = openai.OpenAI(base_url=manager.endpoint, api_key=manager.api_key)

response = client.chat.completions.create(
    model=manager.get_model_info("phi-3.5-mini").id,
    messages=[{"role": "user", "content": "Why is the sky blue?"}]
)
print(response.choices[0].message.content)

Foundry Local is especially useful for developers building AI-powered apps that require offline capability, data privacy, or low-latency performance. It allows you to leverage powerful AI models without relying on cloud infrastructure, making it ideal for scenarios where internet access is limited or data sensitivity is a concern.