Azure AI Foundry Local
Azure AI Foundry Local is a runtime and development toolkit that lets you run generative AI models directly on your own device.
Azure AI Foundry Local
Azure AI Foundry Local is a runtime and development toolkit that lets you run generative AI models directly on your own device-without needing an Azure subscription or cloud access. It’s part of Microsoft’s broader Azure AI Foundry ecosystem, but designed specifically for on-device inference.
Core Capabilities
- Local Model Execution: Run models like Phi-3.5, Qwen 2.5, Mistral, and DeepSeek R1 on Windows (x64/ARM) and macOS (Apple Silicon).
- Hardware Optimization: Automatically selects the best model variant for your system-CPU, GPU, NPU, or Apple Neural Engine.
- OpenAI-Compatible API: Seamlessly integrates with existing apps using familiar SDKs (Python, JavaScript).
- CLI & SDK Tools: Includes a command-line interface (
foundry) and SDKs for managing models, services, and cache. - Privacy & Performance: Keeps all data processing on-device, reducing latency and eliminating cloud costs.
Installation
- Windows:
winget install Microsoft.FoundryLocal - macOS:
brew tap microsoft/foundrylocal brew install foundrylocal
Example Usage
To run a model:
foundry model run phi-3.5-mini
This downloads the model and starts an interactive session in your terminal.
Integration
You can use the Python SDK to interact with models:
from foundry_local import FoundryLocalManager
import openai
manager = FoundryLocalManager("phi-3.5-mini")
client = openai.OpenAI(base_url=manager.endpoint, api_key=manager.api_key)
response = client.chat.completions.create(
model=manager.get_model_info("phi-3.5-mini").id,
messages=[{"role": "user", "content": "Why is the sky blue?"}]
)
print(response.choices[0].message.content)
Foundry Local is especially useful for developers building AI-powered apps that require offline capability, data privacy, or low-latency performance. It allows you to leverage powerful AI models without relying on cloud infrastructure, making it ideal for scenarios where internet access is limited or data sensitivity is a concern.