FoundationModels: On-Device LLM (iOS 26)
Patterns for integrating Apple's on-device language model into apps using the FoundationModels framework. Covers text generation, structured output with @Generable, custom tool calling, and snapshot streaming — all running on-device for privacy and offline support.
When to Activate
- Building AI-powered features using Apple Intelligence on-device
- Generating or summarizing text without cloud dependency
- Extracting structured data from natural language input
- Implementing custom tool calling for domain-specific AI actions
- Streaming structured responses for real-time UI updates
- Need privacy-preserving AI (no data leaves the device)
Core Pattern — Availability Check
Always check model availability before creating a session:
swift
1struct GenerativeView: View {
2 private var model = SystemLanguageModel.default
3
4 var body: some View {
5 switch model.availability {
6 case .available:
7 ContentView()
8 case .unavailable(.deviceNotEligible):
9 Text("Device not eligible for Apple Intelligence")
10 case .unavailable(.appleIntelligenceNotEnabled):
11 Text("Please enable Apple Intelligence in Settings")
12 case .unavailable(.modelNotReady):
13 Text("Model is downloading or not ready")
14 case .unavailable(let other):
15 Text("Model unavailable: \(other)")
16 }
17 }
18}
Core Pattern — Basic Session
swift
1// Single-turn: create a new session each time
2let session = LanguageModelSession()
3let response = try await session.respond(to: "What's a good month to visit Paris?")
4print(response.content)
5
6// Multi-turn: reuse session for conversation context
7let session = LanguageModelSession(instructions: """
8 You are a cooking assistant.
9 Provide recipe suggestions based on ingredients.
10 Keep suggestions brief and practical.
11 """)
12
13let first = try await session.respond(to: "I have chicken and rice")
14let followUp = try await session.respond(to: "What about a vegetarian option?")
Key points for instructions:
- Define the model's role ("You are a mentor")
- Specify what to do ("Help extract calendar events")
- Set style preferences ("Respond as briefly as possible")
- Add safety measures ("Respond with 'I can't help with that' for dangerous requests")
Core Pattern — Guided Generation with @Generable
Generate structured Swift types instead of raw strings:
1. Define a Generable Type
swift
1@Generable(description: "Basic profile information about a cat")
2struct CatProfile {
3 var name: String
4
5 @Guide(description: "The age of the cat", .range(0...20))
6 var age: Int
7
8 @Guide(description: "A one sentence profile about the cat's personality")
9 var profile: String
10}
2. Request Structured Output
swift
1let response = try await session.respond(
2 to: "Generate a cute rescue cat",
3 generating: CatProfile.self
4)
5
6// Access structured fields directly
7print("Name: \(response.content.name)")
8print("Age: \(response.content.age)")
9print("Profile: \(response.content.profile)")
Supported @Guide Constraints
.range(0...20) — numeric range
.count(3) — array element count
description: — semantic guidance for generation
Let the model invoke custom code for domain-specific tasks:
swift
1struct RecipeSearchTool: Tool {
2 let name = "recipe_search"
3 let description = "Search for recipes matching a given term and return a list of results."
4
5 @Generable
6 struct Arguments {
7 var searchTerm: String
8 var numberOfResults: Int
9 }
10
11 func call(arguments: Arguments) async throws -> ToolOutput {
12 let recipes = await searchRecipes(
13 term: arguments.searchTerm,
14 limit: arguments.numberOfResults
15 )
16 return .string(recipes.map { "- \($0.name): \($0.description)" }.joined(separator: "\n"))
17 }
18}
swift
1let session = LanguageModelSession(tools: [RecipeSearchTool()])
2let response = try await session.respond(to: "Find me some pasta recipes")
swift
1do {
2 let answer = try await session.respond(to: "Find a recipe for tomato soup.")
3} catch let error as LanguageModelSession.ToolCallError {
4 print(error.tool.name)
5 if case .databaseIsEmpty = error.underlyingError as? RecipeSearchToolError {
6 // Handle specific tool error
7 }
8}
Core Pattern — Snapshot Streaming
Stream structured responses for real-time UI with PartiallyGenerated types:
swift
1@Generable
2struct TripIdeas {
3 @Guide(description: "Ideas for upcoming trips")
4 var ideas: [String]
5}
6
7let stream = session.streamResponse(
8 to: "What are some exciting trip ideas?",
9 generating: TripIdeas.self
10)
11
12for try await partial in stream {
13 // partial: TripIdeas.PartiallyGenerated (all properties Optional)
14 print(partial)
15}
SwiftUI Integration
swift
1@State private var partialResult: TripIdeas.PartiallyGenerated?
2@State private var errorMessage: String?
3
4var body: some View {
5 List {
6 ForEach(partialResult?.ideas ?? [], id: \.self) { idea in
7 Text(idea)
8 }
9 }
10 .overlay {
11 if let errorMessage { Text(errorMessage).foregroundStyle(.red) }
12 }
13 .task {
14 do {
15 let stream = session.streamResponse(to: prompt, generating: TripIdeas.self)
16 for try await partial in stream {
17 partialResult = partial
18 }
19 } catch {
20 errorMessage = error.localizedDescription
21 }
22 }
23}
Key Design Decisions
| Decision | Rationale |
|---|
| On-device execution | Privacy — no data leaves the device; works offline |
| 4,096 token limit | On-device model constraint; chunk large data across sessions |
| Snapshot streaming (not deltas) | Structured output friendly; each snapshot is a complete partial state |
@Generable macro | Compile-time safety for structured generation; auto-generates PartiallyGenerated type |
| Single request per session | isResponding prevents concurrent requests; create multiple sessions if needed |
response.content (not .output) | Correct API — always access results via .content property |
Best Practices
- Always check
model.availability before creating a session — handle all unavailability cases
- Use
instructions to guide model behavior — they take priority over prompts
- Check
isResponding before sending a new request — sessions handle one request at a time
- Access
response.content for results — not .output
- Break large inputs into chunks — 4,096 token limit applies to instructions + prompt + output combined
- Use
@Generable for structured output — stronger guarantees than parsing raw strings
- Use
GenerationOptions(temperature:) to tune creativity (higher = more creative)
- Monitor with Instruments — use Xcode Instruments to profile request performance
Anti-Patterns to Avoid
- Creating sessions without checking
model.availability first
- Sending inputs exceeding the 4,096 token context window
- Attempting concurrent requests on a single session
- Using
.output instead of .content to access response data
- Parsing raw string responses when
@Generable structured output would work
- Building complex multi-step logic in a single prompt — break into multiple focused prompts
- Assuming the model is always available — device eligibility and settings vary
When to Use
- On-device text generation for privacy-sensitive apps
- Structured data extraction from user input (forms, natural language commands)
- AI-assisted features that must work offline
- Streaming UI that progressively shows generated content
- Domain-specific AI actions via tool calling (search, compute, lookup)