This guide shows how to create an iOS Shortcut that records audio, sends it to your Agent CLI web service, and puts the cleaned transcription in your clipboard.
- Agent CLI Server Running: Your Agent CLI server must be running and accessible
- FFmpeg Installed: For local ASR with audio conversion (iOS uses m4a format)
- macOS:
brew install ffmpeg - Linux:
sudo apt-get install ffmpeg(Ubuntu/Debian) - Windows: Download from ffmpeg.org
- macOS:
- OpenAI API Key: Configure your OpenAI API key in Agent CLI (if using OpenAI)
- Network Access: Your iPhone needs network access to reach the server
-
Install dependencies:
pip install fastapi uvicorn[standard]
-
Start the server:
agent-cli server --host 0.0.0.0 --port 61337
-
Test the server is working:
curl http://your-server-ip:61337/health
- Open the Shortcuts app on your iPhone
- Tap the + button to create a new shortcut
Action 1: Record Audio
- Search for and add "Record Audio" action
- Configure:
- Start Recording: Immediately
- Stop Recording: When shortcut is run again (or set a time limit)
- Audio Quality: Choose based on your preference (Higher = Better quality, Larger files)
Action 2: Get Contents of URL
- Search for and add "Get Contents of URL" action
- Configure:
- URL:
http://YOUR_SERVER_IP:61337/transcribe - Method: POST
- Headers: Leave empty
- Request Body: Form
- URL:
Action 3: Get Dictionary Value
- Search for and add "Get Dictionary Value" action
- Configure:
- Dictionary: Output from Get Contents of URL
- Get Value for:
cleaned_transcript(orraw_transcriptif you prefer unprocessed)
Action 4: Copy to Clipboard
- Search for and add "Copy to Clipboard" action
- Input: Use the text from the previous step
Action 5 (Optional): Show Notification
- Search for and add "Show Notification" action
- Configure:
- Title: "Transcription Complete"
- Body: Use the transcribed text
In the Get Contents of URL action, tap "Show More" and configure:
Critical: Configure Form Data
- In the Get Contents of URL action, tap "Show More"
- Set Request Body to "Form"
- Tap "Add new field" to add the audio file:
- Key:
audio(exactly, lowercase) - Value: Select the "Audio" output from your Record Audio action
- Type: Make sure it's set to "File" (not "Text")
- Key:
Optional Form Fields: Add these fields if needed by tapping "Add new field":
- Key:
cleanup, Value:true(enables AI text cleanup) - Key:
extra_instructions, Value: Custom instructions for processing
- The audio field name must be exactly
audio(lowercase, case-sensitive) - Audio field type must be set to "File" (not "Text")
- Form fields must be configured manually - iOS doesn't add them automatically
Common Issues:
- ❌ Field named "Audio" (uppercase) - won't work
- ❌ Field type set to "Text" - won't work
- ❌ No form fields configured - will give 422 error
- ✅ Field named "audio" with type "File" - works correctly
- Name your shortcut (e.g., "Voice to Text")
- Tap "Done" to save
- Run the shortcut to test it
- Grant microphone permissions when prompted
Add to Home Screen:
- Go to Settings > Shortcuts
- Find your shortcut and tap the settings icon
- Tap "Add to Home Screen"
Add to Control Center:
- Go to Settings > Control Center
- Add "Shortcuts" if not already added
- Your shortcut will be available in Control Center
"Could not connect to server"
- Verify server is running:
curl http://your-server-ip:61337/health - Check firewall settings on server
- Ensure iPhone and server are on same network (or server is publicly accessible)
"No audio recorded"
- Grant microphone permissions to Shortcuts app
- Check audio recording settings in the Record Audio action
"Get Contents of File not available"
- This action was removed in newer iOS versions
- The recorded audio is automatically passed between actions as a variable
- Simply use the output from "Record Audio" directly in "Get Contents of URL"
"Transcription failed"
- Verify OpenAI API key is configured in Agent CLI
- Check server logs for error messages
- Ensure audio file format is supported (wav, mp3, m4a, etc.)
"Empty response"
- Check if the audio was too short or silent
- Verify the Get Value from Dictionary action is looking for the right key
"422 Unprocessable Content" Error
- This means the form fields are not configured correctly
- Make sure you've added the
audiofield in the Request Body Form section - The audio field must be type "File" not "Text"
- Field name must be exactly
audio(lowercase) - Check server logs for specific error details
"FFmpeg not found" Error
- Install FFmpeg on your system for local ASR with audio conversion
- macOS:
brew install ffmpeg - Linux:
sudo apt-get install ffmpeg - Alternative: Use OpenAI ASR instead (set
asr-provider = "openai"in config)
Config File Example (~/.config/agent-cli/config.toml):
[defaults]
# For transcription with Wyoming/FasterWhisper (local)
asr-provider = "wyoming"
asr-wyoming-ip = "localhost"
asr-wyoming-port = 10300
# For LLM cleanup (can use Ollama, OpenAI, or Gemini)
llm-provider = "ollama"
llm-ollama-model = "llama3"
llm-ollama-host = "http://localhost:11434"
# If using OpenAI for transcription or LLM:
# openai-api-key = "your-api-key-here"
[transcribe]
llm = true
clipboard = false # Disabled for web service
extra-instructions = "Your custom cleanup instructions here"Voice Activation:
- Add shortcut to Siri by saying "Hey Siri, add to Siri" while viewing the shortcut
- Record a custom phrase like "Transcribe this"
Conditional Processing:
- Add "If" actions to handle different response cases
- Show different notifications based on success/failure
Text Processing:
- Add text manipulation actions after transcription
- Format text, convert case, etc.
Alternative: Save Recording First If you want to save the audio file:
- After "Record Audio", add "Save to Files" action
- Choose location (e.g., iCloud Drive/Recordings/)
- Name:
Recording-{Current Date}
- Add "Get File" action to retrieve the saved file
- Use this file in "Get Contents of URL"
Request:
Content-Type: multipart/form-data
audio: <audio file>
cleanup: true/false (optional, default: true)
extra_instructions: <string> (optional)
Response:
{
"raw_transcript": "original transcription",
"cleaned_transcript": "cleaned and formatted text",
"success": true,
"error": null
}Response:
{
"status": "healthy",
"version": "1.0.0"
}- Network Security: Use HTTPS in production
- API Key Protection: Keep OpenAI API key secure
- Access Control: Consider adding authentication to your API
- Firewall: Only expose necessary ports
- Set up HTTPS with SSL certificates for production use
- Add authentication to the API endpoint
- Configure automatic server startup
- Create multiple shortcuts for different transcription scenarios