Audio Playback & Streaming
Learn how to play sounds and stream microphone audio using Swift and GStreamer on WendyOS
Audio Playback & Streaming
Source Code: The complete source code for this example is available at github.com/wendylabsinc/samples/swift/audio
In this guide, we'll build an audio application that demonstrates two key capabilities:
- Audio Playback: Triggering sound effects on the device from a web interface.
- Microphone Streaming: capturing live audio from the device's microphone and streaming it to a web client for visualization and playback.
This demonstrates how to use GStreamer with Swift to handle complex multimedia pipelines on embedded Linux.
Prerequisites
- Wendy CLI installed
- Swift 6.2 or later installed via swiftly (Xcode's Swift is not supported)
- A WendyOS device with a speaker and microphone (or a USB audio interface)
Recommended Hardware: For the best experience, we recommend using a USB speakerphone like the Anker PowerConf plugged into your NVIDIA Jetson via USB. It provides high-quality audio capture and playback in a single device.
Setting Up Your Project
Initialize the Project
wendy init audio --target wendyos --language swift --template audio --var APP_ID=audio --var PORT=6004 --var SWIFT_VERSION=6.3 --assistant skip --git-init no
cd audioThe template creates the Wendy config, Dockerfile, frontend, and Swift backend files with the audio entitlement already wired. The sections below explain the generated project.
Run on WendyOS
wendy runWendy will build the app, ask you to select a device if one is not already configured, deploy the app, and print the app URL.
Code Breakdown
Project Structure
audio/
├── Dockerfile
├── wendy.json
├── frontend/ # React + Vite frontend
│ └── src/
│ └── App.tsx # Audio visualizer & controls
└── server/ # Swift backend
├── Package.swift
└── Sources/
└── audio-server/
├── main.swift
└── sounds/ # WAV filesSetting Up the Backend
The backend uses Hummingbird for the HTTP/WebSocket server and a Swift wrapper around GStreamer for audio processing.
1. Package Dependencies
In server/Package.swift, we include the GStreamer Swift wrapper:
dependencies: [
.package(url: "https://github.com/hummingbird-project/hummingbird.git", from: "2.0.0"),
.package(url: "https://github.com/hummingbird-project/hummingbird-websocket.git", from: "2.0.0"),
.package(url: "https://github.com/wendylabsinc/gstreamer.git", from: "0.0.3"),
],2. Audio Playback Pipeline
To play a sound, we construct a GStreamer pipeline that reads a file, parses the WAV format, converts it to the correct audio format, and sends it to the default audio sink (speaker).
func playSound(_ soundName: String, soundsPath: String) async -> PlayResponse {
let soundFile = "\(soundsPath)/\(soundName).wav"
// GStreamer pipeline description
let pipelineDesc = """
filesrc location=\(soundFile) ! \
wavparse ! \
audioconvert ! \
audioresample ! \
autoaudiosink
"""
do {
let pipeline = try Pipeline(pipelineDesc)
try pipeline.play()
// Wait for End of Stream (EOS)
for await message in pipeline.bus.messages() {
if case .eos = message {
pipeline.stop()
return PlayResponse(success: true, sound: soundName, error: nil)
}
}
} catch {
return PlayResponse(success: false, sound: nil, error: "\(error)")
}
return PlayResponse(success: false, sound: nil, error: "Unknown error")
}3. Microphone Streaming Pipeline
To stream audio, we capture from the microphone (using ALSA, PulseAudio, or PipeWire), convert it to raw PCM data, and send it to an appsink where our Swift code can read the buffers.
func handleMicrophoneWebSocket(inbound: WebSocketInboundStream, outbound: WebSocketOutboundWriter) async {
// Pipeline: Capture -> Convert -> Resample -> Raw PCM (16kHz, Mono) -> AppSink
let pipelineDesc = """
autoaudiosrc ! \
audioconvert ! \
audioresample ! \
audio/x-raw,format=S16LE,rate=16000,channels=1 ! \
appsink name=sink
"""
guard let pipeline = try? Pipeline(pipelineDesc),
let sink = try? pipeline.audioSink(named: "sink") else {
return
}
try? pipeline.play()
defer { pipeline.stop() }
// Stream buffers to the client
for await buffer in sink.buffers() {
let data = extractAudioBytes(from: buffer)
let base64Data = data.base64EncodedString()
// Send JSON message to client
let json = """
{\"type\":\"audio\",\"data\":\"\(base64Data)\",\"sampleRate\":16000,\"channels\":1}
"""
try? await outbound.write(.text(json))
}
}Frontend Implementation
The frontend is a React application that connects to the WebSocket to receive audio data. It uses the Web Audio API to play the streamed audio and draws a visualization on a canvas.
// Connect to WebSocket
const ws = new WebSocket(`ws://${window.location.host}/ws/microphone`);
ws.onmessage = async (event) => {
const message = JSON.parse(event.data);
if (message.type === "audio") {
// Decode base64
const binaryString = atob(message.data);
// Convert to Int16 samples
// ...
// Play using Web Audio API
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start(nextPlayTime);
}
};Docker Configuration
Working with audio requires system-level dependencies. The Dockerfile installs GStreamer development files for building and runtime libraries for the final image.
# Build Stage
FROM swift:6.2.3-noble AS swift-builder
RUN apt-get update && apt-get install -y \
libgstreamer1.0-dev \
libgstreamer-plugins-base1.0-dev \
# ... other plugins
# Runtime Stage
FROM swift:6.2.3-noble-slim
RUN apt-get update && apt-get install -y \
libgstreamer1.0-0 \
gstreamer1.0-plugins-base \
gstreamer1.0-plugins-good \
gstreamer1.0-alsa \
gstreamer1.0-pulseaudio \
alsa-utils
# Copy sounds
COPY server/Sources/audio-server/sounds ./soundsEntitlements
To access the microphone and speaker, the application needs the audio entitlement in wendy.json:
{
"appId": "com.example.swift-audio",
"version": "0.0.1",
"entitlements": [
{
"type": "network",
"mode": "host"
},
{
"type": "audio"
}
],
"readiness": {
"tcpSocket": { "port": 3005 },
"timeoutSeconds": 30
},
"hooks": {
"postStart": {
"cli": "wendy utils open-browser http://${WENDY_HOSTNAME}:3005"
}
}
}The readiness probe waits for port 3005 to accept connections. The postStart hook automatically opens the web interface in your browser.
Run Again on WendyOS
- Connect your WendyOS device.
- Run the application:
wendy run- Your browser will open automatically once the app is ready. If it doesn't, navigate to
http://<device-hostname>.local:3005.
You should be able to click buttons to play sounds on the device and toggle the microphone to see the waveform of the audio captured by the device.
Troubleshooting Audio
If audio isn't working:
- Check Hardware: Ensure your microphone/speaker is selected in the system settings or properly connected via USB.
- Check Logs: Docker logs will show GStreamer errors.
wendy device logs - ALSA Devices: The app attempts to auto-detect ALSA devices. You can override this by setting the
AUDIO_DEVICEenvironment variable (e.g.,hw:1,0).