November 12, 2024·3 min readWeb Audio APIWebSocketsNode.jsReal-time

How I synced audio playback across players using the Web Audio API

When I started building SoundTheGame, I assumed audio playback would be the easy part. Load a clip, play it, done. It took about three days of debugging to realise I was completely wrong.

The problem

In a multiplayer music guessing game, all players in a room need to hear the clip at the same time. If one player hears it 300ms before another, the game is unfair. That 300ms is the difference between winning and losing a round.

The naive approach — emit a WebSocket event and call audio.play() on each client when the event arrives — doesn't work. Network latency varies between clients. Player A might be on a 10ms connection, player B on a 200ms one. They won't hear the clip simultaneously.

The solution: schedule playback, don't trigger it

The Web Audio API exposes AudioContext.currentTime, a high-resolution clock that starts when the context is created. The key insight is:

Instead of saying "play now", say "play at time T".

Here's the approach:

// Server emits a scheduled play time (in server clock ms)
io.to(roomId).emit("play", { serverPlayAt: Date.now() + 500 });
 
// Client converts server time to AudioContext time
function schedulePlayback(serverPlayAt: number, offset: number) {
  const serverNow = Date.now();
  const delayMs = serverPlayAt - serverNow;
  const audioContextPlayAt = audioContext.currentTime + delayMs / 1000;
 
  const source = audioContext.createBufferSource();
  source.buffer = audioBuffer;
  source.connect(audioContext.destination);
  source.start(audioContextPlayAt);
}

The offset parameter accounts for the clock difference between server and client, which I measure with a simple ping/pong handshake at room join.

Clock synchronisation

The tricky part is that Date.now() on the client is not the same as Date.now() on the server. Clocks drift. To compensate:

// Client sends a ping with its local timestamp
socket.emit("ping", { clientSentAt: Date.now() });
 
// Server echoes it back with server time
socket.on("ping", ({ clientSentAt }) => {
  socket.emit("pong", { clientSentAt, serverNow: Date.now() });
});
 
// Client estimates offset
socket.on("pong", ({ clientSentAt, serverNow }) => {
  const rtt = Date.now() - clientSentAt;
  const serverOffset = serverNow - (clientSentAt + rtt / 2);
  // serverOffset is added to all subsequent server timestamps
});

This is a simplified version of the NTP algorithm. For a game, accuracy within ±50ms is more than enough.

Pre-buffering

There's one more issue: if the audio isn't loaded when the play event arrives, you still get a delay while the browser fetches and decodes it.

The fix is to pre-fetch and decode all clips at room join time using fetch + AudioContext.decodeAudioData:

async function preloadClip(url: string): Promise<AudioBuffer> {
  const response = await fetch(url);
  const arrayBuffer = await response.arrayBuffer();
  return audioContext.decodeAudioData(arrayBuffer);
}

Once decoded, the buffer lives in memory and plays instantly when scheduled.

What I learned

The Web Audio API is powerful but low-level. It doesn't abstract timing for you — that's intentional. The currentTime clock is designed precisely for sample-accurate scheduling, which is exactly what you need for synchronised playback.

The pattern of "schedule in the future instead of triggering immediately" is also useful beyond audio: it's how video players sync subtitles, how games sync animations across clients, and how collaborative editors handle conflict resolution.

If you're building anything real-time in the browser, learn this API. It'll change how you think about time.

← All posts