A year or two ago I bought a pair of Sony WH-1000xm3 noise-cancelling headphones, and as an introvert – they were absolutely life-changing. The noise cancellation is fantastic, and they're also quite nifty with some touch controls built into the right cup: stroke up and down to raise and lower the volume, double-tap to toggle play or pause, and stroke forwards or backwards to skip forward or backwards.
At work I usually have some mindless background noise via Soundcloud in a browser tab. That does the job against office noise, but it's slightly inconvenient having to switch to that tab just to pause.
Recently I learned that browsers should support media keys out of the box, just like a regular media player, without having to switch to the tab! I'm a hair-shirt wearing Linux user however, and occasionally you don't get the just-works integration with every new bit of kit that other OS users may enjoy, so I wasn't too surprised that it didn't work for me, and forgot about it. Until, that is, I was booted into Windows and suddenly realised that it did just work, and regained some motivation to investigate.
Anyway, the short version is: it more or less does just work™, but you need to decide how to wire it up. There are two parts to the puzzle; MPRIS and AVRCP.
MPRIS is the standard interface for media players, including web browsers in this instance. It stands for "Media Player Remote Interfacing Specification", and is a DBUS interface. The Arch Linux docs are again a useful resource.
To test it in action, you just need something that speaks the client side of the protocol. I picked playerctl, which was in the Ubuntu repo.
Start some music playing in a browser tab (it doesn't have to be
selected), and execute
playerctl play-pause to toggle playing and
The other half of the equation: I spent a bit of time expecting to need to dig into dark corners to hook up my headphones, but it turns out that they implement something called AVRCP, which is apparently another standard interface and stands for "Audio/Visual Remote Control Protocol".
I hadn't heard of it, but it turns out that the headphone's gesture-controls I
mentioned in the intro already generate media key events! You can
test this by – with the headphones connected of course – running
xev and double-tapping, stroking, etc. For instance, when
double-tapping you will observe
when stroking forward, etc.
Wiring it all up
With those in place, it pretty much does just work: you just need to decide what to do with those keys. You have plenty of options here, but I went with xbindkeys, which is simple to set up and even offers a GUI configuration if you wish. I handle play/pause, and the next/previous events (the headphones have their own internal volume control which is fine for my purposes).
The relevant bit from my config:
m:0x0 + c:209
m:0x0 + c:171
m:0x0 + c:173