Why hasn't this been done?
One aspect of old-fashioned broadcasting that hasn't been transferred to the Web is
volume normalization. Since the 1930s, radio and TV have used analog compression systems that instantaneously keep the volume within a constant range. You can set your volume to a comfortable level on one station, and you can be confident that you won't get blasted out of your seat. You can tune to other local stations without changing the level.
Those volume equalization circuits are fairly simple, and real-time digital processing is also possible with FIR filter techniques. So why hasn't any of this moved into computer sound systems? On the Web, all stations are local. There's no such thing as DX. Yet all stations, and all MP3 clips, and all Youtube clips, have totally different and independent settings, with no equalization at all. Some MP3 players
claim to have a sort of normalization, but it doesn't work like broadcast compressors. It merely sets a single volume level compensating for the RMS average of the ENTIRE CLIP. Nearly pointless.
In the newer realm of HTML5, this should be a job for media queries. The CSS specs do include a
limited set of volume properties, which might be enough for your 'receiver' to tell the 'transmitter' about your preferred volume. Pretty much the same as the normalizer on MP3s.
Again nearly pointless, unless there's an agreed-on standard level.
= = = = =
The difference may not be familiar. Here's an attempt to show it, using the processing facilities of Audacity.
Wave 1 is the original.
Wave 2 shows how the 'usual' normalizer might see the RMS average of the whole clip.
Wave 3 shows the result of 'usual' normalizing, multiplying the whole clip by a constant to bring the RMS average up to some desired number. Note that everything moves up together; the soft parts are still too soft, and the loud parts are now clipped. There's still no way to set your final volume control for comfort.
Wave 4 shows the result of REAL normalizing, with a curve similar to broadcast normalizing. Each syllable or beat is brought
up or down, closing in on a median level from both directions. The
DELTA between loud and soft, which is the only thing that really matters to our perception, is shorter than before.