Musings on 16-bit audio

Post by **marurun** » Mon Nov 06, 2023 3:50 pm

I have once again been ruminating on the varied and interesting approaches to game audio taken by the 3 major 16-bit contenders. It's quite interesting just how different all their approaches are. Here are some thoughts I've accrued over the years.

Genesis/Mega Drive
FM Synth chip with 6 channels, the sixth of which can also do digital samples. But also the Master System audio chip, which is a 4 channel PSG chip consisting of 3 square waves (with limited tone range) and a noise channel. And then the Master System CPU, a z80, is also in the mix to coordinate and handle sound programs. Stereo panning limited to left, center, and right. Too many different FM chip revisions to count. Original unit could only do stereo out a headphone jack. Later models moved to an integrated ASIC without all the distortion the original chip had at high and low volumes, but workarounds some devs used to deal with that distortion don't translate well to the later ASIC, despite the later chip being better at what it's supposed to do.
Analysis
The FM chip is not one of the beefier ones so no built-in percussion samples, but it can still do some really great, smooth tones, especially classic synth sounds and also synth brass and synth organ as well as bell and chime tones. It can also produce some really crunchy sounding junk in inexpert hands. Usually a combination of very short, rough sampled drums and generated noise make up the percussion lines. The PSG chip was usually kept to just sound effects, though some composers made good use of it to add in additional depth to music. The Sonic series was known to use this fairly well. While there are some notable exceptions, most sampled audio on the Genesis sounded very poor quality due to Sega leaving out the ability to monitor the timer interrupt with the z80 to preserve backwards compatibility with Master System games. This means that it takes a skilled and/or attentive programmer to actually get good sound sample quality during active gameplay, either for a sound effect or in the music. Practically, it means a lot of games have awful sounding samples and that when it's done well it really stands out. This doesn't hurt sampled percussion much due to the very brief samples used. Most high quality samples on Genesis (SEGAAAAA! at startup, for example) are played back when the system is otherwise not occupied with much else to do. Actual audio output varies wildly depending on which specific Genesis unit you have. Some revisions, like the Genesis 1 VA2 "High Definition" model and the Genesis 2 VA4 model, are considered far superior to other revisions. Standout Genesis tracks are either very synth-heavy or embrace the crunch, and the Genesis largely trounces the competition on low-end basslines. Poor tracks often feature distorted and noisy or just bad instrument choices and even shaky tempos, due to the need to keep a close eye on the audio timer over it's inability to generate interrupts to the sound CPU. While many standout tracks use samples for more than just percussion, it's rare for tracks to use non-percussion samples and have them not sound pretty rough by comparison. There are LOTS of bad tracks that use samples badly, however.

Super Nintendo Entertainment System/Super Famicom
Custom Sony sample-based SPU which has S-DSP component for sample playback and other features and S-SMP component which is a basic 8-bit CPU (SPC700). Outputs up to 8 channels of 16-bit, 32khz audio (resampled). 64k total sound memory for samples and audio programs and data. Has noise generation capabilities but no tone generator capabilities. All pitched audio must be sample-based. Delay (used for echo effects) and filtering thanks to DSP. Full stereo panning.
Analysis
Being sample-based opens a lot of doors for music, but the SNES implementation also closes some doors. Limited memory pool means most samples are very short and low quality, meaning long notes often have obvious warble or other loop indications. This also means the filter and delay features of the DSP will be doing more work to compensate for poor sample quality. While this is often better than having overly rough or crunchy samples, it does mean many samples sound too soft or smudged. The more a pitch is shifted away from the base sample the more distortion is introduced through the sample being sped up (higher pitch) or slowed down (lower pitch), so good sampled instruments have samples at several pitches to ensure there isn't too much distortion from pitch changes. Limited memory means SNES instruments often have inadequate samples to ensure seamless pitch adjustment, leaving very high and very low notes feeling sluggish or hyperactive. Actraiser was the first game to use a sound engine that streamed samples into memory to overcome the 64k memory size (not 100% sure it was the very first, but I've seen the game referenced often as inspiring other composers and developers, including Square's Uematsu, to adopt the same approach). Instrument streaming is imperfect, however, as DMA transfers halt the CPU (the SNES CPU, not the audio CPU), and relying too heavily on external memory access for audio can significantly and noticeably rob the entire system of performance due to a slow bus and slow memory access. in-game sound effect quality often suffers compared to music instrument quality. Sound effects typically requiring longer samples compared to music instrument samples. Instrument samples can not only have ADSR envelopes applied but can also manually be chopped up into relevant components, like an attack sample and a sustain sample. This means sound effects are often sampled at lesser quality to keep memory use down. Overall, the SNES performs beautifully in the hands of a talented team with good programmers and sample crafters, but requires a certain minimum competency not to sound bland and inarticulate. This is the natural outcome of a more modern sound system which makes it easier to deliver "acceptable" results by reducing the challenge floor (though still not easier than simply recording Red Book) but still requires talent and skill to create and implement strong compositions.

TurboGrafx-16/PC Engine
6 channels of custom 5-bit waveform PSG (wavetable synthesis, effectively) integrated into and driven by the CPU. Last two channels can generate noise. Samples can be played back at 5-bit quality either by constantly changing the waveform of a channel or by driving the channel directly with the CPU, and in direct-drive mode channels can be paired to improve quality (2-channels = 10-bit). Despite the ability to playback samples, the primary purpose of this waveform functionality is to create basic waveforms, like square waves, triangle waves, saw waves, and any and all weird variations in-between, though the tiny waveform space does limit even this in some ways. Stereo panning for every channel (4-bits of volume, or 15 steps, each for left and right). Optional CD-ROM attachment adds Red Book CD audio playback and a single 8-bit ADPCM channel with it's own 64k memory pool. (Why is the CD-ROM mentioned here and not for the Genesis/Mega Drive? Because the CD-ROM was more central to the system's market and commercial life and represented a much larger proportion of total sales, comparatively. But don't worry, the CD-ROM attachment won't really figure much into the analysis.)
Analysis
This system has the most "8-bit" like sound, comparatively, despite having relatively good audio flexibility. Being able to set custom waveforms means the system is capable of generating a broad swath of instrument sounds. Limited waveform memory means a lot of them can sound pretty crunchy or noisy due to inadequate bit-depth for smooth waveforms. Because waveform is slowed or sped up to alter pitch, lower notes lose additional quality, so bassy instruments must be created carefully or played as fixed-pitch samples (Sunsoft did this for Batman). Instruments are generally not as smooth as NES tones but much more varied (NES has 2 fixed square waves, a fixed triangle wave with no volume adjustment, and a noise channel). In practice, many developers used pre-set engines with instrument presets. Since the CPU drives the audio a complex audio engine or song can put a drag on system performance, though impact is generally minimal. Typical audio drag on CPU is probably around 5% - 15% of CPU performance, with engines that use more samples using more CPU time. An extreme commercial example is Air Zonk, with as much as 30% CPU overhead when decompressing and playing samples. The game still manages fast action and multi-line scrolling all over the place, so this clearly isn't debilitating. A homebrew MOD-style player (4 XM-style channels, 2 PCM channels) developed for the PC Engine (release not yet public, sadly) only has a 30% CPU overhead for the 4 XM channels and hits 37% with the two PCM channels also engaged, according to the developer. This level of impact isn't quite into DMA overuse territory on the SNES but does need to be accounted for. Many CD games, particularly RPGs, also featured generated audio in places due to the ~60 minute limit on Red Book audio. So action stages and animated cinematics will typically have Red Book audio, sound effects and non-cinematics voiceovers will be 8-bit ADPCM, and PSG generated audio can then be used for towns or slower or more somber tunes which don't need electric guitars or wailing synths. Using samples for percussion and sound effects was common. Compile was notable for this, relying heavily on short, sharp samples for sound effects and percussion even in early titles on the system. Stereo was also used heavily as this was a relatively new feature at the time and was panned audio rather than simple left-center-right like the Genesis or GameBoy. While some samples could be very scratchy at times, most routine sample usage was higher quality than on Genesis, and in some cases sound effects could be more crisp even than on SNES, though without the benefit of filtering and delay effects. One complication on sample use is due to an unfortunate bug in the CPU which generates noise when an audio channel is set to 0, either to mute/deactivate or to change the waveform. This bug can be counteracted by using a second channel to cancel out artifact noise. The bug was fixed in a later CPU revision but that revision saw limited release, probably due to a surplus of original revision CPUs. This means that some games suffer from excessive noise during playback when instrument waveforms are changed mid-tune or waveforms are used to perform sample playback without using another sound channel for noise cancellation or driving the sample playback directly with the CPU.

Post by **Ziggy** » Tue Nov 07, 2023 6:10 pm

Thanks for posting this! I haven't had a chance to read the entire thing yet, just skimmed it so far. But I love this topic!

Sun Nov 12, 2023 12:40 pm

Fantastic writing!
I've always wondered how the Genesis version of NBA Jam was able to make fairly good announcer samples.

Musings on 16-bit audio

Musings on 16-bit audio

Re: Musings on 16-bit audio

Re: Musings on 16-bit audio