Sunsoft's real audio trademark, though I can't explain it in the technical sense (we'll leave that for muteKi), was their 'slap bass' sound. The one curious thing about it was that it always had to finish its process even when a game was paused (at least, it seemed that way in Battle Formula/Super Spy Hunter), never cutting off in the middle of a bass note.
This is all opinion, so feel free to bash and mock as you like.
This post is pretty long, and starts from the fairly basic standpoint. If you're already pretty clear on how the NES sound hardware works, I've put another bold headline where the interesting stuff is. For everyone else, I have this introduction.
Brief overview of the NES sound hardware time! This is dealing with the stock hardware in the NES, that is, the 2A03 chip.
The sound hardware is technically expandable, but it was only through the bottom port in the US (this expansion was in the famicom cart pins in Japan, hence the expanded audio in stuff like Gimmick!, CV3, and Lagrange Point; apparently they changed it around so that the FDS could fit better without needing its own cartridge to be plugged in -- oops).
So, anyway, the 2A03. Biggest thing to note about it is its general similarity to most other chips at the time. 3 sound channels and one noise channel are present. What makes it different is that one of those sound channels is a triangle wave rather than a square wave, and it's tuned down an octave.
But the other two channels are more or less your average square wave, or with a twist. Compared to some of the other common PSG chips of the time, like the AY-3-8910 (ZX Spectrum, Atari ST, MSX among several others) or the SN76489 (Sega System 1 and 2, all Sega consoles up to the Master System and Genesis, Neo Geo Pocket Color, IBM PCJr and a few others), it's got one advantage: a configurable duty cycle on the square wave.
So what does that mean? A square wave basically has two states: low and high. On the SN- and AY- chips this is fixed at 50%, so the square wave is "high" for a certain amount of time and "low" for the same amount of time. On the NES this can also be set to 25% or 12.5% (or 75%, but this is
usually perceived to be the same as the 25% in terms of sound), which means that during a single cycle between high and low the square wave only stays at high for 25% or 12.5% (or 75%) of the time.
What a 12.5% duty cycle square wave sounds like.What a 25% duty cycle square wave sounds like.What a 50% duty cycle square wave sounds like. This is, understandably, the sound most people usually associate with a square wave, even though technically the other two sounds are as well -- or, at least, a "pulse wave".
Note that this is still less powerful than the SID chip in the C64, which could do pulse, triangle,
or saw, configurable per channel in much the same way.
One thing the SID doesn't have, though, and which the 2A03 does (as well as the SN- and AY- chips mentioned previously) is a dedicated noise channel. It can be played back at a few different frequencies. Vaguely similarly to the square waves' duty cycle configuration it can be played in one of two modes, which changes the way in which the pseudo-random wave form is generated.
Here's the probably more common first mode of the noise channel. Note that it's similar to static. That's intentional. The waveform is generated from a pseudorandom pattern, which means the waveform can vary wildly from one moment to the next. This means that the sound doesn't have anything like a perceived fundamental frequency the way the tone channels do.
Some of the higher pitches on this noise channel can sound a bit more like a cymbal, and so it's usually used for a hi-hat part, although it can be (like with the other two chips) also doing main percussion duties. This was more common in older games, though even more than that it was a function of size of the cartridge memory -- more on that later.
I should probably also note that the envelope on the noise channel is configurable, which changes the way it fades in and out. This is actually fairly important to getting a lot of sounds to seem rather cymbal-like, rather than just flat static. Usually that means a gradual decay of the volume, although you could also fade it in (such as if you wanted to do a 'reverse cymbal' sound). This sort of configurability is available to the square wave chanels as well.
Here's the second mode. The sound is a bit buzzier in this mode. I can't think of any cases where this mode was used for music in any games, though it's been used for sound effects sometimes.
Anyway, a while back up there I mentioned the triangle wave channel. It's a bit less flexible than the other two tone channels, both because it's
always a triangle wave (no additional modes on it), and unlike the wave and noise channel, it's fixed volume. It does have, as noted above, the benefit of being tuned an octave down. It's very commonly used as a bass channel for that reason, though it was sometimes used for melody as well.
This is what it sounds like.In addition to all this, there's also a relatively unusual sample. It's a DPCM (differential pulse-code modulation) playback channel. DPCM works in the following way: a PCM (i.e., wave) file is read into a converter, which translates from points measured on the input sound wave to a measurement of the change in the height of the wave per sample instead. Since this usually tends to be fairly regular for most sounds, it means that some good compression can be applied; a DPCM file can be 2-4 times smaller in most cases than the PCM file it's based on.
That doesn't mean the DPCM data is small, exactly. The NES only has so much memory to work with, so there's only so much stuff you can do with the channel in a game, and it's also still going to sound muffled and a little scratchy. Usually it's just used for speech samples, or percussion. A great example of what that can do for a song can be heard in Super Mario 3's Underworld music (which also features the noise channel as a choppy hi-hat).
Of course, you're not limited to drums at all. You could play back any wave you wanted, though stuff with higher-pitched stuff won't come through as well due to the low sampling rate. This is basically a consequence of what's known as the Shannon-Nyquist sampling theorem, which more-or-less says that to store a waveform exactly as a sampled sound, you need to sample at double the rate at the highest component of a sound. As you might guess from this, that also means that an arbitrary sampling rate can only capture components of a sound which are of frequency less-than-or-equal-to that rate. So if we're stuck with a
really low sampling rate we can't do much stuff that's upper-range (good luck trying to play back a piccolo melody, for example), but on the other hand a 44.1 kHz sampling rate should be able to capture just about everything in human hearing range (note that the highest detectable frequency for most people is around 20 kHz).
If you're just looking to get to the good part, here it is:Now the DPCM channel has a few different sampling rates, which can be applied to a single sample to change its pitch. As it turns out, according to
here, these values range from about 4 kHz to 33 kHz. That's not bad at all, but there's still some filtering and compression to deal with. Another thing to note is that the spacing between each of the 15 frequencies spans 3 octaves, and that for the higher notes it's not a complete scale (the range from C9 to C10
is a full C-major scale). Because of the fact that the frequency divider for the DPCM playback rate doesn't have the sort of freedom to scale to any frequency (it's based on the system's clock frequency), most of the higher notes are way more detuned from the note they're supposed to be close to than the lower ones. So we may not want to use the upper set of values past the rate matching C10 in that diagram.
So we end up with two things the DPCM channel can do pretty well. Percussion, and bass samples -- and it can play back a single bass sample so that it comes really close to being the same pitch as any note in the C-major scale, though it only has about an octave depending on the detune you're willing to put up with. Still, there's no reason we can't have a couple extra samples to cover the notes (i.e., those black keys) that we can't reach with a single sample.
Anyway, that's what Sunsoft did with most of their music; they exploited the DPCM channel in order to play bass samples rather than percussion (or ignore it like a lot of games did). How well did it work? Quite damn well, certainly!
Here's the bassline extracted from the title theme.Of course, we can't change the volume of the pitches so there's no way to really differentiate between one note and the next, but then again we wouldn't be able to do that with the triangle wave either, and this certainly does sound a bit deeper, or at least fuller.
But what about those drums? The music has pretty solid percussion overall! Well, as you might guess, the percussion uses the noise channel. So did a lot of games, and they all tended to sound pretty good. So what's the noise channel sound like, then?
This is the noise track extracted from the title theme.Well, that's certainly tinny, isn't it? So, you might be wondering, how did they get the deep bass sound in that snare? Those of you who've heard how the NES Mega Man games do their tom-toms might have an idea -- it's under a similar principle.
The triangle wave is playing its part concurrently with the noise channel, and has a pitch bend down applied to each note. Note that this is actually a pretty common trick for increasing the sound of percussion on a lot of the older chips that don't support wave out (or don't support it without taking up most of their system's CPU power) for playing percussion tracks. Oftentimes just a square or triangle wave is played like that on its own to simulate drums (such as in the Mega Man example above). Since the NES does drums pretty well, it's relatively rare to have a game do percussion this way, using both a triangle and noise channel -- and if the game doesn't use the DPCM at all, usually the music can't spare the triangle wave to duplicate the noise channel's pattern.
Anyway, here's the triangle wave channel extracted.Put them together and you get this:
This is what you get mixing both channels. Sounds good, but it might not seem as deep/full as with the other parts in. That's largely because a lot of the audio spectrum gets covered up by the other melodic channels; this can cause the drums to sound like they cover a lot more range than they usually do otherwise, since if they did it would still be covered up by those other channels.
For comparison, here's the full title screen music:
And there you have it.