Recording the microphone

Recording the microphone using AS3 is quite easy. Actually, storing the data or compressing it and sending it to a server is where it gets complicated. Normally, a Google search helps a lot when trying to find good information, but on this particular subject there seems to be a lot of misinformation. There are also a lot of libraries and / or frameworks available, but so far they didn’t really do what I wanted to do or did it in a far too complicated way. I’ve provided a list of audio libraries I tried out at the bottom.

Clearing things up

When you record data from the microphone using SampleDataEvents, the actual data is (part of a) 32-bit floating point (Big Endian) 44.1 kHz mono PCM audio stream (assuming that you left the rate of the Microphone at its default of 44). To store the microphone input, all you need to do is append this data to a buffer.

To play back the microphone input you can use the Sound class. The Sound class can only deal with a 32-bit floating point 44.1 kHz stereo PCM audio stream as input. Lucky for us, we’re almost there, we just need to convert mono into stereo. As you can see in the example below, creating (fake) stereo out of a mono signal is a matter of writing each sample twice.

The documentation for the Sound class shows an example of how to merge several Sounds into one using the extract method. We can apply the same technique to play back our ByteArray using the Sound class:

  1. // The buffer where we stored the microphone input.
  2. var playbackBuffer:ByteArray;
  3. // The number of channels the buffer contains,
  4. // 1 means mono, 2 means stereo.
  5. var channels:int = 1;
  6. var outputSound:Sound = new Sound();
  7.  
  8. outputSound.addEventListener(SampleDataEvent.SAMPLE_DATA,
  9. outputSampleDataHandler);
  10. outputChannel = outputSound.play();
  11.  
  12. private function outputSampleDataHandler(event:SampleDataEvent):void {
  13.     for (var i:int = 0; i < 8192 &&
  14.         playbackBuffer.bytesAvailable > 2; i++) {
  15.  
  16.         var sample:Number = playbackBuffer.readFloat();
  17.         event.data.writeFloat(sample);
  18.  
  19.         // Fake stereo as dual mono when the
  20.         // original data is only 1 channel.
  21.         if(channels > 1) {
  22.             event.data.writeFloat(sample);
  23.         }
  24.     }
  25. }

The above seems simple enough, but when you want to save the data to a file, things get complicated. First you need to choose a file format in which to store the output. Secondly, if you want to reduce the amount of data to be saved, you need to choose a codec to compress the audio stream. When I was experimenting with MP3 encoding of my recorded data, I found out that the codec required the stream input to be a 16-bit signed integer with a sampling rate of 44.1 kHz. When you get into this territory, it helps to read up a bit about the techniques behind digital audio. I’ll try to summarize some of the most useful things I learned.

Pulse-code modulation (PCM)

Pulse-code modulation is a way of representing analog audio signal on a digital system by sampling the signal at a fixed interval. See 1) in the image I created to explain some PCM examples.

By keeping the same playback frequency, but adding more samples (using some form of interpolation), you slow down playback / lower the pitch (blue curve). The opposite is also true, remove samples and playback will speed up / the pitch is higher (red curve). See 2) in the PCM examples image.

If you multiply or divide the each sample by a fixed value, you increase or decrease the signal strength (green curve). See 3) in the PCM examples image. Keep in mind that multiplying or dividing could result in clipping or complete cancellation of the signal if you overflow or underflow the bit depth precision.

Sample rate conversion

The sample rate defines how many samples are taken of the analogue sound per second and is expressed in Hertz, for example, CDs have a sample rate of 44.1 kHz. The higher the sample rate, the better the quality. Other typical sample rates are 8, 11.025, 22.05 and 48 kHz.

There are two types of sample rate conversion, there is upsampling and there is downsampling.

It is usually easier (whether you are upsampling or downsampling) to first upsample (using linear interpolation) to the least common multiple (LCM) and then downsample to the required sample rate.

Generally speaking when you downsample an audio signal you can get away with duplicating a sample when upsampling to the LCM, most of the data will be dropped anyway as you downsample to the required sample rate.

Since sample rate conversion is quite CPU intensive, there are some shortcuts you can take to convert to a certain sample rate from another. Say you want to convert 44.1 kHz into 8 kHz, it seems a bit redundant to first upsample to 3528000 Hz (which is the LCM), and then downsample to 8 kHz. What you could do instead, is alternate between using every 5th and 6th sample (since 44.1 / 8 is roughly 5.5).

Bit depth conversion

The bit depth determines granularity of a single sample, the higher the bit depth, the more accurate the sound can be represented. CDs have a bit depth of 16-bit, other typical bit depths are 8, 24 and 32.

Usually 8-bit audio streams are using unsigned integers, 16-bit ones are using signed integers and 32-bit streams use signed floating point. Knowing this it is quite straightforward to convert from one bit depth to the other, it’s a matter of multiplication.

Note: You may need to do some additional checks to make sure you aren’t overflowing the available bits when doing the conversion.

When you understand the theory I just talked about, it becomes a lot easier to write the tools you need to convert audio from one bit rate to the other as well as changing between sample rates. It also gave me some ideas about some filters to implement, like a normalization filter. To help you get started I’ve included some tips and references below.

Tips

  • The Microphone.rate property accepts values of 44, 22, 11 and 8. These values actually represent 44100, 22050, 11025 and 8000 Hz.
  • The WaveEncoder class of the MicRecorder library comes in quite handy when you want to save your recorded audio. Keep in mind that the encode method expects the passed in data to be 32-bit and will convert it to 16-bit on the fly. This tripped me up a couple of times as I passed in a 16-bit audio stream and was surprised to find out that the saved WAV file just produced a lot of white noise. On my copy of this class, I added an additional check to the create method, to make sure that the 16-bit space wasn’t being overflown when converting and multiplying by a _volume (multiplier) value.
  • When you want to output your recorded audio, make sure to keep the Endianness of your ByteArrays to Little Endian. I wondered countless times why playback of my ByteArray resulted in just noise, only to find out that I had forgotten about the Endianness of it.
  • Forget about the loadCompressedDataFromByteArray and loadPCMFromByteArray methods in the Sound class for Flash Player 11. They just don’t work, you’ll get a sound playing, but the last set of bytes seem to loop forever when the audio stream should be finished.
  • If you want to see how the sound you produced actually looks like, you may want to download sofware like Audacity. This tool can give you a visual representation of the wave form as well as a lot of information about the file that you opened. When you open a raw audio stream, you can also flick the settings around to figure out what sampling rate and bit depth the stream is using.

Further reading

AS3 audio libraries