GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS

Written by thoughtbot

Streaming Audio to Multiple Listeners via iOS' Multipeer Connectivity

Music has always been a very important part of iPhones and all Apple devices. With the advent of iOS 7, Apple introduced a new technology called Multipeer Connectivity which allows us to stream data with NSOutputStream and NSInputStream. I wanted to use this great new framework to stream audio to many listeners. However, there is no easy way to play audio from an NSInputStream. So I set out on an adventure through CoreAudio to make this possible.

Overview

Multipeer Connectivity uses NSOutputStream to stream data to a connected peer. This is what we’ll use to send the audio data. On the receiving end, Multipeer Connectivity uses NSInputStream which we’ll use to harvest the incoming data. Using the Apple provided Audio Queue Services, we’ll send the data to the device’s system. With the Audio Queue Services, we can fill buffers with audio data and then play them. This is all we need in order to play raw audio data, but most audio files are encoded to reduce file size like MP3 and AAC files. Apple provides the Audio File Stream Services that can process the encoded audio format and return the raw audio data. The picture below shows the flow of data and initial state of the proposed solution.

AudioPlayer Initial State

First, we start the audio stream and as we receive data, we pass it into the stream parser where it will be decoded. The parser will then send out the raw audio data we need. There are three audio buffers in the audio queue which will be filled one by one with the data received from the parser. When full, a buffer will be enqueued to the system. When the system is finished playing an audio buffer, it is returned, refilled, then enqueued again in a loop until there is no more data to play. The GIF below demonstrates how the audio data flows from the code to the system hardware. The red and green squares represent the empty and full buffers respectively.

AudioPlayer Cycle

Sending the Audio Data

Now that we have some background on how streaming works, let’s play a song from our iTunes library. We can use a MPMediaPickerController to allow the user to pick a song to play. We will get an array of MPMediaItems from the picker’s delegate method mediaPicker:didPickMediaItems:.

An MPMediaItem has many properties we can look at for song title or author, but we’re only interested in the MPMediaItemPropertyAssetURL property. We use that to create an AVURLAsset from which we can read the file data by using AVAssetReader and AVAssetReaderTrackOutput.

NSURL *url = [myMediaItem valueForProperty:MPMediaItemPropertyAssetURL];
AVURLAsset *asset = [AVURLAsset URLAssetWithURL:url options:nil];
AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:asset error:nil];
AVAssetReaderTrackOutput *assetOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:asset.tracks[0] outputSettings:nil];

[self.assetReader addOutput:self.assetOutput];
[self.assetReader startReading];

Here, we create the AVURLAsset from the media item. Then we use it to create an AVAssetReader and AVAssetReaderTrackOutput. Finally, we add the output to the reader and start reading. The method startReading will only open the reader and make it ready for later when we request data from it.

Next, we’ll open our NSOutputStream and send the reader output data to it when its delegate method is invoked with the event NSStreamEventHasSpaceAvailable.

CMSampleBufferRef sampleBuffer = [assetOutput copyNextSampleBuffer];

CMBlockBufferRef blockBuffer;
AudioBufferList audioBufferList;

CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, NULL, &audioBufferList, sizeof(AudioBufferList), NULL, NULL, kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment, &blockBuffer);

for (NSUInteger i = 0; i < audioBufferList.mNumberBuffers; i++) {
  AudioBuffer audioBuffer = audioBufferList.mBuffers[i];
  [audioStream writeData:audioBuffer.mData maxLength:audioBuffer.mDataByteSize];
}

CFRelease(blockBuffer);
CFRelease(sampleBuffer);

First, we get the sample buffer from the reader output. Then we call the function CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer to get a list of audio buffers. Finally, we write each audio buffer to the output stream.

We’re now streaming a song from our iTunes library. Now, let’s look at how to receive this stream and play the audio.

The Data Stream

Since we are using Multipeer Connectivity, the NSInputStream is already made for us. First, we need to start the stream to receive the data.

// Start receiving data
inputStream.delegate = self;
[inputStream scheduleInRunLoop:[NSRunLoop currentRunLoop] forMode:NSDefaultRunLoopMode];
[inputStream open];

Here, we set the delegate to the class so we can handle the stream events. Next, we tell the stream to run in the current run loop, which could be on a separate thread, and to use the default run loop mode. It is important to use the default mode or our delegate methods will not be called. Finally, open the stream to start receiving data.

Our class should conform to the NSStreamDelegate protocol so we can handle events from the NSInputStream.

@interface MyCustomClass () <NSStreamDelegate>
//...
@end

@implementation MyCustomClass
//...

- (void)stream:(NSStream *)aStream handleEvent:(NSStreamEvent)eventCode
{
    if (eventCode == NSStreamEventHasBytesAvailable) {
        // handle incoming data
    } else if (eventCode == NSStreamEventEndEncountered) {
        // notify application that stream has ended
    } else if (eventCode == NSStreamEventErrorOccurred) {
        // notify application that stream has encountered and error
    }
}

//...
@end

Above, we use the delegate method to handle events received from the stream. When the stream ends or has an error, we should notify the application so it can decide what to do next. For now, we are only interested in the event that the stream has data for us to process. We need to take this data and pass it into the Audio File Stream Services.

The Stream Parser

The Stream Parser is an AudioFileStream class that we can push encoded audio data into and get back decoded audio data. First, let’s create the AudioFileStream.

AudioFileStreamID audioFileStreamID;
AudioFileStreamOpen((__bridge void *)self, AudioFileStreamPropertyListener, AudioFileStreamPacketsListener, 0, &audioFileStreamID);

We create the parser by passing it a reference to the class, a property changed callback function and a packets received callback function. We need to create these callback functions to use the reference to the class and call methods within the class.

void AudioFileStreamPropertyListener(void *inClientData, AudioFileStreamID inAudioFileStreamID, AudioFileStreamPropertyID inPropertyID, UInt32 *ioFlags)
{
    MyCustomClass *myClass = (__bridge MyCustomClass *)inClientData;
    [myClass didChangeProperty:inPropertyID flags:ioFlags];
}

void AudioFileStreamPacketsListener(void *inClientData, UInt32 inNumberBytes, UInt32 inNumberPackets, const void *inInputData, AudioStreamPacketDescription *inPacketDescriptions)
{
    MyCustomClass *myClass = (__bridge MyCustomClass *)inClientData;
    [myClass didReceivePackets:inInputData packetDescriptions:inPacketDescriptions numberOfPackets:inNumberPackets numberOfBytes:inNumberBytes];
}

Inside the didChangeProperty:flags: method, we are looking for the kAudioFileStreamProperty_ReadyToProducePackets property which tells us that all other properties have been set. Now we can retrieve the AudioStreamBasicDescription from the parser. The AudioStreamBasicDescription contains information about the audio such as sample rate, channels, and bytes per packet and is necessary for creating our audio queue.

AudioStreamBasicDescription basicDescription;
UInt32 basicDescriptionSize = sizeof(basicDescription);
AudioFileStreamGetProperty(audioFileStreamID, kAudioFileStreamProperty_DataFormat, &basicDescriptionSize, &basicDescription);

The other function for the packets received callback will return the decoded audio data that we will add to the audio queue buffers later.

Now, it’s time to pass the encoded data into the parser from the stream’s NSStreamEventHasBytesAvailable event.

uint8_t bytes[512];
UInt32 length = [audioStream readData:bytes maxLength:512];
AudioFileStreamParseBytes(audioFileStreamID, length, data, 0);

The file stream will continue to parse bytes until it has enough to decypher the type of file. At this point, it invokes its property changed callback with the property kAudioFileStreamProperty_ReadyToProducePackets. After this, it will invoke its packets received callback with nicely packaged packets of decoded audio data for us to use.

The Audio Queue

The Audio Queue is an AudioQueue class that allows us to create audio buffers, fill them, and then enqueue them. It also gives us audio control like play, pause, and stop. Now, let’s create the queue and its buffers.

AudioQueueRef audioQueue;
AudioQueueNewOutput(&basicDescription, AudioQueueOutputCallback, (__bridge void *)self, NULL, NULL, 0, &audioQueue);

AudioQueueBufferRef audioQueueBuffer;
AudioQueueAllocateBuffer(audioQueue, 2048, &audioQueueBuffer);

To create the audio queue we need to pass the AudioQueueNewOutput function the AudioStreamBasicDescription we received from the parser, a callback function that is invoked when the system is done with a buffer and reference to the class. Next, we create one audio buffer using the AudioQueueAllocateBuffer function and give it the audio queue and the size of bytes it can hold.

Now, we wait for the parser to invoke its packets received callback. Then, we fill an empty buffer with the packets. There are two possible formats for the data received from the parser, VBR or CBR. Variable Bitrate (VBR) means that the bit rate can change from packet to packet where as Constant Bitrate (CBR) means that it will be constant.

In the case of VBR, we can only fill the buffer with whole packets which contain many bytes. This means that the buffer may not fill up before we have to send it to the system. With CBR, we fill the buffer to the brim and then send it along.

CBR

AudioQueueBufferRef audioQueueBuffer = [self aFreeBuffer];
memcpy((char *)audioQueueBuffer->mAudioData, (const char *)data, length);

We also need some logic to make sure we’re not overfilling the buffer and if we don’t completely fill it, we should wait for more.

VBR

AudioQueueBufferRef audioQueueBuffer = [self aFreeBuffer];
memcpy((char *)audioQueueBuffer->mAudioData, (const char *)(data + packetDescription.mStartOffset), packetDescription.mDataByteSize);

Here, we need to check if the packet will fit in the leftover space in the buffer. If it can’t fit another packet of mDataByteSize then we will have to get another buffer. We also need to hold on to our packet descriptions for enqueueing.

When the buffer is full, we enqueue it to the system with AudioQueueEnqueueBuffer.

AudioQueueEnqueueBuffer(audioQueue, audioQueueBuffer, numberOfPacketDescriptions, packetDescriptions);

Now we’re ready to play audio. When all the buffers have been filled and enqueued, we can start playing sound with AudioQueuePrime and then AudioQueueStart.

AudioQueuePrime(audioQueue, 0, NULL);
AudioQueueStart(audioQueue, NULL);

AudioQueueStart allows us to pass a second parameter, instead of NULL, that represents a time to start playing. We will ignore this for now but it could be useful if audio synchronization is needed.

The End

Those are the basics for streaming audio through Multipeer Connectivity. At the end of this adventure I created a open source library that brings together everything described here in a more organized and structured fashion. If you’d like more detail, the complete code and examples are on GitHub at tonyd256/TDAudioStreamer.