I would like to build a streaming video player for iOS that is capable of playing .mp4 files hosted on a remote server. I would also like to do this using just low-level APIs like those found in AudioToolbox and VideoToolbox. I do NOT want to use AVPlayer for this. I want to do this mostly as a learning exercise, but also because I've noticed that AVPlayer and its related classes block the main thread quite a bit which makes it really difficult to embed a video in a scroll view.

From what I know at the moment, this will require 4 major challenges which I'll have to solve;

Downloading an MP4 file from the internet and extracting audio/videos samples

This is probably the area where I have the least idea about how it would work. Let's say that I initiate a network download for an MP4 file and I get a resulting data stream.

What's the format of this data stream?

Are there any documents that I can look at to read more about what that would look like?

How do I extract the audio and video samples from that stream?

Decompressing and playing audio samples from the stream

From poking around Apple's documentation it looks like I can use the AudioQueue API for this. It seems to have everything that I'll need to play a stream of audio data. It probably won't be easy to get everything setup correctly to play compressed audio data, but it seems doable.

Decompressing and displaying video samples from the stream

For decompressing and displaying video samples, so far I've looked into AVSampleBufferDisplayLayer. I've played around with it a bit and it seems very promising for what I want to do. Would AVSampleBufferDisplayLayer be the recommended thing to use here, or is there something else that's more suitable?

Synchronizing audio and video playback

This is another one of those areas that I don't understand too well. From reading documentation and header files it looks like I need create a CMAudioClock and then create a CMTimeBase using that clock which can then be used to drive the AVSampleBufferDisplayLayer. There are a few things about this setup that I don't understand

How quickly does a CMAudioClock advance? Does it just advance constantly at a fixed rate? Or does it advance a little bit every time a sound is played?

If I use a CMAudioClock with an associated CMTimeBase do I need to associate it with my audio pipeline at all? How do those two things keep in sync?

You are on the right track. I have a linux-box with a camera (Raspberry Pi). This one opens a socket and write a H.264 stream into it. I wrote an initial working app on iOS that receives that stream and displays it. For me, the most important thing to make things more clear was watching 'Direct Access to Video Encoding and Decoding' from WWDC 2014. That session can give you some idea, but not all the details, on how to approach things. If you have some code, I will gladly take a look at it. I am by no means a guru, but I do have working code :). By the way, my implementation is audio-agnostic.
– JorideJul 13 '15 at 19:28

@Joride I would love to take a look at your example.
– user3344977May 17 '16 at 3:30

Antonio did you ever make progress on this? I'm thinking of doing the same thing. My scroll performance is less than idea due to main thread blocking issues.
– user3344977May 17 '16 at 3:30