Download YouTube 4K Videos with PHP and SlimerJS

Among friends let’s agree we’ll be privately caching1 videos and not permanently saving them, or we’ll be using them for Fair Use, and we’ll certainly not upload nor share these videos outside of the originating platform (e.g. YouTube.com).

These scripts all have limitations with cipher signatures, copyrighted videos, or need continual updating. We won’t have those problems with my approach. But first, TOS boilerplate.

YouTube TOS (Terms of Service)

I’ll actually be staying within the TOS of YouTube. Here is what it states:

“You may access Content for your information and personal use solely as intended through the provided functionality of the Service and as permitted under these Terms of Service.”

Check. We’ll be using a web browser (the “provided functionality”) to access the videos directly from the hosting web site, and we’ll be caching them.

“You shall not download any Content unless you see a ‘download’ or similar link displayed by YouTube on the Service for that Content.”

Check. We’re not going to download anything. YouTube is going to autoplay videos and save the data to our disk. YouTube is actually forcing data to our disk when you think about it. If we do download a video, then it will be for Fair Use.

Using PHP to download videos

There are desktop programs, CLI programs (e.g. youtube-dl) and phone apps as well as sites (e.g. savetube.com, keepvid.com) that let you save videos. My goal here is to demonstrate a simple way to download even the newest YouTube videos (and other HTML5 videos) via network inspection.

I wrote an iOS app that hooks into network requests in a UIWebView to get the direct video URL for offline caching. It’s been humming along for years without code modification. I’ll demonstrate the same technique using PHP. Here I’ll describe how I use my network-request-hooking method used in the iOS UIWebView with a headless browser (a browser with no GUI) to achieve an automated video downloader.

PhantomJS video downloading capability

“Because PhantomJS permits the inspection of network traffic, it is suitable to build various analysis on the network behavior and performance. All the resource requests and responses can be sniffed using onResourceRequested and onResourceReceived.”

Perfect, almost. PhantomJS is able to do my network trick, and ultimately I’d only need it to visit an autoplaying YouTube video link, sit back, and monitor network traffic. However, there is still the matter of HTML5 video and Flash support.

Unsupported features: Support for plugins (such as Flash) was dropped a long time ago. … Video and Audio would require shipping a variety of different codecs.

PhantomJS doesn’t support Flash nor HTML5 video. If you have a similar way of thinking to myself, you will try to inject some JavaScript2 to make websites think you have Flash and/or HTML5 enabled to get at src properties.

This works with Flash because you could get the URL for the Flash object (i.e. the <embed> tag) and the encoded video URL which together would pull video data from the direct URL.

It would be up to you to make the YouTube Flash object and the encoded video URL work together with a 3rd-party Flash solution as PhantomJS doesn’t support Flash.

As for HTML5 video mocking, it is possible to make web sites believe the <video> tag and any video format is supported. However, it isn’t possible to get the direct video URL from the encoded URL because the mocked video tag doesn’t actually play.

Try as we might, we cannot mock the <video> tag enough to get at the direct video URL.

1

2

3

4

5

6

7

8

9

10

11

12

page.onInitialized=function(){

page.evaluate(function(){

varcreate=document.createElement;

document.createElement=function(tag){

varelem=create.call(document,tag);

if(tag==="video"){

elem.canPlayType=function(){return"probably"};

}

returnelem;

};

});

};

We need a different solution than PhantomJS.

SlimerJS/Firefox can handle HTML5 video and Flash

We need an actual browser to make this work. Enter SlimerJS which has a similar API to PhantomJS, but uses an actual Firefox browser to render web pages, naturally supporting HTML5 video and Flash.

Sample request

A script on the above page crafted or retrieved this URL with a signature, requesting IP, expiration, content length and a slew of other parameters. This configuration regularly changes, and the validity of URL is short-lived plus it is restricted to being accessed by the same IP embedded in the URL, and there may be a cipher signature as well. That means copying it and manually entering in into a client-side browser will most likely fail with a 403 Forbidden error.

Trying to circumvent this protection by manually editing the URL, reverse-engineering and modifying the script that crafted it, or otherwise editing the calling page is against the TOS. It would also be time better spent on something else as the protection and scripts continually change. This is where the cat-and-mouse game played by “downloader” web sites begins. Fortunately, the hosting server will happily return the legitimate video data without any intervention on the user’s part.

Sample response

In the request URL is the parameter &range=0-127214 and in the sample response there is "bodySize": 127215 with a matching Content-Length of 127215 (~15.5 KB) confirming that the video data is indeed downloading to the cache on the server (because we are using SlimerJS and a headless Firefox browser on the server).

Intercepting requests

From here one could either retrieve and assemble the cache chunks in the Firefox profile cache folder on the server manually, or intercept the GET requests and curl the video and audio data to a predetermined cache location instead. Here is the matching cURL command for the above example request.

The User-Agent string contains “SlimerJS/0.10.1” which is appended to the actual UA string reported by the Firefox browser. This is a telltale sign that an automated browser is accessing a given web site. It’s worth modifying.

Selecting video quality

By default, the quality setting will be on “Auto” and YouTube will use the highest quality based on your video player size.

To help YouTube choose at most 1080p videos we can increase the viewport size in the SlimerJS script:

1

2

varpage=require('webpage').create();

page.viewportSize={width:1920,height:1080};

This renders a very large HTML5 player and in turn results in a better video quality being selected (up to 1080p) for the best user experience possible. This is again without any hacking or reverse engineering. That is very nice of YouTube.

4K Ultra HD video quality

Now, to get even higher quality videos like 4K Ultra HD videos, we can take advantage of Window.localStorage. YouTube remembers the quality you “manually” select and stores that setting in local storage. Here is an example from a video where I selected 1440p.

It looks like the yt-player-quality value is stored for a month and then expires.

Using a SlimerJS script we can inject the desired video quality into the local storage of the Firefox browser and update the expiration with this minimal snippet below. Before the URL is loaded the local storage data will be set or replaced:

Downloading the complete file

The example request we’ve seen so far will actually only download a ~15.5KB chunk of the video file. To get the whole video in one download requires a slight modification to the cURL request: remove &range=0-127214 from the request URL and the whole video file will be downloaded instead.

Merging the audio and video streams

In the latest browsers YouTube uses MPEG-DASH (Dynamic Adaptive Streaming over HTTP) which means the data is downloaded in chunks (we eliminated that problem just above), and the video and audio streams are most likely separate for most videos. Remember to obtain both the video and audio files. They can be recognized by the mime type parameter in the URL.

Video – videoplayback?key=yt6&…&mime=video%2Fwebm&…

Audio – videoplayback?key=yt6&…&mime=audio%2Fwebm&…

One solution is to feed the separate audio and video MP4 or WebM files to avconv or ffmpeg – something along the lines of ffmpeg -i video.mp4 -i audio.m4a -c copy combined.mp4 – to combine (mux) them into a single media file.

Discussion

I’ve demonstrated it is possible and straightforward to download a 4K YouTube video (or any available resolution video) with just SlimerJS and a bit of JavaScript. It can all be controlled by PHP, however, the controller is just a wrapper around CLI commands.

Next: Shortly I’ll put these functions together in a complete script with explanations.

Notes:

I’ll be using the term ‘download’ and ‘cache’ interchangeably to mean “temporarily store”. ↩