Advantages of Youtube Caching !!!

In most part of the world, bandwidth is very expensive, therefore it is (in some scenarios) very useful to Cache Youtube videos or any other flash videos, so if one of user downloads video / flash file , why again the same user or other user can’t download the same file from the CACHE, why he sucking the internet pipe for same content again n again?
Peoples on same LAN ,sometimes watch similar videos. If I put some youtube video link on on FACEBOOK, TWITTER or likewise , and all my friend will watch that video and that particular video gets viewed many times in few hours. Usually the videos are shared over facebook or other social networking sites so the chances are high for multiple hits per popular videos for my LAN users / friends. [syed.jahanzaib]

This is the reason why I wrote this article.

Disadvantages of Youtube Caching !!!

The chances, that another user will watch the same video, is really slim. if I search for something specific on youtube, i get more then hundreds of search results for same video. What is the chance that another user will search for the same thing, and will click on the same link / result? Youtube hosts more than 10 million videos. Which is too much to cache anyway. You need lot of space to cache videos. Also accordingly you will be needing ultra modern fast hardware with tons of SPACE to handle such kind of cache giant. anyhow Try it

AFAIK you are not supposed to cache youtube videos, youtube don’t like it. I don’t understand why. Probably because their ranking mechanism relies on views, and possibly completed views, which wouldn’t be measurable if the content was served from a local cache.

After unsuccessful struggling with storeurl.pl method , I was searching for alternate method to cache youtube videos. Finally I found ruby base method using Nginx to cache YT. Using this method I was able to cache all Youtube videos almost perfectly. (not 100%, but it works fine in most cases with some modification.I am sure there will be some improvement in near future).

5) Install RUBY

What is RUBY?
Ruby is a dynamic, open source programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write. [syed.jahanzaib]

Now install RUBY by following command

apt-get install ruby

6) Configure Squid Cache DIR and Permissions

Now create cache dir and assign proper permission to proxy user

mkdir /cache1chown proxy:proxy /cache1 chmod -R 777 /cache1

Now initialize squid cache directories by

squid -z

You should see Following message

Creating Swap Directories

7) Finally Start/restart SQUID & Nginx

service squid startservice nginx restart

Now from test pc, open youtube and play any video, after it download completely, delete the browser cache, and play the same video again, This time it will be served from the cache. You can verify it by monitoring your WAN link utilization while playing the cached file.

Look at the below WAN utilization graph, it was taken while watching the clip which is not in cache

WAN utilization of Proxy, While watching New Clip (Not in cache)

Now Look at the below WAN utilization graph, it was taken while watching the clip which is now in CACHE.

WAN utilization of Proxy, While watching already cached Clip

Playing Video, loaded from the cache chunk by chunk

It will load first chunk from the cache, if the user keep watching the clip, it will load next chunk at the end of first chunk, and will continue to do so.

Video cache files can be found in following locations./usr/local/www/nginx_cache/files

e.g:

ls -lh /usr/local/www/nginx_cache/files

The above file shows the clip is in 360p quality, and the length of the clip is 5:54 Seconds.
itag=34 shows the video quality is 360p.

i used 240p,480p and 720p these all works fine…..
i use Smart Video Firefox Plugin which forces the user to watch video in desired quality every time…..It automatically select your desired quality on each n every video.
http:// youtu.be/ 5BejfxzXqMw

thanks for new YT cache
but in squid.conf there is no ZPH directives to mark cache content, so that it can later pick by Mikrotik.
Did you find the solution ???
Did Nginx method cache Windows Update , AntiVirus Update and (Google,MSN,CNN,Metacafe,etc) Videos ???

This method is just an example for caching youtube videos. You can integrate it to cache other dynamic content also along with squid I will do some more testing and will post the updates whenever I will get time. The problem is I have a job related to Microsoft environment and I only get 1 free day in 10-15 days to do my personnel testing, so its hard to manage time for R&D.
🙂

This is where you set the directories you will be using. You should have already mkreiserfs’d your cache directory partitions, so you’ll have an easy time deciding the values here. First, you will want to use about 60% or less of each cache directory for the web cache. If you use any more than that you will begin to see a slight degradation in performance. Remember that cache size is not as important as cache speed, since for maximum effectiveness your cache needs only store about a weeks worth of traffic. You’ll also need to define the number of directories and subdirectories. The formula for deciding that is this:

I Have the exact same problem, it was working 100% with all 4 drives, but I made a boo boo and needed to reinstall the OS.
I Have 4 x 2 TB drives with each 1.7 TB available and 8 GB Ram. When I use only 3 drives, it works.

The ACL whcich is used to send YT request to nginx will only allow first segment of video to be cached. 240p is working because if we select 240p then &range tag will be removed and video will be requested as full file not range request.

hello sir `nice working but only 1.7 mb yt file cache why not combine ur previous storeurl.pl script and that new also thanks for ur lusa dynamic youtube cache for squid thanks a lot please finish ur project why we are looking for use our also project u r great

Mangling on DSCP(TOS) based slows down cache hit content to deliver at 17mbps(2.2MB/sec). I have check it on both MT v3.3 & v5.18.
Is it a bug or Mikrotik limitation. Sir Jahanzaib and all blog members kindly take a look into this matter. Thanks

hello syed.
please can you tell me why when i change cache_dir pointed to another hdd the squid stop working
i mounted it and sqiud make swap but after that squid not work and it saying
root@CACHE:~# /etc/init.d/squid3 restart
* Restarting Squid HTTP Proxy 3.x squid3 [ OK ]

Squid is rebuilding the storage after an unclean shutdown of squid
(crash).

It means that the previous time you ran Squid you did not let it to
terminate in a clean manner, and Squid need to verify the consistency of
the cache a little harder while rebuilding the internal index of what is
cached.

one more thing i have to inform you that i m using lusca/squid with nginx on centos and what is user www-data; in nginx.conf becoz i have no user created, except root.
i changed www-data; into user squid;
still same issue please resolved this as soon as possible.
thanking you,

salam
finally i found some errors when starting squid on centos with nginx method seems that ruby cant access by squid i tried and change permission of /usr/bin/env but nothing happend plzzz check access.log and correct my conf.thx

Most common reason of “FATAL: The url_rewriter helpers are crashing too rapidly, need help!” is copy pasting error in nginx.rb . It sometimes happens when pasting the code in wordpress blog. I will test it on monday and will let you know. I will post on pastebin, Also send me your email and I will send you the raw code.

chown www-data /usr/local/www/nginx_cache/files/ -Rf
what is www-data, is this is a user? i have centos with lusca and i never crete any user named www-data so in which i should change www-data into squid or nginx…? plzzz clear this and my e-mail is big.bang.now@gmail.com plz send me nginx.rb raw code may be it will resolved by this.

in UBUNTU , www-data is a user/group set created specifically for web servers. It should be listed in /etc/passwd as a user, and can be configured to run as another user in /etc/apache2/apache2.conf.
Basically, it’s just a user with stripped permissions so if someone managed to find a security hole in one of your web applications they wouldn’t be able to do much. Without a lower-user like www-data set, apache2 would run as root, which would be a Bad Thing, since it would be able to do anything and everything to your system.

when starting squid i have error look like ruby has some permission problem i tried to change permission of /usr/bin/env to squid, nginx, Apache but still same issue please resolve this or guide me to resolve this i m using centos with lusca. thxzzzzzz 4 ur reply,

2- No, nginx cache will not be removed by SQUID policy. Workaround is to create an bash script which will remove objects that have not been accessed in xxx days, lets say if the object is not accessed by user/web server, it may be deleted after one month.

Dear Syed,
I got it working like the tutorial, but can you advise how to use the ZPH feature for youtube video hits? It seems the video which are HIT are not recognized by squid and therefore shows TCP_MISS in the access log. As a result, the ZPH feature won’t work. I believe the ruby scripts needs some improvement which will be able to tell the squid that a particular video is a hit or not. And hence squid can mark the TOS while delivering to the client.

dear janzaib
i have to inform u that i have storeurl.pl patren to cache youtube and it only cache 51 sec of video segment but after adding ( set quick_abort_min = 1MB ) and ( set the range_offset_limit = 10 MB ) when i download any video of youtube once with idm and re-download from other pc with idm it provide full video from cache so i think storeurl.pl script had some thing missing to provide full video cache in flash player. if i m wrong then please correct me. thanks

i did yout tutorial and i have squid box caching every files extensions and now i add your codes
the video play normal but not cached because it still getting the video from youtube not from my cache
but look here please

i have a big problem with watching youtube video , not caching it , now i don’t want to cache it , only watching is a problem ..
I tried 3 different squid servers , same issue , the youtube plays the 1st segment only , then it keeps waiting forever !!!!
without squid there is no problem at all , I know youtube divided file to segments but how could you guys play youtube videos normally using squid ??
forget caching i just need to play the whole video file directly from youtube …

Some additional info. Using squid 3 on Ubuntu 12.04, I had to make the following changes to your squid.conf to get it to work (bits snipped out of the diff, – are removed lines, + are added lines):
# This directive was removed
-server_http11 on

hello friends … I have squid3 ubutnu with … wonder .. you just have to modify these lines of squid.conf ..? … or you are doing something in q and nginx.rb nginx.conf? I do not really run well but work does not cache videos .. and youtube video .. not load .. Now if I load another video on youtube than if I load but not caches

Just mark the packets for you tube destination ip’s and then create queue to allow unlimited or desired bandwidth to these marked packets. You can use the script that can mark ips for youtube.com. search wiki and forum, there are lot of examples out there.

I have been running your setup on my company’s network and is working well except I have noticed some issue that I can’t get a fix for.
I have noticed that nginx is hogging my network even when users are not watching youtube. even after setting these directive as follows (quick_abort_min to 0 and quick_abort_max to 0) I have noticed that nginx keeps hogging my network bandwidth.

I used the following command to show ports that are hogging my network
sudo iftop -P

and the following command to show which process the connection belongs to
sudo netstat -tpn | grep #port number

I found huge number of ports and most of them belong to nginx process.

I have not faced such issue, I implemented nginx base youtube caching method at a friends network and since long its working great for caching youtube videos near perfect. Also it stops downloading the video if the user stop or abort the video. as youtube have changed there method of delivering videos in chunks (1.7Mb each) so its become easier and manageable for cache admins)
There must be some squid configuration mistake.

and can i use ubuntu 12 with squid 3 or ubuntu 12 with squid 2.7 better and there is standerd server can buy to caching youtube or any movie or only this way and about detail’s server if i want make server have maximam size for data can you give me the detail’s for server pls

This seems to be working well for me, but only for 360p and 480p video. i see videos with filenames containing itag=34 and itag=35
But when I watch a 720p or 1080p video, no files are saved in the nginx folder.

Does it have to be transparent?
For testing purposes, I’ve just got my web browser configured to use the caching server as a forward proxy, on port 8080.
I’m having a really big problem being able to figure out if it’s actually working properly.
I see videos in the /usr/local/www/nginx_cache/files folder, yet if I watch the same video after clearing my browser cache, it doesn’t seem to play back properly.

Hello,
there is problem with html5 (video/webm) I think this could be because of new HTTP Status(206 Partial Content and Range Requests). Videos are loaded by proxy, it is possible to watch it but nginx just don’t save to cache downloaded files from tmp.
Any sugestion how to solve this problem?

Dear Jahanzaid,
I use squid with tproxy (divert) feature and YT does not show video. If I do mangle -F (clear tproxy settings), it works fine. If I use port 3128 from a browser it works fine, too. If I use tproxy everything works but there is no video. Please help.

If I connect through a proxy in a browser I see logs in nginx/access.log. WHen I use tproxy i have nginx.rb working in syslog but do not have anything in nginx/access.log. Any ideas? I changed nginx.conf to “listen 8081” instead of “listen 127.0.0.1:8081”. I think the point is in the ip the request is coming from. In the case of pure proxy it is the external address of the proxy. In the case of tproxy it is a client’s address.

I solved this problem by nat-ing google IP’s instead of tproxy interception. Kind of workaround.
Dear Jahanzaib, what is the procedure of cleaning up nginx’s files? As far as I see the rb procedure only stores them. Is there a script that deletes files not in use for some time?

Actually there is no builtin method to clean the files stored in nginx. But you can create a script that can check for files that have not been access from last xx days and delete them accordingly and schedule this script to run daily at night. I used this method in windows to remove our SAP Logs that are older then 10 days to save space. in linux, its much easier and customizable to do this.

Dear Jahanzaib,
Are you saying that nginx does not have a database of the files stored in it’s directory and looks into it everytime a request comes in? In this case there should be a definite number of files after which it becomes too slow.

Please do.
Of course the script needs to be adapted with the right values. 1000 files equals to 1.7Gb. If you run it every 10 minutes you should have youtube videos received for 10 minutes account for less than 1.7Gb.

sir, thanks for the squid configuration everything work fine and also the you tube caching but when i enable the nginex then the cache will consume all the bandwidth the the net will be down so i need to stop it agian.what i can do?

sir, thanks for the squid configuration everything work fine and also the you tube caching but when i enable the nginex then the cache will consume all the bandwidth the the net will be down so i need to stop it agian.
what i can do?

Syed, i use the nginx with the squid and i follow all the steps and it caching everythings fine with the youtube, but my nginx squid server take all the bandwidth then i need to stop the nginx, sir is there a command put it that when i closed the youtube then the session will closed because it is a big problem please help me.

Asslam i install ubuntu 10.04 on my 3TB hard disk and i dont do any partition , where is the setting on squid.conf or storeurl to make my cache data store longger on my 3TB hard disk although 1 youtube video not watch for more than 4 -8 month, because i do squid cache server for more than 50 user on my office, so i want my cache use fully my 3TB hard disk then after more than 95% cache are full it will automaticly delete the old cache

Assalamualaikum Wr. Wb. thank you for this tutorial, squid cache + nginx + ruby was able to run on my server.
however, the bandwidth can not be limited when opening youtube video first
(delay parameters squid.conf does not work)
why? and what’s the solution?
because before I installed nginx and ruby delay parameters to restrict streaming youtube.

hello friend you do not know me I’m from Argentina, I’m trying your tutorial days ago but I do it in debian 6.0.6 and whenever I have a problem, I do in Ubuntu 10.10 as putty does not work I can not copy and paste, I am using Oracle VM. pede virtualbox.Cual be the error in debian? can you give me advice regarding my problem? many Thanks

Thank you very much to those who help me! I installed everything as root id, apparently went well but something else fails, I have run several YT video but I can not do cahe ..

the browser will start listening on 127.0.0.1 port 3128 as the default is to do tests and connect to mikrotik lugo.

so not all caches, if not finger restart or something .. more waiting your answers and if when I reboot or turn off starting squid and nginx.
“I leave a sample of the terminal” Thank you again for such input and answers.

would be nice for someone to tell us if it works here. your current method is far more, I wonder if there is another method for certain whether this functional Lusca think this no longer, I am very sorry to have to pay for the trundre cache and can do things myself. thanks for answering!

alsalamu alikum
please I need help
I installed squid 2.7+ Nignx
I did all config but have problem :
1- video is cached but not by passes through queue in mikrotik but all other files by pass like exe , zip ,…
2- when any one make request for any video and canceled video still downloading is there any way to stop downloading it coz it`s take alot of bandwidth from my poor connection …
3- I installed ubuntu server 64 bit 12.04 when I am trying to install squid the default now is 3.1X is there any option to download 2.7 ?
4- which better Nignx or Lusca …
thank you

In pakistan to access youtube without using proxies a better way is https://youtube.com just hit it and enjoy ptcl forgot to block https tunnel of youtube, and all videos are caching in all pixels even 360p

Salam Alikim Syed Jahanzaib
please help me …
please I need help
I installed squid 2.7+ Nignx
I did all config but have problem :
1- video is cached but not by passes through queue in mikrotik but all other files by pass like exe , zip ,…
2- when any one make request for any video and canceled video still downloading is there any way to stop downloading it coz it`s take alot of bandwidth from my poor connection …
3- I installed ubuntu server 64 bit 12.04 when I am trying to install squid the default now is 3.1X is there any option to download 2.7 ?
4- which better Nignx or Lusca …
one more question please if there is a way to from squid or Nginx to make youtube only use another connection
thanks

1) Nginx files donot pass through squid , it goes throught nginx, thats why its not marked as cache HIT thus it cannot be bypassed by queue limit, possibly there could be workaround but I am not aware of it yet, as Youtube is still banned in pakistan therefore cant test it.
2) You cant abort an video download even if its not viewed fully by client, Its nginx related issue, no workaround for it yet.
3) ubuntu 12 have squid 3.x by default, you cant install squid 2.7 with atp-get, however you can download squid 2.7 source source and compile it, this way it will work.
4) NGINX is the better way.
5) Yes you can route specific port/destination webs request to specific wan. Use Mark and Route method.

bro thank you for your reply ,…
please can you tell me how to make route for youtube to use another service ip ….
please if you tested any or anybody know to by pass video that will be great …
I tried with ubuntu server 12 to download squid apt-get install squid I always get version 3.1 removed that old squid and did it many times ….
I am using now debian 64 bit .
thank you again

2GB is very low for running any proxy server.
You should have at least 8 – 16 GB , as acquiring this amount of RAM is not a g big deal nowa days.
Anyhow try to lower down your cache_mem to 8 MB or less

Salam Alikum
thank you bro
I tried to decrease cache_mem 8 MB still have the same problem any way I am working to install another one with 32G of ram
I want to add 2 HD with 2 Terra for each one for youtube and another one for squid , I want to ask if I will face the same problem and what cache_mem size I should put for it ,
which better make one dir for squid cache or 2 dir about cache dir full what to do with this issue .
about Nginx what will happen if the HD full also ? it can delete old video and cache new video ?
thank you again
best wishes

# After configuring cache_mem, did you restarted the squid service? How much active users accessing the proxy box ?
# 32G ram will make a good performance effect by utilizing more and more RAM for cache.Make sure you use 64bit of Linux OS , either its ubuntu or whatever flavor of *nix
# Yes you can use 2 HD , one for default squid cache, and second for youtube.
# It will take a lot of time to fill-up 2 TB of space just for youtube cache, except if you have large number of users. anyhow When the drive will fill-up, nginx will not auto clear it up, You can create a simple bash script that can delete files older then X days or any file that have not been access from past 2 months. then schedule it to run on daily basis, or weekly.
Read the following, basically it was designed for windows, but using same logic, you can create your own bash script. in this guide look for subject To delete files olde then X days using FORFILEShttps://aacable.wordpress.com/tag/batch-file-to-delete-files-older-then-n-days/

Hello bro ,
thank you for your advise , I have another issue when user download any mp3 , mpeg ,. or windows update squid continue download also if users canceled the request …
quick_abort_min 256 KB
quick_abort_max 512 KB
# quick_abort_pct 95
refresh_pattern -i \.mp3$ 10080 100% 10080 override-expire override-lastmod reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-auth
this coz I make 100% coz it was 80%
do you prefer for 2 Terra HD one cache Directory or 2 ….
what cache_mem engough for 32G of ram it to run whithout any problem and increasing and rebooting every 1 or 2 days .
I still have the same problem by cache dir is full also I added 2 G ram for the old one but the same every time memory full I have 2 reboot again ! this problem will disappear for the 32G of ram or will face the same problem ? .
I just have 100 users I change max_filedescriptors 4096 I saw it in ur website for more request
nano /etc/sysctl.conf

Sorry, but the old 2.7 (incl. 300-Patch to avoid loop) works like a charm for yt. Much easier to understand and to implement.
Like an old proverb in my country says: “Sometimes, old methods still are good” 🙂 In production for several months, about 20GB/day yt traffic. Between 25%-30% daily bytehitrate. Squid-only setup; multiple instances, though, for performance reasons. No need for nginx etc.

could you post the URL of the how-to you followed in order to get it done by the “old 2.7-way”?

I set things up using squid + nginx and generally spoken it works, but there are a lot of YT videos which won’t play due to “An error ocurred. Try it later again.” (It has nothing to do with resolution). The error message in nginx is: “14 sendfile failed 32 Broken pipe while sending to client”.

As I’m running Squid 2.7 from debian 6.0.6 repository, I would give “the old” way a try.

I tested this. The setup itself seems to work. While streaming a video, I can see two files in the nginx /tmp folder. But a second time, the video still is TCP_MISS:DIRECT and there are no files under /files. Any idea, what could be changed, so caching would work again?

Amazing post! mine is working fine!!! 2 questions, who can I identify the files that are most “shared” after it is cached ? when it gets the chunks from the cache, does squid compute this a cache HIT or the way that this cache is done, squid doesnt compute a HIT ?

Well, squid know nothing about the nginx files, they will not be marked as cache.
You can create a script that can check the cache files statuf of NGINX, and delete files older then x days, or that have not been opened since x days.

sayed please help,
nginx still downloading even if i stop serving videos
i mean if i put 100m it take them
i have 2 eth fake and real
so i remove the fake cable from the cache but it still taking bandwidth from the real eth
why ? thank you

but i tried some thing
i viewed a video and close it in a middle
and come back to it again after period of time
the first part that i served was cached but the second one not in cache !!

another problem , did you find solution for mark packet in mikrotik to make it work in zph mode , i found a solution but didnt tried it yet because you must use nginx source could not from apt-get ??
thank you 🙂

Nginx files donot pass through squid , it goes throught nginx, thats why its not marked as cache HIT thus it cannot be bypassed by queue limit, possibly there could be workaround
you can use apt-get install nginx

Videocache: When user watches any video, the squid download the video , and at same time, video is being downloaded by VC plugin (python script and apache) in parallel. So it consumes double bandwidth. For example if 10 users are watching 10 different videos, 20 videos download bandwidth will be consumed.
So in this case, VC sucks

storeurl.pl is a bit different. It don’t download videos in paralleled. Squid downloads the video and caches it without using extra bandwidth for parallel.