This thread is long, but please read the howto on the first page and then check the last page or two for updates...

Have A Couple (or Couple Hundred) Boxes On A LAN?

Be a good Gentoo Netizen and Speed the updates with Http-Replicator! Http-Replicator is a proxy that works with Portage to serve packages from a cache. It saves you bandwidth and helps gentoo grow by minimizing the mirror abuse. The cache download speed is limited only by your disk and LAN Speeds!

Here is how it works. Http-Replicator is running on one of your machines and listens for connections from other gentoo boxes. If the package is in the cache, it sends it out at LAN speeds! If not in the cache, http-replicator will simultaneously download the file, and stream the file to multiple clients! No matter how many machines request the package, only one copy comes down the internet pipe. Multiple copies can stream out the LAN pipe.

This is the easiest, and most reliable way to share both source and binary packages.

Http-Replicator has been designed with speed and security in mind. Give it a try!!

1. Emerge http-replicator.

****If upgrading http-replicator installed from portage overlay, remove the overlay files and unmerge http-replicator first. Don't worry, your config files will not be touched and will work fine with the new official ebuild...

Code:

# emerge http-replicator

2. Modify /etc/make.conf on both the server and your other gentoo boxes.

Add the "http_proxy" to /etc/make.conf:

Code:

http_proxy="http://YourProxyHere.com:8080"

replacing YourProxyHere.com with the hostname or IP address of the box running http-replicator .

NOTE: Http-Replicator 3.0 no longer needs any special RESUMECOMMAND!! Please comment out (place # at the start of the line) the previous RESUMECOMMAND changes if upgrading from a version prior to 3.0!!

3. Check the config file /etc/conf.d/http-replicator and then run repcacheman to create the cache dir and transfer files to http-replicators cache.

Most people can just run 'repcacheman' which will create the default cache dir /var/cache/http-replicator and complete the setup. You can change the defaults if you have special needs. repcacheman will set up the cache dir and user according to the /etc/conf.d/http-replicator config file and doesn't need any special command line.

Code:

/usr/bin/repcacheman

NOTE: I created repcacheman to automate the install and maintenance of Http-Replicator's cache dir. If repcacheman doesn't work for you for some reason, send me the bug reports. Http-replicator will work without repcacheman, but you'll have to create the cache dir and chown the dir to complete the install. repcacheman also checksums your existing /usr/portage/distfiles before moving them to the cache dir. Portage leaves incomplete and corrupt files in the distfile directory and repcacheman will not move those files to the cache.

4. Next, start Http-Replicator on the server:

Code:

/etc/init.d/http-replicator start

5. You should add http-replicator to your default runlevel

Code:

rc-update add http-replicator default

Don't forget that portage needs mirrors! Edit GENTOO_MIRRORS in /etc/make.conf to add more http mirrors and place any ftp mirrors LAST. The default mirrors in gentoo leave something to be desired Use mirrorselect if you need help in selecting mirrors.

Also, some packages in portage have a RESTRICT="nomirror" option which will prevent portage from checking replicator for those packages. The following will override this behavior. Create the file "/etc/portage/mirrors" containing:

Code:

# Http-Replicator Override for FTP and RESTRICT="nomirror packages
local http://gentoo.osuosl.org

You can replace gento.osuosl.org with your favorite HTTP:// mirror. If you already have a local setting, don't worry, as long as it is an http mirror this will still be effective.

Then update something and watch http-replicator go!! It doesn't matter which box or how many boxes you update at the same time!! Http-Replicator can handle it!!

I recommend running repcacheman after running emerge's on the server (except emerge sync). It will delete duplicates after the server box fetches any files and also import FTP'd or other files to the cache. The easiest way to do this is to run emerge's on the server like this:

Code:

emerge -uDva world && repcacheman

This runs the emerge then repcacheman when the emerge is complete.

To keep repcacheman fast and efficient, you should consider deleting any files that remain in your distfile directory after repcacheman runs. They are either:
1. No longer in Portage
2. Incomplete or corrupt
3. Just plain junk

Http-Replicator will serve binary file to clients!! Build once and use binary packages for the rest of your gentoo herd!! Great for mass installs!! Other http or ftp daemons not required!!

This needs a whole HOWTO but here is the quick version..
Add -b to your emerges on the server or set features buildpkg in /etc/make.conf. This will make a binary package after compiling. Then just set your clients PORTAGE_BINHOST (/etc/make.conf) to point to http://YourProxyHere:8080/All for a default gentoo install.

Example: a new install of xmms

Code:

emerge -vab xmms

Then run an emerge with the binary option -g on a client (see man portage for more options)

Code:

emerge -uvag xmms

-g will cause portage to use binary packages from the server if available, and compile if not on the server. Huge time saver for similar machines!!!

You don't have to re-emerge packages to make binaries.

Code:

quickpkg xmms

will create a binary package for the already installed xmms. quickpkg will use the installed config files for the package so any custom config's will be part of the new package!!

Version 1.9
No longer masked in portage
Version 1.8
Now an official package!!
http://packages.gentoo.org/ebuilds/?http-replicator-3.0
Version 1.7
http-replicator 3.0!!
Deleted reference to old versions
Added BINHOST info
Version 1.6c
Added note to delete old ebuilds for upgrades
Version 1.6b
Updated to any http://mirror and clarify conf
Version 1.6a
Updated http://gentoo.osuosl.org/ link
Version 1.6
Added repcacheman example
Some changes for 2.1
Version 1.5
Clarify repcacheman note
Version 1.4
Added repcacheman note
added /etc/portage/mirrors note
Version 1.3
Added repcacheman
Changed default cache dir
Version 1.2
Updated RESUMECOMMAND
Version 1.1
Added group permissions on /usr/portage/distfiles
Added mirror reminder
Simplify activation
Version 1.0

Last edited by flybynite on Sat Jan 28, 2006 3:21 am; edited 37 times in total

Hi flybynite,
This ebuild should definately be in the official portage list of ebuilds.
It works a treat, is easy to setup and fixes a complicated (or people have made it complicated) problem. I think this is what most people are after instead of the other 5 or 6 solutions I've found on the forums.

Just a couple of questions for you tho...

1- Senario, I have 2 workstations and a little server. The Http-Replicator server is running on my beefy workstation where most of the packages are downloaded to. If I emerge a package onto the server and the workstation didn't have it, does that mean if I go to emerge the same package onto the workstation I have to download it again? Or will the package be stored on the workstation too?

2- Is there a way to automatically bypass the Http-Replicator if the source for the emerge is on an ftp server? Currently the emerge will fail if the only sources for the ebuild are on ftp servers (ie. kdebase-3.2.2.tar.bz2, kdegraphics-3.2.2.tar.bz2. The rest of the kde stuff is OK.) It comes up with an error about the proxy port is invalid...

Thanks _________________Even if your on the right track, you'll get run over if you just sit there...

Hi flybynite,
This ebuild should definately be in the official portage list of ebuilds.
It works a treat, is easy to setup and fixes a complicated (or people have made it complicated) problem. I think this is what most people are after instead of the other 5 or 6 solutions I've found on the forums.

1- Senario, I have 2 workstations and a little server. The Http-Replicator server is running on my beefy workstation where most of the packages are downloaded to. If I emerge a package onto the server and the workstation didn't have it, does that mean if I go to emerge the same package onto the workstation I have to download it again? Or will the package be stored on the workstation too?

No package is ever downloaded twice, no matter who starts the download or when the download is started Start the same emerge on all 3 boxes at once and only 1 copy is downloaded while all 3 boxes receive the file from http-replicator!! ( one exception, that is ftp mirrors aren't supported by HTTP-replicator but there are plenty of good http mirrors! )

Darkaxe wrote:

2- Is there a way to automatically bypass the Http-Replicator if the source for the emerge is on an ftp server? Currently the emerge will fail if the only sources for the ebuild are on ftp servers (ie. kdebase-3.2.2.tar.bz2, kdegraphics-3.2.2.tar.bz2. The rest of the kde stuff is OK.) It comes up with an error about the proxy port is invalid...

Thanks

This happens automatically in my setup directions. You must have an ftp_proxy set some where in your environment leftover from another program, check for this. If you did try and set http-replicator as an ftp_proxy, the error would be different! You would get an ERROR -1: No data received, not proxy port invalid.

In http-replicator, we only set an http_proxy, not an ftp_proxy, so wget should not use the proxy for ftp transfers.

Make sure you have plenty of good http mirrors in GENTOO_MIRRORS in /etc/make.conf, not really for http-replictor, just for gentoo in general. Use mirrorselect if you need help selecting more mirrors. The default mirrors in gentoo aren't the best. I got kdebase etc from gentoo.oregonstate.edu when the ebuilds first came out!

2- Is there a way to automatically bypass the Http-Replicator if the source for the emerge is on an ftp server? Currently the emerge will fail if the only sources for the ebuild are on ftp servers (ie. kdebase-3.2.2.tar.bz2, kdegraphics-3.2.2.tar.bz2. The rest of the kde stuff is OK.) It comes up with an error about the proxy port is invalid...

I think you have to have an extra directory on your proxy for the files,
not the distfiles dir because portage and the replicator will save partial files there. A little bit confusing for both of them.

After emerging, at least in my System, the proxy didn't cache anything because "distfiles" didn't had the right permissions, too.
So i changed the dir in /etc/http-replicator.conf.

The changes you made to make.conf are more simple, so I've changed the howto. Thanks ! Actually you don't even have to uncomment the FETCHCOMMAND, as that is the default anyway, only the RESUMECOMMAND is non standard. Neither the old or new way should cause ftp requests to go through http-replicator though. Ftp works just fine on my system following the howto.

I also added changes to the group/permissions to make it clear that http-replicator can write files in that directory. I don't know if some installs are different or what but this ensures everything works just fine.

Http-Replicator uses a special tmp file so there is no conflict between portage and http-replicator using the same directory. There is nothing wrong with separate directories though, using the same directory just makes things simple. It also means that if you have to use ftp to get some special file on the server, replicator can still cache that file to other machines using http!

After *heavy* interrupting (using strg+c) wget multiple times during transfers with the same files on the client and server my client started creating .1-xx files and emerge failed. I've used --fetchonly for testing.

It seems the Replicator started serving incomplete tmp files as complete files, wget recognized them as new files and starts renaming...

I havn't seen this with seperate directorys.

I used two machines 1 machine running the replicator and a 2nd one as a client.

Replicator works very well in normal use. The issue is a possible annoyance, not a safety or security issue at all.

The thing is I would like replicator to be able to use /usr/portage/distfiles as a cache. Only on the server replicator and portage must play well using the same directory. The issue is when you interrupt a fetch on the server, Portage leaves an incomplete file and replicator doesn't know it is incomplete.

If for some reason you interrupt a portage fetch on the server, you might have to manually delete the incomplete file, thats all.

I use replicator every day and haven't had any problems in normal use because I don't interrupt portage downloads.

If you are paranoid, change the cache directory in /etc/http-replicator.conf to another directory. Then just move all the files from distfiles to the new directory to prime the cache. This will avoid the problem till I test some fixes.

Okay i emerged http Replicator and followed your setup instructions for my Gentoo Server and one Gentoo client. But i am not quite sure if http Replicator is being used. How can i check on this because it doesn't seem to leave any entries in the log file.

I should have made it obvious that repcacheman is not strictly needed. Http-Replicator will work fine without it. Right now, any new emerge's you do will be cached!!

I wrote repcacheman to help new users create the cache directory and move your existing files from /usr/portage/distfiles to the cache directory. It also checks those files for corruption, just in case. Portage leaves incomplete downloads in /usr/portage/distfiles, repcacheman won't copy those files to the replicator cache.

Now back to your error. This is caused by a gentoo developer forgetting to include a digest file for a new or changed ebuild. My guess this will be fixed by now. Just sync and try it again. If that doesn't work, just copy all the packages (/usr/portage/distfiles) to the cache dir yourself (/var/cache/http-replicator), and tell me your portage version. I'll add code to catch this error as soon as I get the chance....

Thanks for the excellent program. I went through the exact same chain of programs as senectus Works like a charm _________________"And isn't sanity really just a one-trick pony, anyway? I mean, all you get is one trick, rational thinking, but when you're good and crazy, ooh ooh ooh, the sky's the limit!" -- The Tick

OK, your killing me knowing there is some outdated info out there. The rsync mirror described by the HOWTO: Central Gentoo Mirror for Internal Network at https://forums.gentoo.org/viewtopic.php?t=59134 is currently outdated (as of June 1, 2004) and uses insecure options. Although I believe there will be a fix in a later version of rsync, till then, following that old howto is insecure!!

Now that will be one more post to go through in the chain of different options _________________"And isn't sanity really just a one-trick pony, anyway? I mean, all you get is one trick, rational thinking, but when you're good and crazy, ooh ooh ooh, the sky's the limit!" -- The Tick

In the above setup, is there any problem with using the cache directory as the source directory for server to get the files it needs?

So for instance setting in /etc/make.conf

Code:

PKGDIR= /var/cache/http-replicator

I was wondering if this would cause write issues as the proxy is downloading the file to cache directory while giving it to the localhost which is in turn saving it to the same place. I guess this could be overcome by turning off the http_proxy for the localhost? Is there a problem with this configuration, something that I am not seeing?

Thanks for the program _________________"And isn't sanity really just a one-trick pony, anyway? I mean, all you get is one trick, rational thinking, but when you're good and crazy, ooh ooh ooh, the sky's the limit!" -- The Tick

In the above setup, is there any problem with using the cache directory as the source directory for server to get the files it needs?

So for instance setting in /etc/make.conf

Code:

PKGDIR= /var/cache/http-replicator

Well PKGDIR is for the .tbz2 binary packages portage creates, I think you meant DISTDIR by your description, which is where portage stores it's downloaded tarballs?

Using DISTDIR (/usr/portage/distfiles) for replicator's cache is not recommended, for now.

That is the reason I created the 'repcacheman' script. It manages portages DISTDIR and replicator's cache directory very intelligently. With repcacheman, there really isn't any drawbacks to following my howto exactly and having DISTDIR separate from replicator's cache.

I originally intended replicator to share the directory with portage - but portage needs to learn how to share first

I've just started to play with various portage caching ideas and have setup the http-replicator and the local rsync mirror. Great!

I'm slightly lost with all the options and suggestions so I have a quick question about binary packages if that's ok.

What's the best/simplest way to also share the binary packages that are in the packages directory of the machine hosting the replicator? This machine has already built a large number of packages and I ultimately want this machine to build and serve all the packages for my network as they get updated.

I assume that the replicator only serves the files that (nominally) live in distfiles? The option in make.conf for binary packages seems to suggest I need http or ftp access to a server - serving the packages directory?

Lets say that I want to build packages on lots of machines.. of different spec's then host the binaries on one PC so that I can have a whole host of binaries on the network for network installations (the idea is to make Gentoo a feasable distro to install at installfests).._________________2800+XP A7N8X FX6600GT
www.modmeup.net |
Belief is 9/10 of YOUR reality.Wise man say: A skilled troll is a master baiter.