Wow - that is quite a change that I was not aware of. Does this obsolete http-replicator?

um... I don't see how it would. http-replicator is a means of proxy-caching gentoo packages. It doesn't alter, nor particularly care what's in, any of the packages it caches/proxies. The need that http-replicator satisfies isn't going away. The 'undesirable change' mentioned in the previous post (and related thread) sounds like a hiccup in the transitional phase of the GLEP44 proposal. The GLEP44 itself says this:

Quote:

It is important to note that this proposal only deals with a change of the format of the digest and Manifest system.

So, no, once the dust settles, we'll all appreciate having http-replicator as much as we currently do. Unless I'm really missing something about all of this.

Wow - that is quite a change that I was not aware of. Does this obsolete http-replicator?

um... I don't see how it would. http-replicator is a means of proxy-caching gentoo packages. It doesn't alter, nor particularly care what's in, any of the packages it caches/proxies. The need that http-replicator satisfies isn't going away. The 'undesirable change' mentioned in the previous post (and related thread) sounds like a hiccup in the transitional phase of the GLEP44 proposal. The GLEP44 itself says this:

Quote:

It is important to note that this proposal only deals with a change of the format of the digest and Manifest system.

So, no, once the dust settles, we'll all appreciate having http-replicator as much as we currently do. Unless I'm really missing something about all of this.

The problem isn't so much http-replicator, but perhaps it's indexing tool, repcacheman, which seems to fail a lot due to not being able to find MD5's for a lot of packages now - at least that's what I've observed from my own experience within the past month now, and from what I've read from previous posts in this thread._________________H T T P : / / W W W . B I N A R Y F R E E D O M . I N F O /

repcacheman uses portage functions to do it's job. It just needs to call the new portage functions and all is well. I'm working on the new changes now. The only real change is It needs to call the sha1 checksum rather than the md5 checksum.

Also, repcacheman isn't really required at all. Truth is I don't even run it all the time.

It does several things for you that aren't strictly needed to make http-replicator work.

1. It creates the cache dir upon install. It does this only upon initial install so I don't have to have the ebuild or users do it. mkdir /var/cache/http-replicator, chown portage:portage /var/cache/http-replicator if you had to do it manually.

3. It deletes duplicate files only on the http-replicator server. I just delete all the distdir files on the server. The most I lose is one or two files that were only available thru FTP. With a good mirror, I can go months without using FTP at all so I don't loose anything. Check your logs, how many times has repcacheman actually added new files?

So repcacheman will soon work with the small but increasing number of manifest2 packages. Until then, http-replicator works just fine

For a while I have had a problem with http-replicator not remaining running after a reboot on my central LAN server.
I would have to SSH in to the system to restart it manual for it remain running.

Then I noticed once when SSH'ng into the system before it was completely up and I saw the http-replicator process (python script) running on tty1. A short time later the process was gone and /sbin/agetty from /etc/inittab process was running on tty1 (when performing a ps -ef command). I know that /etc/init.d/http-replicator script is supposed to run the python script as a daemon, but it is possible that the script is not forking as a daemon fast enough before being "smacked-down" by the /sbin/agetty process and terminating. (I do not know for sure, but that is what it seems like is happening to me).

So I changed the following in /etc/init.d/http-replicator which seemed to resolve the problem for me:

I would also like to add something to the mix, since you are already under the hood fixing things. I'm on a really slow dial-up and sometimes I have to stop replicator for various reasons. I notice that when I restart it and restart the emerge process which will try to continue the download, it has trouble reconnecting to replicator. Replicator will start to download right away as it should but emerge can't seem to get back in sync.

I run into things like this on the really large packages. If you need me to I can post what I get so you can see it.

Maybe one of these days I will get DSL or something. Solve a lot of problems.

Just through I would share this with the rest of you incase you have encountered the same problem...

You rule, boy

I've searched this solution without results for several months!

I'll open ASAP a bug report to fix this issue!

Thanks a lot again! _________________I was born in a deep forest/I wish I could live here all my life/I am made from stones and roots/My home, these woods and roads
All my life I loved this sound/Of the woods all around/Eagles flies where the winds blows free
Journey is my destiny

I've read your e-mail, i send you more information ASAP _________________I was born in a deep forest/I wish I could live here all my life/I am made from stones and roots/My home, these woods and roads
All my life I loved this sound/Of the woods all around/Eagles flies where the winds blows free
Journey is my destiny

I do use the init scripts to stop it. /etc/init.d/http-replicator stop. The mirror varies

OK, that eliminates some possible problems. Then I'll have to say I'm afraid you've found a "feature", not a bug. Well, actually a lack of a feature

Scen and Griffon26, I'll save you some time chasing this down!

The feature is that replicator supports resuming on the client end but not on the internet end. If replicator is stopped or killed it will delete the partial download from the cache!!!! Think about that for a second and your problem will be clear.

Here is how to test it. Start a long fetch, (any file will probably do on dialup openoffice-bin for the rest of us) and then shutdown http-replicator before the download is finished. Then look and see if the partial file is in the cache. It won't be there, but portage will keep the partial download in the distfile dir!!

Then you restart replicator and then restart your long fetch. Replicator will try and honor portage's request to resume the download but it has to start the download from the beginning over the net!! This means portage will probably timeout or you will give up before the download from the net catches up with the partial download portage still has.

So, in it's current level of development, http-replicator is not designed for nor can it handle all situations, shutting it down in the middle of a download is not a supported use at this time (developers needed). Resuming on the net side along with FTP support was a planned part of the next release of http-replicator. Unfortunately, that release was bogged down in a total rewrite.

dalek to suit your dialup needs, it would be better to bypass replicator for the interrupted long download, then move the file in the cache when complete. Just do this

OK. The solution you posted is what I have been doing, sort of. My biggest problem is my ISP puts a 6 hour limit on unlimited access. Go figure that one out. I have been editing make.conf but it does the same thing I guess.

Think maybe one day this will be fixed I did see some phone trucks the other day. We may be going to get DSL soon. Parden me while I go jump around like a idiot at even the thought of broadband.

I've waded through the depths of portage, and returned with a new version of repcacheman.

I'm continuing to develop and add new features, but now is a good time to test and get some feedback from users. Since I have new features still in development, I chose an older revision to test, but it should work much better than the previous version and have all the old features.

My tests show this beta 4.0 uses only 10% of the resident memory of the previous version and runs 4 times faster!!

It can be run from any dir as root or dropped in place of the old /usr/bin/repcacheman.py ( not /usr/bin/repcacheman )

SUMMARY:
Found 0 duplicate file(s).
Deleted 0 dupe(s).
Found 1103 new file(s).
Added 1040 of those file(s) to the cache.
Rejected 60 File(s) not in Portage.

Oops, I did that on the wrong machine.

The second results are the same. The first attempt fails, rep4.py ignores my desired cache dir, and it picked up a few more bad files. Now, I'd be happy if it'd just use my ftp pub dir instead of /var/cache/http-replicator.

--update
I also have to set the conf back to /var/cache/http-replicator for the time being or http-replicator fails.

I've uploaded beta 4.1 with two typo's fixed. There is still something going on with the core code. Right now I think it is a filename collision in portage itself. I didn't change the download link but you will see rep41.py inside.

Okay, that worked, with verbose output of the portage tree. Is there a trigger to avoid the verbosity? My server performs slower when outputting scrolling text.

I'm wondering how files in the cache are treated. Are they assumed to be good, and thus unchecked? For example, if I have an overlay on another machine that my server does not, the files it grabs are stored in the cache as they are fetched, and they remain there indefinitely? So, can I put any files in the cache that my client machines might fetch, such as livecd .iso's, and so long as the client used wget with http_proxy specified it can fetch that locally?

Okay, that worked, with verbose output of the portage tree. Is there a trigger to avoid the verbosity? My server performs slower when outputting scrolling text.

The verbose output is just my debugging going on, it won't be in the final version. I've uploaded beta revision 4.3 that removes the scrolling and fixes the problem in the core code I mentioned earlier.

mkzelda wrote:

I'm wondering how files in the cache are treated. Are they assumed to be good, and thus unchecked? For example, if I have an overlay on another machine that my server does not, the files it grabs are stored in the cache as they are fetched, and they remain there indefinitely? So, can I put any files in the cache that my client machines might fetch, such as livecd .iso's, and so long as the client used wget with http_proxy specified it can fetch that locally?

Thanks for asking! I've been trying to decide some possible options to add and who might need them. I also want the greatest possible options for users.

replicator is a general purpose proxy at heart, It will serve and cache anything that goes through it, even web browsing. There is an "alias" option to serve files from a dir of your choice in addition to the cache. It defaults to serving BINARY packages from gentoo's default location but you can add to or replace that default.

/etc/conf.d/http-replicator

Code:

## Local dir to serve clients. Great for serving binary packages
## See PKDIR and PORTAGE_BINHOST settings in 'man make.conf'
## --alias /path/to/serve:location will make /path/to/serve
## browsable at http://http-replicator.com:port/location
DAEMON_OPTS="$DAEMON_OPTS --alias /var/tmp/packages/All:All"

So if you want to serve random files you can keep them in a separate dir for easy management by fetching them with the alias url or keep them in the cache and fetch them with the http_proxy setting. Multiple alias options are allowed. Http-replicator was designed to be a secure, high performance web server with a cache.

replicator doesn't check its own cache for this reason. It won't touch anything in it's cache because it may contain user files.

The question is should replicator check it's cache?

I say no right now because it can be done better by other means. But adding that feature would be convenient for many users?

1. If replicator is a gentoo only cache, there are other distfile checking scripts that will delete files based on many tests such as not in portage, not the most current version, older than a certain date, exceed a maximum cache size, not accessed in the last 3 months, etc etc.

2. If replicator is used for other files I can't even guess how to prune the cache.

What I do is this. It could be a cron script but I do it manually by choice.

This moves the cache files to the distfile dir. This is fast because it only renames the files, it doesn't move anything on disk.
repcacheman runs which moves all good files back to the cache.
then I delete all the remaining files which are not in portage or corrupt/incomplete.

You could also move the files, run the distfile cleaning script to prune based on your desires, then run repcacheman!

There was a time when distfile cleaning scripts were hard to find, now eclean is part of gentoolkit.

I know that was probably more than you wanted to know but I hope it helped you and some lurkers

Some time ago (early '06 I think) I posted here that http-replicator would be started in the rc init scripts, but when I went to emerge anything I had to restart it. This behaviour has remained until yesterday.

Before then I was using a login manager of varying types from gdm to xdm and even the Enlightenment greeter, but yesterday I decided I had had enough and wanted to properly secure my lan by using proper console login procedures.

Surprise, surprise! Suddenly http-replicator did not have to be re-started after login, now it works without that annoying restart before I emerge anything.

I do not know if this is a bug, however, I thought you might like to know._________________Regards, Robert

..... Some people can tell what time it is by looking at the sun, but I have never been able to make out the numbers.