Introducing Charcoal - Centralised URL Filter for squid

Introducing Charcoal - Centralised URL Filter for squid

We are excited to invite early users to test drive Charcoal
(http://charcoal.io) - a Squid URL Rewriter for distributed proxies.

Charcoal is designed to help administrators manage access rules for the
proxies at just one place with a GUI, instead of editing configuration
of individual proxy servers.

It has come out of our need of managing ACLs for 100+ proxy servers on
embedded devices (OpenWRT/LEDE) running at our customer offices across
the geography of India. We are releasing it in the hope that it will be
useful for Squid users who have to manage multiple proxy servers everyday.

The architecture is API key driven client-server, where a squid
url-rewrite helper contacts server to query access controls for the
incoming requests.

Re: Introducing Charcoal - Centralised URL Filter for squid

This sounds great, and would you mind specifying the source of the
blacklist data at the core of your services?

In other words, what I dare ask you is this, and im sure others might
want to know, are you using the blacklists from shalla, UT1, or
urlblacklist? Or have you developed your own domain management technology?

Re: Introducing Charcoal - Centralised URL Filter for squid

Hi Benjamin,

On Wednesday 14 June 2017 08:22 PM, Benjamin E. Nichols wrote:
> This sounds great, and would you mind specifying the source of the
> blacklist data at the core of your services?
>
> In other words, what I dare ask you is this, and im sure others might
> want to know, are you using the blacklists from shalla, UT1, or
> urlblacklist? Or have you developed your own domain management technology?
>

Thanks for the kind words.

For the test run, we are using Shalla.

I understand that quality of blacklists matters. It is also possible to
mix-match multiple blacklists and that should be the ideal scenario with
most of the bases covered. And that depends on the user-base and the
financial aspects of sourcing the blacklists.

Right now, our first priority is to fix a handful of bugs reported just
after the announcement.

Re: Introducing Charcoal - Centralised URL Filter for squid

I want to offer you a more advanced helper that supports actual concurrency compared to the current perl helper on github,
which understands the protocol but do not use threads or any other method of concurrency.

We are excited to invite early users to test drive Charcoal
(http://charcoal.io) - a Squid URL Rewriter for distributed proxies.

Charcoal is designed to help administrators manage access rules for the proxies at just one place with a GUI, instead of editing configuration of individual proxy servers.

It has come out of our need of managing ACLs for 100+ proxy servers on embedded devices (OpenWRT/LEDE) running at our customer offices across the geography of India. We are releasing it in the hope that it will be useful for Squid users who have to manage multiple proxy servers everyday.

The architecture is API key driven client-server, where a squid url-rewrite helper contacts server to query access controls for the incoming requests.

Re: Introducing Charcoal - Centralised URL Filter for squid

>I want to offer you a more advanced helper that supports actual
>concurrency compared to the current perl helper on github,
>which understands the protocol but do not use threads or any other
>method of concurrency.
>
>Let me know if it's of any interest for you.
>The skeleton is at:
>http://wiki.squid-cache.org/EliezerCroitoru/GolangFakeHelper

Thanks a lot for the offer. It surely is interesting.

The current state of helper is due to the fact that it was written for embedded/low powered devices running Linux. OpenWrt doesn't cross-compile Go as of now, so we had to go for Perl. It is good enough for low request proxies at small offices.

We are modifying it as per recommendations by Amos and will check-in the updated code soon.

>I am willing to take my time and write the code for you. So..

Glad to know about your willingness to write it in Go. It will help the community at large to run it on more powerful machines that serve a lot of requests.

Another version of helper that we are writing will use memcached on local proxy to cache the access granted from the cloud server and will greatly increase the speed.

Re: Introducing Charcoal - Centralised URL Filter for squid

I wanted to be sure I am not day-dreaming but from the code it seems that every request is given a single TCP connection.
Am I right?
If so there is much to improve.
You can use the same tcp connection for more then a single request and also have a reconnect option for the very far from realiy case of a closed connection.

>I want to offer you a more advanced helper that supports actual
>concurrency compared to the current perl helper on github, which
>understands the protocol but do not use threads or any other method of
>concurrency.
>
>Let me know if it's of any interest for you.
>The skeleton is at:
>http://wiki.squid-cache.org/EliezerCroitoru/GolangFakeHelper

Thanks a lot for the offer. It surely is interesting.

The current state of helper is due to the fact that it was written for embedded/low powered devices running Linux. OpenWrt doesn't cross-compile Go as of now, so we had to go for Perl. It is good enough for low request proxies at small offices.

We are modifying it as per recommendations by Amos and will check-in the updated code soon.

>I am willing to take my time and write the code for you. So..

Glad to know about your willingness to write it in Go. It will help the community at large to run it on more powerful machines that serve a lot of requests.

Another version of helper that we are writing will use memcached on local proxy to cache the access granted from the cloud server and will greatly increase the speed.

Re: Introducing Charcoal - Centralised URL Filter for squid

Administrator

On 17/06/17 19:07, Eliezer Croitoru wrote:
> I wanted to be sure I am not day-dreaming but from the code it seems that every request is given a single TCP connection.
> Am I right?
> If so there is much to improve.

You are seeing correct. That is one of the things I brought up and is
being worked on already. see issue #3 in their tracker.

Re: Introducing Charcoal - Centralised URL Filter for squid

On Saturday 17 June 2017 12:37 PM, Eliezer Croitoru wrote:
> I wanted to be sure I am not day-dreaming but from the code it seems that every request is given a single TCP connection.
> Am I right?
> If so there is much to improve.
> You can use the same tcp connection for more then a single request and also have a reconnect option for the very far from realiy case of a closed connection.

It tries to address the issues you mentioned, but is not yet ideal.
Since, it is invoked by squid, number of children started depends on
squid. Total no. of sockets in states ESTABLISHED & CLOSE_WAIT are equal
to the number of helper children started by squid.

May be, the helper architecture could be changed such that a parent
process creates a pool of network connections that children use. Thus,
limiting the number of sockets being used at any moment. And squid
controls the number of those parent processes.

Re: Introducing Charcoal - Centralised URL Filter for squid

Administrator

On 17/06/17 21:59, Nishant Sharma wrote:
> May be, the helper architecture could be changed such that a parent
> process creates a pool of network connections that children use. Thus,
> limiting the number of sockets being used at any moment. And squid
> controls the number of those parent processes.

That would mean making Squid aware of the internal workings of the
helper. Namely that it uses connections to a specific server, port and
which transport. One of the major points of flexibility with helpers is
that this kind of thing is kept completely separate from Squid.

The URL-rewrite API being used by charcoal has the purpose of altering
the URI which Squid fetches content for a client from. Doing access
control through it instead of the access control API (external ACL
helper) is kind of borked from the start.

Re: Introducing Charcoal - Centralised URL Filter for squid

>That would mean making Squid aware of the internal workings of the
>helper. Namely that it uses connections to a specific server, port and
>which transport. One of the major points of flexibility with helpers is
>
>that this kind of thing is kept completely separate from Squid.

Re-reading my mail made me realise that it conveyed that helper architecture of squid be modified, instead I wanted to say that we can modify the architecture of our helper, where it internally manages its own children which may speed-up the URL rewrite process.

>The URL-rewrite API being used by charcoal has the purpose of altering
>the URI which Squid fetches content for a client from. Doing access
>control through it instead of the access control API (external ACL
>helper) is kind of borked from the start.

I agree, external ACL helper will also allow to have access to additional information like user-agent, reply content-type etc. to have more granular control.

Re: Introducing Charcoal - Centralised URL Filter for squid

Hey Nishant,

Responding to your idea and the whole concept of the helper and also comparing to GoLang binaries.

About the software design to run on-top of embedded hardware and a GoLang binary:
A GoLang helper can be compiled to almost any modern embedded device(else them mips based)
Also its more efficient then any software you will write with perl.
The only "limit" is CPU comparability and Binary size vs device free space.(in any case a perl helper would use more then a GoLang one).

The idea of writing a software that will implement concurrency in perl or python is nice and noble but I believe that probably
for small embedded devices you won't need a "robust" helper that supports concurrency or any other more complex solutions.

I believe that you should aim for the more standard hardware devices which squid can be built on-top such as:
- x86
- x86_64
- arm64
- arm5
- arm8

The above will benefit from a good and robust helper which supports concurrency.
Now that it's clear that your socket can handle more then only one request I will write a helper in GoLang that works with:
- concurrency
- better connection handling(being able to handle responses whenever they received)

I already wrote most of the code so I believe it's a matter of days for the helper to be ready.
Would I be able to receive some testing api key\token once the helper will be ready?

>That would mean making Squid aware of the internal workings of the
>helper. Namely that it uses connections to a specific server, port and
>which transport. One of the major points of flexibility with helpers is
>
>that this kind of thing is kept completely separate from Squid.

Re-reading my mail made me realise that it conveyed that helper architecture of squid be modified, instead I wanted to say that we can modify the architecture of our helper, where it internally manages its own children which may speed-up the URL rewrite process.

>The URL-rewrite API being used by charcoal has the purpose of altering
>the URI which Squid fetches content for a client from. Doing access
>control through it instead of the access control API (external ACL
>helper) is kind of borked from the start.

I agree, external ACL helper will also allow to have access to additional information like user-agent, reply content-type etc. to have more granular control.