Archive

I’ve stumbled upon an interesting article this morning called “I hated php back when it was cool“, which kind of vocalises some of the gripes I recently developed with PHP.

Namely, the language has grown exponentially, but without a structure to tie the language together effectively. I just love the description quoted in the article:Aristotle Pagaltzis makes an interesting point when he says how PHP suffers from a lack of initial design. … Basically PHP seems to have started out as a pet project, and had its features battered on with a staple gun, rather than included in the design.. Everything is done with a series of functions, most of which have different patterns of behaviour or different parameters depending on what version of PHP your running. This makes coding effective sites an absolute nightmare!

My other chief gripe is classes. Despite the recent changes to the whole class handling system in PHP, it’s still rubbish, and most of the time, I can’t use the improvements that PHP 5 brings because no bugger supports it!

To me, when the PHP people get to writing version 6, they need to stop, and quite literally start again. I really mean that – it doesn’t matter if the “newer” PHP code is not backwards compatible – it isn’t very BC at the moment, and so they should take more time, consider their direction, and plan ahead.

I’m thinking of building an adaptive firewall on my Linux router at home.

I’ve noticed that people are scanning the ports on the computer, and running HTTP requests to see if they can trip several known security flaws (e.g. in AwStats).

I did a little reading up on how to build an IPTables based adaptive firewall, and I’m beginning to concoct some ideas in my brain.

Basically, what I want to do, is constantly scan the requests that are made to Apache, and maybe some other server apps, and build some rules to pick out naughty behaviour. Once I’ve done that, I can get the IP address of the offender, and build a list of banned IP addresses. I’ll only want to “ban” (i.e. block at the firewall) those IPs for a set amount of time (e.g. 24 hours), but the response time of the firewall must be quick in order to catch these people in the act, and so I must rebuild my IPTables rules in reasonable time. After the 24 hours is up, I then need to clear any expired IP addresses down again whilst still keeping blocked IPs and my other firewall rules in place.

I’m therefore thinking that producing a series of scripts based around Cron is not suitable – you can’t schedule it to work more that once every minute. It could mean that I need to produce some server program (either using a UDP socket, or UNIX-type pipe) to receive IP addresses as soon as possible, and to store the data for 24 hours.

OK, so iWeb was launched the other day, as part of the new iLife ’06 package, and as soon as it was released people started using it and testing the output.

It appears, according to this post, that it produces some pretty nasty output – or should I say that it is not good engineering practice to produce the same output as iWeb!

However, I think that someone has nailed the situation pretty much on the head in the commentsWell, iWeb was built so that you didn’t have to write any code at all, so why care?

So what if there is some unnecessary code? It is a brand new application, and it gets the job done without the user having to write a line of code.

Posted by: Appleologist at January 12, 2006 09:42 PM

My point is this: Yes, it generates some pretty nasty code (although, full credit to Apple, it does parse correctly and conforms to standards), some of which could be cleaned up easily. But, as quoted, the people that will want to use this application are those people who don’t have enough web experience to develop this kind of site on their own, and so don’t care about what it generates – just as long as it looks good on their screen. The site will use up more web space, and more bandwidth, in order to host it, but not that much more.

The people that are commenting on the quality of the code, are going to be those people that know how to produce websites efficiently and correctly, and are people that care about the amount of bandwidth that their company uses, because bandwidth costs money.

Not having used iWeb, I can’t really comment on the use of the application, or it’s intended market, but from the evidence that I have seen, and my perceived target user based for the app, I would say that it does a pretty good job.

(Note To Apple: Please just tidy up the code a little, and get rid of that damn “Generator” tag – that thing just stinks of “Frontpage”!)

I’ve just come across this website: Stuff On My Cat, which I had initially assumed was some womans weak blog about her cat “Honey”, but as it turns out, it’s a bloody hilarious site.

Basically the idea is this: Put stuff on a cat – it can be anything. Take picture. Post to website.

Now, from my experience of putting things on my pet cats over the years, it doesn’t take long for cats to “shrug” these things off, with a mean time of about 5 seconds before they end up on the floor. Some of them must have been sooo hard to take pictures of – I can only admire their dedication.

OK, so I now get quite a lot of comment spam. It’s not exactly a torrent, but none of it ever gets through my 100% effective filtering system (i.e. Me).

However, I did think about automating at least some of the process, and so at the weekend I started recording IP addresses and User-Agent strings, in a futile attempt to at least get some kind of handle on this.

What I’ve seen disturbs me. Effectively, the spammers are “spoofing” their User-Agents to the extent that it is possible to exclude them based on their User-Agent string, but I could also be excluding a large proportion of my audience.

For example, one of the strings is “Mozilla/4.0 (compatible; MSIE 4.01; Windows NT Windows CE)”, however, looking it to this string here (fifth entry down), you can’t really do that. I’ve seen someexamples of .htaccess files that seem to limit some of these things, but even these get it wrong – in both my quoted examples, you will see that the completely exclude “Maxthon” which is a legitimate tabbed shell around the Internet Explorer Active X control, and you can also see an exclusion for “AtHome021” – an extension added to the end of Internet Explorer’s User-Agent string by the “At Home” ISP.

I’m thinking about integrating some kind of pattern recogniser to work out the “routes” through my website that legitimate users take, and the routes that spammers take, in order to work out some kind of trail.

Something has just made me think how much I appreciate the BBC, and the way that is funded (for those not in the know, it’s funded by an annual License Fee, paid by all residencies with a television set).

The BBC has an amazing news site at http://news.bbc.co.uk where anyone in the world can access it for free. The BBC are currently in a technology/Internet push, and so they continue to embrace news ideas and new technologies. Because of this, I am able to sign up for as many RSS feeds from their site as I like, which are updated “every minute of every day” and include about 30 top stories per feed – all for free. I don’t need to register, I don’t need to tell them who or where I am, it’s all anonymous.

Compare this to the Spanish newspaper “El Mundo”.

I learnt Spanish to A-Level standard, and the other day I decided that I would like to get back more practice with the language. To do this, I decided the best way would be to read Spanish newspapers. So this morning, I went to the “El Mundo” site (http://www.elmundo.es), and tried to find an RSS feed. After waiting through an advert, that I could see no way to cancel (N.B. None of this on the BBC site – they don’t need to, thank god), I was confronted with the front page.

I tried to use automatic feed discovery in my RSS reader (the Sage plugin for Firefox), but it could not find any. I hunted high and low for an orange “RSS” button, and checked to see if one had been identified by Firefox (they are usually shown in the address bar, or in the status bar) – none was present. I even viewed the source code to see if they had messed up the “embedding” in the page. Nope, not there.

Eventually, I noticed that there was a link (not the standard orange button) at the bottom of the page (rather amusingly, it advertised that it uses IPv4 – wow! – and is a member of the W3C – why?). Anyway, turns out that to view RSS feeds, you need to register (admittedly for free), telling them who and where I am. I needed to provide an email address, so that they could send me a PIN number to access the feeds. So I registered, received my PIN and logged in. It appears that I can subscribe to 2 out of there 6, or so, feeds at any one time. Not only that, but each feed is updated only once every 2 hours for each user, and I only get 6 stories per feed.

That’s absolutely terrible. The idea of RSS, is that it can be syndicated easily – I shouldn’t have to wait 2 hours to get the latest information about 6 stories.

I don’t bloody believe it! It looks like they have already reduced it so I can only five stories at a time!

I need to do something very specific, (and I am sure that I am not the only person who has ever wanted to do this) using XHTML and CSS (this also applies to HTML), it is also a very simple concept, but it appears absolutely impossible to achieve. Here’s the deal:

I have a table on a page – there is only a class name attribute on the table, it is formatted using <thead> <th> and <tbody> elements, and it has 2 columns. The table, using CSS, is set to consume 100% of the available width, and each column has a class name that is applied to all the cells in that column. I want first column to be a fixed width (but with a difference), and I would like the second column to pick up the slack space. The first column should be a fixed width column based on the width of the widest piece of data in that column – as the table resizes (e.g. with a change in Window size, or a different screen resolutions) this should not change size.

I am fully aware that I could specify this with a fixed pixel width but this does not solve a problem:

Each browser renders differently (even now!).

Each machine may substitute fonts for those that it has installed.

I have different pieces of text in this column type depending on the web page viewed (and even the status of data on that screen), so I could never know the correct number of pixels.

I am also aware that I could specify a percentage, but again this does not solve the problem:

At very thin resolutions, the cell size would be too small

At very wide resolutions there could be acres of screen space used by absolutely nothing (which could be better served by not wrapping the possibly large contents in the adjacent cell).

I cannot see a way to do what I want to do, either using standards compliant technique (i.e. CSS and XHTML strict), or even by going off-piste and using techniques that would work in only the most popular browsers.

It seems to me now, that although in-roads have been made in recent years towards getting browsers to display content in a relatively similar manner (we are still a long way from that yet), it is the layout specification that is at fault. This is not the first run-in I have had with this sort of problem – back in May, I experienced something else that I could not do from CSS.

If anyone has any clues as to how to achieve this, then please share by posting it here!

BBC Four have just broadcast a very nice interview with Sir Tim Berners-Lee (I think it might have had something to do with this new “World Wide Web” thingy that people keep harking on about!)

It was quite revealling, having not had much interest before in his morals or his “drive”. It seemed to focus more on what he thinks of the way that the web had involved, and whether “he had any sleepness nights over the perverted images available”.

Sir Tim never seemed to defend the other side of the web though. He mentioned in several instances about “the ‘Greater Good'” and mentioned receiving email from people who the Web had literally saved their lives, but he never defended his position (and his brain child)using such examples as (for example) the Open Source Software movement (I accept that that name is a gross generalisation, but it will pass for now!), or the vast, well-research, information projects such as Wiki-Pedia.

He touched briefly on the Sematic Web, but his example only seemed to promote it as a commercial aggregation tool (the example he gave was for hotel prices, but that kind of site is already available using existing Web technologies), and did not promote it as a vast information source.

More interestingly though, when questioned over any regrets he had about making the Web a commercial entity before releasing it, he gave a very strange and slightly contradictory answer. His reply stated that he did not have sleepness nights thinking that he could have made an incredibly large amount of wealth through the technology, but he did state that after initially developing the technologies, he realised that for the system to go “to a point of critical mass” – “It would have to be royalty free”. It is fair to conclude then, that monetary benefit was conjectured, but obviously rejected – for me, that would occasionally make me consider (especially with the scope of what he has achieved) how different my life could have been.

His views for the future were quite interesting, where he envisions the World Wide Web becoming “an assumption” as much as the light bulb or paper, and he expects the Semantic Web to grow further. He would not be drawn into an exact position though, merely stating that “Computer Science is only limited by people’s imagination”.