Advogato blog for robogatohttp://www.advogato.org/person/robogato/
Advogato blog for robogatoen-usmod_virguleTue, 3 Mar 2015 22:39:29 GMTSun, 5 Feb 2012 08:30:24 GMT5 Feb 2012http://www.advogato.org/person/robogato/diary.html?start=36
http://www.advogato.org/person/robogato/diary.html?start=36As you probably noticed we're under attack by spammers again. Heavy account creation and blog spamming wiped out the recentlog. It's partially recovered and should be back to normal in another few hours. Account creation is off for now so that should prevent further spamming but the site may be slow due to the heavy traffic generated by the spammers. Looks like a botnet or multiple proxies being used. If anyone's interested in doing a little research on their own, here are a few of the many IPs from which the spam is originating: 173.208.47.67, 218.186.17.251, 190.212.92.132, 99.129.227.221, 86.122.20.133, 61.140.173.221, 67.72.247.233, 176.9.33.251, 110.4.89.20, 122.177.153.205, 66.56.158.67, 72.64.98.16, 79.141.172.14.Thu, 26 Jan 2012 23:25:13 GMT26 Jan 2012http://www.advogato.org/person/robogato/diary.html?start=35
http://www.advogato.org/person/robogato/diary.html?start=35<p><b>More Minor Security Updates</b></p><p>I declared an Advogato hacking day today and got a little more work done on our security ToDo list. I've added a set of cryptographic nonce functions to generate tokens for email verification and CSRF prevention. The tokens have configurable expiration times. The new code replaces the hard-coded token generation used by the original cookie functions.</p><p>I also added a generic email function that can be used for account verification. This replaced the hard-coded part of the password recovery email function.</p><p>I was able to get the CSRF token code integrated with the account creation forms. It's tested and live. Hopefully this will knock out a few more of our automated account spammers including the commercial Incansoft spamming tools. I've still got a little more work to do before I can turn on the email verification but we're nearly there.</p>Mon, 12 Sep 2011 16:00:12 GMT12 Sep 2011http://www.advogato.org/person/robogato/diary.html?start=34
http://www.advogato.org/person/robogato/diary.html?start=34<p><b>Status Update</b></p><p>Advogato has been under a sustained attack from spammers since 11:00 UTC Sunday. The attack is originating from a botnet of at least several hundred nodes with world wide distribution. The attack is automated and creates 10 to 20 new user accounts with large, spam-filled blog posts every minute. I discovered the attack around two hours after it started and immediately turned off new account creation.</p><p>Mod_virgule buffers the 100 most recent new accounts for display in the "recent people joining" box on the front page. The attackers had blown past that number pretty quickly, requiring me to use the web server logs to track down and remove the bad accounts. Once removed, it left the recent accounts buffer completely empty. It will fill up again once I'm able to turn new account creation back on.</p><p>I spent a while Sunday logging and blocking IPs for individual nodes of the attacking botnet but basically gave up after blocking the first hundred or so. With account creation off, the attackers fail to create accounts and what we're left with is a low-level DDoS attack. The bandwidth being used isn't disabling and hopefully the attacker will give up once they realize no new accounts are being created.</p><p><b>Other Fun</b></p><p>The switch to the libxml2 HTML parser solved a lot of internal problems but as some of you have noticed, it introduced a new one. Libxml2 "thinks" in XML and when it comes across a set of HTML tags with no content, such as &lt;em&gt;&lt;/em&gt; it turns that into a self-closing tag: &lt;em /&gt; which is great if you're viewing the result with an XML parser but most browser HTML parsers can't parse certain tags as self-closing and see the tag as an open with no corresponding close. This has the effect of including all the subsequent markup on the page inside the offending tag, usually terminating display of the page.</p><p>It looks like only a handful of tags produce this effect, so it should be possible to filter them out. It may be possible to drop empty tag pairs before parsing or convert them back to open/close pairs.</p><p><b><a href="http://www.advogato.org/person/redi/diary/249.html" >Redi</a></b>: in theory yes but the mod_virgule codebase is scary mix of HTML 4 (and earlier), XHTML, and XML. Throw in the random markup coming in from syndicated blogs and the resulting tag soup is very difficult to normalize without breaking something. However, incoming blog markup was previously being normalized to XHTML by libxml2 and I'm thinking now, we may have to switch that to HTML 4 to force the open/close tags. The function you mention produces different output depending on what markup type is specified on the tree (or on the individual node). So, parse the blog, walk the tree forcing it all to HTML 4, then ask libxml2 to export it. Maybe... I'm doing some work on the code today, so I'll let you know.</p><p><b>Another Update</b>: I've got some code changes in that might (or might not) help with the broken tag problem. We'll have to see if any incoming blog posts break anything over the next day or so. Nothing new on the spam attack, it's still going strong. I'm going to look at implementing a few more security features in the code that might allow us to turn account creation back on without waiting for the attack to subside.</p>Thu, 2 Jun 2011 23:15:35 GMT2 Jun 2011http://www.advogato.org/person/robogato/diary.html?start=33
http://www.advogato.org/person/robogato/diary.html?start=33<p><b>Robogato Returns</b></p><p>We had a bad hardware crash recently and, as I was restoring Advogato to new hardware, I realized that it's been too long since I've devoted any significant time to improving the code around here. I took advantage of the downtime caused by the crash to make some final tweaks to the long-awaited libxml2 based HTML parser and made it live. It fixes a lot of the rendering problems already and will fix more once I make a few more tweaks.</p><p>I'm also working on improving security in general and making account creation by spammers harder in particular. I had a nice email exchange with <a href="http://www.advogato.org/person/dkg/" >dkg</a> about the subject awhile back. He took a look at the code and provided a laundry list of things that needed fixing or improving. I'm working on those now. The first change just went live this week - mod_virgule now requires the POST method for submitted forms. This minor change already stopped a couple of our automated account spammers who were creating accounts with GETs. Only the dumbest spammers were doing that I'd think. Using POST isn't much harder. More changes to come.</p><p>If you're wondering what caused the increase in spam accounts we've been seeing for the last year, here's a possible contributor: Incansoft, apparently a purveyor of web-based spam tools, added an Advogato attack to a spamming tool they sell called Web20Bot (sorry, not going to link to it but you can google it). Web20Bot will create phony account profiles containing your backlink spam on 20 websites including Advogato.org, squidoo.com, wordpress.com, blogger.com, tumblr.com, and livejournal.com. They claim Web20Bot handles email verification and captchas, so working out a defense may be interesting. I doubt any of their spam lasts more than 48 hours around here anyway but it would be nice to make life harder for them. (incidentally, if someone were to come up with a copy of this thing so we could analyze it, that might be cool - maybe we could help other sites being attacked by it too).</p><p><b>Update:</b> Thanks for <a href="http://advogato.org/person/redi/diary/243.html" >pointing out those issues, Redi</a>. I've fixed the diary edit problem, it should not have been checking for a POST. The &lt;person&gt;, &lt;project&gt;, and &lt;wiki&gt; tags were special cases in the old HTML handler. If one is broken, all three probably are. I'll get on that now. It will take me a little while to track down the problem. &lt;proj&gt; was deprecated in favor of &lt;project&gt; way back in the Raph days but the code checking for &lt;proj&gt; wasn't dropped until this most recent update. I didn't realize anyone still used it. I can add it back in.</p><p><b>Update 2:</b> Ok, found the problem. The old tag handlers output directly to the apache buffer while the new handlers modify the XML tree, which is rendered to the buffer later. I need to modify or replace the handlers for those three tags. I'll try to get to it today if time allows.</p><p><b>Update 3:</b> I think the special tag issue is fixed now, let's try this code for a day or so and see if any problems show up.</p>&lt;person&gt; test: <person>redi</person><br/><br/>
&lt;proj&gt; test: <proj>mod_virgule</proj><br/><br/>
&lt;project&gt; test: <project>mod_virgule</project><br/><br/>
&lt;wiki&gt; test: <wiki>WikiPedia:Advogato.org</wiki><br/><br/>
Wed, 21 Jan 2009 19:33:28 GMT21 Jan 2009http://www.advogato.org/person/robogato/diary.html?start=32
http://www.advogato.org/person/robogato/diary.html?start=32<p><b>Watch for Spammers</b>
<p>If you're wondering about the source of the recent
increase in phony users signing up for Advogato accounts, I
think I've found it. A number of Russian
SEO/spammer blogs are discussing a list of websites that
seem to be highly trusted by Google based on the ratio of
pages in the main Google index to the <a href="http://www.mattcutts.com/blog/indexing-timeline/" >supplemental
Google
index</a>. Advogato is #16 on the list. (I'd provide some
links but giving them links
from Advogato is the last thing we should do. If you're
curious you should be able to find them using a site like
<a href="http://technorati.com/" >Technorati</a> to find
blogs that have linked to Advogato in the
last few weeks.)
<p>A side effect has been a big bandwidth hit. I thought at
first we'd been slashdotted. But the main result is a rash
of SEO spammers signing up for Advogato accounts and trying
to find some way to get backlinks to their link farms and spam
sites. Average survival time for their profiles has been
less than 48 hours so probably nothing to worry about but
everyone should take a look at the "recent people joining"
list and flag anyone who looks like spam. Hopefully it will
die down in a week or two.Sun, 24 Feb 2008 00:15:50 GMT24 Feb 2008http://www.advogato.org/person/robogato/diary.html?start=31
http://www.advogato.org/person/robogato/diary.html?start=31Test post for the libxml2 HTML parser
<p> <p> In theory, the libxml2 HTML parser should make best
guesses on how to fix screwed up, illegal HTML and all tags
should get closed at the end of this diary entry, preventing
problems in diary entries that follow or elsewhere on the page.
<p> <p> <b>bold tag with no close
<p> <p> <i>italics tag with no close
<p> <p> <strike>strike tag with no close</strike></i></b>
<p> Update Jan 2009: after a long downtime, I'm finally working
on the HTML parser again. Should have it live this month!Thu, 10 Jan 2008 17:36:03 GMT10 Jan 2008http://www.advogato.org/person/robogato/diary.html?start=30
http://www.advogato.org/person/robogato/diary.html?start=30<p><b>Advogato Status Report</b>
<p>
My New Year's resolution is to start doing monthly
status reports again! Here's the first one.
<p> Even though I haven't posted a status update in a while,
minor code updates have continued. To find out what's
changed in the live <a href="http://www.advogato.org/proj/mod_virgule/" >mod_virgule
code</a> running Advogato, see the <a href="http://svn.dprg.org/viewvc/mod_virgule/trunk/ChangeLog?view=markup" >changelog</a>.
It's always there and nearly always up to date.
<p> The biggest change has been in the XML file store locking
code. The previous system relied on a site-wide read/write
lock that locked out access to the entire database when
writes were happening. This was getting to be a problem
because of trust recalculations and diary syndication that
happens at the top of the hour. Write locks were often
clogging things up for 10 to 15 minutes per hour.
<p> But it's all good now. All the locking code has been totally
ripped out and replaced with file-level locking. There
should almost never be any detectable site delays caused by
locking now. Besides fixing the hourly slowdowns, this
also gives us a little more breathing room to continue growing.
<p> Another recent change is a patch from <a href="http://www.advogato.org/person/fzort/" >fzort</a> that
improves the HTML parsing code to eliminate undesirable tag
attributes. The long-term the plan is still switching to
libxml2's HTML parser and junking the one in mod_virgule
but, until then, this should make things a little more secure.
<p> A few other fixes and improvements:
<p> The GUID of syndicated blog posts is now preserved when they
go out on the
Advogato diary RSS feed.
<p> Mod_virgule now has built in support for Google Analytics.
Drop your GA ID code into the config.xml and the appropriate
GA markup appears on every page throughout the site.
<p> <a href="http://www.advogato.org/person/presbrey/" >Joe
Presbrey</a> of MIT contributed a patch for an external FOAF
URI on the user profile. This allows you to link your
Advogato FOAF to any other existing FOAF profile you may
have, helping to consolidate your online identify.
<p> The computed trust level for each user is now exported via
FOAF, referencing a local RDF schema that describes the
trust levels. This mechanism was suggested by Sean B. Palmer
and <a href="http://www.advogato.org/person/connolly/" >Dan
Connolly</a> on the W3C <a href="irc://irc.freenode.net/#swig" >#swig IRC channel</a>.
Fri, 31 Aug 2007 23:29:23 GMT31 Aug 2007http://www.advogato.org/person/robogato/diary.html?start=29
http://www.advogato.org/person/robogato/diary.html?start=29<p><b>Advogato Status Report</b>
<p>A new rev of <a
href="http://www.advogato.org/proj/mod_virgule/">mod_virgule
code</a> is live on Advogato. See the <a
href="http://svn.dprg.org/viewvc/mod_virgule/trunk/ChangeLog?view=markup">changelog</a>
for the details. Here are a few highlights.
<p>
A discussion between <a
href="http://www.advogato.org/person/raph/">ncm</a>, <a
href="http://www.advogato.org/person/raph/">raph</a>, and <a
href="http://www.advogato.org/person/raph/">chrisd</a>
speculated on why there seemed to be a decline in Google
rankings for individual blog content on Advogato lately. It
was suggested that a change in the Google ranking algorithm
may be placing less value on pages with dynamic URLs like
<a
href="http://www.advogato.org/person/ncm/diary.html?start=191">http://www.advogato.org/person/ncm/diary.html?start=191</a>.
Advogato has long had static URLs for individual articles,
so I've added similar support for each individual blog post.
If you click the permalink marker beside one of your blog
posts, you'll see it now goes to a static URL with just that
one post on the page instead of to a dynamic URL that
includes a range of posts. For example: <a
href="http://www.advogato.org/person/ncm/diary/190.html">http://www.advogato.org/person/ncm/diary/190.html</a>.
The old, dynamic system is still in place so search engines
and existing links will get to the right place, of
course. There's another advantage to having the static URLs
to individual blog entries. These will be used for comment
pages eventually. Yes, blog comments are really coming. I
promise. Some day.
<p>
There's also a fix to minor foaf:mbox_sha1sum bug that was
noticed by <a href="http://harth.org/andreas/" >Andreas
Harth</a>.
<p>
You may have noticed that our Italian cittaditorino spammers
were back with a vengence the last couple of weeks. The
community spam flagging system seems to be controlling them.
Most of the bogus accounts are being deleted within a few
days of creation. At ncm's suggestion, I've added
rel="nofollows" attributes to all links to untrusted users
in the recentlog,
recent people joining list, and Advogato People index.
There were already nofollows on all links created by
untrusted users but this new addition should prevent search
engines from even indexing their profile and blog pages.
With all these spam control measures in place, keep in mind
it's a little harder than it used to be for real users
to create an Advogato account and get certified. Well-known
users aren't having much trouble and the new trust injected
by adding <a
href="http://www.advogato.org/person/mako/">mako</a> as a
seed has helped tremendously. But there
are users here and there who haven't collected enough
certs to become trusted, like <a
href="http://www.advogato.org/person/pabs3/">pabs3</a>.
<p>
<p>That's all the news for now but more new features are on
the way.Wed, 1 Aug 2007 15:46:30 GMT1 Aug 2007http://www.advogato.org/person/robogato/diary.html?start=28
http://www.advogato.org/person/robogato/diary.html?start=28The URL rendering bug that <a
href="http://www.advogato.org/person/redi/diary.html?start=102">redi
spotted</a> has been fixed, I think. Looks like it was an
artifact of the Apache APR 1.3 to 2.0 upgrade that had gone
unnoticed for a quite a while. If anyone spots any other URL
issues in the project section, let me know.Mon, 30 Jul 2007 18:29:13 GMT30 Jul 2007http://www.advogato.org/person/robogato/diary.html?start=27
http://www.advogato.org/person/robogato/diary.html?start=27<p><b>Advogato Status Report</b>
<p>A new rev of <a
href="http://www.advogato.org/proj/mod_virgule/">mod_virgule
code</a> is live on Advogato. See the <a
href="http://svn.dprg.org/viewvc/mod_virgule/trunk/ChangeLog?view=markup">changelog</a>
for the details.
<p>Aside from the usual minor bugfixes and tweaks, there are
two new features you may have noticed already.
<p><b>New certification indicators:</b> A visual
indication is now added to trust certifications that are
less than
30 days old. This should make it easier to spot new certs on
the user profiles. You can check this out on your own user
profile if you've certified anyone, or been certified by
anyone, in the last 30 days.
<p><b>Article lists:</b> Ever wonder how many Advogato
articles you've posted? Or wanted to read other articles by
a particular poster? Each user profile now includes a
reverse chronological list of the 10 most recent articles
posted by that user. For <a
href="http://www.advogato.org/person/lkcl/">users who are
more prolific</a>, there is a link to a separate page that
includes a <a
href="http://www.advogato.org/person/lkcl/articles.html">complete
listing</a>
of all articles posted by that user.
<p>In addition to providing a new way to explore Advogato's
articles, this should provide another direct route for
search engine robots to find the static links to the
articles.