Earlier, I had posted an excel sheet that contained information about the ending cut off marks for Tamil Nadu engineering colleges, per branch, per college. This was the greasemonkey script that I used to parse the web site and get information.Firstly, I think that the original site is not very usable; most queries made would be of the form "give me possible colleges ( and possible branches)" for my cut off marks. In the original web site, you had to visit individual colleges and check out the details.The script that I wrote was a five minute hack that scraped the site and came up with the data. IT was a two step scraping process, and I know that I could have written it better. I just wanted to get the job done, and spend as minimum effort writing code as possible.The first job was to get the identification numbers of individual colleges. This included selecting each district and picking up the college link. At line 202, the function addList added the colleges to a list that I was maintaining in a cookie. I did not use GM_setValue as I initially intended to distribute the script as a bookmarklet along with the datasheet.Once the cookie was created, I manually pasted it in the script with the "tnColleges" variable, as indicated in line 13.The second task was to visit individual college pages and pick up data from there. By this time, I was pretty convinced that people would be fine with just the data sheet, so I wrote the departments and the cut off marks into GM_setValue. The script also checked if the page was not loaded; and waited some time till the page loaded with the data correctly. Every time a collge was visited and the data picked up, the page was redirected to the next data. Thus, all the data for every single college was available in the greasemonkey storage.The final job was to publish the datasheet; this is done in line numbers 53 to 103. The function reads data using GM_getValue and writes it to a table. It also classified the departments, and all departments that were not listed were shown on the firebug console. Copying the table to MSExcel gave us the data sheet ready.The entire process was complete in about 15 minutes of coding and 10 minutes of scraping. I hope it did save a lot of time for people looking for this information. I did the scraping at night so as to have minimal impact of the servers.

The Tamil Nadu Engineering results are out and I know a lot of people trying to figure out the colleges that suit their marks. Though not a very accurate method, I have seen people using the previous year's cut off to get an idea of the college that shall fall in their mark range. The TNEA website provides this data, but the interface is not really useful for someone doing analysis.I had visited my wife's home town this weekend and saw her wasting time navigating the bad-to-use site, trying to find the college for her younger brother. I offered to help, and came up with a spreadsheet that had all the details. Simply sorting the website would give an idea of the cutoff range per department, per college, per category (oc, bc, sc, st) etc.You can find the spreadsheet hosted at Google docs here. I also have the MS Excel version that I have hosted in a zipped format here. The greasemonkey script that did this is here.Would be posting the technical details on how I screen scraped (trivial, but worth documenting !! ) the entire site here. Watch out this space for updates, etc.

An appeal : Just thought of putting a personal note here. Can we please do away with the caste system in India ? Why have caste based reservation, why not economic status based ? Should economically weak OC candidates with mediocre marks suffer ?

A couple of days ago, I had posted on an OpenSocial Orkut hack. For the hack, I had hosted the phishing page on Tripod. The problem with Tripod is that it packs the normal page with an unreasonable number of advertisements. Not only popups, it also puts in advertisements right into the page. Tripod inserts a whole lot of HTML for the advertisements including scripts and divs. Though we cannot really stop the scripts, we definitely can prevent them from executing so that our page shows up without the ads.Adding a few of lines of script at the end of the page effectively hides the divs that show the ads.

The tb_container is the div with the ads at the top. At the bottom, the ads are generated using inline javascript. That is removed using the second line that executes when the script is loaded. The popups can also be disabled by initializing the AdManager to null;Sometimes however, the initial ads as show as they are embedded right into the page; I am still looking at ways to remove that also.Just insert this script, and see the page as want it to look like - without those bugging ads.

This is a follow on post for these posts.I know that the implementation was unnecessary, buy I stumbled upon another application that has a script injection, and I seized the opportunity to show the monstrous proportions this simple problem can grow to. Also, this time, since I am in IST, I could also show off the hack to friends :)The application under the scanner this time is TooStep Biz, an application that claims to deal with virtual business cards. The injection was simple and typing the target of the iFrame inserted the frame. The target has a frame killer, and redirected the page to a phished google login page. An unsuspecting user may typically enter the credentials on this page, specially if the URL is funkified.The page also exploits a cross site request forgery on iRead. Whenever a user adds a book to the bookshelf, a simple URL is fetched using a HTTP GET. The URL has all the parameters required to add a book or change its status. Opening it in an iFrame in the phished page would simple add the book. This can be extended to exploit any applications that have CSRF.

Note: The application may correct this error soon, so take a look as soon as you can !! Here is the link

I had earlier written about my attempts to break into OpenSocial on Orkut. The hack basically exploited script injection into an application called emote. The hack has since been fixed, but I thought it would make sense to write about the potential problems that such hacks in application can have.The applications open in iFrames from a different source and hence, script injection cannot really steal orkut cookies. However, since they definitely are embedded in the page, people could exploit it for phishing Google accounts. All that the iFrame is required to do is call a kill frames script and redirect the user to an account page resembling the Orkut login page. If the user is tricked into believing that Orkut did indeed log him out, credentials may be entered, making the phisher collect "GOOGLE CREDENTIALS". These could in turn be used to log into gmail, docs, etc.Another possible attack is that of a cross site request forgery. Many other opensocial application are not guarded against this. Inserting an iFrame would be a classic way to sent request to those other applications for tasks that a user may never intend. This could typically include placing a bet on a wrong team in those betting applications to deleting books from a bookshelf application. I am currently working on demonstrating this hack and need to find an injection in some other application to make requests. Watch this space for updates.