Friday, April 4, 2014

I used to have a goal that is to make a website to collect all premium accounts that available all over the internet. Unfortunately, I didn't have enough knowledge and skill to code my own bot. Thanks to Berkeley, I've learnt a lot that I could be able to make my own crawler and web bot. Read the description on the website and you know what it does :) I don't need to explain. Please remember it's [18+] only and Read the Terms of Use before using.

The idea of the website is to have my bots crawled over the available resources on the internet, analyze the data and utilize them. Using Markov's decision (CS188 - Artificial Intelligence), the bot will learn the belief which website tends to have, means a better chance to contain the combination (username/password). The idea is to crawl and find all websites that have the links relate to the list a desired websites or terms, excluding those contain bit.ly, bc.cv, adf.ly etc ... any kinds of advertisement links. Yes, no Bullshit, No Clicklink and No Survey needed.

Algorithms / Technologies used:
- Flask, as a web framework.
- Python, as a main language.
- JQuery, as part of Front-End design.
- Markov's Decision Process, main algorithm to search.
- Quicksort/Bublesort, to sort the account lists.
- User-Agent-Generator, to prevent being blocked by some websites (https://github.com/nguyenph88/User-Agent-Generator)
- Proxy/Sock, to prevent being blocked by some websites (removed due to time consuming)

Challenges that I had to solve:
- Decode and analyze the received data, as not all the websites give the same result of the combination.
- App is written in Python so pretty much Heroku is a free platform that I can use (hopefully, I still have 200$ credits)
- Since the crawling process will take time, and i'm limited to 1 dyno (like a virtual box of Heroku) so I have to stop the bots at some points to prevent exceeding my limitation. Also the longer it crawls the more result it will have and we don't want that, since most of the results are garbage.
- So garbage data, how do I know? Well, I tested the result from some of the websites myself, and Hard-coded which website provides good data and which provides garbage.

Limitation:
- Since i'm making this app as part of my "childhood's goal" so it's for FRIENDS ONLY, the account is manually created.
- I made it to apply my knowledge, not to take advantage of anything or be harmful to anything, provided info are on the internet.
- I don't intend to earn couple money from this so I just keep it that way, plus the more people join, the more dynos that I need to work with and it will fry my bots (stuck or provides NULL data)
- The code is not completed so the data is not completely utilized.
- Some websites have security method, trying to log in the same accounts will result in accounts being blocked.

I also made this for education purpose only, so hope you enjoy it, as my friend :) PM me for the username/password.