On this blog I regularly publish articles with tips and tricks for the programming language C# .Net.
C# is a modern, object-oriented programming language, which fully ultilizes the possibilites of the .Net framework. I also write about app programming for Android via C#, as well as PHP and Matlab. The difficulty of this blog is supposed to be variable, I hope for beginners and experts there is always something to look for.
If you have questions or suggestions, I am happy about your emails.

Friday, March 11, 2016

Randomly Browse through the Internet with C#

In this post I want to show how one can surf through the Internet via C# by following random links. This is not only very exciting by itself (pretty interesting to see where one ends up after a couple links), but also has practical applications: For example Google's PageRank algorithm for rating the popularity of websites uses a similar model.

The following C# program contains a webbrowser control and a button. When the user clicks the button, the program searches the current website for a random link and displays the target website in the webbrowser.
The code should be relatively self-explanatory: The class Browser covers the searching for new links. It saves the current page as well as the current page source code. If the method GoNext() is called, this calls FindLink() to find a random link. For this a random starting position in the source code of the current page (the source code is obtained via a Webclient) is chosen and then the first link after it chosen. I find this method to be more efficient than to scan through the whole document first and then choose a random link. But watch out: This way certain links are preferably chosen since we do not directly work with the probabilities of links anymore! We now work with a probability distribution over strings, and since the links are probably not uniformly distributed over the source code (for example in the beginning there is a big header etc.) our selection has a certain bias.
When we found a link we use the method from the previous post to convert relative links to absolute ones, if necessary, and follow it.
This program works but is still relative basic, for example it can run in dead ends etc., also as previously noted, the link selection is not totally random. I post it here in this form because I think that for an application it wil be customized by the user anyway, and the applications differ heavily.

So have fun trying this out and leave me interesting link chains in the comments!

publicstring GoNext()
{// randomly go to a new website
CurrentPage = FindLink(); // for this find a random link
CurrentContent = GetText(CurrentPage); // and for the next seach store the source code of the current webpagereturn CurrentPage;
}