For my diploma project, I chose to do an “advanced text-editor”… something along the lines of an IDE. I’m writing it in ruby. At this point I have a GUI that provides almost everything I need. One of the things I thought my IDE would be cool to have is automatic language detection : you paste some source code in the editor, and it will highlight it BEFORE you save the file to disk. For this purpose I created the following class :

It’s still “very incomplete” ( to say the least ), but I’ll continue to work on it and improve it. Here is how I envisioned something like this works : you split the code into tokens ( actual tokens, not by whitespace as I did here ), and you assign each token to a language. Each language has a “score” associated to it. When the language detector finishes with the last token, all that needs to be done is to obtain the key with the highest score from the score hash. Here is a snippet of how you could use it :

require"language_detector"
language =LanguageDetector.new
language.detect_language("this is a test")
# this will output textputs language.get_language(language.get_score)
# because I'm tokenizing based on whitespace,I have to put spaces between tokens# this will change in a future version
language.detect_language("public static void main ( String [] args )")
# this will output javaputs language.get_language(language.get_score)

This class will be updated to provide better support for ( more ) programming languages really soon.