Blog Stats

Tag: .netcore

For all the REST resources we deploy out every day, one of the most common scenario a developer ends up handling is designing a nice and effective search/filter/query endpoint for these aforementioned resources. Usually in these scenarios conventions play a big part and that starts from query filter parameters in the query string to using a full blown search engine like ElasticSearch, Lucene or Algolia. Vinay Sahni did a phenomenal job for putting out the gist for basic REST practices here. If you are new into REST and want to get yourself started talking REST, it’s worth a look.

Today, we will do a simple yet fun experiment. Our goal today is to create a simple nifty looking query language for a sample REST resource. We will try to mimic a couple of features Twitter Search Api provides and our sample data storage today will be Azure Cosmos Db: Document Db which previously was known as just Azure Document db. Recently it moved to the Azure Cosmos Db family and if you have missed it, here is the full details.

Therefore, the first thing we need is a set of sample data we can use as our REST resource. Since we are going to mimic the twitter public search api, we definitely need some sample data first. And of course github comes to the rescue. Download the sample data and put it to your own document db instance. If you don’t know how to create a document db instance, follow the quick start here. I’m referencing the link to the .net quick start, but there are node.js, Python, Java and Xamarin resources right by the side of it. For those of you who do not want to test your work against an Azure hosted document db and want to test Azure document db locally, you can always opt for the azure document db emulator.

Now, for the sake of this experiment I ported the github twitter sample data to an azure document db instance. It has 10 entries in that small data set and hosted here. If you want to browse the data, azure portal does allow it. But I suggest looking at this open source app here named Azure DocumentDB Studio. You can use the endpoint and the key written in the sample code in github to connect. The database is public just for you guys so you can test the sample code I have hosted in github and will list out in the end of this article. TL;DR people, this is your cue to go to the end of the article, but I highly suggest to stick around.

Features to be built:

Out of all the features twitters search api provides, I will port the to:UserAccount, from:UserAccount and “exact text” search capabilities. That means, We will be able to search tweets sent from one account to other and we will also be able to search tweets by mentioning a string we like to be present in the tweets. For an example if we want to find tweets

If you have an eye for detail, you will notice that this is not exactly like the twitter search api since doesn’t use an and operator. But for the sake of the simplicity in this example this should be enough.

Tools we are going to use:

Our api stack will be written in asp.net core. We will use ANTLR as our query language lexer-parser. If you need an ANTLR primer and another experiment I did with it, please have a look here where I tried to gobble up a simple scripting language. If you are not really accustomed to any of .net stacks, fret not. All of these are totally doable in any other tech stack you will possibly prefer to use. Since ANTLR has targets for multiple languages, api can be built in basically any language and even for our sample storage solution azure document db here, you have access to multiple client SDKs.

The data:

As I mentioned above, we have a little data set of 10 entries. I’m posting a sample entry here so the rest of the tutorial makes sense. One entry points to a single tweet made by a sample user.

It’s a small and simple grammar like the one we used in our scripting language tutorial. Since our REST resource query is essentially an expression, our entry rule will be expr. The railroad diagram for expr looks like:

Rule expr

This means we can have a single search term or multiple search term chained by a op rule. The op rule is nothing but the two relational operator we support: AND and OR.

Rule op

Along side of the op rule we also need to know what the term rule stands for. The term rule stands for the search term format we are allowed to use. In our sample query stated above, we have terms like from:terminator, to:robocop, “I’m back” or a hashtag like #HastaLaVista. That’s why we have 4 rules defining all of these cases and the term rule is a OR relationship between them.

rule term

I’m not going to post the railroad diagram for toText, fromText, hashText and exactText rules since they are pretty self-explanatory if you have a cursory look at the grammar.

So, what are we waiting for? Let’s start writing our little codebase that will parse this query string and translate it to an azure document db SQL that we can use to fetch the tweets. For that, we need a small repository that will connect to our desired database and collection in document db and will let us fetch some items. I only added methods that will allow us to read the tweets and connect to the database. I ignored the rest since you can always have a look at those in the quick start for azure document db.

Here’s our small rough database repository. Please remember to keep your endpoint and key strings somewhere secret and safe in production environment, since this is a tutorial, I went with what is easy and fast to go for a proof of concept.

I opted for executing a SQL instead of a linq expression since constructing SQL is easier and simpler for a tutorial. Plus it decouples the query structure from compile time POCOs that we use as our models too.

I created a DocumentDbListener class based off the SearchBaseListener which was auto-generated from our ANTLR grammar. The sole purpose of this class is to generate a simple SQL against our search expression. To search inside nested arrays, I used a user defined function for azure document db. All of these are very crudely written, so forgive my indecency. Since this is just a tutorial, I tried to keep it as simple as possible.

All this method does is it tries to find nested array elements based on the match we send back. You can achieve the same result thorough JOINs in Azure Document Db or Array method ARRAY_CONTAINS, but I preferred a user defined function since it serves my purpose easily.

Constructing SQL from the query expression:

To understand how the SQL is generated from the query expression, let’s begin with the to:UserAccount expression. Since we start with the rule expr, let’s override the SearchBaseListener method EnterExpr first.

The approach I took here is essentially the simplest. I handle the events fired the moment ANTLR enters a specific rule and I keep appending the SQL string to the whereClause. Since, entering the expr rule means that I will need a where SQL clause, I initialized it with “WHERE”. The thing to notice here is instead of concatenating I chose to initialize it because I expect this event to be fired exactly once since that is how the grammar is designed.

Following the same trail the next thing to handle will be the EnterTerm event. But, term is nothing but an OR relationship between 4 other rules. Handling them specifically gives me the edge since they produce simpler and smaller readable methods. For example, if we want to handle the to:UserAccount expression, a simple method like following should be sufficient for our use case.

We got our listener ready! Now, all we need is a context class that will bootstrap the lexer and parser and tokens so the input expression is transpiled and the output SQL is generated. Just like our last work on ANTLR, the TweetQueryContext class will look like the following:

I reused the db repository we created earlier and as you see that is dependency injected in the controller which you have to configure in the ConfigureServices method in your Startup class. I’m not adding that specific code here since it is already in the sample code and doesn’t belong to the scope of this tutorial. Same goes for the model class Tweet and the classes it uses inside.

Time to test!

The project is hosted here in github. Clone the code, and build and run it from your visual studio. As a sample query try the following:

http://localhost:5000/api/tweets/search?q=to:hatena_sugoi AND from:maeta_tw OR %23HopesUp OR "Surely June is a summer"

I url-encoded the hashtag here just to be nice on the REST client you might use. I highly suggest Postman if you don’t want anything heavy.

I only took the minimalists way of using ANTLR here, you can build your own expression tree based on the auto-generated listener and can do so much more if you want. A perfect example will be LinqToQuerystring written by Pete Smith. It generates the necessary LINQ expression for any IQueryable and thus allows you to write database agnostic Odata driven queries and it’s tons faster and lighter than the one Microsoft ships.

For everyone of us who wakes and codes everyday somewhere in this world, we often find ourselves pretty attached to the programming language we love most. All of us feels that point of idiosyncrasy when we try to break our boundaries and try to learn something new since this field is always expanding and evolving like a universe on steroids.

For those who still are newbies and is trying to find ways to understand how a programming language works, what could be better than trying to make one of your own? 🙂 This post is essentially a small proof of concept where we will go for a small nice fun looking esoteric programming language.

Before we jump into the very tidbits inside, allow me to explain what we will do here. We will write a small programming language that is parsed and lexed by ANTLR, transpiles to C# and the transpiled code is fed into Roslyn C# script api.

If the aforementioned words are looking too wordy for you, let me clear it right up for you. Like every language we use to talk every day, programming language comes with a grammar itself. It should be clear as a daylight to you if you have written a single line of code in your life. Every programming language follows a pretty defined structure and a dancing parade of words following that. So, since we are creating one, the first thing we need is that grammar for our language. Instead of doing it from scratch, our tool of choice is ANTLR, which stands for “Another Tool for Language Recognition”. ANTLR will help us to define the grammar, generate the lexer and parser for it and we will eventually be able to reuse those components to transpile our code to C#. Remember ANTLR do support for javascript, java, python too. So, if you want to have your lexer and parser defined in those languages, please don’t refrain yourself from using this.

Now, even before we start talking about languages and everything else, we need to understand a bit about compiler theory. Now, I can go into gory details of lexer and parser but since this is an age of google, I will let that responsibility fall in your hand. We will take somewhat of an exploratory way to understand how everything works and take it from there.

The first thing we need is a grammar. To understand the programming language we need something that understands all the words here. And that guy is the lexer. A lexer eats up the whole code block we send to the compiler, chops that into little lexemes (read strings/words here). So we can understand when we have hit a specific keyword. Like for our dummy language here, we used the popular internet lingo derp to initialize a variable.

The next thing in line to do is making some meaningful excerpt from the words. Like when we write a if block, the compiler needs to know what strings of characters or words should stand together to construct a meaningful statement or expression. That is the job of a parser. If you see closely, you will see that every block of code can be expressed as a tree. For example, the initialization block here looks like the following:

This is definitely a single branch tree, it says you need to write the word derp to start the statement and you should end it with a smiley ( 🙂 ) to finish the statement. In the middle you use an identifier (name for your variable) and an ASSIGN operator (=) to define the flow of assignment from right to left. That tree is actually generated straight from the grammar we are going to use today.

So, why wait? Let’s have a look at the grammar we are going to be using today:

Now, my first advice here will be to not get confused by the size of the grammar and start taking out bits we do understand at first. If you refer to the first image in this and have a look at the grammar you will see that the rule assignstmt is defined the exact same way depicted in the picture. If you find the words ID, ASSIGN and SMILEY, you will also see that all of them are defined in the grammar below where they have their literal form. These tokenshelps the parser to understand what you have written. And the assignstmt is called a rule, which define relations among different tokens or rules. See? this is how we build rules for our grammar. Because a grammar is nothing but a set of well defined rules.

By the very first rule named compilationUnit,our language is nothing but a set of statements ending with an EOF. Every statement can either be a printstmt (for printing), ifstmt (if-else) or the aforementioned assignstmt. Neat huh? You can even go down and find out how the rest of the tree is built.

I strongly suggest using Visual Studio Code and its ANTLR4 extension for writing ANTLR grammar files. You will have nice looking railroad diagram like the one I posted for every rule you write!

Time to get our hands dirty. Fret not since the whole project is publicly available over github so I will only point out excerpts of the code that we need to visit. And I’m going back to the same assignstmt. First thing we need to do is download ANTLR . Go to the download page and download the latest .jar. ANTLR is written in Java so you will need to have java installed in your machine. We will use .net core here for the rest of the work so you can download that for your OS too. When you are done downloading ANTLR you can generate the target code for C# from your grammar file. In our case the grammar file name was Profane.g4 and to generate the C# lexer and parser based off the code all we had to do is invoke :

java -jar antlr-4.7-complete.jar -Dlanguage=CSharp Profane.g4

It will generate the C# classes you need for walking through the grammar. You will see a lexer and a listener class being created. You don’t really need to touch the lexer class. The listener class is used to handle events that are fired when ANTLR encounters a rule. So, to make use of our assignment statement we need to handle the event that is fired when ANTLR encounters our assignstmt rule. But how will we ever understand which event is that? Here comes the ANTLR magic.

ANTLR will generate the event you need for all your rules every time you create a lexer and parser from your grammar. That means every time you change your grammar you need to regenerate the C# targets again, the lexer and listener class. I added the command in the build target for the sample project so it will automatically do it on build.

So, lets create a class inherited from ProfaneBaseListener class which was auto generated by ANTLR and name it ProfaneListener. Since our rule was named assignstmt ANTLR will generate a method called EnterAssignstmt inside it which will be invoked every time ANTLR encounters that type of statement or that rule. Our target here is to generate equivalent C# code from it so

derp some = 10 ':)'

will be transpiled in C# equivalent code like the following.

dynamic some = 10;

And we are going to do that in the method EnterAssignstmt which ANTLR has built out for us. Let’s have a look at it.

This is really simple work here. We are asking the context class to give us the ID and the expression so we can generate the equivalent C# code string. The nice thing you can notice here is ANTLR has generated class for our assignment statement context itself. But what is that ResolveExpression method. If we remember the rule it was :

Rule assignstmt

Here expr is another rule. Which looks like.

Rule expr

That means, expr can either be a rule named term or a rule named opExpression. Let’s have a look at both of them.

First is the rule called term. A term can be an identifier (the name for your variable), a number (int or float) or a string. Either one of these.

Rule term

That means that term can also be a valid value for rule expr. Since expr is either term or opExpression

The opExpression on the other hand is a tad complex one. This is an example where you can reuse multiple rules to create a complex rule.

The only thing that we don’t know here is the op rule. the opExpression rule says it is structured as term followed by a rule named op and there has to be another term in the end.

Rule op

The op rule defines an OR relationship between PLUS, MINUS and ASSIGN which stands respectively for ‘+’, ‘-‘ and ‘=’. That means this rule says we can write things like

derp some = 10 + 2 + someOtherDerp 🙂

Amazing! Isn’t it?

Before we look into the ResolveExpression method, let’s see a nice looking tree on how that previous statement actually gets parsed by all these rules.

This will hopefully help you to understand how things are happening here. Now, lets see the ResolveExpression method which transpiles the rule expr.

We also see two new methods the ResolveExpression method uses named ResolveTerm and ResolveOpExpression. They generate the transpiled C# code for term rule and opExpression rule. We keep adding the generated code in a string named output and when it’s done we have our C# transpiled code ready to be executed.

Executing C# as a script using Roslyn:

Now that we have our C# code to be executed, we will use another tool called Roslyn. It is a compiler tool for .net that gives you rich set of features regarding code analysis and compilation. We will specifically be using the C# scripting api.

First we need to generate the AST for our code. If you are scratching head on what an abstract syntax tree is, you already saw something like it in the tree we posted before. To generate the tree, you need your lexer. Your lexer will generate tokens. Your parser will use those tokens to start the execution unit you need to compile your code which is the root node for your tree. This is still done using ANTLR.

When you have your tree, you need to walk on it. When you walk on the AST, the listener fires the events we need to generate the transpiled code like we did minutes ago. So the listener output is our C# code which we need to feed to roslyn.

Roslyn C# script engine comes with a nifty class named CSharpScript which will do the trick for you. All we have to do is to feed the C# code and load the assemblies we will need. Then, it will do its magic and we will have our own scripting language talking to us.

I uploaded the full sample code in github here. You need to build and run the Profane project which is a console app. It will host a small web api in port 5000. If you POST your code to the endpoint as plain text in the POST body, you will get back the output of your code. Postman can be a nice client to do so.

Hope this was fun. Knowing the internals of your daily programming language essentially boosts up the confidence while you write it. So make your own esoteric language if you have time. It’s always fun to make em.