Compiled Experience

Guarding against N+1 issues in GraphQL

13 Mar 2019 by Nigel Sampson

The N+1 query problem is a common one to encounter during software development, particularly with ORMs (Object Relational Mappers) and their capabilities around lazy loading. A quick example looks like

varorders=GetOrders();foreach(varorderinorders){Console.WriteLine($"{order.Customer.Name} made an order on {order.CreatedOn}");}

Unless you’re careful, the lazy loading of the Customer of the Order results in a database query, for 100 orders this would result in 101 database calls, one per Customer and one extra for the orders (hence the name N+1).

I won’t go into the ways we can solve this in ORMs as this is a well documented problem with a myriad of approaches to avoid it.

GraphQL and N+1

When we look at a typical GraphQL query we can see the makings of the same problem.

query {
orders {
createdOn
customer {
name
}
}
}

Whether the above query results in N+1 database queries is completely up to the the implementation of the resolvers. Most GraphQL server frameworks support the idea of a DataLoader. A Dataloader allows you to batch the Customer requests into a single query and put the correct Customer into the correct location of the results. For better examples you can view the documentation for Hot Chocolate, a popular .NET GraphQL server framework. This only works when the underlying data source supports a performant batch operation (in these case getting a list of customers based on a list of ids).

When DataLoader won’t work

Sometimes however this batch operation isn’t supported and we’re left with our N+1 performance problem, what can we do?

Let’s assume our Orders and Customers data live in different microservices and that Customers microservice doesn’t support a batch operation for requesting a list of Customers.

In that case a query like the above one wouldn’t be performant, resulting in N requests to the underlying Customer microservice, however a query like the following would only result in a single call.

query {
order("b3JkZXI6NDI=") {
createdOn
customer {
name
}
}
}

So how do we ensure one is valid, but one is invalid? The easiest way would be to define different Order types in our schema, one that has customer field and one doesn’t and never expose the former in a list field. While it works it would cause a long term maintenance problem as we struggle to keep these types in sync and doesn’t create a particularly nice graph.

Instead we’re going to play with the validation rules feature of Hot Chocolate. We’ll create a rule that the customer field can only be included in a query where none of its parent fields are a list. This would ensure the latter query is valid while the former is invalid.

One really nice feature about validation rules are that they’re done during the parsing of the query document. This means the validation result can be cached along with the parsing result. This way if you’re sending constant queries (where only the variables are different) then the validation rule won’t be executed that often.

Creating the Directive

First we need to create a directive, these function in much the same way as C# attributes, a way to attach metadata to our GraphQL schema that other code can introspect and make use of. In this case we’ll define an includeOnce.

Creating the validation rule

The implementation of the validation rule can be quite complex, the code below illustrates the concept but doesn’t cover some of the more complex cases (such as mutations). For a more fully featured example I suggest the MaxComplexityRule and MaxComplexityVisitor from the core repository.

What we need to do is create a QuerySyntaxWalker that supports the traversal of the AST (abstract syntax tree), if you’ve done anything with Roslyn some of this will look familiar.

It can look complicated, but we’re visiting each field within the query and maintaining some context from the parent fields we’ve visited. If we find a field whose definition has the includeOnce directive then we examine the list of parent fields, if any are a list type then we report an error.

publicclassIncludeOnceVisitor:QuerySyntaxWalker<IncludeOnceContext>{protectedoverrideboolVisitFragmentDefinitions=>false;protectedoverridevoidVisitField(FieldNodenode,IncludeOnceContextcontext){varnewContext=context;if(context.TypeContextisIComplexOutputTypetype&&type.Fields.TryGetField(node.Name.Value,outIOutputFieldfieldDefinition)){newContext=newContext.WithField(fieldDefinition);if(fieldDefinition.Type.NamedType()isIComplexOutputTypect){newContext=newContext.SetTypeContext(ct);if(fieldDefinition.Directives.Contains("includeOnce")){varincludedMoreThanOnce=context.FieldPath.Any(f=>f.Type.IsListType());if(includedMoreThanOnce){newContext.ReportError(newValidationError($"The field {node.Name} can only be includes once in a query",node));}}}base.VisitField(node,newContext);}}}publicclassIncludeOnceContext{publicIncludeOnceContext(ISchemaschema,Action<IError>reportError){Schema=schema;FieldPath=newList<IOutputField>();ReportError=reportError;}protectedIncludeOnceContext(ISchemaschema,IEnumerable<IOutputField>fieldPath,Action<IError>reportError){Schema=schema;FieldPath=fieldPath.ToList();ReportError=reportError;}publicAction<IError>ReportError{get;}publicIList<IOutputField>FieldPath{get;}publicISchemaSchema{get;}publicINamedOutputTypeTypeContext{get;protectedset;}publicIncludeOnceContextWithField(IOutputFieldfield){returnnewIncludeOnceContext(Schema,FieldPath.Concat(new[]{field}),ReportError);}publicIncludeOnceContextSetTypeContext(INamedOutputTypetypeContext){returnnewIncludeOnceContext(Schema,FieldPath,ReportError){TypeContext=typeContext};}}

After all this, the validation rule itself is pretty straightforward. Create a visitor, explore the document, then report if there are any errors.