2018-05-21T14:41:21-04:00http://rmosolgo.github.io/Octopress2018-05-21T14:11:00-04:00http://rmosolgo.github.io/blog/2018/05/21/how-ripper-parses-variablesRuby has a few different kinds of variables, and Ripper expresses them with a few different nodes.

:vcall

12

# a[:vcall,[:@ident,"a",[1,0]]]]

A :vcall is a bareword, either a local variable lookup or a method call on self. Used alone, this can only be determined at runtime, depending on the binding. If there’s a local variable, it will be used. My guess is that :vcall is short for “variable/call”

Interestingly, there is a single-expression case which could be disambiguated statically, but Ripper still uses :vcall:

123456

# a b[:command,[:@ident,"a",[1,0]],[:args_add_block,[:args_add,[:args_new],[:vcall,[:@ident,"b",[1,2]]]],false]]]]

:var_ref

:var_ref (presumably “variable reference”) is shared by many of these examples, and can always be resolved to a variable lookup, never a method call.
Its argument tells what kind of lookup to do (global, constant, instance, class), and what name to look up.

Method calls

Some Ruby can be statically known to be a method call, not a variable lookup:

In these cases, :fcall, :call and :command are used to represent definite method sends.

Interestingly, :var_ref is used for self, too.

]]>http://rmosolgo.github.io/blog/2018/05/21/how-ripper-parses-variables/2018-04-09T09:52:00-04:00http://rmosolgo.github.io/blog/2018/04/09/updating-github-to-graphql-1-dot-8-0GraphQL 1.8.0 was designed and built largely as a part of my work at GitHub. Besides designing the new Schema definition API, I migrated our codebase to use it. Here are some field notes from my migration.

If you want to know more about the motivations behind this work, check out this previous post.

Below, I’ll cover:

The Process: in general, how I went about migrating our code

The Upgrader: how to run it and roughly how it’s organized

Custom Transforms: extensions I made for the upgrader to work on GitHub-specific code

Fixes By Hand: bits of code that needed more work (some of these could be automated, but aren’t yet!)

Porting Relay Types: using the class-based API for connections and edges

Migrating DSL extensions: how to support custom GraphQL extension in the new API

The Process

GitHub’s type definitions are separated into folders by type, for example: objects/, unions/, enums/ (and mutations/). I worked through them one folder at a time. The objects/ folder was big, so I did it twenty or thirty files at a time.

I had to do interfaces/ last because of the nature of the new class-based schema. Interfaces modules’ methods can’t be added to legacy-style GraphQL object types. So, by doing interfaces last, I didn’t have to worry about this compatibility issue.

Now that I remember it, I did the schema first, and by hand. It was a pretty easy upgrade.

When I started each section, I created a base class by hand. (There is some automated support for this, but I didn’t use it.) Then, I ran the upgrader on some files and tried to run the test suite. There were usually two kinds of errors:

Parse- or load-time errors which prevented the app from booting

Runtime errors which resulted in unexpected behavior or raised errors

More on these errors below.

After upgrading a section of the schema, I opened a PR for review from the team. This was crucial: since I was working at such a large scale, it was easy for me to miss the trees for the forest. My teammates caught a lot of things during the process!

After a review, the PR would be merged into master. Since GraphQL 1.8.0 supports incremental migration, I could work through the code in chunks without a long running branch or feature flags.

About the Upgrader

Here’s an overview of how the upgrader works. After reading the overview, if you want some specific examples, check out the source code.

Running The Upgrader

The gem includes an auto-upgrader, spearheaded by the folks at HackerOne and refined during my use of it. It’s encapsulated in a class, GraphQL::Upgrader::Member.

To use the upgrader, I added a Ruby script to the code base called graphql-update.rb:

# Usage:# ruby graphql-update.rb path/to/type_definition.rb## Example:# # Upgrade `BlameRange`# ruby graphql-update.rb lib/platform/objects/blame_range.rb## # Upgrade based on a pattern (use quotes)# ruby graphql-update.rb "lib/platform/objects/blob_\*.rb"## # Upgrade one more file in this pattern (use quotes)# ruby graphql-update.rb 1 "lib/platform/objects/**.rb"# Load the upgrader from local code, for easier trial-and-error development# require "~/code/graphql-ruby/lib/graphql/upgrader/member"# Load the upgrader from the Gem:require"graphql/upgrader/member"# Accept two arguments: next_files (optional), file_pattern (required)file_pattern=ARGV[0]iffile_pattern=~/\d+/next_files=file_pattern.to_inext_files_pattern=ARGV[1]"Upgrading #{next_files} more files in #{next_files_pattern}"filenames=Dir.glob(next_files_pattern)elsefilenames=Dir.glob(file_pattern)next_files=nilputs"Upgrading #{filenames.join(", ")}"end# ...# Lots of custom rules here, see below# ...CUSTOM_TRANSFORMS={type_transforms:type_transforms,field_transforms:field_transforms,clean_up_transforms:clean_up_transforms,skip:CustomSkip,}upgraded=[]filenames.eachdo|filename|puts"Begin (#{filename})"# Read the file into a stringoriginal_text=File.read(filename)# Create an Upgrader with the set of custom transformsGraphQL::Upgrader::Member.new(original_text,**CUSTOM_TRANSFORMS)# Generate updated texttransformed_text=upgrader.upgradeiftransformed_text==original_text# No upgrade was performedelse# If the upgrade was successful, update the source fileFile.write(filename,transformed_text)upgraded<<filenameendputs"Done (#{filename})"ifnext_files&&upgraded.size>=next_files# We've upgraded as many as we said we wouldbreakendendputs"Upgraded #{upgraded.size} files: \n#{upgraded.join("\n")}"

This script has two basic parts:

Using GraphQL::Upgrader::Member with a set of custom transformations

Supporting code: accepting input, counting files, logging, etc

In your own script, you can write whatever supporting code you want. The key part from GraphQL-Ruby is:

1234

# Create an Upgrader with the set of custom transformsGraphQL::Upgrader::Member.new(original_text,**CUSTOM_TRANSFORMS)# Generate updated texttransformed_text=upgrader.upgrade

The Pipeline

The upgrader is structured as a pipeline: each step accepts a big string of input and returns a big string of output. Sometimes, a step does nothing and so its returned string is the same as the input string. In general, the transforms consist of two steps:

Check whether the transform applies to the given input

If it does, copy the string and apply a find-and-replace to it (sometimes using RegExp, other times using the excellent parser gem.)

You have a few options for customizing the transformation pipeline:

Write new transforms and add them to the pipeline

Remove transforms from the pipeline

Re-use the built-in transforms, but give them different parameters, then replace the built-in one with your custom instance

(The “pipeline” is just an array of instances or subclasses of GraphQL::Upgrader::Transform.)

field_transforms are run second, but they receive parts of the type definition. They receive calls to field, connection, return_field, input_field, and argument. Fine-grained changes to field definition or argument definition go here.

clean_up_transforms are run last, on the entire file. For example, there’s a built-in RemoveExcessWhitespaceTransform which cleans up trailing spaces after other transforms have run.

skip: has a special function: its #skip?(input) method is called and if it returns true, the text is not transformed at all. This allows the transformer to be idempotent: by default, if you run it on the same file over and over, it will update the file only once.

Custom Transforms

Here are some custom transforms applied to our codebase.

Handle a custom type-definition DSL

We had a wrapper around ObjectType.define which attached metadata, linking the object type to a specific Rails model. The helper was called define_active_record_type. I wanted to take this:

Also, for this to work, I added the def self.model_name(name) helper to the base class.

Renaming a Custom Field Method

We have a helper for adding URL fields called define_url_field. I decided to rename this to url_fields, since these days it creates two fields.

The arguments are the same, so it was a simple substitution:

1234567

classUrlFieldTransform<GraphQL::Upgrader::Transformdefapply(input_text)# Capture the leading whitespace and the rest of the line,# then insert the new name where the old name used to beinput_text.gsub(/^( +)define_url_field( |\()/,"\\1url_fields\\2")endend

This transform didn’t interact with any other transforms, so I added it to clean_up_transforms, so it would run last:

1234

# Make a copy of the built-in arryclean_up_transforms=GraphQL::Upgrader::Member::DEFAULT_CLEAN_UP_TRANSFORMS.dup# Add my custom transform to the end of the arrayclean_up_transforms.push(UrlFieldTransform)

Moving DSL methods to keywords

We have a few DSL methods that, at the time, were easier to implement as keyword arguments. (Since then, the API has changed a bit. You can implement DSL methods on your fields by extending GraphQL::Schema::Field and setting that class as field_class on your base Object, Interface and Mutation classes.)

I wanted to transform:

123

field:secretStuff,types.Stringdovisibility:secretend

To:

1

field:secretStuff,types.String,visibility::secret

(Later, a built-in upgrader would change secretStuff to secret_stuff and types.String to String, null: true.)

To accomplish this, I reused a built-in transform, ConfigurationToKwargTransform, adding it to field_transforms:

1234

# Make a copy of the built-in list of defaultsfield_transforms=GraphQL::Upgrader::Member::DEFAULT_FIELD_TRANSFORMS.dup# Put my custom transform at the beginning of the listfield_transforms.unshift(GraphQL::Upgrader::ConfigurationToKwargTransform.new(kwarg:"visibility"))

In fact, there were several configuration methods moved this way.

Custom Skip

As I was working through the code, some files were tougher than others. So, I decided to skip them. I decided that a magic comment:

1

# @skip-auto-upgrade

would cause a file to be skipped. To implement this, I made a custom skip class:

And passed it as skip: to the upgrader. Then, later, I removed the comment and tried again. (Fortunately, my procrastination paid off because the upgrader was improved in the meantime!)

Fixes by Hand

As I worked, I improved the upgrader to cover as many cases as I could, but there are still a few cases that I had to upgrade by hand. I’ll list them here. If you’re really dragged down by them, consider opening an issue on GraphQL-Ruby to talk about fixing them. I’m sure they can be fixed, I just didn’t get to it!

If you want to fix one of these issues, try to replicate the issue by adding to an example spec/fixtures/upgrader and then getting a failing test. Then, you could update the upgrader code to fix that broken test.

Accessing Arguments By Method

Arguments could be accessed by method to avoid typos. However, now, since arguments are a Ruby keyword hash, they don’t have methods corresponding to their keys.

Unfortunately, the upgrader doesn’t do anything about this, it just leaves them there and you get a NoMethodError on Hash.

This could almost certainly be fixed by improving this find-and-replace in ResolveProcToMethodTransform:

(If you don’t have to return a value, use raise instead, then you can stop reading this part!)

The problem is that @context is not a field-specific context anymore. Instead, it’s the query-level context. (This is downside of the new API: we don’t have a great way to pass in the field context anymore.)

To address this kind of issues, field accepts a keyword called extras:, which contains a array of symbols. In the case above, we could use :execution_errors:

So, execution_errors was injected into the field as a keyword. It is field-level, so adding errors there works as before.

Other extras are :irep_node, :parent, :ast_node, and :arguments. It’s a bit of a hack, but we need something for this!

Accessing Connection Arguments

By default, connection arguments (like first, after, last, before) are not passed to the Ruby methods for implementing fields. This is because they’re generally used by the automagical (😖) connection wrappers, not the resolve functions.

But, sometimes you just need those old arguments!

If you use extras: [:arguments], the legacy-style arguments will be injected as a keyword:

Fancy String Descriptions

The upgrader does fine when the description is a "..." or '...' string. But in other cases, it was a bit wacky.

Strings built up with + or \ always broke. I had to go back by hand and join them into one string.

Heredoc strings often worked, but only by chance. For example:

12345

field:stuff,types.Intdodescription<<~MDHere'sthestuffMDend

Would be transformed to:

123

field:stuff,Integer,description:<<~MD,null:trueHere'sthestuffMD

This is valid Ruby, but a bit tricky. This could definitely be improved: since I started my project, GraphQL 1.8 was extended to support description as a method as well as a keyword. So, the upgrader could be improved to leave descriptions in place if they’re fancy strings.

Removed Comments From the Start of Resolve Proc

I hacked around with the parser gem to transform resolve procs into instance methods, but there’s a bug. A proc like this:

1234

resolve->(obj,args,ctx){# Do stuffobj.do_stuff{stuff}}

Will be transformed to:

123

defstuff@object.do_stuff{stuff}end

Did you see how the comment was removed? I think I’ve somehow wrongly detected the start of the proc body, so that the comment was left out.

In my case, I re-added those comments by hand. But it could probably be fixed in GraphQL::Upgrader::ResolveProcToMethodTransform.

Hash Reformating?

I’m not sure why, but sometimes a hash of arguments like:

123456

obj.do_stuff(a:1,b:2,c:3,d:4,)

would be reorganized to

1234

obj.do_stuff(a:1,b:2,c:3,d:4,)

I have no idea why, and I didn’t look into it, I just fixed it by hand.

Issues with Connection DSL

We have a DSL for making connections, like:

1

Connections.define(Objects::Issue)

Sometimes, when this connection was inside a proc, it would be wrongly transformed to:

1

field:issues,Connections.define(Objects::Issue)},,null:true

This was invalid Ruby, so the app wouldn’t boot, and I would fix it by hand.

Porting Relay Types

Generating connection and edge types with the .connection_type/.define_connection and .edge_type/.define_edge methods will work fine with the new API, but if you want to migrate them to classes, you can do it.

modulePlatformmoduleConnectionsclassBase<Platform::Objects::Base# For some reason, these are needed, they call through to the underlying connection wrapper.extendForwardabledef_delegators:@object,:cursor_from_node,:parent# When this class is extended, add the default connection behaviors.# This adds a new `graphql_name` and description, and searches# for a corresponding edge type.# See `.edge_type` for how the fields are added.defself.inherited(child_class)# We have a convention that connection classes _don't_ end in `Connection`, which# is a bit confusing and results in naming conflicts.# To avoid a GraphQL conflict, override `graphql_name` to end in `Connection`.type_name=child_class.name.split("::").lastchild_class.graphql_name("#{type_name}Connection")# Use `require_dependency` so that the types will be loaded, if they exist.# Otherwise, `const_get` may reach a top-level constant (eg, `::Issue` model instead of `Platform::Objects::Issue`).# That behavior is removed in Ruby 2.5, then we can remove these require_dependency calls too.begin# Look for a custom edge whose name matches this connection's namerequire_dependency"lib/platform/edges/#{type_name.underscore}"wrapped_edge_class=Platform::Edges.const_get(type_name)wrapped_node_class=wrapped_edge_class.fields["node"].typerescueLoadError=>err# If the custom edge file doesn't exist, look for an objectbeginrequire_dependency"lib/platform/objects/#{type_name.underscore}"wrapped_node_class=Platform::Objects.const_get(type_name)wrapped_edge_class=wrapped_node_class.edge_typerescueLoadError=>err# Assume that `edge_type` will be called laterendend# If a default could be found using constant lookups, generate the fields for it.ifwrapped_edge_classifwrapped_edge_class.is_a?(GraphQL::ObjectType)||(wrapped_edge_class.is_a?(Class)&&wrapped_edge_class<Platform::Edges::Base)child_class.edge_type(wrapped_edge_class,node_type:wrapped_node_class)elseraiseTypeError,"Missed edge type lookup, didn't find a type definition: #{type_name.inspect} => #{wrapped_edge_class.inspect}"endendend# Configure this connection to return `edges` and `nodes` based on `edge_type_class`.## This method will use the inputs to create:# - `edges` field# - `nodes` field# - description## It's called when you subclass this base connection, trying to use the# class name to set defaults. You can call it again in the class definition# to override the default (or provide a value, if the default lookup failed).defself.edge_type(edge_type_class,edge_class:GraphQL::Relay::Edge,node_type:nil)# Add the edges field, can be overridden laterfield:edges,[edge_type_class,null:true],null:true,description:"A list of edges.",method::edge_nodes,edge_class:edge_class# Try to figure out what the node type is, if it wasn't provided:ifnode_type.nil?ifedge_type_class.is_a?(Class)node_type=edge_type_class.fields["node"].typeelsifedge_type_class.is_a?(GraphQL::ObjectType)# This was created with `.edge_type`node_type=Platform::Objects.const_get(edge_type_class.name.sub("Edge",""))elseraiseArgumentError,"Can't get node type from edge type: #{edge_type_class}"endend# If it's a non-null type, remove the wrapperifnode_type.respond_to?(:of_type)node_type=node_type.of_typeend# Make the `nodes` shortcut field, which can be overridden laterfield:nodes,[node_type,null:true],null:true,description:"A list of nodes."# Make a nice descriptiondescription("The connection type for #{node_type.graphql_name}.")endfield:page_info,GraphQL::Relay::PageInfo,null:false,description:"Information to aid in pagination."# By default this calls through to the ConnectionWrapper's edge nodes method,# but sometimes you need to override it to support the `nodes` fielddefnodes@object.edge_nodesendendendend

Base edge class:

123456789101112131415161718192021222324

modulePlatformmoduleEdgesclassBase<Platform::Objects::Base# A description which is inherited and may be overriddendescription"An edge in a connection."defself.inherited(child_class)# We have a convention that edge classes _don't_ end in `Edge`,# which is a little bit confusing, and would result in a naming conflict by default.# Avoid the naming conflict by overriding `graphql_name` to include `Edge`wrapped_type_name=child_class.name.split("::").lastchild_class.graphql_name("#{wrapped_type_name}Edge")# Add a default `node` field, assuming the object type name matches.# If it doesn't match, you can override this in subclasseschild_class.field:node,"Platform::Objects::#{wrapped_type_name}",null:true,description:"The item at the end of the edge."end# A cursor field which is inheritedfield:cursor,String,null:false,description:"A cursor for use in pagination."endendend

Migrating DSL Extensions

We have several extensions to the GraphQL-Ruby .define DSL, for example, visibility controls who can see certain types and fields and scopes maps OAuth scopes to GraphQL types.

The difficulty in porting extensions comes from the implementation details of the new API. For now, definition classes are factories for legacy-style type instances. Each class has a .to_graphql method which is called once to return a legacy-style definition. To maintain compatibility, you have to either:

Modify the derived legacy-style definition to reflect configurations on the class-based definition; OR

Update your runtime code to stop checking for configurations on the legacy-style definition and start checking for configurations on the class-based definition.

Eventually, legacy-style definitions will be phased out of GraphQL-Ruby, but for now, they both exist in this way in order to maintain backwards compatibility and gradual adoptability.

In the mean time, you can go between class-based and legacy-style definitions using .graphql_defintion and .metadata[:type_class], for example:

classBaseObject<GraphQL::Schema::Object# Add a configuration methoddefself.visibility(level)@visibility=levelend# Re-apply the configurationdefself.to_graphqltype_defn=super# Call through to the old extension:type_defn=type_defn.redefine(visibilty:@visibility)# Return the redefined type:type_defnendend# Then, use it in type definitions:classPost<BaseObjectvisibility(:secret)end

The Hard Way: .metadata[:type_class]

An approach I haven’t tried yet, but I will soon, is to move the “source of truth” to the the class-based definition. The challenge here is that class-based definitions are not really used during validation and execution, so how can you reach configuration values on those classes?

The answer is that if a legacy-style type was derived from a class, that class is stored as metadata[:type_class]. For example:

12345

classProject<BaseObject# ...endlegacy_defn=Project.graphql_definition# Instance of GraphQL::ObjectType, just like `.define`legacy_defn.metadata[:type_class]# `Project` class from above

So, you could update runtime code to read configurations from type_defn.metadata[:type_class].

Importantly, metadata[:type_class] will be nil if the type wasn’t derived from a class, so this approach is tough to use if some definitions are still using the .define API.

I haven’t implemented this yet, but I will be doing it in the next few weeks so we can simplify our extensions and improve boot time.

The End

I’m still wrapping up some loose ends in the codebase, but I thought I’d share these notes in case they help you in your upgrade. If you run into trouble on anything mentioned here, please open an issue on GraphQL-Ruby! I really want to support a smooth transition to this new API.

]]>http://rmosolgo.github.io/blog/2018/04/09/updating-github-to-graphql-1-dot-8-0/2018-03-25T13:59:00-04:00http://rmosolgo.github.io/blog/2018/03/25/why-a-new-schema-definition-apiGraphQL-Ruby 1.8.0 will have a new class-based API for defining your schema. Let’s investigate the design choices in the new API.

The new API is backwards-compatible and can coexist with type definitions in the old format. See the docs for details. 1.8.0.pre versions are available on RubyGems now and are very stable – that’s what we’re running at GitHub!

Problems Worth Fixing

Since starting at GitHub last May, I’ve entered into the experience of a huge-scale GraphQL system. Huge scale in lots of ways: huge schema, huge volume, and huge developer base. One of the problems that stood out to me (and to lots of us) was that GraphQL-Ruby simply didn’t help us be productive. Elements of schema definition hindered us rather than helped us.

So, our team set out on remaking the GraphQL-Ruby schema definition API. We wanted to address a few specific issues:

Familiarity. GraphQL-Ruby’s schema definition API reflected GraphQL and JavaScript more than it reflected Ruby. (The JavaScript influence comes from graphql-js, the reference implementation.) Ruby developers couldn’t bring their usual practices into schema development; instead, they had to learn a bunch of new APIs and figure out how to work them together.

Rails Compatibility, especially constant loading. A good API would work seamlessly with Rails development configurations, but the current API has some gotchas regarding circular dependencies and reloading.

Hackability. Library code is fine until it isn’t, and one of the best (and worst) things about Ruby is that all code is open to extension (or monkey-patching 🙈). At best, this means that library users can customize the library code in straightforward ways to better suit their use cases. However, GraphQL-Ruby didn’t support this well: to support special use cases, customizations had to be hacked in in odd ways that were hard to maintain and prone to breaking during gem updates.

Besides all that, we needed a safe transition, so it had to support a gradual adoption.

After trying a few different possibilities, the team decided to take a class-based approach to defining GraphQL schemas. I’m really thankful for their support in the design process, and I’m indebted to the folks at Shopify, who used a class-based schema definition system from the start (as a layer on top of GraphQL-Ruby) and presented their work early on.

The new API, from 10,000 feet

In short, GraphQL types used to be singleton instances, built with a block-based API:

123

Types::Post=GraphQL::ObjectType.define{# ...}

Now, GraphQL types are classes, with a DSL implemented as class methods:

More Familiarity

First, using classes reduces the “WTF” factor of GraphQL definition code. A seasoned Ruby developer might (rightly) smell foul play and reject GraphQL-Ruby on principle. (I was not seasoned enough to detect this when I designed the API!)

Proc literals are rare in Ruby, but common in GraphQL-Ruby’s .define { ... } API. Their lexical scoping rules are different than method scoping rules, making it hard to remember what was and wasn’t in scope during field resolution (for example, what was self?). To make matters worse, some of the blocks in the .define API were instance_eval’d, so their self would be overridden. Practically, this meant that typos in development resulted in strange NoMethodErrors.

Proc literals also have performance downsides: they’re not optimized by CRuby, so they’re slower than method calls. Since they capture a lexical scope, they may also have unexpected impacts on memory footprint (any local variable may be retained, since it might be accessed by the proc). The solutions here are simple: just use methods, the way Ruby wants you to! 😬

In the new class-based API, there are no proc literals (although they’re supported for compatibility’s sake). There are some instance_eval’d blocks (field(...) { }, for example), but field resolution is just an instance method and the type definition is a normal class, so module scoping works normally. (Contrast that with the constant assignment in Types::Post = GraphQL::ObjectType.define { ... }, where no module scope is used). Several hooks that were previously specified as procs are now class methods, such as resolve_type and coerce_input (for scalars).

Overriding ! is another particular no-no I’m correcting. At the time, I thought, “what a cool way to bring a GraphQL concept into Ruby!” This is because GraphQL non-null types are expressed with !:

12

# This field always returns a User, never `null`author:User!

So, why not express the concept with Ruby’s ! method (which is usually used for negation)?

1

field:author,!User

As it turns out, there are several good reasons for why not!

Overriding ! breaks the negation operator. ActiveSupport’s .present? didn’t work with type objects, because ! didn’t return false, it returned a non-null type.

Overriding the ! operator throws people off. When a newcomer sees GraphQL-Ruby sample code, they have a WTF moment, followed by the dreadful memory (or discovery) that Ruby allows you to override !.

There’s very little value in importing GraphQL concepts into Ruby. GraphQL-Ruby developers are generally seasoned Ruby developers who are just learning GraphQL, so they don’t gain anything by the similarity to GraphQL.

So, overriding ! didn’t deliver any value, but it did present a roadblock to developers and break some really essential code.

In the new API, nullability is expressed with the options null: and required: instead of with !. (But, you can re-activate that override for compatibility while you transition to the new API.)

By switching to Ruby’s happy path of classes and methods, we can help Ruby developers feel more at home in GraphQL definitions. Additionally, we avoid some unfamiliar gotchas of procs and clear a path for removing the ! override.

Rails Compatibility

Rails’ automatic constant loading is wonderful … until it’s not! GraphQL-Ruby didn’t play well with Rails’ constant loading especially when it came to cyclical dependencies, and here’s why.

Imagine a typical .define-style type definition, like this:

1

Types::T=GraphQL::ObjectType.define{...}

We’re assigning the constant Types::T to the return value of .define { ... }. Consequently, the constant is not defined until.define returns.

If T1 depends on T2, andT2 depends on T1, how can this work? (For example, imagine a Post type whose author field returns a User, and a User type whose posts field returns a list of Posts. This kind of cyclical dependency is common!) GraphQL-Ruby’s solution was to adopt a JavaScriptism, a thunk. (Technically, I guess it’s a functional programming-ism, but I got it from graphql-js.) A thunk is an anonymous function used to defer the resolution of a value. For example, if we have code like this:

Notice that Post depends on User, and User depends on Post. The difference is how these lines are evaluated, and when the constants become defined. Here’s the same code, with numbering to indicate the order that lines are evaluated:

Since Types::Post is initialized first, then built-up by the following lines of code, it’s available to Types::User in the case of a circular dependency. As a result, the thunk is not necessary.

This approach isn’t a silver bullet – Types::Post is not fully initialized by the time Types::User needs it – but it reduces visual friction and generally plays nice with Rails out of the box.

Hackability

I’ve used a naughty word here, but in fact, I’m talking about something very good. Have you ever been stuck with some dependency that didn’t quite fit your application? (Or, maybe you were stuck on an old version, or your app needed a new feature that wasn’t quite supported by the library.) Like it or not, sometimes the only way forward in a case like that is to hack it: reopen classes, redefine methods, mess with the inheritance chain, etc. Yes, those choices come with maintenance downsides, but sometimes they’re really the best way forward.

On the other hand, really flexible libraries are ready for you to come and extend them. For example, they might provide base classes for you to extend, with the assumption that you’ll override and implement certain methods. In that case, the same hacking techniques listed above have found their time to shine.

ActiveRecord::Base is a great example of both cases: plenty of libraries hack methods right into the built-in class (for example, acts_as_{whatever}), and also, lots of Rails apps use an ApplicationRecord class for their application-specific customizations.

Since GraphQL-Ruby didn’t use the familiar arrangement of classes and methods, it was closed to this kind of extension. (Ok, you could do it, but it was a lot of work! And who wants to do that!?) In place of this, GraphQL-Ruby had yet-another-API for extending its DSL. Yet another thing to learn, with more Proc literals 😪.

Using classes simplifies this process because you can use familiar Ruby techniques to build your GraphQL schema. For example, if you want to share code between field resolvers, you can include a module and call its methods. If you want to make shorthands for common cases in your app, you can use your Base type classes. If you want to add special configuration to your types, you can use class methods. And, whenever that day should come, when you need to monkey-patch GraphQL-Ruby internals, I hope you’ll be able to find the right spot to do it!

Stay Classy

GraphQL-Ruby is three years old now, and I’ve learned a LOT during that time! I’m really thankful for the opportunity to focus on developer productivity in the last few months, learning how I’ve prevented it and working on ways to improve it. I hope to keep working on topics like this – how to make GraphQL more productive for Ruby developers – in the next year, especially, so if you have feedback on this new API, please open an issue to share it!

I’m excited to see how this new API changes the way people think about GraphQL in Ruby, and I hope it will foster more creativity and stability.

]]>http://rmosolgo.github.io/blog/2018/03/25/why-a-new-schema-definition-api/2017-10-06T09:00:00-04:00http://rmosolgo.github.io/blog/2017/10/06/ruby-type-checking-roundupThis fall, several people presented their work on Ruby type checkers. So let’s take a look: what’s the big deal, and what have they been up to?

Why Type Check?

Part of Ruby’s appeal is to be free of the cruft of its predecessors. So why is there so much interest in adding types to Ruby?

Large, sprawling projects are becoming more common. At Ruby’s inception, there were no 10-year-old Rails apps which people struggled to maintain, only greenfield Ruby scripts for toy projects.

Programmers have experienced excellent type systems in other languages, and want those benefits in Ruby.

Optional, gradual type systems have been introduced to Python and JavaScript and they’re big successes.

What are the benefits?

Correctness: Type checking, like testing, is a way to be confident that your codebase is functioning properly. Employing a type checker can help you find bugs during development and prevent those bugs from going to production.

Confidence: Since an incorrect program won’t pass type checking, developers can refactor with more confidence. Common errors such as typos and argument errors can be caught by the type checker.

Design: The type system gives you a way to think about the program. Specifically, types document and define the boundaries between parts of code, like methods, classes and modules.

To experience a great type system in a Ruby-like language, I recommend Crystal.

In this approach, type information is gathered while the program runs, but the typecheck is deferred until the method is called. At that point, RDL checks the source code (static information) using the runtime data (dynamic information). For this reason, RDL is called “Just-in-Time Static Type Checking.”

Valentin Fondaratov, RubyKaigi 2017

Valentin works at JetBrains (creators of RubyMine) and presented his work on type-checking based on runtime data. His presentation, Automated Type Contracts Generation for Ruby, was really fascinating and offered a promising glimpse of what a Ruby type ecosystem could be.

RubyMine uses this to support autocomplete, error prediction, and rename refactorings

He also pointed out that even code coverage is not enough: 100% code coverage does not guarantee that all possible codepaths were run. For example, any composition of if branches require a cross-product of codepaths, not only that each line is executed once. Besides that, code coverage does not analyze the coverage of your dependencies’ code (ie, RubyGems).

So, Valentin suggests getting more from our unit tests: what if we observed the running program, and kept notes about what values were passed around and how they were used? In this arrangement, that runtime data could be accumulated, then used for type checking.

Impressively, he introduced the implementation of this, first using a TracePoint, then digging into the Ruby VM to get even more granular data.

However, the gathered data can be very complicated. For example, how can we understand the input type of String#split?

Summary

There’s a lot of technically-savvy and academically-informed work on type checking Ruby! Many of the techniques preserve Ruby’s productivity and dynamism while improving the developer experience and confidence. What makes them unique is their use of runtime data, to observe the program in action, then make assertions about the source code.

Webpacker support

Webpacker was great to work with. react-rails now supports webpacker for:

Mounting components with <%= react_component(...) %> via require

Server rendering from a webpacker pack (server_rendering.js)

Loading the unobtrusive JavaScript (UJS)

Installation and component generators

A nice advantage of using webpacker is that you can load React.js from NPM instead of the react-rails gem. This way, you aren’t bound to the React.js version which is included with the Ruby gem. You can pick any version you want!

UJS on npm

To support frontends built with Node.js, react-rails’s UJS driver is available on NPM as react_ujs. It performs setup during require, so these two are equal:

12345

// Sprockets://= require react_ujs// Node, etc:require("react_ujs")

Request-based prerender context

If you’re prerendering your React components on the server, you can perform setup and teardown in your Rails controller. For example, you might use these hooks to populate a flux store.

Other Takeaways

See the changelog for bug fixes and a new default server rendering configuration.

Webpacker is great! Setup was smooth and the APIs were clear and convenient. I’m looking forward to using it more.

🍻 Here’s to another major version of react-rails!

]]>http://rmosolgo.github.io/blog/2017/04/13/whats-new-in-react-rails-2-dot-0/2017-04-12T14:09:00-04:00http://rmosolgo.github.io/blog/2017/04/12/watching-files-during-rails-developmentYou can tell Ruby on Rails to respond to changes in certain files during development.

Rails knows to watch config/routes.rb for changes and reload them when the files change. You can use the same mechanism to watch other files and take action when they change.

All Together Now

react-rails maintains a pool of V8 instances for server rendering React components. These instances are initialized with a bunch of JavaScript code, and whenever a developer changes a JavaScript file, we need to reload them with the new code. This requires two steps:

Adding a new watcher to app.reloaders to detect changes to JavaScript files

]]>http://rmosolgo.github.io/blog/2017/04/12/watching-files-during-rails-development/2017-03-17T15:49:00-04:00http://rmosolgo.github.io/blog/2017/03/17/prototyping-a-graphql-schema-from-definition-with-rubyGraphQL 1.5.0 includes a new way to define a schema: from a GraphQL definition.

In fact, loading a schema this way has been supported for while, but 1.5.0 adds the ability to specify field resolution behavior.

GraphQL IDL

Besides queries, GraphQL has an interface definition language (IDL) for expressing a schema’s structure. For example:

But you can also reduce a lot of boilerplate by using a hash with default values:

123456789101112

# This hash will fall back to default implementation if another value isn't provided:type_hash=Hash.newdo|h,type_name|# Each type gets a hash of fields:h[type_name]=Hash.newdo|h2,field_name|# Default resolve behavior is `obj.public_send(field_name, args, ctx)`h2[field_name]=->(obj,args,ctx){obj.public_send(field_name,args,ctx)}endendtype_hash["Query"]["post"]=->(obj,args,ctx){Post.find(args[:id])}schema=GraphQL::Schema.from_definition(schema_defn,default_resolve:type_hash)

Isn’t that a nice way to set up a simple schema?

Resolving with a Single Function

You can provide a single callable that responds to #call(type, field, obj, args, ctx). What a mouthful!

The advantage of that hefty method signature is that it’s enough to specify any resolution behavior you can imagine. For example, you could create a system where type modules were found by name, then methods were called by name:

1234567891011

moduleExecuteGraphQLByConventionmodule_function# Find a Ruby module corresponding to `type`,# then call its method corresponding to `field`.defcall(type,field,obj,args,ctx)type_module=Object.const_get(type.name)type_module.public_send(field.name,obj,args,ctx)endendschema=GraphQL::Schema.from_definition(schema_defn,default_resolve:ExecuteGraphQLByConvention)

So, a single function combined with Ruby’s flexibility and power opens a lot of doors!

# Extend the schema with new definitions:schema=schema.redefine{resolve_type->(obj,ctx){...}monitoring:appsignal}

What’s Next?

Rails has proven that “Convention over Configuration” can be a very productive way to start new projects, so I’m interested in exploring convention-based APIs on top of this feature.

In the future, I’d like to add support for schema annotations in the form of directives, for example:

123

typePost{comments:[Comment!]@relation(hasMany:"comments")}

These could be used to customize resolution behavior. Cool!

]]>http://rmosolgo.github.io/blog/2017/03/17/prototyping-a-graphql-schema-from-definition-with-ruby/2017-03-16T20:16:00-04:00http://rmosolgo.github.io/blog/2017/03/16/tracking-schema-changes-with-graphql-rubyOne way to keep an eye on your GraphQL schema is to check the definition into source control.

When modifying shared code or reconfiguring, it can be hard to tell how the schema will really change. To help with this, set up a snapshot test for your GraphQL schema! This way:

Check It In

Write a Rake task to get your schema’s definition and write it to a file:

12345678910

# lib/tasks/graphql.rakerakedump_schema::environmentdo# Get a string containing the definition in GraphQL IDL:schema_defn=MyAppSchema.to_definition# Choose a place to write the schema dump:schema_path="app/graphql/schema.graphql"# Write the schema dump to that file:File.write(Rails.root.join(schema_path),schema_defn)puts"Updated #{schema_path}"end

]]>http://rmosolgo.github.io/blog/2017/03/16/tracking-schema-changes-with-graphql-ruby/2017-03-08T08:02:00-05:00http://rmosolgo.github.io/blog/2017/03/08/optimizing-graphql-rubySoon, graphql-ruby 1.5.0 will be released. Query execution will be ~70% faster than 1.3.0!

Let’s look at how we reduced the execution time between those two versions. Thanks to @theorygeek who optimized the middleware chain helped me pinpoint several other bottlenecks!

~5% of time was spent during ~7k calls to Class#new: this is time spent initializing new objects. I think initialization can also trigger garbage collection (if there’s not a spot on the free list), so this may include GC time.

~4% of time was spent during ~9k calls to InstanceDefinable#ensure_defined, which is part of graphql-ruby’s definition API. It’s all overhead to support the definition API, 😿.

Several methods are called 1748 times. Turns out, this is once per field in the response.

With that in mind, 25,403 seems like a lot of calls to Module#===!

Reduce GC Pressure

Since Class#new was the call with the most self time, I thought I’d start there. What kind of objects are being allocated? We can filter the profile output:

Lots of GraphQL internals! That’s good news though: those are within scope for optimization.

MiddlewareChain was ripe for a refactor. In the old implementation, each field resolution created a middleware chain, then used it and discarded it. However, this was a waste of objects. Middlewares don’t change during query execution, so we should be able to reuse the same list of middlewares for each field.

This required a bit of refactoring, since the old implementation modified the array (with shift) as it worked through middlewares. In the end, this improvement was added in 5549e0cf. As a bonus, the number of created Arrays (shown by Array#initialize_copy) also declined tremendously since they were used for MiddlewareChain’s internal state. Also, calls to Array#shift were removed, since the array was no longer modified:

The number FieldResult objects was also reduced. FieldResult is used for execution bookkeeping in some edge cases, but is often unneeded. So, we could optimize by removing the FieldResult object when we had a plain value (and therefore no bookkeeping was needed): 07cbfa89

A very modest optimization was also applied to GraphQL::Arguments, reusing the same object for empty argument lists (4b07c9b4) and reusing the argument default values on a field-level basis (4956149d).

Avoid Duplicate Calculations

Some elements of a GraphQL schema don’t change during execution. As long as this holds true, we can cache the results of some calculations and avoid recalculating them.

A simple caching approach is to use a hash whose keys are the inputs and whose values are the cached outputs:

12345678910111213141516171819

# Read-through cache for summing two numbers## The first layer of the cache is the left-hand number:read_through_sum=Hash.newdo|hash1,left_num|# The second layer of the cache is the right-hand number:hash1[num1]=Hash.newdo|hash2,right_num|# And finally, the result is stored as a value in the second hash:puts"Adding #{left_num} + #{right_num}"hash2[right_num]=left_num+right_numendendread_through_sum[1][2]# "Adding 1 + 2"# => 3read_through_sum[1][2]# => 3

The first lookup printed a message and returned a value but the second lookup did not print a value. This is because the block wasn’t called. Instead, the cached value was returned immediately.

This approach was applied aggressively to GraphQL::Schema::Warden, an object which manages schema visibility on a query-by-query basis. Since the visibility of a schema member would remain constant during the query, we could cache the results of visibility checks: first 1a28b104, then 27b36e89.

This was also applied to field lookup in 133ed1b1e and to lazy_resolve handler lookup in 283fc19d.

Use yield Instead of &block

Remove Overhead from Lazy Definition API (warning: terrible hack)

In order to handle circular definitions, graphql-ruby’s .define { ... } blocks aren’t executed immediately. Instead, they’re stored and evaluated only when a definition-dependent value is required. To achieve this, all definition-dependent methods were preceeded by a call to ensure_defined.

Maybe you remember that method from the very top of the profiler output above:

There’s one caveat: these optimization apply to the GraphQL runtime only. Real GraphQL performance depends on more than that. It includes application-specific details like database access, remote API calls and application code performance.

What’s a “repository?”

A GraphQL::Pro::Repository works like a single, large GraphQL document with many different operations (ie, queries, mutations, or subscriptions) and fragments inside it. These operations are validated and analyzed as a single unit, as if they came in a single query string.

From a client’s perspective, the server has a fixed set of operations it can perform. Each one can be executed by sending its operation name.

The repository approach allows us to use pre-existing GraphQL concepts:

Document: A GraphQL document is a set of operations and fragments. The semantics of a valid document are well-specified and broadly implemented. A repository is an extension of this concept.

Operation name: GraphQL includes a way to specify which operation to run in a document. Repositories build on this by separating the set of operations (which lives on the server) from the identifier (which comes from the client).

By employing these concepts, we make full use of the battle-tested graphql-ruby runtime without deviating from the spec.

This way, a reader can skim the app/graphql/documents directory to take a quick inventory of operations. Also, this one-to-one mapping mimics the Ruby convention of putting constants in identically-named files.

In the end, GraphQL::Pro::Repository will accept files with any name, as long as they match #{path}/**/*.graphql.

Sharing Fragments

Since a repository functions as one big GraphQL document, fragments are shared by default.

You can put fragments in their own files, then reference them from each operation that needs them. This way, operations with common data responsibilities can share code, ensuring that they stay in sync.

For example, consider a list of comments with a box to create a new comment. We’d make three .graphql files:

For me, I’m hoping to improve client support (eg, Apollo Client) and server tooling (eg, query diffing) to make repositories even more useful!

]]>http://rmosolgo.github.io/blog/2017/03/07/persisted-graphql-queries-with-ruby/2017-01-22T10:23:00-05:00http://rmosolgo.github.io/blog/2017/01/22/parallelism-in-graphql-rubyIt’s possible to get IO operations running in parallel with the graphql gem.

I haven’t tried this extensively, but I had to satisfy my curiosity!

Setup: Long-Running IO

Let’s say we have a GraphQL schema which has long-running IO- or system-bound tasks. Here’s a silly example where the long-running task is sleep:

1234567891011121314

QueryType=GraphQL::ObjectType.definedoname"Query"field:sleep,!types.Int,"Sleep for the specified number of seconds"doargument:for,!types.Intresolve->(o,a,c){sleep(a["for"])a["for"]}endendSchema=GraphQL::Schema.definedoquery(QueryType)end

🎉 Three seconds! Since the sleep(3) calls were in different threads, they were executed in parallel.

Real Uses

Ruby can run IO operations in parallel. This includes filesystem operations and socket reads (eg, HTTP requests and database operations).

So, you could make external requests inside a Concurrent::Future, for example:

123

Concurrent::Future.execute{open("http://wikipedia.org")}

Or, make a long-running database call inside a Concurrent::Future:

123

Concurrent::Future.execute{DB.exec(long_running_sql_query)}

Caveats

Switching threads incurs some overhead, so multithreading won’t be worth it for very fast IO operations.

GraphQL doesn’t know which resolvers will finish first. Instead, it starts each one, then blocks until the first one is finished. This means that subsequent long-running fields may have to wait longer than they “really” need to. For example, consider this query:

123456

{sleep(for:5)nestedSleep(for:2){sleep(for:2)}}

Even with multithreading, this would take about 7 seconds to execute. First, GraphQL would wait for sleep(for: 5), then it would get to nestedSleep(for: 2), which would have already finished, then it would execute sleep(for: 2).

Conclusion

If your GraphQL schema is wrapping pre-existing HTTP APIs, using a technique like this could reduce your GraphQL response time.

]]>http://rmosolgo.github.io/blog/2017/01/22/parallelism-in-graphql-ruby/2017-01-09T09:47:00-05:00http://rmosolgo.github.io/blog/2017/01/09/introducing-graphql-prographql-ruby is almost two years old! Today, I’m adding a new element to the project, GraphQL::Pro.

As time goes on, I’ll keep an eye out for other integrations that could be included in GraphQL::Pro. (If you have a suggestion, I’d love to hear it!)

Feedback Loop

Some teams adopt GraphQL as a foundational element of their application. I’d like to provide them service (and peace of mind) as they build on that investment. GraphQL::Pro customers have my ear for any performance issues, bugs or feature requests. They also have an assurance that I’ll continue to maintain and improve graphql-ruby.

Prevent Burnout

I really enjoy working on graphql-ruby and I’m excited about the work to be done in 2017. But it’s no secret that open-source work can become an unrewarding, thankless grind. Charging money for GraphQL::Pro provides me with a simple, concrete “reward” to continue the work. I hope this will be good for me, for the project, and for others who are invested in the project.

Buying GraphQL::Pro

]]>http://rmosolgo.github.io/blog/2017/01/09/introducing-graphql-pro/2016-11-23T10:34:00-05:00http://rmosolgo.github.io/blog/2016/11/23/raising-exceptions-is-badIn general, raising exceptions for control flow makes code hard to understand. However, there are other cases when an exception is the right choice.

Raise vs Return

raise is return’s evil twin.

They both stop the execution of the current method. After a return, nothing else is executed. After a raise, nothing else is executed … maybe. The method may have a rescue or ensure clause which is executed after the raise, so a reader must check for those.

They both change flow of control. return gives control back to the caller. raise may give control anywhere on the call stack, depending on the specific error and rescue clauses. If all you see is a raise, you can’t guess where it will be rescued!

They both send values to their new destination. return provides the given value to the caller, who may capture the return value in a local variable. raise provides the error object to the rescue-er. return can send any kind of value, but raise can only send error objects.

They both create coupling across call stack frames. return couples two adjacent call stack frames: caller depends on the return value. raise → rescue couples far-removed stack frames: they may be adjacent, or they may be several frames removed from one another.

Raise → Rescue is Unpredictable

Sending values through a program by calling methods and return-ing values is very predictable. If you return a different value, the caller will get a different value. To see where return values “go”, simply search for calls to that method.

Finding where raise’d errors go is a bit more challenging. For example, this change:

How can you tell if this is a safe refactor? Here are some considerations:

Instead of looking for callers of this method, you have to find entire call stacks which include this method, since any upstream calls may also have expectations about this error.

When searching for rescues, you have to keep the error’s ancestry in mind, finding bare rescues, superclass-tagged rescues and class-tagged rescues.

Some rescues may consume the error object itself. For example, they may read its #message or other attached data. If you change any properties of the error object, you may break the assumptions of those rescues.

If you find that the new error will be rescue’d differently, you must also consider how execution flow will change in other methods. For example, some methods may be cut short because previously-rescue’d errors now propagate through them. Other methods which used to be cut short may now continue running, since errors are rescued in child method calls.

If your raise is located in a Ruby gem, these problems are even harder, because rescue clauses may exist in your users’ code.

If your error patterns are well documented, ༼ つ ◕_◕ ༽つ 🏆. Bravo, just don’t break your public API. Users might still make assumptions beyond the documentation, such as error ancestry or message values. Additionally, they could be monkey-patching library methods and applying rescue-related assumptions to those patches.

If your error patterns aren’t documented, 💩 ノ༼ ◕_◕ ノ ༽. You have no idea what assumptions users make about those errors! You can’t be sure your changes won’t break their code.

Use Return Instead

raise can be replaced by return. However, if you’re using raise to traverse many levels of the call stack, the refactor will be intense. Take heart: previously you were hacking your way back up the call stack, now you’re creating a predictable, explicit flow through your program!

Return errors instead of raising them. Ruby errors are objects, like everything else. You can return them to the caller and let the caller check whether the returned value is an error or not. For example, to return an error:

1234567891011

defdo_somethingcalculation=SomeCalculation.new# ...ifcalculation.something_went_wrong?# Let the caller handle this errorMyCustomError.new("oops!")else# Return the result to the callercalculation.resultendend

Use success and failure objects. Instead of returning a raw StandardError instance to the caller, use a Failure class to communicate failure. Additionally, use a Success class to communicate success. (This is similar to the “monad” technique, eg dry-monads gem.)

123456789101112131415161718192021222324252627282930313233343536

classConvertSuccessattr_reader:old_file,:new_filedefinitialize(old_file:,new_file:)# ...endendclassConvertFailureattr_reader:old_file,:errordefinitialize(old_file:,error:)# ...endend# Try to convert this file, returning either a# ConvertSuccess or ConvertFailure)defconvert_file(file)# ...iferror_message.nil?ConvertSuccess.new(old_file:file,new_file:converted_file)elseConvertFailure.new(old_file:file,error:error_message)endend# Try to convert a file,# then specify behavior# for failure case & success case:conversion=convert_file(File.read(file_path))caseconversionwhenConvertSuccess# Do something with the new filewhenConvertFailure# Notify the user of the failureend

As a last resort, return nil. Using nil as an expression of failure has some downsides:

nil can’t hold a message or any extra data

sometimes, nil is a valid value

But, for simple operations, using nil may be sufficient. Since it will be communicated via return, refactoring it will be straightforward in the future!

Sometimes, Raise is Okay

raise has its purposes.

raise is a great way to signal that the program has reached a completely unexpected state and that it should exit. For example, in the convert_file example above, we could use raise to assert that we don’t receive an unexpected value from convert_file:

123456789

conversion=convert_file(File.read(file_path))caseconversionwhenConvertSuccess# Do something with the new filewhenConvertFailure# Notify the user of the failureelseraise("convert_file didn't return a ConvertSuccess or ConvertFailure, it returned: #{conversion.inspect}")end

Now, if the method ever returns some unexpected value, we’ll receive a loud failure. Some people use fail in this case, which is also fine. However, the need to disambiguate raise and fail is a code smell: stop using raise for non-emergencies!

raise is also helpful for re-raising other errors. For example, if your library needs to log something when an error happens, it might need to capture the error, then re-raise it. For example:

12345678910111213

# This method yields to a user-provided block, eg# `handle_converted_file(old_file) { |f| push_to_s3(f) }`defhandle_converted_file(old_file)conversion=convert_file(old_file)ifconversion.is_a?(ConvertSuccess)yield(conversion.new_file)endrescueStandardError=>err# Make a log entry for the library:logger.log("User error from handle_converted_file",err)# Let the user handle this error:raise(err)end

This way, you can respond to the error without disrupting user code.

raise SharpKnifeError

In my own work, I’m transitioning away from raising errors and towards communicating failure by return values. This pattern is ubiquitous in languages like Go and Elixir. In Node.js, callbacks communicate errors in a similar way (callback arguments). I think Ruby code can benefit from this practice as well.

]]>http://rmosolgo.github.io/blog/2016/11/23/raising-exceptions-is-bad/2016-11-12T13:07:00-05:00http://rmosolgo.github.io/blog/2016/11/12/graphql-query-as-a-state-machineState machines are applied to a wide variety of programming problems. I found it useful to think of a GraphQL query as a state machine.

Part 0: Introduction to State Machines

Practically speaking, a state machine is a unit of code with these properties:

It has a set of states

It is in one state at a time

Transitions connect one state to another

Transitions can be triggered by outside activity and/or make changes to code on the “outside”

One state is the starting state

One or more states may be valid ending states

State machines are also called “finite automata”.

To see why code like this is useful, let’s examine a couple of applications of state machines:

Some ORMs use a state machine to track the lifecycle of persisted objects. For example, the set of states may be: new, persisted and destroyed. A new object begins in the new state. Calling save() initiates a transition to the persisted state. Calling destroy() moves the machine to the the destroyed state. Moving from destroyed to new is impossible; there is no transition between these states.

Regular expressions are often implemented with state machines. The regular expression’s various patterns are transformed into states. While the expression is tested against a string, matching characters cause transitions from one valid state to another. Non-matching characters case a transition to the “failed” state. After the string has been completely tested, the regular expression tests itself: if it’s in a valid ending state, then the string was a match. If it isn’t, then the string was not a match.

In summary, state machines provide a model for well-defined progression through a many-stepped process.

In the diagram above, each state is represented by a box. Between states, transitions are represented by arrows. Since this is a regular expression, the transitions are named after the strings which they match. For example, if the machine is in the start state and it observes an "a", it moves to Matching State 1 (MS1). As the regular expression matches the string "abc", it progresses through the states, finally reaching end.

Let’s see another regular expression, /^a(bc|bd)$/. It matches two strings, "abc" and "abd". Here’s a naive machine for this expression:

Contrasting this machine to the previous one, we can see a difference: this machine has a branch. To make matters “worse”, the branch is ambiguous: from MS1, when a "b" appears in the string, should the machine move to MS2 or MS4? It can only tell by looking ahead, and possibly backtracking, which is inefficient.

This difference is called deterministic vs non-deterministic. The first machine is deterministic: for each state, each input character can lead to exactly one state (failed state is not pictured). The second machine is non-deterministic: for some states, an input character may lead to multiple states.

Solving Non-Determinism

It turns out, you can transform a non-deterministic machine into a deterministic machine. The process works like this:

Inspect the non-deterministic machine:

For each state, gather the possible inputs for that state.

For each possible input, find the one-or-more destination states which it leads to.

Take those destination states a create a new state in deterministic machine

For a set of destination states S, the new state represents “any of S”

Repeat the process from this new state (find possible inputs, derive a new state for its possible destinations)

The result is a deterministic machine, some of whose states represent a set of states in the non-deterministic machine.

Let’s apply the transformation to the non-deterministic machine above:

The non-deterministic transitions on "b" have been replaced by a single transition to a newly-created state. The new state represents MS2orMS4, and it has transitions from both of those states. It may transition on EOSor it may transition on "c".

The result is a deterministic machine: for each input, we have exactly one transition, so we never need to backtrack.

What about GraphQL?

Consider a GraphQL query:

123456789

{
parent {
child {
field1
field2
field3
}
}
}

Executing this query can be articulated in terms of a state machine:

Each selection ({ ... }) is a state, which has a type (a GraphQL type) and a value (a value in the host language, eg, Ruby object)

Fields are transitions: they move execution from one state to another (that is, from one selection to a child selection)

When each field in a selection has been executed, the machine moves “back” to the parent state

The starting state is the root-level selection (eg, query { ... })

The ending state is also the root-level selection, after traversing all selections in the query

Some transitions are invalid: for example, if the value is nil, the machine can’t move into a state whose type is non-null.

Non-Determinism in GraphQL

Depending on the runtime type of node(id: $nodeId), 0, 1, 2, or 3 of those typed selections may be executed. This is a kind of non-determinism.

A simple solution for a GraphQL AST interpreter is:

Test each condition and gather the selections which apply

For each unique field in the set of selections, evaluate it

Evaluate sub-selections for each field which matches that unique field

Concretely, that boils down to:

Get the runtime type (RT) of the object (O) returned by node(id: $nodeId)

For each interface type, if RT implements that interface, gather that selection

Find uniquely-named fields in the set of selections (child is the only one)

Resolve child field on O

For each selection in the set, find fields named child and gather them up (a subset of field1, field2, field3)

Repeat

This solution is not a good fit for graphql-ruby because we have a pre-execution phase for analyzing incoming queries. This flow requires runtime types which are not available until fields are actually executed.

Solving Non-Determinism in GraphQL

To streamline execution, we can apply a similar transformation to a GraphQL query. Before executing, we can check each field:

Identify each possible return type

Identify each type condition

Build a new “state” for each valid combination of return type and type conditions

The state contains a set of selections. This machine can be the basis of both pre-execution and execution, since all possible transitions have been identified. Execution will follow a subset of those transitions, depending on the runtime type of returned objects.

Now, each runtime type transitions to exactly one selection. This information simplifies pre-execution analysis and execution. Additionally, this computed state can cache field-level values like coerced arguments and field resolve functions.

In graphql-ruby, these transformations are implemented in GraphQL::InternalRepresentation. At time of writing, the multi-selection state is implemented as an array of InternalRepresentation::Nodes, but an incoming PR will formalize them as InternalRepresentation::Selections.

Ceci n’est pas un state machine

Although this mindset helps solve a problem with GraphQL fragment merging, there are also some key differences between a GraphQL query and a state machine:

A GraphQL query cannot be cyclical. That is, the same state may not be reached more than once. (Even “returning” from a selection is not actually the same state: you’re in a different place because some fields have been resolved.)

Because a GraphQL query can’t form a cycle, “precompiling” it to a deterministic machine doesn’t yield a big payoff. Instead, you can build states as you reach them, saving some work in case some branches are not reached at runtime (or if an error occurs during execution).

]]>http://rmosolgo.github.io/blog/2016/11/12/graphql-query-as-a-state-machine/2016-10-24T10:30:00-04:00http://rmosolgo.github.io/blog/2016/10/24/hash-key-vs-hash-getI read thatHash#key? was slower than Hash#[] and it made me sad because, shouldn’t Hash#key? generally require less work?

Besides that, there are cases where only Hash#key? will do the trick. For example, if you need to distinguish between these two cases:

But, Hash#key? does something a bit unusual: to doesn’t capture the
value of key in self. Instead, it uses the return value of st_lookup
to detect whether the lookup was a hit or a miss. In the case of a hit, it returns Qtrue (The C name for Ruby’s true.)

Digging deeper: st_lookup

st.c provides a general purpose hash table implementation. It is widely used by Ruby. st_lookup looks up a key in a table. On a hit, it writes the value to a pointer and returns 1. On a miss, it returns 0.

st_lookup accepts 0 as input for the value pointer. And in that case, it does nothing with value. For example, here’s a snippet from the hit case:

12

if(value!=0)*value=ptr->record;return1;

Referring back to Hash#[] and Hash#key?, that’s the most notable distinction:

Hash#[] sends a st_data_t* to st_lookup

Hash#key? sends 0 to st_lookup

But … why would it be slower to use 0? 😿

VM Optimization

I was going to report my failure to the twitter thread where I first saw this, but I noticed a new response from @schneems:

“It’s optimized by the interpreter to skip the usually more expensive method lookup”

Ok, let’s check that! We can see the Ruby bytecode by using the RubyVM module.

The earlier cases check if the receiver is an Array or Hash, and that the method hasn’t been redefined. In that case, it directly calls the C function for lookup. If any of those checks fail, it uses normal_dispatch to execute the instruction. Hash#key?, on the other hand, always uses a full method lookup.

Conclusion

Hash#[] gets an optimized VM instruction, so it runs faster than Hash#key?. But sometimes onlyHash#key? will do the trick!

]]>http://rmosolgo.github.io/blog/2016/10/24/hash-key-vs-hash-get/2016-10-18T15:05:00-04:00http://rmosolgo.github.io/blog/2016/10/18/parameterized-styles-with-react-rails-and-sprocketscss_modules provides an approach to styling UI components in a local-first way.

CSS Modules

Since each context has a DetailPane, let’s define a mixin and share it between the two:

12345678910111213141516171819202122

// in shared/detail_pane.scss@mixin detail-pane{.detail-pane{margin:5px;border-radius:5px;border:1pxsolid#777;.description{font-size:1.2rem;}}}// in views/resources.css:module(resources){@include detail-pane;}// in views/rooms.css:module(rooms){@include detail-pane;}

Why a mixin?

Using a mixin makes it easier to track usage within the application: you only need to search for @includes, rather than class names.

It also enforces a clear separation from base styles and custom styles. Base styles are hard-coded in the mixins. Custom styles are implemented as overrides within the module or as parameters to the mixin (using $-variables).

Applying Styles

To apply the modulized styles to a component, provide the component with a CSS module prop:

1234567891011

varresourcesModule=CSSModules("resources")<divclassName="resources"><DetailPanecssModule={resourcesModule}/></div>// later ...varroomsModule=CSSModules("rooms")<divclassName="rooms"><DetailPanecssModule={roomsModule}/></div>

Then, update DetailPane so that it gets class names from this.props.cssModule:

// in views/resources.scss:module(resources){@include detail-pane(10px);// .detail-pane will have 10px margin}// in views/rooms.scss:module(rooms){@include detail-pane(5px);// .detail-pane will have 5px margin}

Bare class names?

Perhaps you need to support bare class names (no module). For example, if you extra @mixin detail-pane but your app still contains bare .detail-pane class names, you might apply the mixin to the global scope:

To use bare class names in your <DetailPane /> component, use a null module:

1234

// This module has no name, it renders bare selectors:varnullModule=CSSModules(null)nullModule("detail-pane")// "detail-pane"

You can pass that in for the cssModule prop:

1

<DetailPanecssModule={CSSModules(null)}/>

Then, the rendered output will contain bare class names:

123

<divclass="detail-pane"><pclass="description"></p></div>

Use with Rails Partials

You can also parameterize the class names in Rails partials.

Get a module with the view helper, then pass it to a partial:

12345

<%resources_module=css_module("resources")%><%=renderpartial:"detail_pane",locals:{style_module:resources_module}%><!-- later --><%rooms_module=css_module("rooms")%><%=renderpartial:"detail_pane",locals:{style_module:rooms_module}%>

]]>http://rmosolgo.github.io/blog/2016/10/18/parameterized-styles-with-react-rails-and-sprockets/2016-09-04T22:13:00-04:00http://rmosolgo.github.io/blog/2016/09/04/trip-report-rubyconf-colombia-2016I just got back from RubyConf Colombia. The content was great, the community was great, and the venue was great!

The Content

This was a single-track conference, but there were no dud talks! I prefer code-driven talks, and for that reason, these were a few of my favorites:

David Pelaez demoed some functional concepts in Ruby, including Right and Left (?? name) and None objects.

Oscar Rendon described an approach to combining business rules which allows you to simplify conditional branching and reuse code more effectively.

Nick Sutterer shared some insights into busting up “god objects” into more specialized, context-specific units of code.

Sebastian Arcila-Valensuela and Frederico Builes gave my favorite non-code talk. They told a funny story of making tradeoffs while working on a tough problem: building a human-friendly path from a disjoint set of points.

The Community

My favorite part of this event was getting to know the local Ruby community. In fact, I should say “software development community”, as many attendees were not full-time Ruby developers.

Many things were familiar to me: these folks are building interesting projects (like Wesura, peer-to-peer insurance, and Vlip, a mobile payment platform). They’re excited about doing great work and learning new things. Also, there’s a big focus on training and education: many attendees were students at local universities or participants in web dev bootcamps.

Other things were different. I saw many more female attendees at this conference than others. I know that this was a goal for the organizers, bravo to them for succeeding! Also, the community is relatively new. RubyConf Colombia is the only Ruby conference in Spanish-speaking Latin America (the other is in Brazil), and it requires a lot of work to plan and execute, because there isn’t so much momentum in place. For example, there’s not a culture of companies sending their employees to conferences, so many attendees paid out of pocket.

I also realized something I take for granted: as a native English speaker, basically all technical information is available in my mother tongue, no matter where in the world it came from. What a blessing! It makes me think how valuable the translation service must have been for some native Spanish speakers. Of course, it was hugely valuable to me too – I would have been lost without it during the Spanish-language talks!

The Venue

El Teatrico was a great spot for this event. We were close to max capacity, and that gave the conference a very intimate feeling. During breaks, you couldn’t help but start a conversation with your neighbor.

Besides that, the translation was amazing! There were two guys taking turns, and they did a great job keeping up with the talks and translating technical vocabulary. I felt like I understood almost 100% of the Spanish-language talks.

I really enjoyed Medellin, too. It checked all the boxes for me: great environment, great people and great food!

Photos

In Summary

This was a wonderful trip! I loved making some new friends and visiting this fantastic city. I really appreciate the work of the organizers, and that their primary goal is really to benefit their community. I’m excited to see what next year holds for them!

Specializing Ruby describes Chris Seaton’s work on JRuby+Truffle. It seems to be aimed at an unfamiliar audience, so it’s loaded with background information and careful explanations. Those were a big benefit to me! I’ll describe a few things that I enjoyed the most:

Introduction to Truffle and Graal

Optimizing Metaprogramming with Dispatch Chains

Zero-Overhead Debugging

Interpreting Native Extensions

Introduction to Truffle and Graal

Seaton’s work is built on top of two existing Java projects: Truffle and Graal (pronunciation: 😖❓).

Truffle is a language implementation framework for self-optimizing AST interpreters. This means:

Truffle is for implementing languages. People have used Truffle to implement many languages, including Ruby, C, and Python.

Truffle languages are AST interpreters. A Truffle language parses its source code into a tree of nodes (the abstract syntax tree, AST), which represents the program. Then, it executes the program by traversing the tree, taking actions at each node.

Truffle languages can self-optimize. Nodes can observe their execution and replace themselves with optimized versions of themselves.

Graal is a dynamic compiler for the JVM, written in Java. A few points about Graal:

It’s a just-in-time compiler, so it improves a program’s performance while the program runs.

Graal is written in Java, which means it can expose its own APIs to other Java programs (like Truffle).

Graal includes a powerful system for de-optimizing. This is especially important for Ruby, since Ruby’s metaprogramming constructs allow programs to define new behavior for themselves while running.

Truffle has a “Graal backend,” which supports close cooperation between the two. Together, they make a great team for language implementation: Truffle provides a simple approach to language design and Graal offers a means to optimize all the way to machine code.

Optimizing Metaprogramming with Dispatch Chains

This is a novel optimization technique for Ruby, described in section 5.

Since Ruby is dynamic, method lookups must happen at runtime. In CRuby, call sites have caches which store the result of method lookups and may short-circuit the lookup next time the call happens.

12345

some_object.some_method(arg1,arg2)# ^- here's the call site# the _actual_ method definition to use# depends on `some_object`'s class, which is unknown# until the program is actually running

One such cache is a polymorphic inline cache, which is roughly a map of Class => method pairs. When CRuby starts the call, it checks the cache for the current receiver’s class. On a cache hit, it uses the cached method definition. On a cache miss, it looks up a definition and adds it to the cache.

In some cases, CRuby declares bankruptcy. Dynamic method calls (.send) are not cached!

12

some_object.send(method_name,arg1,arg2)# ^- who knows what method to call!?!?

JRuby+Truffle’s solution to this challenge is dispatch chains. Each call site (including .send) gets a dispatch chain, which is a like two-layer cache. First, it stores the name of the method. Then, it stores the class of the receiver. For a “static” method call, it looks like this:

In this respect, JRuby+Truffle treats every method call like a .send(...). This cache is implemented with Truffle nodes, so it’s optimized as much as the rest of the program.

I wonder if this kind of method cache could be implemented for CRuby!

Zero-Overhead Debugging

Debugging in JRuby+Truffle (described in section 6) is a tour de force for the Truffle-Graal combo. Other Rubies incur big performance penalties for debugging. Some require a special “debug” flag. But Seaton implements zero-overhead, always-available debugging by applying Truffle concepts in a new way.

Debugging hooks (such as the beginning of a new line) are added as “transparent” Truffle AST nodes, analogous to CRuby’s trace instruction. By default, they don’t do anything – they just call through to their child nodes. Since they’re “just” Truffle nodes, they’re optimized like the rest of the program (and since they’re transparent, they’re optimized away completely). When those nodes are targeted for debugging, they’re de-optimized, updated with the appropriate debug code, and the program continues running (and self-optimizing). When the debugger is detached, the node de-optimizes again, replaces itself with transparent nodes again, and the program resumes.

This chapter included a good description of Graal’s Assumption concept. Assumptions are attached to optimized code. As long as isValid() is true, optimized code is executed. However, when an assumption is marked as invalid, Graal transfers execution back to the interpreter. Debugging takes advantage of this construct: debug nodes are transparent under the assumption that no debugger is attached to them. But when a developer attaches a debugger, then that assumption is invalidated and Graal de-optimizes and starts interpreting with the new debug nodes. Removing a debugger does the same thing: it invalidates an assumption, automatically de-optimizing the compiled code.

Interpreting Native Extensions

Truffle: if it’s not solving your problems, you’re not using enough of it!

Throughout the paper, Seaton points out the “real-world” challenge of any new Ruby implementation: it simply must support all existing code, including C extensions! If you require developers to rewrite code for a new implementation, they probably won’t bother with it.

He also points out that CRuby’s C API is an implementer’s nightmare (my words, not his). It’s tightly coupled to CRuby’s implementation it provides direct access to CRuby’s memory (eg, string pointers).

Truffle’s design offers a solution to this problem. Truffle languages implement common interfaces for AST nodes and objects, meaning that they can be shared between languages! With this technique, JRuby+Truffle can implement Ruby’s C API by interpreting C with Truffle. Since it’s “just Truffle”, C and Ruby ASTs can be seamlessly merged. They are even optimized together, just like a pure-Ruby program.

Conclusion

The only remaining question I have is, how bad is warm-up cost in practice? All of JRuby+Truffle’s benchmarks are at “peak performance”, but the system is “cold” at start-up, and many triggers in the program can cause the system to de-optimize. Is JIT warm-up a real issue?

“Optimizing Ruby” was a great read. Although I found the subject matter quite challenging, the writing style and occasional illustrations helped me keep up. Practically speaking, I can’t use JRuby+Truffle until it runs all of Ruby on Rails, which isn’t the case yet. I’m eager to see how this project matures!

]]>http://rmosolgo.github.io/blog/2016/08/06/summer-reading-specializing-ruby/2016-05-19T22:00:00-04:00http://rmosolgo.github.io/blog/2016/05/19/finding-a-browser-ready-file-for-sprocketsI like using Sprockets, but sometimes it’s hard to find a file to include in the asset pipeline. Here are some methods I use to find browser-ready JavaScript files.

There are a few good options for getting browser-ready files for JavaScript libraries:

Download a file from the project’s website

Download a file from the project’s source code repository

Download a file from a CDN (npmcdn is great for cases where files are only “compiled” for releases)

Build the file yourself, following the project’s documentation

Don’t get a minified version. Sprockets will minify it for us later. In the meantime, the unminified version will help us during development.

From a Website

This is the good ol’ way of getting JavaScript files. Because we still use browsers, you can still download these files.

Here are some examples:

From the Repo

Many projects maintain a browser build in the project’s source. You may have to poke around a bit, but likely places are the project’s root folder, the dist/ folder, or the build/ folder.

As you explore the repo, remember to examine a stable ref, such as a release or a stable branch.

From a CDN

Sometimes, an author only compiles browser-ready files for releases to NPM. You can get these files from npmcdn.

Since npmcdn is serving NodeJS projects, employ a similar technique to searching the project repo for a file:

Check the “main” file

Check the “dist” or “build” directories

Build it from Source

If a pre-built, browser-ready file is not available, you may have to build it yourself! The project’s readme will contain instructions to do so. If it doesn’t … you may want to reconsider adding this dependency! (Even if it’s well-maintained, it’s not a good match for this asset bundling approach.)

Summary

Hopefully these will work well for you!

You may have to learn a bit of RequireJS, jspm, Grunt, Browserify, Gulp, Webpack or Rollup along the way. (Ok, probably not Rollup, sadly.) But at least you don’t have to use them day-in and day-out!

]]>http://rmosolgo.github.io/blog/2016/05/19/finding-a-browser-ready-file-for-sprockets/2016-05-19T08:44:00-04:00http://rmosolgo.github.io/blog/2016/05/19/how-i-use-sprocketsWhen reviewing issues for react-rails, I see many questions about how to gather JavaScript dependencies with Sprockets. Here is how I use Sprockets to manage JavaScript dependencies.

I’m looking for a few things in a JavaScript bundler:

Stability: I don’t want any changes to my dependencies unless I explicitly make them.

Clarity: I want to be able to quickly tell what dependencies I have (library and version).

Insulation: I don’t want to rely on external services during development, deployment or runtime (except for downloading new dependencies, of course)

Feature-completeness: I want to concatenate and minify my assets and serve them with cache headers

Finding a browser-ready file

Adding the file to vendor/

Use an unminified version of the library. It will help in debugging development and viewing diffs when you update the dependency. Have no fear, Sprockets will minify it for you for production.

Include the version number in the file name. This will give you more confidence in updating the library, since you’ll know what version you’re coming from.

Integrating with Sprockets

The //= require ./vendor/{library}-v{version} directive is your friend. Like an entry in package.json, it tells the reader what dependency you have.

Now, your library will be accessible by its global name, such as React, d3 or Immutable.

Consuming a library via global variable is not ideal. But it does help you remember that, at the end of the day, the browser is one giant, mutable namespace, so you must be a good citizen! At least global variables can be grepped like any other dependency.

Consider isolating your dependency. For example, you could wrap Pusher in an application-specific event emitter. This way, when you update Pusher, you only have to check one file for its usages. (Some libraries are poor candidates for isolation. My app will never be isolated from React!)

Caveats

There are some things Sprockets doesn’t provide for me, which I wish it did:

Named imports: I wish there was a good alternative to global namespacing with Sprockets, but not yet. (It’s not a deal breaker – it doesn’t hurt to be familiar with this constraint because it’s the reality of the browser, anyways.)

Tree shaking: It wish I could only transmit the parts of Underscore.js I actually used!

Perhaps I should read up on Sprockets and submit a patch 😎

Also, there’s one case where copy-pasting isn’t a great solution. Some libraries (like React.js) have separate “development” and “production” builds. The production build has fewer runtime checks than the development build, making it smaller and faster. There are a few solutions to this problem:

Use a gem which provides the proper file for each environment (like react-rails)

Add environment-specific folders to the asset pipeline (like react-rails does, I can write more on this if need be)

Use the development build in productiosn (weigh the costs first: what’s the difference in behavior, performance and file size?)