2017-09-25T13:00:20+02:00http://karolgalanciak.com/Octopress2017-09-24T22:00:00+02:00http://karolgalanciak.com/blog/2017/09/24/do-or-do-not-there-is-no-try-object-number-try-considered-harmfulObject#try is quite a commonly used method in Rails applications to cover cases where there is a possibility of dealing with a nil value or to provide flexible interface for handling cases where some kind of object doesn’t necessarily implement given method. Thanks to try, we may avoid getting NoMethodError. So it seems like it’s perfect, right? No NoMethodError exception, no problem?

Well, not really. There are some severe problems with using Object#try, and usually, it’s quite easy to implement a solution that would be much better.

Object#try – how does it work?

The idea behind Object#try is simple: instead of raising NoMethodError exception when calling some method on nil or calling a method on non-nil object that is not implemented by this object, it just returns nil.

Imagine that you want to grab the email of the first user. To make sure it won’t blow up when there are no users, you could write it the following way:

1

user.first.try(:email)

What if you implemented some generic service where you can pass many types of objects and, e.g., after saving the object it attempts to send a notification if the object happens to implement a proper method for that? With Object#try it could be done like this:

As you can see, it is also possible to provide arguments of the method.

What if you need to do some chaining of the methods where you can get nil at each intermediate step? No problem, you can use Object#try:

1

payment.client.try(:addresses).try(:first).try(:country).try(:name)

What is the problem then?

Apparently, Object#try is capable of handling multiple cases, so what is the problem with using it?

Well, there are many. The biggest issue with Object#try is that in many cases it solves problems that should never happen in the first place and that problem is nil. The another one is that the intention of using it is not clear. What does the following code try to say?

1

payment.client.try(:address)

Is it a legit case that some payment might not have a client and indeed it could be nil? Or is added “just in case” if client happens to be nil to not blow up with NoMethodError exception? Or even worse, does client happen to be a polymorphic relationship where some models implement addresses method and the others don’t? Or maybe there is a problem with data integrity, and for a few payments the client was deleted for some reason, and it’s no longer there?

Just by looking at this code it is impossible to tell what’s the intention of Object#try, there are just too many possibilities.

Fortunately, there are plenty of alternative solutions that you can apply to get rid of Object#try and make your code clear and expressive – thanks to that, it will be much more maintainable, more readable and less prone to bugs as the intention will no longer be ambiguous.

Alternative solutions

Here are few “patterns” you could apply depending on the context where Object#try is used.

Respecting Law of Demeter

Law of Demeter is a handy rule (I wouldn’t go that far to call it a “law” though) which helps avoid structural coupling. What it states is that hypothetical object A should be only interested in its own immediate surrounding and should not be aware of the internal structure of its collaborators or associations. In many cases, it means having only one “dot” in method calls. However, Law of Demeter is not really about the amount of “dots” (method calls), it’s only about the coupling between objects, so chained operations and transformations are perfectly fine, e.g., the following example doesn’t violate the law:

1

input.to_s.strip.split(" ").map(&:capitalize).join(" ")

but the following one does:

1

payment.client.address

Respecting Law of Demeter usually results in a clean and maintainable code, so unless you have a good reason to violate it, you should stick to the law and avoid tight coupling.

Let’s get back to the example with payment, client and address. How could we refactor the following code?

1

payment.client.try(:address)

The first thing would be to reduce structural coupling and implement Payment#client_address method:

12345

class Payment
def client_address
client.try(:address)
end
end

It’s much better now – instead of referring to the address via payment.client.try(:address) we can simply do payment.client_address, which is already an improvement as Object#try happens only in one place. Let’s refactor it further.

We are left now with two options: either client being nil is a legit case or not. If it is, we can make the code look confident and explicitly return early, which clearly shows that having no client is a valid use case:

Such delegations are pretty generic; maybe Rails has some nice solution to this problem? The answer is “yes”! ActiveSupport offers a very nice solution to the exact issue: ActiveSupport#delegate macro. Thanks to that macro, you can define delegations and even handle nil in the exact way we did it.

The first example, where nil is a legit use case, could be rewritten the following way:

Much cleaner, less coupled and we’ve managed to achieve the final result of not using Object#try, but just in a much more elegant way.

However, it is still possible that we might not expect payment to have an empty client (e.g. payment for the transaction that is not completed yet) in some cases, e.g., when displaying data for the payments with completed transactions, but somehow we are getting dreaded NoMethodEror exception. It doesn’t necessarily mean that we need to add allow_nil: true option in delegate macro and for sure it doesn’t mean that we should use Object#try. The solution here would be:

Operating on the scoped data

If we want to deal payments with completed transactions, which are guaranteed to have client, why not simply make sure that we are dealing with the right set of data? In Rails apps that would probably mean applying some ActiveRecord scope to Payments collection, like with_completed_transactions:

123

Payment.with_completed_transactions.find_each do |payment|
do_something_with_address(payment.client_address)
end

Since we never plan to do anything with client’s address for payments for not completed transactions, we don’t need to explicitly handle nils here.

Nevertheless, even if client were always required for creating a payment, it would still be possible that such code might result in NoMethodError. One example where that might happen would be a deleted by mistake associated client record. In that case, we would need to fix:

Data integrity

Ensuring data integrity, especially with RDBMS like PostgreSQL, is quite simple – we just need to remember about adding the right constraints when creating new tables. Keep in mind that this needs to be handled on a database level, validations in models are never enough as they can easily be bypassed. To avoid the issue where client turns out to be nil, despite presence validation, we should add NOT NULL and FOREIGN KEY constraints when creating payments table, which will prevent us from not having a client assigned at all and also deleting the client record if it’s still associated with some payment:

And that’s it! By remembering about those constraints, you can avoid a lot of unexpected use cases with nils.

Ensuring types via explicit conversion

I saw few times Object#try used in a quite exotic way which looked similar to this:

1

params[:name].try(:upcase)

Well, this code clearly shows that some string is expected to be found under name key in params, so why not just ensure it is a string by applying explicit conversion using to_s method?

1

params[:name].to_s.upcase

Much cleaner that way!

However, those two codes are not equivalent. The former one returns a string if params[:name] is a string, but if it is nil, it will return nil. The latter always returns a string. It is not entirely clear if nil is expected in such case (which is the obvious problem with Object#try), so we are left with two options:

nil is the expected return value if params[:name] is nil – might not be the best idea as dealing with nils instead of strings might be quite inconvenient, however, in some cases, it might be necessary to have nils. If that’s the case, we can make it clear that we expect params[:name] to be nil by adding a guard statement:

123

return if params[:name].nil?
params[:name].to_s.upcase

a string is the expected return type – we don’t need to bother with guard statements, and we can just keep the explicit conversion:

1

params[:name].to_s.upcase

In more complex scenarios, it might be a better idea to use form objects and/or have a more robust types management, e.g. by using dry-types, but the idea would still be the same as for explicit conversions, it would just be better as far as the design goes.

Using right methods

Dealing with nested hashes is quite a common use case, especially when building APIs and dealing with user-provided payload. Imagine you are dealing with JSONAPI-compliant API and want to grab client’s name when updating. The expected payload might look like this:

However, since we never know if the API consumer provided a proper payload or not, it would make sense to assume that the structure won’t be right.

One terrible way to handle it would be using… guess what? Obviously Object#try:

1

params[:data].try(:[], :attributes).try(:[], :name)

It’s certainly hard to say that this code looks pleasant. And the funny thing is that it is really easy to rewrite cleanly.

One solution would be applying explicit conversions on each intermediate step:

1

params[:data].to_h[:attributes].to_h[:name]

That’s better, but not really expressive. Ideally, we would use some dedicated method. One of those potentially dedicated methods is Hash#fetch which allows you to provide a value that should be returned if the given key is not present in the hash:

1

params.fetch(:data).fetch(:attributes, {}).fetch(:name)

It looks even better but would be nice to have something even more dedicated for digging through nested hashes. Fortunately, since Ruby 2.3.0, we can take advantage of Hash#dig, which was implemented for exactly this purpose – digging through nested hashes and not raising exceptions if some intermediate key turns out to not be there:

1

params.dig(:data, :attributes, :name)

Having Proper interfaces / Duck typing

Let’s get back to the example that was mentioned in the beginning with sending a potential notification:

Thanks to this refactoring, the code is much cleaner, and we easily got rid of Object#try. However, now we need to know that for one type of objects we need to use MyServiceA and for another type MyServiceB. It might make sense, but might also be a problem. In such case the 2nd option would be better:

Duck typing. Simply add send_success_notification method to all objects that are passed to MyService and if it’s supposed to do nothing, just leave the method body empty:

123456

class MyService
def call(object)
object.save!
object.send_success_notification("saved from MyService")
end
end

The extra benefit of this option is that it helps to identify some common behaviors of the objects and to make them explicit. As you can see, in case of Object#try a lot of domain concepts might stay implicit and unclear. It doesn’t mean they are not there; they are just not clearly identified. This is yet another important thing to keep in mind – Object#try also hurts your domain.

Null Object Pattern

Let’s reuse the example above with sending notifications after persisting some models and do a little modification – we will make mailer an argument of the method and call send_success_notification on it:

But you’ve probably already guessed this solution is a no-go. Fortunately, we can apply Null Object Pattern and pass an instance of some NullMailer which implements send_success_notification method that simply does nothing:

What about &. a.k.a. lonely/safe navigation operator?

&., lonely/safe navigation operator is a pretty new thing introduced in Ruby 2.3.0. It’s quite similar to Object#try, but it’s less ambiguous – if you call a method on the object different than nil, and this method is not implemented by that object, NoMethodError will still be raised which is not the case for Object#try. Check the following examples:

But I think neither of those alternatives is more readable than the first example with &. operator, so might be worth trading a bit of clarity for more readability.

Wrapping Up

I believe there is not a single valid use case for Object#try due to the ambiguity of its intentions, negative impact on the domain model and simply for the fact that there are many other ways to solve the problems that Object#try “solves” in a clumsy way – starting from respecting Law of Demeter and delegations, through operating on properly scoped data, applying right database constraints, ensuring types using explicit conversions, using proper methods, having right interfaces, taking advantage of duck typing, ending with Null Object Pattern or even using the safe navigation operator (&.) which is much safer to use and might be applied in limited cases.

]]>2017-08-27T23:00:00+02:00http://karolgalanciak.com/blog/2017/08/27/5-years-of-professional-ruby-and-rails-development-my-reflectionsAs hard as it is for me to believe, I already have over 5 years of professional experience in Ruby and Rails. Throughout all these years my attitude towards Rails has been fluctuating between going from blind love to harsh critic (ActiveRecord, I’m looking at you) ending with a bit more balanced but certainly a positive approach. Such time is long enough to have a meaningful opinion about the overall experience using any framework, so here are few points about Rails that I would particularly like to focus on in my reflections.

ActiveRecord and model layer

ActiveRecord is arguably the biggest and the most important part of Rails. Not only is it quite complex itself, but following the “skinny controllers, fat models” mantra often leads to creating huge models which extend ActiveRecord::Base making a huge part of virtually the majority of the applications. So what has been my experience with this layer for the last 5 years?

When I was starting with Rails I naturally followed the default “Rails Way” which meant moving logic from the controllers to models, handling entire business logic in models’ classes and adding callbacks here and there for the logic around persistence. And it was awesome initially! I was able to progress with all the features really fast, even in spite of lacking meaningful Rails experience.

However, I started quite soon to experience some serious issues: I had to handle big part of the logic with a lot of conditionals depending on the context, some methods were handling logic for both creation of the records and the updates, but only with a slight difference, which added even more conditionals to that. The validation logic started to become complex which required conditional validations as well. And one day, when running some data migrations I used update_attributes instead of update_columns and tons of email notifications were sent to the users due to some callbacks that were responsible for sending notifications…

At that point I pretty much lost control over the application logic as I was not able to tell any longer what something so fundamental as calling update or save can lead to. That was the time when my default policy regarding models started to be “no callbacks ever, no conditional validations ever, ideally no logic at all.” This approach worked for a while, but eventually it lead to some other issues like distributing similar logic between many objects (service objects and form object mostly), duplication of the logic and feature envy code smells, even though it was quite clear that the logic belonged to the models. Aparently, the anemic domain model approach didn’t work as well as I had thought it would. Another case was that I was doing whatever it takes to avoid callbacks and lost quite a lot of time with fighting some gems that were coupled to the models via callbacks. That could have been a right thing to do from the “purity” perspective, but it wasn’t the smartest decisision business wise – the purpose of the code is to serve the business and provide the required functionality, not to be possibly the purest solution. Maintainability is one thing, but it’s easy to reach a point of diminishing return in most applications where more purity and better design doesn’t necessarily lead to a greater business value, but takes definitely a lot of time.

All those events lead to more balanced attitude that I have now towards ActiveRecord and model layer. Callbacks, complex conditional validations and other typical Rails Way techniques are far from being my preferred way of handling business logic and in general I consider those approaches harmful in the long-term perspective, but I clearly see how they can be beneficial in short-term perspective when developing MVP and the rapid speed of development is required and maintainability is secondary or when something can be cleverly handled even in more complex applications with minimum effort like e.g. in case of Carrierwave callbacks, using touch and dependent associations’ options etc.

I also tend to put model-related logic in, well, models. Does it lead to fat models? Sometimes yes, In bigger applications than can easily lead to the models with 200-300 lines of code. But if the logic is cohesive and not really context-dependent I don’t find it a big issue – the clarity is most often preserved and the maintainability is not negatively impacted. The important thing is to put there only a generic domain model logic, ideally not related to the persistence itself.

Following that approach has been working pretty great for me and I don’t really complain about ActiveRecord anymore. Maybe the architecture is a bit limiting and something like data mapper pattern would be more flexible. Or some methods like update_attribute / update_columns can be really confusing if used without a right reason or even more exotic features like ActiveRecord.suppress can lead to the code that is hard to reason about. Nervertheless, it is still possible to mantain models in a good shape using ActiveRecord and just the fact that something can do a lot of harm doesn’t mean that it should not be there at all – it’s a developer’s responsibility to choose the tools and design for solving the problem wisely and in the maintainable way in the long run.

Lack of Higher Level Architecture

Rails is sometimes criticized for not providing higher-level architecture and there are a lot of solutions that are supposed to fill that hole (e.g. Trailblazer which is a mini-framework providing form objects, operation classess and more). However, I don’t necessarily think it’s a bad decision.

Models and controllers are generic enough that to some extent they can be pretty much similar in most of the applications. What about some higher level layers?

There are plenty of gems implementing form objects, service objects and other layers and most often they are significantly different from each other. And just adding service objects or form objects might not be the best design decision ever. Maybe going full CQRS / Event Sourcing with write models and read models is better? And how would you know what should be the structure of service objects or operations or form objects or any other abstraction?

It would probably be extremely difficult to find a solution that would satisfy most of the Rails developers and any attempt to add those layers to Rails could end up with conflicts about the implementaiton details and/or interfaces, which wouldn’t be really productive.

The current approch of focusing on existing layers is in my opinion the right one and any reasonably experienced developer should be able to figure out what kind of architectural approach would be the best fit for the given application.

ActiveSupport

Another layer that is arguably widely considered to be problematic is ActiveSupport, especially the monkeypatching part. I’m not a fan of monkeypatching myself and I almost never do it, however, the core extentions provided by ActiveSupport are extremely useful and very convenient and I don’t really remember having any major problems with them. I can agree that it might not be the most “elegant” solution from the purity perspective, but it gets the job done and does it well without causing problems in the long run for the application – this is ultimately the most important factor when it comes to a software engineering. I think the overall critique of ActiveSupport is a bit far-fetched and the practical negative implications of what ActiveSupport provides are negligible and there are quite a lot of positive outcomes which cannot be overlooked that easily.

What About Other Frameworks?

I’ve had a chance to try some different frameworks than Rails throughout all these years – including Django (Python), Play (Java), Phoenix (Elixir), Meteor (JavaScript) or other Ruby frameworks – Sinatra and Hanami. They were quite fun to work with, but the productivity and the enjoyment of development couldn’t possibly match the Rails experience. Obviously, the maturity of the ecosystem plays a huge role here and that’s why some of the newer frameworks have a much harder time competing with Rails, nevertheless, even Rails out-of-box without any extra gems offers a great productivity which is significantly higher comparing to the other frameworks.

Future

Currently I don’t see any framework that could possibly replace Rails in the near future, at least not for the generic webdevelopment. Phoenix, which somehow resembles Rails, might be the closest one, but in my opinion Elixir language and functional paradigm are much harder to learn than Ruby and Object Oriented Programming. Also, due to the much bigger community, maturity and overall ease of development in Rails, it might take quite a long time until Phoenix catches up, despite having some clear advantages over Ruby and Rails like speed and concurrency (thanks to Erlang virtual machine).

Wrapping Up

Ruby on Rails definitely made my professional life amazing and it’s been a great joy to develop all the applications I’ve had a chance to work on, despite few times when I had a bit negative attitude towards it. Even though there are some imperfections, Rails is still a number one choice for me for the majority of the cases and I don’t see it moving anyway in the near future.

]]>2017-06-25T22:00:00+02:00http://karolgalanciak.com/blog/2017/06/25/ember-tips-testing-outgoing-http-requestsEmber.js is a web frontend framework and it’s no surprise that majority of the applications deal with a lot of HTTP requests. But such fact has a lot of implications on the process of development of the Ember apps, especially when it comes to testing. For basic GET requests which don’t include any query params or don’t deal with pagination it’s quite straight-forward – for those we just want to fetch some data, so we can check if proper objects are present as a side-effect of these requests. What about POST, PATCH or DELETE requests, where we can’t easily test the side effects?

Scenario #1: Testing if the request body sent in the outgoing request is right

Imagine that you are writing a classic sign-up for users. It would be quite useful to ensure that the right params are indeed sent to the /api/users endpoint (if that’s the case).

For dealing with HTTP requests and/or implementing a backend mock, ember-cli-mirage addon is a great choice. The setup is beyond the scope of this article, but if you happen to not be familiar with ember-cli-mirage, I highly recommend reading the docs which are very clear about the setup and its features.

Let’s assume that we have a proper route generated for the signup, let it be a signup route, a corresponding signup controller already handling a logic for the registration in one of its actions and that we have a User model with email and password attributes. Our scenario will be pretty simple: we want to make sure that after filling in email and password fields and clicking the submit button the request will be performed to /api/users with the right params. Here’s our test for the signup feature:

In this acceptance test we visit the signup page, provide the email and password combo and we click on the submit button. There is only one simple assertion here: comparing the expected attributes against the normalized attributes from the requests to /api/users endpoint – we use normalized attributes to avoid dealing with JSONAPI format. To achieve that we provide a custom action handler which is very close to the default implementation for POST actions from ember-cli-mirage. The only extra step here is comparing the attributes.

What if we want to just make sure that the request was performed to the given endpoint, but we don’t care about the request body?

Scenario #2 Testing if the request was performed to the given endpoint

For this scenario imagine that we want to have a feature of deleting some tasks from the to-do list. The simplest way to make sure that the task will be removed would be checking if the DELETE request was performed to /api/tasks/:id endpoint. Again, let’s assume that we already have a right implementation for this feature (too bad we didn’t practice strict TDD to develop it properly).

For this use case we will do something a bit different than the last time. First, let’s add the right config for the ember-cli-mirage to handle CRUD actions for tasks using resource helper:

Again, our test has a very simple structure: we visit the tasks route where all the tasks are displayed and delete the one we created in the test’s setup. To make sure that the request was performed to the right endpoint we take advantage of the fact that ember-cli-mirage uses pretender under the hood which keeps track of all handled requests in handledRequests property. Thanks to this feature, we can identify our request based on the URL and the request method.

Wrapping Up

Testing outgoing requests in Ember might not be the most obvious thing to do. Fortunately, thanks to pretender and ember-cli-mirage, we can easily verify both the URLs of the endpoints where the requests were performed to and the request body that was sent with the request.

P.S. I’ve just started writing a book about test-driving Ember applications. If you found this article useful, you are going to love it :). Subscribe to my newsletter to get updates and promotion code once it’s released.

]]>2017-05-28T22:00:00+02:00http://karolgalanciak.com/blog/2017/05/28/ruby-memoization-||-equals-vs-defined-syntaxIn the majority of the Rails applications or even Ruby gems you can find a lot of use cases where you need to memoize a result of some computation for performance benefits and to not compute it again if this result has already been computed. Seems like doing the assignment to some instance variable with ||= operator is the most commonly used solution for this purpose, e.g. @result ||= do_some_heavy_computation. However, there are some cases where it might not produce the expected outome and you should actually use defined? operator instead.

What Is ||= operator?

Let’s get back to the example from the introduction: @result ||= do_some_heavy_computation. What is this ||= operator and how does it work? It’s nothing more than a syntactic shortcut and it’s an equivalent of @result || @result = do_some_heavy_computation Edit: It’s very close to @result || @result = do_some_heavy_computation, but not exactly the same which translates to: “return the value of @result if the value is truthy or assign the result of do_some_heavy_computation to @result”. Clearly, the shortcut version looks more appealing. Keep in mind though that it’s not really about already assigning some value to the instance variable, but rather if the value of it is truthy or not. How do we check then if the instance variable has already been set knowing that referring to undefined instance variable will simply result in nil without any exceptions?

What Is defined? operator?

We can do that by using defined? opeator, which returns nil if its argument is not defined or, if it is defined, the description of that argument. Thanks to that behaviour, we can easily check if some instance variable has already been set or not:

Memoization gotcha

Ok, we now understand the difference between ||= and defined? operators, why should we bother in the context of memoization?

Imagine that you have a following method in some object:

123

defheavy_computation_result@result||=do_some_heavy_computationend

and you are calling heavy_computation_result method multiple times to reuse the result of the computation. Certainly, this computation is heavy (as the name suggests) and ideally it should be computed only once for the performance reasons. What if this computation returns nil or false?

As @result ||= do_some_heavy_computation is nothing more than a shortcut of @result || @result = do_some_heavy_computation expression Edit: As @result ||= do_some_heavy_computation works in a pretty similar way to @result || @result = do_some_heavy_computation expression, the left side will be falsey in such case and the computation will be performed every time you call heavy_computation_result method making this syntax useless here!

For the proper memoization, this method should be rewritten using defined? operator:

Wrapping Up

Even though ||= operator is commonly used for memoization, it isn’t necessarily the best solution to this problem. It is certainly quite convenient to use, nevertheless, when there is a possiblity of having falsey values such as false and nil, it is much safer to use defined? operator instead.

]]>2017-04-30T22:00:00+02:00http://karolgalanciak.com/blog/2017/04/30/ember-quick-tips-managing-timeouts-and-delaysTimeouts and delays are quite extensively used in many applications when deferring execution of some action via Ember.run.later or debouncing via Ember.run.debounce. Having small amounts of tests executing such methods might not be a problem initially, but obviously, as the application grows, this can easily lead to slow test suite which takes minutes to finish due to the waiting for all the timeouts and delays in many places. Let’s try to find the best solution to solve this problem.

Anatomy of The Problem

Imagine you are implementing a todo-list and want to add a destroy item feature. The obvious solution would be adding a button which would trigger some destroy action once a user clicks it. But the problem with such solution is that it doesn’t offer the best UX as a user could easily destroy items by accident. A nicer way for such use cases is making a user hold the button for a certain period of time and only after this delay would the action be invoked, otherwise it won’t be executed.

A great news is that there is already an addon solving such problem: ember-hold-button. Let’s create a very simple component handling the logic of displaying the item and deleting it after holding a button for 3 seconds using ember-hold-button:

Ok, cool, so the feature is done. What about integrations tests verifying that this feature works? Currently, it would take at least 3 seconds due to the waiting time + the runtime of the test itself, which is definitely too slow.

Solving The Problem

One way to fix this problem would be moving delay to computed property which would be configurable and by default make it equal to 3 seconds. The component would look like this in such case:

To make integration tests fast, we would simply override the default value of destroyActionDelay and render the component in the test the following way:

tests/integration/components/display-todo-item-test.js

12345

// the rest of the teststhis.render(hbs`{{display-todo-itemitem=itemdestroyActionDelay=0}}`);// the rest of the tests

This surely solves the problem for integration tests, but what about the acceptance ones? It would still take at least 3 seconds of waiting for this delay.

For this purpose we could add a special function which would return the value for the delay based on the environment. For non-test we may want to return a provided value and for test environment some other value, which by default would be equal to 0 to make the tests fast. Let’s add such a utility function and call it timeoutForEnv:

And that’s it! It will work for both integration and acceptance tests.

Wrapping Up

Using a lot of timeouts and delays without special adjustments for tests can easily lead to a very slow test suite as the application grows. Fortunately, it’s quite easy so solve such a problem by using environment-dependent config and setting the values to 0 for tests.

P.S. I’ve just started writing a book about test-driving Ember applications. If you found this article useful, you are going to love it :). Subscribe to my newsletter to get updates and promotion code once it’s released.

]]>2017-03-26T23:00:00+02:00http://karolgalanciak.com/blog/2017/03/26/test-driven-ember-testing-holding-buttonThanks to the awesome tools in Ember ecosystem such as ember-cli-mirage, ember-qunit or ember-test-helpers writing majority of the tests is pretty straight-forward. Nevertheless, there are quite a few cases where simulating user’s interaction is not that simple. An example of such use case would be holding a button for particular period of time triggering some side effect.

Anatomy of The Problem

Imagine you are implementing a feature of destroying some records in your application, e.g. the todo items from the list. It would be a bit unfortunate to destroy any item if a user accidentally clicked on the destroy button, so it might be a good idea to somehow make it harder to execute such an action. A simple approach would be displaying some alert asking user to confirm whether this item should be removed or not. This approach would get our job done, but it doesn’t offer the best UX. What are the better options here?

A pretty cool solution to this problem would be making user hold a delete button for a particular period of time, e.g. for 3 seconds. Holding this button for less than 3 seconds wouldn’t destroy the item, so it would be impossible to accidentally delete anything.

There is an addon which solves exactly this problem: ember-hold-button, so there is no need to reinvent the wheel. Let’s add this to our application.

Adding Destroy Action

Let’s start by installing ember-hold-button addon:

1

ember install ember-hold-button

and assume that we already have some component for displaying a single item with destroy action:

delay option will get the job done here to make it holdable for 3 seconds to trigger destroy action.

The button is working great, but our test obviously is failing now! How can we simulate holding action in our integration tests?

Testing Holding Interaction

To solve that problem we should break the problem down into the single events. On desktop, pressing a button simply means triggering mouseDown event and releasing means trigger mouseUp event. On mobile that would be touchStart and touchEnd events accordingly.

Based on how hold-button component works, we may suspect that there is some internal timer which starts counting time after triggering mouseDown (touchStart) event or a scheduler which executes the action if it was held for required period of time and cancels it if it was released before that period of time, which would mean cancelling timer on mouseUp event.

After checking the internals, it turns out this is exactly the case! Let’s rewrite our test by triggering these events. We will also need two extra things as we are dealing with asynchronous actions:

async() / done() – To make sure QUnit will wait for an asynchronous operation to be finished we need to use async() function. That way QUnit will wait until done() is called. We will call done() after triggering mouseUp event. But we also need to wait until the action is executed. We will need wait() helper for that.

wait() – it forces run loop to process all the pending events. That way we ensure that the asynchronous operation have been executed (like calling destroy action after 3 seconds).

importEmberfrom'ember';import{moduleForComponent,test}from'ember-qunit';importhbsfrom'htmlbars-inline-precompile';importwaitfrom'ember-test-helpers/wait';const{set,RSVP,}=Ember;moduleForComponent('display-todo-item','Integration | Component | display todo item',{integration:true});test('item can be destroyed',function(assert){assert.expect(1);const{$,}=this;constitem=Ember.Object.extend({destroyRecord(){assert.ok(true,'item should be destoyed');returnRSVP.resolve(this);},});set(this,'item',item);this.render(hbs`{{display-todo-itemitem=item}}`);const$destroyBtn=$('[data-test=destroy-item-btn]');$destroyBtn.mousedown();wait().then(()=>{$destroyBtn.mouseup();done();});});

Nice! Our test is passing again. However, there is one serious problem: this test is quite slow as it waits 3 second for the action to finish. Can we make it somehow faster?

Making Our Test Faster

The answer is: yes. We just need to provide a way to make delay configurable from the outside. This can be simply done by introducing destroyActionDelay property with default value equal 3000 and allowing it to be modified. Let’s start with applying this little change to the test:

tests/integration/components/display-todo-item-test.js

12

// the rest of the teststhis.render(hbs`{{display-todo-itemitem=itemdestroyActionDelay=0}}`);

We don’t care about waiting for 3 seconds in the tests, we just want to test if it works and to make it fast. 0 sounds like the most reasonable value in such case.

Wrapping Up

Testing holding a button for particular period of time doesn’t sound like an obvious thing to do. Fortunately, with proper design and understanding the interaction from the browser’s perspective, it isn’t that hard to do and doesn’t necessarily make your tests slower.

P.S. I’ve just started writing a book about test-driving Ember applications. If you found this article useful, you are going to love it :). Subscribe to my newsletter to get updates and promotion code once it’s released.

]]>2017-02-26T23:00:00+01:00http://karolgalanciak.com/blog/2017/02/26/javascript-tips-redefining-useragent-propertyImagine a use case where you are trying to check if a user accessed your app from a mobile device or not. Most likely you will need to use navigator.userAgent property and craft some smart regular expression to test for the presence of particular expression, like (/Mobi/.test(navigator.userAgent) which seems to be the recommended way to do it. Ok, so we’re almost done with our feature, we just need to add some tests to make sure it works as expected. But there’s a problem – you can’t redefine userAgent property with just using a setter! Fortunately, there is a way to solve this problem.

Anatomy of the problem

Let’s check what happens when we try to override navigator.userAgent property with a setter in a browser.

Well, that’s not exactly what we wanted to be returned. But we need to override this value somehow to test both behaviours – when the device is a mobile one and not a mobile one. Fortunately, JavaScript is quite powerful at this point and it’s possibly to redefine such property using Object.defineProperty.

Object.defineProperty to the Rescue

Object.defineProperty allows to define a new property or redefine an existing one on a given object. The syntax is following:

1

Object.defineProperty(obj,prop,descriptor)

descriptor argument is a particularly interesting one – it allows to define a value of the property, a getter, a setter, whether the property should be enumerable (if it’s going to be included when iterating over the properties), if it’s writable (if the value can be changed with an assignment operator) and configurable (if the property can be changed and deleted from the object’s properties).

Both of the ways work just fine, however, writable is a bit more flexible as it allows to change the value returned by a given property by redefining this property or using a simple setter. In case of configurable you can only redefine a property.

Wrapping Up

Maybe JavaScript has some odd parts, nevertheless, it’s a quite powerful language. Changing the value of read-only properties is probably not a something do you will do often, but if you really need to do it, Object.defineProperty will be your friend.

]]>2017-01-22T22:45:00+01:00http://karolgalanciak.com/blog/2017/01/22/javascript-the-surprising-partsDo you think you know all the surprising parts of JavaScript? Some of these “features” may look as if the language was broken, but it’s not necessarily the case. Things like variables hoisting, variables scope, behaviour of this are quite intentional and besides just being different from most of other programming languages, there is nothing particularly wrong with them. However, there are still some things that are quite surprising about JavaScript. Let’s take a look at some of them.

Surprise #1 - parseInt function

Imagine you have some numbers as strings and you want to convert them to integers. You could probably use Number() function to do that, but let’s assume you are used to parseInt() function. Let’s do some conversions then:

Something is definitely wrong here. How could possibly parseFloat() work fine here and parseInt() not? Obviously JavaScript is broken, right?

Not really. This is actually the expected behaviour. The difference between parseFloat and parseInt() is that parseFloat() takes only one argument (string), but parseInt()takes two arguments - string and… radix. To verify it, let’s rewrite the mapping using an anonymous function:

12

["1",2,"3",4].map((number)=>parseInt(number));// => [1, 2, 3, 4]

When you pass simply parseInt() function as an argument to map(), the second argument (which is a current index) is going to be passed as radix to parseInt, which explains why it returns NaN. The equivalent of just passing parseInt looks like this:

As “odd” as it may look like, this is a perfectly valid behaviour and there is nothing wrong with JavaScript ;).

Surprise #2 - sorting

Now that we’ve learned how to parse integers in JavaScript like a boss, let’s do some sorting:

12

[1,20,2,100].sort();// => [1, 100, 2, 20] // WUT again...

Again, something odd is going on here. However, this is the intended behavior - after consulting with docs, we can learn that sort() converts all elements into strings and compares them in Unicode code point order. I think this might be a big surprise for a majority of developers performing sorting and seeing the result, but this behaviour is clearly documented. Due to the necessity of maintaing backwards compatibility, I wouldn’t expect this behavior to change, so it’s worth keeping it in mind.

To perform sorting on integers you need to provide a compare function:

12

[1,20,2,100].sort((a,b)=>a-b);// => [1, 2, 20, 100]

Surprise #3 - == vs. ===

You’ve probably heard that you should never use double equals (loose equality) and just stick to triple equals (strict equaility). But following some rules without understanding the reasons behind them is never a good solution to a problem. Let’s try to understand how these operators work.

Loose equality (==) compares two values after converting them to common type. After conversions (both of the values can be converted) the comparison is performed by strict equality (===). So what is a common type in this case?

The easiest way to get the idea what happens when using == would be checking the table for conversion rules here as it’s not really something obvious (the details of how conversion works are described later in this article).

So basically this table says that the following expressions will be truthy:

The first scenario would be comparing integers from the forms when you don’t really care about strict equality and types:

123

if($('.my-awesome-input').val()==100){// do something here}

In such case it may turn out that we don’t really care if we compare strings or integers, either "100" and 100 are fine and we don’t need to perform any explicit conversions.

The second use case would be treating both undefined and null as the same thing meaning lack of some value. With strict equality we would need to check for both values:

12345

x=getSomeValue();if(x!==undefined&&x!==null){// some logic}

Doesn’t look that nice. We could clean it up with loose equality and simply check if something is not null-ish:

12345

x=getSomeValue();if(x!=null){// some logic}

The last use case would be comparing primitives and objects. It’s especially useful when dealing with both primitive strings ("simple string") and strings as objects (new String("string as object"):

12345

x=getSomeValue();if(x!="some special value"){// some logic}

With strict equality we would probably need to explicitly convert objects to strings using toString(), which is not that bad, but loose equality looks arguably cleaner.

Surprise #4 - equality gotcha #1: NaN

Do you know how to identify NaN in JavaScript. Sounds like a silly question, right? Well, not really, both of the following expressions are falsey:

12

NaN==NaN;// => falseNaN===NaN;// => false

Fortunately, there is still a way to check for NaN: it is the only value in JS that is not equal to itself:

12

NaN!=NaN;// => trueNaN!==NaN;// => true

You could either take advantage of this behaviour or use isNaN function:

1

isNaN(NaN);// => true

There is one more possibility to test for NaN: Object.is function, which is very similar to strict equality, but with few exceptions. One of those is comparing NaN values:

12

NaN===NaN// => falseObject.is(NaN,NaN);// => true

Surprise #5 - equality gotcha #2: comparing objects

There is one more gotcha besides NaN when it comes to testing for equality: comparing objects. If you think you can easily compare arrays with the same elements or objects with the same keys and values, you might be quite surprised:

The reason behind it is quite simple though: strict equality doesn’t compare the values, but identities instead. And two different objects are, well, different, unless they are referring to the exactly same thing.

How about loose equality? As already discussed, if the types are the same, the values are compared using strict equality. It doesn’t work with Object.is either. The only option for objects is to compare each key and associated value with the ones from the other object.

Surprise #6 - instanceof and typeof

There seems to be a lot of confusion regarding those two and how use them in different contexts. Basically, typeof should be used for getting the basic JavaScript type of given expression (i.e. undefined, object, boolean, string, number, string, function or symbol) and instanceof should be used for checking if a prototype of a given constructor is present in expression’s prototype chain. Even if they may seem to be similar at times, they should be used in very different use cases, check the following examples:

12345678910

typeof"basic string"// => "string", it's a primitive so looks good so fartypeofnewString("basic string")// => "object", because it's no longer a primitive!"basic string"instanceofString// => false, because "basic string" is a primitive1instanceofNumber// => false, same reason, 1 is a primitive[]instanceofArray// => true[]instanceofObject// => true, array is not a primitivetypeof[]// => "object", there is no array primitive, it's still an object

Unforunately, it’s not that easy in all cases. There are 2 exceptions regarding usage of typeof that are quite surprising.

There is undefined type which would be returned for undefined expression, but what about null? Turns out that its type is object! There were some attempts to remove this confusion - like this proposal for introducing null type - but they were eventually rejected.

And another suprise: NaN. What is the type of something that is not a number? Well, it’s number of course ;). As funny as it sounds, it is in accordance with IEEE Standard for Floating-Point Arithmetic and the concept of NaN is kind of number-ish, so this behaviour is somehow justified.

Surprise #7 - Number.toFixed() returning strings

Imagine you want to round some number in JavaScript and do some math with it. Apparently Math.round() is capable only of rounding to the nearest integer, so we need to find some better solution. There is Number.toFixed() function which seems to do the job. Let’s try it out:

1

123.789.toFixed(1)+2// => 123.82, huh?

Is math broken in JS? Not really. It’s just the fact that Number.toFixed() returns a string, not a numeric type! And its intention is not really to perform rounding for math operations, it’s only for formatting! Too bad there is no built-in function to do such simple operation, but if you expect a numeric type, you can just handle it with +unary prefix operator, which won’t be used as an addition operator, but will perform conversion to number in such case:

12

constnumber=+123.789.toFixed(1);number// => 123.8

Surprise #8 - Plus (+) operator and results of addition

“Adding stuff in JavaScript is simple, obvious and not surprising” - No one ever

Have you ever watched Wat by Gary Bernhardt? If not, I highly encourage you to do it now, it’s absolutely hillarious and concerns a lot of “odd” parts of JavaScript.

Let’s try to explain most of those odd results when using + operator. Beware: once you finishing reading it, you will actually not find most of these results that surprising, it will be just “different”. I’m not sure yet if it’s a good or a bad thing :).

All of these results may seem to be somehow exotic, but only one of them, maybe two at most, are exceptional. The basic thing before figuring out the result of those expressions is understanding what is happening under the hood. In JavaScript you can only add numbers and strings, all other types must be converted to one of thos before. The + operator basically converts each value to primitive (which are: undefined, null, booleans, numbers and strings). This convertion is handled by the internal operation called ToPrimitive which has the following signature: ToPrimitive(input, PreferredType). The PreferredType can be either number or string. The algorithm of this operation is quite simple, here are the steps if string is the preferred type:

return input if it’s already a primitive

If it’s not a primitive, call toString() method on input and return the result if it’s a primitive value

If it’s not a primitive, call valueOf() method on input and return the result if it’s a primitive value

If it’s not a primitive, throw TypeError

For number as preferred type the only difference is the sequence of steps 2 and 3: valueOf method will be called first and if it doesn’t return a primitive then toString method wil be called. In most cases number will be the preferred type, string will be used only when dealing with the instances of Date.

Now that we know what is going on under the hood let’s explain the results from the examples above.

The result of calling valueOf method on objects ({}) and arrays (which technically are also objects) is simply the object itself, so it’s not a primitive. However, for objects toString() method will return "[object Object]" and for arrays it will return empty string - "". Now we have primitives that can be added. From this point we can predict the results of operation like {} + {}, [] + {} or even:

1

1+{valueOf:function(){return10;},toString:function(){return5;}};

and:

1

1+{toString:function(){return10;}};

If you remember that string is the preferred type for operations involving dates, the result of ({ toString: function() { return "surprise!"; } }) + new Date(2016, 0 , 1) is not really surprising anymore. But how is it possible that {} + [] returns 0, not "[object Object]"?

Most likely {} in the beginning is interpreted as an empty block and it’s ignored. You can verify it by putting the empty object inside parentheses (({}) + []), the result will be "[object Object]". So in fact that expression is interpreted as +[] which is very different from the addition! As I’ve already mentioned before, it’s the unary prefix operator which performs conversion to number. For arrays the result of such conversion is simply 0.

And why does 1 + undefined return NaN? We can add only numbers and strings, undefined is neither of them, so it must be converted to a number in this case. The result of such operation is simply NaN and 1 + NaN is still NaN.

Surprise #9 - No integers and floats - just numbers

In most programming languages there are different type of numbers, like integers and floats. What is again surprising about JavaScript is that all numbers are simply double precision floating point numbers! This has a huge impact of anything related to math, even for such things like precision. Take a look at the following example:

1

9999999999999999===10000000000000000// => true

This is definitely not something that would be expected here. If you are planning to do any serious math in JavaScript, make sure you won’t run into any issues caused by the implementation of the numbers.

Wrapping up

JavaScript may sometimes seem like it’s “broken” somehow, especially comparing to other programming languages. However, many of these features are quite intentional and others are consequences of some decisions. There are still few things that seem to be really odd, but after digging deeper they start to make sense (or at least don’t look like some voodoo magic), so to avoid unfortunate surprises it’s definitely worth learning about those odd parts of JavaScript.

]]>2016-12-11T22:00:00+01:00http://karolgalanciak.com/blog/2016/12/11/ember-tips-computed-properties-and-arrow-functions-not-a-good-ideaArrow function expressions were definitely a great addition in ES6 and thanks to tools like babel the new syntax has been quite widely adopted. Besides more concise syntax, an interesting thing about arrow function expressions is that they preserve the context, i.e. they don’t define their own this, which was sometimes annoying and resulted in assigning that or self variables to keep the outer context that could be referred inside functions. As great as it sounds, arrow function expressions cannot be used in all cases. One example would be Ember computed properties.

Arrow Function Expressions - A Quick Introduction

Let’s start with a quick introduction to arrow functions. Before ES6, anytime we were using function expressions and wanted to refer this from outer context, we had to do some workarounds which are (arguably) a bit unnatural, especially comparing to other major programming languages.

Let’s do some pseudo-object-oriented programming with JavaScript (ES5) to illustrate a possible issue with function expressions:

123456789101112131415161718

functionOrder(){this.id=Math.floor((Math.random()*10000000)+1);// don't do it in a production code ;)this.items=[];}Order.prototype.addItem=function(item){this.items.push(item);}Order.prototype.logItems=function(){this.items.forEach(function(item){console.log("item description: "+item.description+" for order with id: "+this.id);});}varorder=newOrder();order.addItem({description:'Glimmer 2 rockzzz'});order.logItems();// whooops

We have a simple class-like functionality using constructor function and prototype to implement Order with some questionable ( ;) ) way of assigning id and some items. We can add more items with Order.prototype.addItem function and we can log them with Order.prototype.logItems function.

Function expressions create their own context and define own this, so it no longer refers to the outer context, which is the order instance. There are several ways to solve this problem.

The most obvious is to assign outer this to some other variable, like that or self:

123456

Order.prototype.logItems=function(){varself=this;this.items.forEach(function(item){console.log("item description: "+item.description+" for order with id: "+self.id);});}

You can also pass outer this as a second argument to forEach function:

12345

Order.prototype.logItems=function(){this.items.forEach(function(item){console.log("item description: "+item.description+" for order with id: "+this.id);},this);}

You can even explicitly bind outer this to callback argument inside forEach function:

12345

Order.prototype.logItems=function(){this.items.forEach(function(item){console.log("item description: "+item.description+" for order with id: "+this.id);}.bind(this));}

All these solutions work, but aren’t really that clean. Fortunately, since ES6, we can use arrow function expressions which preserve outer context and don’t define own this. After little refactoring Order.prototype.logItems could look like this:

12345

Order.prototype.logItems=function(){this.items.forEach((item)=>{console.log("item description: "+item.description+" for order with id: "+this.id);});}

Much Better!

As great as it looks like, it may not be a good idea to apply arrow function expressions everywhere, especially for Ember computed properties.

Ember Computed Properties And Arrow Functions? - Not A Good Idea

Recently I was doing some refactoring in one Ember app. The syntax in one of the models was a bit mixed and there were some function expressions and arrow function expressions which looked a bit like this:

app/models/user.js

1234567891011121314151617181920

importEmberfrom"ember";importModelfrom'ember-data/model';exportdefaultModel.extend({fullname:Ember.computed('firstname','lastname',function(){return`${this.get('firstName')}${this.get('lastName')}`;}),doThis:function(){// some logic goes here},doThat:function(){// even more logic},doYetAnotherThing(args){// more logic}});

So I decided ES6-ify entire syntax here and ended up with the following code:

app/models/user.js

1234567891011121314151617181920

importEmberfrom"ember";importModelfrom'ember-data/model';exportdefaultModel.extend({fullname:Ember.computed('firstname','lastname',()=>{return`${this.get('firstName')}${this.get('lastName')}`;}),doThis(){// some logic goes here},doThat(){// even more logic},doYetAnotherThing(args){// more logic}});

And how did this refactoring end up? Well, instead of a proper fullName I was getting undefined undefined! That was surprising, but then I looked at the changes and saw that I’m using arrow function expressions in computed properties and referring there to this, which won’t obviously work for the reasons mentioned before. So what are the options for computed properties?

Wrapping Up

Even though arrow function expressions are very convenient to use, they can’t be used interchangeably with function expressions. Sometimes you may not want this inside a function to preserve outer context, which is exactly the case with Ember computed properties.

]]>2016-12-04T22:00:00+01:00http://karolgalanciak.com/blog/2016/12/04/introduction-to-activerecord-and-activemodel-attributes-apiRails 5.0 is without a doubt a great release with plenty of useful changes and additions. The most notable change was probably ActionCable - the layer responsible for integrating your app with websockets. However, there were also other additions that could bring some substantial improvements to your Rails apps, but were a bit outshined by bigger changes. One of such features is Attributes API.

ActiveRecord Attributes And Defaults - The Old Way

Imagine that you are in a vacation rental industry and you are adding a new model for handling reservations for rentals, let’s call it Reservation. To keep it simple for the purpose of this example, let’s assume that we need start_date and end_date date fields for handling the duration of the reservations and price field, which is pretty useful unless you are developing an app for a charity organization ;). Let’s say we want to provide some defaults for the start_date and end_date attributes to be 1 day from now and 8 days from know accordingly when initializing a new instance of Reservation and the price should be converted to integer, so in fact it is going to be price in cents, and the expected format of the input is going to look like "$1000.12". How could we handle it inside ActiveRecord models?

For default values, one option would be to add after_initialize callbacks which would assign the given defaults unless the values were already set in the initializer. For price we can simply override the attribute writer which is Reservation#price= method. We would most likely end up with something looking like this:

Well, the above code works, but it can get repetitive across many models and doesn’t read that well, would be much better to handle it with more of a declarative approach. But is there any built-in solution for that problem in ActiveRecord?

Then answer is yes! Time to meet your new friend in Rails world: ActiveRecord Attributes API.

ActiveRecord Attributes And Defaults - The New Way - Attributes API

Since Rails 5.0 we can use awesome Attributes API in our models. Just declare the name of the attribute with attribute class method, its type and provide optional default (either a raw value or a lambda). The great thing is that you are not limited only to attributes backed by database, you can use it for virtual attributes as well!

For our Reservation model, we could apply the following refactoring with Attributes API:

That’s exactly what we needed. What about our conversion for price? As we can specify the type for given attribute, we may expect that it would be possible to define our own types. Turns out it is possible and quite simple actually. Just create a class inheriting from ActiveRecord::Type::Value or already existing type, e.g. ActiveRecord::Type::Integer, define cast method and register the new type. In our use case let’s register a new price type:

As expected, the price used for query was the one after serialization.

If you want to check the list of built-in types or learn more, check the official docs.

What About ActiveModel?

So far I’ve discussed only the ActiveRecord Attributes API, but the title clearly mentions ActiveModel part, so what about it? There is a bad news and good news.

The bad news is that it is not yet supported in Rails core, but most likely it is going to be the part of ActiveModel eventually.

The good news is that you can use it today, even though it’s not a part of Rails! I’ve released ActiveModelAttributes gem which provides Attributes API for ActiveModel and it works in a very similar way to ActiveRecord Attributes.

Just define your ActiveModel model, include ActiveModel::Model and ActiveModelAttributes modules and define attributes and their types using attribute class method:

You can also add your custom types. Just create a class inheriting from ActiveModel::Type::Value or already existing type, e.g. ActiveModel::Type::Integer, define cast method and register the new type:

Wrapping up

ActiveRecord Attributes API is defintely a great feature introduced in Rails 5.0. Even though it is not yet supported in ActiveModel in Rails core, ActiveModelAttributes can be easily added to your Rails apps to provide almost the same functionality.

]]>2016-11-01T00:30:00+01:00http://karolgalanciak.com/blog/2016/11/01/keeping-data-integrity-in-check-conditional-unique-indexes-for-soft-deleteSoft delete is a pretty common feature in most of the applications. It may increase complexity of the queries, nevertheless, not deleting anything might be a right default as the data might prove to be useful in the future: for restoring if a record was removed by mistake, to derive some conclusions based on statistics and plenty of other purposes. It may seem like it’s a pretty trivial thing: just adding a column like deleted_at and filtering out records that have this value present. But what happens when you need to do some proper uniqueness validation on both model layer and database level? Let’s take a look what kind of problem can easily be overlooked and how it can be solved with a conditional index.

Case study: daily prices for vacation rentals

Let’s imagine we are developing a vacation rental software. Most likely the pricing for each day will depend on some complex set of rules, but we may want to have some denormalized representation of base prices for each day to make things more obvious and have some possiblity of sharing this kind of data with other applications, which is quite common in this domain. We may start with adding a DailyPrice model having a reference to a rental, having price value and of course date for which the price is applicable.

Ensuring uniqueness

Obviously, we don’t want to have any duplicated daily_prices for any rental, so we need to add a uniqueness validation for rental_id and date attributes:

app/models/daily_price.rb

1

validates:rental_id,presence:true,uniqueness:{scope::date}

To ensure integrity of the data and that we are protected against race conditions and potental validation bypassing, we need to add a unique index on the database level:

db/migrate/20161030120000_add_unique_index_for_daily_prices.rb

1

add_index:daily_prices,[:rental_id,:date],unique:true

Adding soft delete functionality

We have some nice setup already. But it turned out that for recalculating daily_prices if some rules or values influencing the price change it’s much more convenient to just remove them all and recalculate from scratch than checking if the price for given date needs to be recalculated. To be on the safe side, we may decide not to hard remove these rates, but do a soft delete instead.

To implement this feature we could add deleted_at column, drop the previous index and a new one which will respect the new column. We should also update the validation in model in such case:

Wrapping Up

Keeping data integrity in check is essential for most of the applications to not cause some serious problems, especially when implementing soft delete. Fortunately, simply by adding PostgreSQL conditional unique indexes we can protect ourselves from such issues.

]]>2016-09-25T23:45:00+02:00http://karolgalanciak.com/blog/2016/09/25/decoding-rails-magic-how-does-activejob-workExecuting background jobs is quite a common feature in many of the web applications. Switching between different background processing frameworks used to be quite painful as most of them had different API for enqueuing jobs, enqueuing mailers and scheduling jobs. One of the great addition in Rails 4.2 was a solution to this problem: ActiveJob, which provides extra layer on top of background jobs framework and unifies the API regardless of the queue adapter you use. But how exactly does it work? What are the requirements for adding new queue adapters? What kind of API does ActiveJob provide? Let’s dive deep into the codebase and answer these and some other questions.

Anatomy of the job

To enqueue a job we could simply write: MyAwesomeJob.perform_later(some_user) or if we wanted to schedule a job in some time in the future we could write: MyAwesomeJob.set(wait: 12.hours).perform_later(some_user) or MyAwesomeJob.perform_now(some_user) for executing the job immediately without enqueuing. But we never defined these methods, so what kind of extra work ActiveJob performs to make it happen?

There are some interesting modules included in this class, which we will get to know in more details later, but let’s focus on the core API for now. Most likely this kind of logic would be defined in, well, Core module. Indeed, the set method is there:

We have 2 methods available here: perform_now and perform_later. Both of them create a new job instance with arguments passed to the method and they either call perform_now method on the job instance or call enqueue passing the options which are the arguments from the set method.

Let’s go deeper and start with perform_now method: it’s defined inside Execution module, which basically comes down to deserializing arguments if needed (there is nothing to deserialize when calling perform_now directly), and calling our perform method, which we defined in the job class. This logic is wrapped in run_callbacks block, which lets you define callbacks before, around and after the execution of perform method.

active_job/execution.rb

1234567891011121314151617181920

moduleActiveJobmoduleExecutionmoduleClassMethodsdefperform_now(*args)job_or_instantiate(*args).perform_nowendend# rest of the code which was removed for brevitydefperform_nowdeserialize_arguments_if_neededrun_callbacks:performdoperform(*arguments)endrescue=>exceptionrescue_with_handler(exception)||raiseendendend

These callbacks are defined inside Callbacks module, but its only responsibility is defining callbacks for perform and enqueue method, which help extend the behaviour of the jobs in a pretty unobtrusive manner. For example, if we wanted to log when the job is finished, we could add the following after_perform callback:

We can pass several options here - scheduled_at attribute could be configured with wait (which will schedule a job in specified amount of seconds from current time) and wait_until (which will schedule a job at exact specified time). We can also enforce queue used for the job execution and set the priority. At the end, the method call is delegated to queue_adapter. This logic is wrapped in run_callbacks block, which lets you define callbacks before, around and after the execution of this code.

In Enqueueing module we can also find perform_later method, which is the part of most basic API of ActiveJob and it basically comes down to calling enqueue method without any extra options arguments.

Queue Adapters

What is this queue_adapter to which we delegate the enqueueing? Let’s take a look at QueueAdapter module. Its responsibility is exposing reader and writer for queue_adapter accessor, which by default is async adapter. Assigning adapter is quite flexible and we can pass here a string or a symbol (which will be used for the lookup of the proper adapter), instance of adapter itself or the class of the adapter (which is deprecated).

All supported queue adapters are defined in queue_adapters directory. There are quite a lot of adapters here, so let’s pick some of them.

Async Adapter

Let’s start with AsyncAdapter which is the default one. What is really interesting about this queue adapter is that it doesn’t use any extra services but runs jobs with an in-process thread pool. Under the hood it uses Concurrent Ruby, which is a collection of modern tools for writing concurrent code, I highly recommend to check it further. We can pass executor_options to constructor, which are then used to create a new instance of Scheduler.

Remember how we could assign queue adapter for ActiveJob in multiple ways? That’s exactly the use case for assigning specific instance of the queue adapter, besides just passing a string / symbol (or class, but that way is deprecated). The Scheduler instance acts in fact like a queue backend and but specifics of how it works are beyond the scope of this article. Nevertheless, the thing to keep in mind is that it exposes two important methods: enqueue and enqueue_at:

The main difference between these two methods is a timestamp (or lack of it) used for executing the job later.

Let’s get back to top-level AsyncAdapter class. The primary interface that is required for all queue adapters to implement is two methods: enqueue and enqueue_at. For Async adapter, these methods simply pass instance of JobWrapper with queue_name and timestamp (only for enqueue_at):

Serialization and deserialization

Let’s take a closer look how it works: execute method is defined in Execution module and it basically comes down to deserializing job data (which was serialized in JobWrapper so that it can be enqueued) and calling perform_now. This logic is wrapped with run_callbacks block so we can extend this logic by performing some action before, around or after execution logic:

deserialize class method is defined inside Core module and what it does is creating a new instance of the job, deserializing data and returning the job:

active_job/core.rb

123456789101112

moduleActiveJobmoduleExecutionmoduleClassMethods# Creates a new job instance from a hash created with +serialize+defdeserialize(job_data)job=job_data['job_class'].constantize.newjob.deserialize(job_data)jobendendendend

Before explaining what happens during the deserialization we should know how the serialized data look like - it’s a hash containing name of the job class, job id, queue name, priority, locale and serialized arguments:

This format can easily be used for enqueuing jobs in different queues.

Just before the execution of the job, the data needs to be deserialized. Like serialize method, deserialize is defined in Core module and it assigns job id, queue name, priority, locale and serialized arguments to the job using its accessors. But the arguments are not deserialized just yet, so how does the execution with perform_now work?

Remember how I mentioned before that there is nothing to be deserialized when using perform_now directly? In this case it will be a bit different as we operate on serialized arguments. Deserialization happens just before executing perform method in deserialize_arguments_if_needed.

Again, the deserialization is delegated to Arguments module and its primary responsibility is turning global ids into real models, so gid://app-name/User/3 would be in fact a User record with id equal to 3.

Exploring more queue adapters

Inline Adapter

Let’s explore some more adapters. Most likely you were using InlineAdapter in integration tests for testing the side effects of executing some job. Its logic is very limited: since it’s for the inline execution, it doesn’t support enqueueing jobs for the future execution and enqueue method for performing logic merely calls execute method with serialized arguments:

activejob/queue_adapters/inline_adapter.rb

123456789

classInlineAdapterdefenqueue(job)#:nodoc:Base.execute(job.serialize)enddefenqueue_at(*)#:nodoc:raiseNotImplementedError,"Use a queueing backend to enqueue jobs in the future. Read more at http://guides.rubyonrails.org/active_job_basics.html"endend

Sidekiq Adapter

Let’s check a queue adapter for one of the most commonly used frameworks for background processing - Sidekiq. Sidekiq requires defining a class implementing perform instance method executing the logic of the job and inclusion of Sidekiq::Worker module to be enqueued in its queue. Just like AsyncAdapter, SidekiqAdapter uses internal JobWrapper class, which includes Sidekiq::Worker and implements perform method taking job_data as an argument and its logic is limited to delegating execution of the logic to ActiveJob::Base.execute method:

activejob/queue_adapters/sidekiq_adapter.rb

123456789101112131415161718192021222324252627

classSidekiqAdapterdefenqueue(job)#:nodoc:#Sidekiq::Client does not support symbols as keysjob.provider_job_id=Sidekiq::Client.push\'class'=>JobWrapper,'wrapped'=>job.class.to_s,'queue'=>job.queue_name,'args'=>[job.serialize]enddefenqueue_at(job,timestamp)#:nodoc:job.provider_job_id=Sidekiq::Client.push\'class'=>JobWrapper,'wrapped'=>job.class.to_s,'queue'=>job.queue_name,'args'=>[job.serialize],'at'=>timestampendclassJobWrapper#:nodoc:includeSidekiq::Workerdefperform(job_data)Base.executejob_dataendendend

Again, like every other adapter, SidekiqAdapter implements enqueue and enqueue_at methods and both of them push jobs to Sidekiq’s queue by passing some meta info that is later used for identifying proper job class, executing in specific queue and of course the serialized arguments. As an extra argument, enqueue_at passes timestamp for executing the job at specific time. Pushing a job to Sidekiq queue returns internal job id which is then assigned to provider_job_id attribute.

DelayedJob Adapter

Let’s take a look at adapter for arguably most common choice backed by application’s database - DelayedJob. The pattern is exactly the same as for Sidekiq Adapter: We have enqueue and enqueue_at methods and both of them push the job to the queue with extra info about queue name, priority and, for enqueue_at method, the time to run the job at. Just like SidekiqAdapter, it wraps serialized job with internal JobWrapper instance which delegates execution of the logic to ActiveJob::Base.execute. At the end, the internal job id from DelayedJob’s queue is assigned to provider_job_id attribute:

TestAdapter

Have you ever needed to test which jobs were enqueued or performed when executing some specs? There’s a good change you were using test helpers provided by ActiveJob or rspec-activejob for that. All these assertions are quite easy to handle thanks to TestAdapter which exposes some extra API for keeping track of enqueued and performed jobs adding enqueued_jobs and peformed_jobs attributes, which are populated when calling enqueue and enqueue_at methods. You can also configure if the jobs should be actually executed by changing perform_enqueued_jobs and perform_enqueued_at_jobs flags. You can also whitelist which jobs could be enqueued with filter attribute.

classTestAdapterattr_accessor(:perform_enqueued_jobs,:perform_enqueued_at_jobs,:filter)attr_writer(:enqueued_jobs,:performed_jobs)# Provides a store of all the enqueued jobs with the TestAdapter so you can check them.defenqueued_jobs@enqueued_jobs||=[]end# Provides a store of all the performed jobs with the TestAdapter so you can check them.defperformed_jobs@performed_jobs||=[]enddefenqueue(job)#:nodoc:returniffiltered?(job)job_data=job_to_hash(job)enqueue_or_perform(perform_enqueued_jobs,job,job_data)enddefenqueue_at(job,timestamp)#:nodoc:returniffiltered?(job)job_data=job_to_hash(job,at:timestamp)enqueue_or_perform(perform_enqueued_at_jobs,job,job_data)endprivatedefjob_to_hash(job,extras={}){job:job.class,args:job.serialize.fetch('arguments'),queue:job.queue_name}.merge!(extras)enddefenqueue_or_perform(perform,job,job_data)ifperformperformed_jobs&lt;&lt;job_dataBase.executejob.serializeelseenqueued_jobs&lt;&lt;job_dataendenddeffiltered?(job)filter&amp;&amp;!Array(filter).include?(job.class)endend

Wrapping up

We’ve learned quite a lot how ActiveJob works under the hood - what kind of public API is available and how to extend it with custom queue adapters. Even though understanding the internals of Rails may require some effort and time, it’s worth going deeper and exploring the architecture of the framework we use for everyday development. Here are some key takeaways:

You can provide the exact instance of queue adapter for ActiveJob, not only a string or symbol, which lets you pass some extra configuration options

Adapter pattern is a great choice when we have several services with different interfaces but we want to have one unified interface for using all of them

Most of the ActiveJob’s logic is divided into modules (which seems to be a common pattern in other layers of Rails), but benefits of doing so are unclear: why Execution is a separate module from Core? What kind of benefits does splitting queue-related logic to QueuePriority, QueueName and QueueAdapter give? I don’t really see it as a way to decouple code as e.g. Enqueuing module depends on logic from QueueName, yet it’s not required explicitly, it just depends on existence of queue_adapter attribute. It would be more clear if Base or Core module acted like a facade and delegated responsibilities to some other classes. If anyone knows any reason behind this kind of design, please write it in a comment, I’m really curious about it.

To support another background jobs execution framework, you just need to add a queue adapter class implementing enqueue and enequeue_at methods which under the hood would push the job to the queue and delegate execution of the logic to ActiveJob::Base.execute method passing the serialised job as an argument.

Rails internals are not that scary :)

If there’s any particular part of Rails that seems “magical” and you would like to see it decoded, let me know in the comments, I want to make sure I cover the needs of my readers.

]]>2016-08-30T12:00:00+02:00http://karolgalanciak.com/blog/2016/08/30/little-known-but-useful-rails-features-activerecord-querymethods-dot-extendingEvery now and then I discover some features in Rails that are not that (arguably) commonly used, but there are some use cases when they turn out to be super useful and the best tool for the job. One of them would definitely be a nice addition to ActiveRecord::QueryMethods - extending method. Let’s see how it could be used in the Rails apps.

ActiveRecord::QueryMethods.extending - a great tool for managing common scopes

Imagine you are developing an API in your Rails application from where you will be fetching data periodically. To avoid getting all the records every time (which may end up with tons of unnecessary requests) and returning only the changed records since the last time they were fetched, you may want to implement some scope that will be returning records updated from given date that may look like this:

The problem with this solution is that updated_from scope would be available for all the models, even for the ones that won’t really need it. Another way would be extracting updated_from to HasUpdatedFrom models’ concern:

and including it in all the models that will be using that scope, but it’s a bit cumbersome. Fortunately, there’a a perfect solution for such problem in Rails: ActiveRecord::QueryMethods.extending, which lets you extend a collection with additional methods. In this case, we could simply define updated_from method in HasUpdatedFrom module:

and that’s it! You won’t need to remember about including proper concern in every model used in such API or defining any scopes in ApplicationRecord and inheriting them in models that won’t ever use them, just use ActiveRecord::QueryMethods.extending and extend your collection with extra methods only when you need them.

Wrapping up

ActiveRecord::QueryMethods.extending is not that commonly used Rails feature, but it’s definitely a useful one for managing common scopes in your models.

]]>2016-07-31T21:15:00+02:00http://karolgalanciak.com/blog/2016/07/31/decoding-rails-magic-how-does-calling-class-methods-on-mailers-workHave you ever wondered how it is possible that calling class methods on mailers work in Rails, even though you only define some instance methods in those classes? It seems like it’s quite common question, especially when you see the mailers in action for the first time. Apparently, there is some Ruby “magic” involved here, so let’s try to decode it and check what happens under the hood.

Anatomy of ActionMailer mailers

If we wanted to send welcome email to some user, we would write the following code:

1

WelcomeMailer.welcome(user).deliver_now

It’s quite interesting that it works just like that, even though we have never defined any class method in WelcomeMailer. Most likely it’s handled with method_missing magic in ActionMailer::Base. To verify that, let’s dive into Rails source code. In ActionMailer::Base class we indeed have method_missing defined for class methods:

Basically, any action method defined in mailer class will be intercepted by method_missing and will return an instance of MessageDelivery, otherwise it runs the default implementation. And where do action methods come from? ActionMailer::Base inherits from AbstractController::Base, so it works exactly the same as for controllers - it returns a set of public instance methods of a given class.

We now have a better idea what actually happens when we call class methods on mailers, but it still doesn’t answer the questions: how is the mailer instantiated and how is the instance method called on it? To investigate it further, we need to check MessageDelivery class. We are particularly interested in deliver_now method (could be any other delivery method, but let’s stick to this single one) with the following body:

This method creates the instance of the mailer, calls process method with @action argument (which is the name of the instance method) and with @args, which are the arguments passed to the class method and in the end it returns the created instance of the mailer. Inside handle_exceptions the deliver method is called. And where does this one come from? MessageDelivery inherits from Delegator class and delegates all the method calls for methods not implemented by MessageDelivery to processed_mailer.message, which is the attribute defined in our mailer instance itself.

And that’s it! It took a bit switching between different methods and classes to understand the entire flow and what happens under the hood, but it’s clear that such interface hiding all the complexity is quite convenient.

Wrapping up

Some parts of Rails may contain a lot of “magic” which makes understanding the details more difficult. However, thanks to that magic, the usage of these parts is greatly simplified by the nice abstraction and easy to use interface, which is exactly the case with Rails mailers.

]]>2016-07-10T23:50:00+02:00http://karolgalanciak.com/blog/2016/07/10/scaling-up-rails-applications-with-postgresql-table-partitioning-part-3After publishing recent blog posts about table partitioning - its SQL basics part and how to use in in Rails application I was asked quite a few times what is the real performance gain when using table partitioning. This is a great question, so let’s answer it by performing some benchmarks.

Setting up data for benchmarking

As the table partitioning is intended to be used in Rails application, it makes most sense to perform benchmark with ActiveRecord’s overhead as well - we want to have some real-world comparison.

In the two previous parts we were discussing orders example, so let’s do the same here. We can start with generating model:

1

rails generate Order

For benchmarking use case without table partitioning we don’t really need to change much, just add an index for created_at column:

For sample data let’s start with creating 1 million orders for every year from 2016 to 2020. This should be enough to make the tables moderately big for real-world example and perform some meaningful benchmarking. Here’s the code to create these records with random date from given year:

To get even better idea about the performance difference, we could also test the queries with different amount of data. After each benchmark we could create additional 250 000 records for each year and rerun benchmarks. This could be reapeated until we reach 2 mln records in each table (10 mln in total) so that way we would have data for 5 different orders’ amount.

Benchmark methodology

To have a meaningful benchmark that can be applicable to some real-world app, we need to test the queries that are likely to happen. For that purpose we can try selecting all orders, orders from particular year, from several years, from past few months and we could also try finding some orders with random id. We should also limit the amount of records we return, well, unless we want to kill the local machine ;). Counting the amount of orders for different date ranges would also be a nice addition. For partitioned tables we could also throw in some additional benchmarks comparing the performance between selecting orders from given partitioned child table and from master table and letting the PostgreSQL figure out how to handle the query (i.e. using constraint exclusion for filtering tables).

In this case we don’t really care about exact time for each query, but rather the ratio of query time (or iterations per seconds) for partitioned and not-partitioned table. Counting iterations per seconds for every query with banchmark-ips will be perfect for that. To calculate this ratio (let’s call it Partioned To Not Partioned Ratio) we would just need to divide the result from partitioned table by the result from non-partitioned table.

The amount of data is quite an important factor for this benchmark, especially with comparison to some PostgreSQL config settings. The size of orders table for different amount is the following:

From our benchmark’s perspective, shared_buffers and constraint_exclusion parameters are the crucial ones - shared_buffersdetermines how much memory can be used for caching tables and constraint_exclusion will prevent scanning all child tables if the query conditions make it clear it is not required.

That way we’ve obtained iterations per second for different queries. Let’s calculate now the ratio of query time for partitionied and not partitioned table. To get the better idea about the relation between the ratios for different orders’ amount, I put the results on the graph (due to the proportions you may want to see them in better quality by clicking the link below each graph).

For selecting all records and counting them the obvious conclusion is that for partitioned tables the queries are slower: slightly slower for just selecting them and noticeably slower for counting them. For selecting all orders, the ratio of partitioned to not partitioned tables query times most likely decreases as the tables’ size grow and for counting it is not clear: the data doesn’t show any regular correlation, we could expect the ratio would also decrease for the larger amount of data, however, we can’t really tell based on this benchmark, which even suggests it could be a constant value. Nevertheless, table partitioning isn’t the best idea if the queries are primarily run across all the tables.

The results for selecting orders (queries for orders from years: 2016, 2018, 2020) only from specific date range which matches the constraints for children partitioned tabled look quite interesting: The more records we have, the better ratio (in favour of table of partitioning) we get. Table partitioning doesn’t always yield better results when the child table is not specified (see: Partitioned (master table) To Not Partitioned Ratio), but there’s a certain size of the tables when we get the ratio above 1, which means queries for partitioned tables (even without specifying a child table) are faster. When selecting from partitioned child table (see: Partitioned (child table) To Not Partitioned Ratio) the queries are faster for partitioned tables regardless of the size, which was expected. Depending on the size of the table, the difference can be quite significant, up to 28%. The data is not clear enough to be certain about the correlation with amount of orders / table size (substantial irregularity for orders from years 2020), but probably the bigger the tables are, the bigger difference of query time between partitioned and not partitioned tables, similarly to the case when the child table is not specified. The difference between explicitly running queries against a specific child table and running against a master table and relying on constraints exclusion it quite surprising: I was expecting only a slight difference, however, specifying a child table can make the queries up to 35% faster. There is a possibility this difference decreases slowly the more records there are in the tables, however, we would need more benchmarks to prove or disprove this hypothesis as there is another possibility of having costant ratio.

The general correlation for counting the records which can be put in the specific partitioned child tables is the same as for selecting the orders: the more records in the tables, the better performance table partitioning yields. When the child table is not specified, counting records can be faster for not partitioned tables until we reach a certain size where table partitioning seems to be a better choice performance-wise. For counting orders from all the years (2016, 2018, 2020) for 5 mln orders, the ratios are significantly different comparing to the values for higher amount of orders, which can’t be easily explained. It’s quite interesting that it happened for the queries for all the tables used in the benchmark, which might be worth investigating further, however, I would treat them as irrelevant in this case and not consider them at all. When the partitioned table is specified, the results are always better in favour of table partitioning - we can expect queries to be more than 2 times faster, even up to almost 3 times faster. Similarly to selecting the orders, the ratio of running queries against child table and master table is either a constant or slightly decreases as the amount of orders grows.

For selecting records form multiple children tables, the Partioned To Not Partitioned Ratio is lower than 1, which means that table partitioning would be a inferior choice for such queries, regardless of the size. However, the difference is not that significant. On the other hand, for counting the records it looks like table partitioning yields better performance, which is especially clear for 2018-2020 range, but we can’t tell what’s the correlation with the amount of orders based on the obtained results.

The performance for selecting records from last N months (here 3 and 6 accordingly) is quite similar for both partitioned and not partitioned strategies (ratio close to 1), which doesn’t change when the amount of orders grows. However, counting records is significantly slower, but the ratio most likely remains constant as the table gets bigger.

Initially, there seems to be no difference in query time between selecting orders from partitioned and not-partitioned table, but as the amount of orders grows, the performance keeps getting worse considerably, which is expected as the constraints exclusion can’t be applied in such case.

Wrapping up

Even though table partitioning requires some extra overhead and may be tricky to get started with, it is clear that the performance benefits of using it may outweight the costs when applying to the right queries. However, we need to be aware that it is not the perfect choice for all the queries and in some cases it can deteriorate the performance.

]]>2016-06-12T22:45:00+02:00http://karolgalanciak.com/blog/2016/06/12/scaling-up-rails-applications-with-postgresql-table-partitioning-part-2In the previous blog post we learned some basics about table partitioning: how it works and what kind of problems it solves. So far we’ve been discussing mostly basic concepts with raw SQL examples. But the essential question in our case would be: how to make it work inside Rails application then? Let’s see what we can do about it.

Partitioning tables with partitioned gem

It turns out there’s no built-in support for table partitioning in ActiveRecord. Fortunately, there’s a gem that makes it pretty straight-forward to apply this concept to your models: partitioned. Not only does it have several strategies for partitioning (e.g. by foreign key or by yearly / weekly / monthly and you can easily create custom ones by subclassing base class and defining proper methods) making it easy to perform CRUD operations when dealing with multiple tables, but it also provides some methods to create and destroy infrastructure (separate schema for partitioned tables) and some helper methods for generating tables based on partitioning criteria, even with indexes and constraints! Let’s get back to example from previous the blog post with orders. Firstly, add partitioned gem to the Gemfile. Unfortunately, there are some issues with compatibility with Rails 4.2 at the time I was experimenting with it, so it might be necessary to use some forks. The following combination should work with Rails 4.2.6:

This class inherits from Partitioned::ByYearlyTimeField to handle exactly the strategy we need for orders. We set this class to be an abstract one to make it clear it’s not related to any table in the database. We also need to provide partition_time_field, in our case it’s created_at column. In partitioned block we can define some extra constraints and indexes that will be used when creating children tables. The next thing would be to make it a parent class for Order model:

app/models/order.rb

12

classOrder<PartitionedByCreatedAtYearlyend

Creating migration for partitioned tables

Let’s get back to our migration. What we want to do is to create orders table, a schema for children partitioned tables of orders and the tables themselves for the next several years. We could do it the following way:

The gem also provides excellent helper method partition_generate_range to help with setting up new partition tables. That way we will generate tables handling orders from 2016 to 2021. Now you can simply run rake db:migrate.

CRUD operations on partitioned tables

So far we’ve managed to set up the database for handling table partitioning. But the essential question is: can our app handle management of these tables? Will it insert / update / delete records to and from proper tables? Let’s play with some operations to find out:

Awesome! Looks like all the CRUD operations work without any problems! We even have extremely helpful query method from_partition to scope queries to the specific child table.

Wrapping up

Table partitioning might a great solution to solve database performance issues. Even though it’s not supported out-of-the-box by Rails, you can easily integrate it with your app thanks to partitioned gem.

]]>2016-06-05T22:00:00+02:00http://karolgalanciak.com/blog/2016/06/05/scaling-up-rails-applications-with-postgresql-table-partitioning-part-1You’ve probably heard many times that the database is the bottleneck of many web applications. This isn’t necessarily true. Often it happens that some heavy queries can be substiantially optimized, making them really efficient and fast. As the time passes by, however, the data can remarkably grow in size, especially in several years time which can indeed make the database a bottleneck - the tables are huge and don’t fit into memory any longer, the indexes are even bigger making queries much slower and you can’t really optimize further any query. In many cases deleting old records that are no longer used is not an option - they still can have some value, e.g. for data analysis or statystical purposes. Fortunately, it’s not a dead end. You can make your database performance great again with a new secret weapon: table partitioning.

Table partitioning and inheritance 101

To put it in simple words, table partitioning is about splitting one big table into several smaller units. In PostgreSQL it can be done by creating master table serving as a template for other children tables that will inherit from it. Master table contains no data and you shouldn’t add any indexes and unique constraints to it. However, if you need some CHECK CONSTRAINTS in all children tables, it is a good idea to add them to master table as they will be inherited. Due to inheritance mechanism, there is no point in adding any columns to children tables either. Creating tables inheriting from other tables is pretty straight-forward - you just need to use INHERITS clause:

and that’s it - all columns and extra constraints will be defined on other_orders thanks to inheritance from orders table.

Defining constraints for partitioning criteria

As we are going to split one big table into the smaller ones, we need to have some criterium that would be used to decide which table should we put the data in. We can do it either by range (e.g. created_at between 01.01.2016 and 31.12.2016) or by value (client_id equal to 100). To ensure we have only the valid data which always satisfy out partitioning criterium, we should add proper CHECK CONSTRAINTS as guard statements. To make sure the orders in particular table were e.g. created at 2016, we could add the following constraint:

1

CHECK(created_at>=DATE'2016-01-01'ANDcreated_at<=DATE'2016-12-31')

If we were to create tables for orders for the upcoming few years (assuming that we want to cover entire year in each of them), we could do the following:

In this case orders with client_id equal to 1000 could be inserted to any of these tables. Make sure the constraints are not inclusive on the same value when using ranges to avoid such problems.

Performance optimization

To provide decent performance it is also important to make constraint_exclusion in postgresql.conf enabled. You can set it either to on or partition:

1

constraint_exclusion=on#on,off,orpartition

or

1

constraint_exclusion=partition#on,off,orpartition

That way we can avoid scanning all children tables when the CHECK CONSTRAINTS exclude them based on query conditions. Compare the query plans between queries with constraint_exclusion disabled and enabled:

We wanted to select only the orders created on 2018, so there is no need to scan other children tables that don’t contain such data.

The difference between on and partition setting is that the former checks constraints for all tables and the latter only for children tables inheriting from parent table (and also when using UNION ALL subqueries).

Create new records easily with triggers

Having multiple tables is certainly difficult to manage when creating new records. Always remembering that they should be inserted to a specific table can be cumbersome. Fortunately, there’s an excellent solution for this problem: PostgreSQL triggers! If you are not familiar with them, you can read my previous blog post about database triggers.

We can automate the insertion process by checking the value of created_at and decide which table it should be put in. Here’s an example how we could approach it:

1234567891011121314151617181920212223

CREATEORREPLACEFUNCTIONorders_create_function()RETURNSTRIGGERAS$$BEGINIF(NEW.created_at>=DATE'2016-01-01'ANDNEW.created_at<=DATE'2016-12-31')THENINSERTINTOorders_2016VALUES(NEW.*);ELSIF(NEW.created_at>=DATE'2017-01-01'ANDNEW.created_at<=DATE'2017-21-31')THENINSERTINTOorders_2017VALUES(NEW.*);ELSIF(NEW.created_at>=created_at'2018-01-01'ANDNEW.created_at<=created_at'2018-12-31')THENINSERTINTOorders_2018VALUES(NEW.*);ELSERAISEEXCEPTION'Date out of range, probably child table is missing';ENDIF;RETURNNULL;END;$$LANGUAGEplpgsql;CREATETRIGGERorders_create_triggerBEFOREINSERTONordersFOREACHROWEXECUTEPROCEDUREorders_create_function();

There’s a potential gotcha though: as we as we are dealing with multiple tables, the id of the orders might not necessarily be unique across all of them. When creating new records we can ensure that the next id will be based on “global” sequence for partitioned tables, but it still gives a possibility to have duplicated ids in some tables, e.g. by accidental update of the id. Probably the best way to make sure there are no duplicates in all partitioned tables would be using uuid which most likely will be unique.

Wrapping up

Table partitioning might be a great solution to improve performance of your application when the amount of data in tables is huge (especially when they don’t fit into memory any longer). In the first part, we learned some basics from raw SQL perspective about table partitioning. In the next one we will be applying this concept to real-world Rails application.

]]>2016-05-06T19:45:00+02:00http://karolgalanciak.com/blog/2016/05/06/when-validation-is-not-enough-postgresql-triggers-for-data-integrityIs validation in your models or form objects enough to ensure integrity of the data? Well, seems like you can’t really persist a record when the data is not valid unless you intentionally try to bypass validation using save: false option when calling save or using update_column. What about uniqueness validation?
A classic example would be a unique email per user. To make sure the email is truly unique we could add a unique index in database - not only would it prevent saving non-unique users when bypassing validation, but also it would raise extra error when 2 concurrent requests would attempt to save user with the same email. However, some validations are more complex that ensuring a value is unique and index won’t really help much. Fortunately, PostgreSQL is powerful enough to provide a perfect solution to such problem. Time to meet your new friend: PostgreSQL triggers.

Anatomy of PostgreSQL triggers and procedures

PostgreSQL trigger is like a callback: it’s a function that is called on specific event: before or afterinsert, update, delete or truncate in case of tables and views and for views you can also run a function instead of those events. Triggers can be run either for each row (tables only) and for each statement (both tables and views). The difference between them is quite simple: for each row is run for every modified row and for each statement is run only once per statement. The important thing to keep in mind regarding for each row is that you have a reference to the row being modified, which will be essential in upcoming example.

By running \h CREATE TRIGGER; from psql we can get a generalized syntax for creating triggers:

We started with defining dummy function taking no arguments returning type of trigger. Next, we have a DECLARE block where we declare temporary variables, in our case it’s some_integer_variable of type int. Within BEGIN / END block we define the actual function body: we assign a dummy value to some_integer_variable variable using := operator and then we do some assignment using implicit NEW variable which is basically a row referenced by given statement. This variable is available only when running a trigger for each row, otherwise it will return NULL. Any trigger has to return either a row or NULL - in this example we return NEW row. At the end we have a declaration of writing function in plpgsql language.

Let’s take a look at some real world code to see triggers in action.

Using triggers for data integrity

A good example where we could use a trigger for more complex validation could be a calendar event: we need to ensure that no other event exists between some begins_at time and finishes_at time. We should also scope it only to a given calendar and exclude id of event being updated - when creating new events it wouldn’t matter, but without excluding id we wouldn’t be able to update any event. So what we actually want to achieve is to create a trigger that will be run before insert or update on events table for each row.

Let’s start with generating Calendar model and Event model with reference to calendar and begins_at and finishes_at attributes:

Our validate_event_availability function performs query to count all events that are between given time period for specified calendar excluding own id (so that the row being updated is not considered here, which would prevent updating any event). If any other event is found, the exception is raised with an error message - % characters are used for interpolation of begins_at and finishes_at attributes. If no other event is found, we simply return the row.

We want to define a trigger running this function before creating any new event or updating existing ones, so we need to run it BEFORE INSERT OR UPDATEFOR EACH ROW.

It might be a good idea to switch also to :sql schema format - the standard :ruby format can’t handle triggers at this point. Add this line in config/application.rb:

12

# config/application.rbconfig.active_record.schema_format=:sql

Now we can run migrations:

1

rakedb:migrate

After changing the schema format, new structure.sql file should be created. It’s not going to look that nice like schema.rb, but at least it contains all the details. Let’s try creating some events from rails console:

Awesome (haha, doesn’t happen that often to be happy when an error occurs ;)), that’s exactly what we wanted to achieve - the trigger keeps our data safe making sure that we won’t have any duplicated events or events covering the same time period. The last thing we should do is to mimic such validation and add it to form object or model for better user experience, but it’s beyond the scope of this article. It’s going to duplicate some logic between the code in the application and the database, but in this case there’s no way to DRY it up.

Wrapping up

PostgreSQL triggers and procedures are not something that you will often want to use in Rails applications, but sometimes there’s no other solution, especially when you have more complex rules for data integrity. In such cases, triggers and procedures are the right tool for the job.

]]>2016-04-24T20:00:00+02:00http://karolgalanciak.com/blog/2016/04/24/security-on-rails-hacking-sessions-with-insecure-secret-key-baseI was recently asked what is secret key base used for in Rails applications and why not secure value of it (or even worse - the public one!) creates a security issue. That was a really good question, I remember how it was a serious threat years ago, especially before introducing secrets.yml in Rails 4.1 - at that time by default secret_token initializer was generated and the secret key was directly stored there. The result was that in many open source projects secret key was publicly available creating a great security risk. Let’s take a look how exposed secret key base could be exploited.

Anatomy of possible attack

Imagine that current_user lookup in some application is performed like this:

Using user_id for storing id of the logged in user seems like an obvious choice. If I knew the secret key base, I could try encrypting the following hash:

1

malicious_hash={user_id:10,session_id:"123abc"}

and send carefully crafted cookie pretending that I’m logged in as the user with id 10! The question is: how would I do it? Ok, maybe I have a secret key base, but what exactly should be done with it? What are the necessary steps to generate such a cookie that will be successfully decrypted later by Rails application?

I was browsing through Rails source code and here’s what I’ve come up with:

123456789101112

defgenerate_encrypted_cookie(data_hash,secret_key_base)# inspired by https://github.com/rails/rails/blob/v4.2.6/actionpack/test/dispatch/cookies_test.rb#L595 and https://github.com/rails/rails/blob/v4.2.6/actionpack/lib/action_dispatch/middleware/cookies.rb#L527salt="encrypted cookie"# default value from Rails.application.config.action_dispatch.encrypted_cookie_saltsigned_salt="signed encrypted cookie"# default value from Rails.application.config.action_dispatch.encrypted_signed_cookie_saltkey_generator=ActiveSupport::KeyGenerator.new(secret_key_base,iterations:1000)# based on https://github.com/rails/rails/blob/v4.2.6/railties/lib/rails/application.rb#L179secret=key_generator.generate_key(salt)sign_secret=key_generator.generate_key(signed_salt)encryptor=ActiveSupport::MessageEncryptor.new(secret,sign_secret,serializer:JSON)encryptor.encrypt_and_sign(data_hash)end

This is basically how Rails handles encrypting the session data. The last thing we are missing is session store key, which is not really a secret value, you can get one simply by using curl:

Assuming that there is some page requiring authentication which redirects to login page if user is not logged in, there are 2 possible scenarios: either user is authenticated and 200 HTTP code is returned or user is redirected with 302 status. To pretend that we are logged in we could perform the following request:

As a result you should see HTTP/1.1 200 OK, which means you’ve successfully bypassed authentication :).

Wrapping up

As you now know, having secure and securely stored secret key base is an essential thing for the security of the app. Rails applications are now much more secure by default than it used to be and seems that accidentally exposing secret key base is not likely to happen. Nevertheless, it is still a very important thing to be aware of.

]]>2016-02-20T21:20:00+01:00http://karolgalanciak.com/blog/2016/02/20/postgresql-in-action-sorting-with-nullif-expressionOrdering with ActiveRecord and PostgreSQL may seem like an obvious and simple thing: in most cases you just specify one or several criteria with direction of ordering like order(name: :asc) and that’s all, maybe in more complex scenarios you would need to use CASE statement to provide some more customized conditions. But how could we approach sorting with forcing blank values to be the last ones, regardless of the direction of the sort? You might be thinking now about NULLS LAST statement for this purpose, but that’s not going to handle empty string. For this case you need something special: time to meet NULLIF conditional expression.

NULLIF to the rescue

Imagine that you want to sort users by fullname ascending and for whatever reason it happens that the records contain both null and empty string values (besides some meaningful data of course). As previously mentioned, the following expression: order("fullname ASC NULLS LAST") won’t be enough: it will work only for null values and the blank strings will get in a way. Fortunately, PostgreSQL offers NULLIF conditional statement which takes 2 arguments. If argument 1 equals argument 2 it will return NULL, otherwise it will return argument 1. In our case we want to return NULL if fullname is a blank string, so the final ordering statment could look like this: order("NULLIF(fullname, '') ASC NULLS LAST"). That’s all you need for such sorting!

Wrapping up

NULLIF conditional expression can help tremendously in sorting by nullifying empty strings and always placing them in the end when combined with NULLS LAST statement.