News

Welcome to End Point’s blog

This year's YAPC (Yet Another Perl Conference) Europe was held in Kiev, Ukraine on August 12-14. There was a full schedule of three tracks of interesting talks at the conference, spread over three days.

I spoke just after lunch on the first day. My talk was Adventures in Perl Packaging, on our experience at End Point building a custom-compiled perl RPM for RHEL/CentOS, and many hundreds of CPAN modules into RPMs in a custom Yum repository. I also touched on similarities in Debian, alternative ways of getting custom perl & CPAN using perlbrew, plenv, and Carton (akin to Ruby's rvm, rbenv, and bundler), and others' efforts at packaging Perl modules in RPMs. I had several good follow-up conversations later about this and have some plans about how we may do things better in RHEL/CentOS 7.

Larry Wall, creator of Perl (and also patch!), was at this conference, and it was fun to talk with him and his wife Gloria again. Larry pretty definitively settled one question: In recent months an idea has been floated that the next version of Perl 5 might simply be renamed to Perl 7 to skip over the seemingly endlessly under construction Perl 6. That would solve an annoying marketing problem, in that Perl 5 continues to grow and improve but its younger incomplete sibling Perl 6 still isn't usable in most production situations, so Perl can seem to be stalled. Anyway, Larry said no to that and said Perl 6 simply needs to keep moving.

At the end of each day was about an hour of lightning talks (max. 5 minutes each). As always, these were enjoyable and interesting and the rapid pace kept everyone's interest and prevented any one topic from dragging on too long.

The social event on the night after the second day was a river cruise, which was a fun setting to get to know more people and see a bit more of Kiev at the same time. The photo to the right is of Andrei Shitov (who headed up the conference) and me, and on the left one of his co-workers. Andrei and the rest of the volunteers from YAPC Russia did a great job making the conference well-run and very affordable.

The conference venue was the Ukraine House convention center which was spacious and interesting and centrally located to the Khreshchatyk street that has many hotels, monuments, and people about at all times of day and night.

The choice of Kiev was a big draw for me and a lot of other attendees, and it's a beautiful city that was a lot of fun to explore outside of the conference. Between metro, bus, and walking, it was easy to get around the city.

Next year's YAPC Europe will be held in Sofia, Bulgaria, so if you're interested, start making plans now!

All they really wanted to do is log any query that takes over 10 seconds. Most of their queries are very simple and fast, but the application generates a few complicated queries for some actions. Recording anything that took longer than 10 seconds allowed them to concentrate on optimizing those. Thus, the following line was set in postgresql.conf:

log_min_duration_statement = 10

Log Everything

A little while back, Greg wrote about configuring Postgres to log everything, and the really good reasons to do so. That isn't what they intended to to here, but is effectively what happened. The integer in log_min_duration_statement represents milliseconds, not seconds. With a 10ms threshold it wasn't logging everything the database server was doing, but enough that this performance graph happened:

Reconstructed I/O Utilization

That is, admittedly, a fabricated performance plot. But it's pretty close to what we were seeing at the time. The blue is the fast SAS array where all the Postgres data resides, showing lower than normal utilization before recovering after the configuration change. The maroon behind it is the SATA disk where the OS (and /var/log) resides, not normally all that active, showing 100% utilization and dropping off sharply as soon as we fixed log_min_duration_statement.

It took a few minutes to track down, as we were originally alerted to application performance problems, but once we spotted the disk I/O metrics it didn't take long to track down the errant postgresql.conf setting. That the disk jumped to 100% with so much log activity isn't surprising, but the database resides entirely on separate disks. So why did it affect that so much?

syslog() and the surprise on the socket

If you're used to using syslog to send your log messages off to a separate server, you may be rather surprised by the above. At least I was; by default it'll use UDP to transmit the messages, so an overloaded log server will simply result in the messages being dropped. Not ideal from a logging perspective, but it keeps things running if there's a problem on that front. Locally, messages are submitted to a dgram UNIX socket at /dev/log for the syslog process to pick up and save to disk or relay off to an external system.

The AF_UNIX SOCK_DGRAM socket, it turns out, doesn't behave just like its AF_INET UDP counterpart. Ordering of the datagrams is preserved and, more importantly here, a full buffer will block rather than drop the messages. As a result in the case above, between syslog's file buffer and the log socket buffer, once the syslog() calls started blocking, each Postgres backend stopped handling traffic until its log messages made it out toward that slow SATA disk.

As of now, this system has the postgres logs on the faster array, to mitigate it if there's any logging problems in the future. But if you're looking at leaning on syslog to help manage high volumes of log entries, just be aware that it doesn't solve everything.

We’re looking for a few more talented Ruby on Rails developers to consult with our clients and develop their web applications. Do you like to focus on solving business problems? Do you take responsibility for getting a job done well without intensive oversight? Then please read on!

End Point is an 18-year-old web consulting company based in New York City, with 38 full-time employees working mostly remotely from home offices in the United States, Canada, and Europe. Our team is made up of strong ecommerce, database, and system administration talent, working together using ssh, tmux and screen, IRC, phone, Google+ hangouts, and Skype.

We serve over 200 clients ranging from small family businesses to large corporations, using a variety of open source technologies including Ruby, Python, Perl, Git, PostgreSQL, MySQL, CouchDB, Redis, Elasticsearch, jQuery, and many more, on Linux.

What you will be doing:

Help clients determine their web application needs

Build, test, release, and maintain web applications for our clients

Work with open source tools and contribute back as opportunity arises

Use your desktop platform of choice: Linux, Mac OS X, or Windows

What you will need:

Professional experience building solid Ruby on Rails apps

Good front-end web skills with HTML, CSS, and JavaScript

Experience with PostgreSQL, MySQL, or other databases

A customer-centered focus

A passion for building flexible and, where needed, scalable web applications

Strong verbal and written communication skills

Experience directing your own work, and working from home

Ability to learn new technologies

Bonus points for experience:

Building and supporting ecommerce systems such as Spree

Working with other languages and web app frameworks

Contributing to gems or other open source projects

Handling system administration and deployment

What is in it for you?

Work from your home office (time zones UTC-10 to UTC+4 preferred)

Flexible full-time work hours

Annual bonus opportunity

Health insurance benefit (for U.S. employees)

401(k) retirement savings plan (for U.S. employees)

Ability to move without being tied to your job location

You may apply by emailing us an introduction to jobs@endpoint.com. Include a resume, your GitHub or LinkedIn URLs, and anything else that will help us get to know you. We look forward to hearing from you! Full-time job seekers only, please. No agencies or subcontractors.

I typed "Unicode" into an online translator, and it responded saying it had no idea what the language was but it roughly translates to "Surprise!"

Recently a client sent over a problem getting some of their Postgres data through an ASCII-only ETL process. They only needed to worry about some occasional accent marks, and not any of the more uncommon or odd Unicode characters, thankfully. ☺ Or so we thought. The unaccent extension was a great starting point, but the problem they sent over boiled down to this:

unaccent() worked, except for that odd ѐ, which then failed the ETL task. That's exactly what unnaccent is supposed to handle. The character è even appears in the unaccent.rules file. So what gives?

Well, if you're in the habit of piping blog posts through hexdump (and who isn't?) then you probably already know the answer. But even if not, you may already suspect that we're dealing with a different character that just looks the same. And you'd be right. Specifically, the è in the rules file is from the more common Latin set, and the ѐ that doesn't work is from the Cyrillic set. Pretty much visually identical, but completely separate characters.

Augmenting the unaccent dictionary:

Speaking more generically, ideally a simple UPDATE statement with a replace() will correct it in the source data. And a trigger doing the same will keep it tidy from that point forward.

But if you can't or just don't want to go down that path, the unaccent extension dictionary can be edited. On my system it's found in /usr/share/postgresql/9.3/tsearch_data/unaccent.rules. It has a very simple format.

1. Make a copy of the file before you edit it. Updated packages or new deployments if you're compiling from source will wipe out any changes to the unaccent.rules file.

When switching from RHEL5 to RHEL6 everyone had fears and hopes about things which would have been lost and gained.

One of the lost ones is /var/log/rpmpkgs which is a nice tool which helps system administrator staying sane when a server rebuild or migration is needed by giving them the list of packages installed up to the day before.

How this feature works is that basically a daily cronjob dumps the installed packages in the log file /var/log/rpmpkgs, along with various information, for the sake of system maintainers.

What happened is that while this tool was included in the RPM package 'til RHEL5 (and CentOS 5.x), when releasing RHEL6 (and CentOS 6.x) they decided to split it and create a specific package called rpm-cron.

So if you're among the ones who misses this useful feature, please fire up your SSH connection and type

I assume, as it was in the original question, that there are some strict ranges of data in the tables. For the example I encoded the ranges in table names, so table t_10_20 contains values from the range [10,20] and table t_10_16 has values from [10,16];

The above view will be used for getting all data.

For filling them up, I used a function which I wrote long time ago, it returns a random number with uniform distribution from the given range:

Fixing the Ugly Part

This works great, however nothing is for free. The description of the different values for this setting says:

Currently, constraint exclusion is enabled by default only for cases that are often used to implement table partitioning. Turning it on for all tables imposes extra planning overhead that is quite noticeable on simple queries, and most often will yield no benefit for simple queries. If you have no partitioned tables you might prefer to turn it off entirely.

So generally setting that for the whole database is not too wise, you can always set it for your specific query only, so it will help with this one query and won’t cause any problems with others. This can be done like this:

BEGIN;
SET LOCAL constraint_exclusion TO 'on';
SELECT * FROM all_tables_2 WHERE i BETWEEN 10 AND 14;
END;

Implementing a "Buy One, Get One Free" promotion in
Spree requires implementation of a custom promotion action and
appropriate use of existing promotion rules. This article implements
the promotion by automatically adding and removing immutable "get one" line items whose
price is zero and whose quantity always mirrors its paid
"buy one" counterpart. Although written and tested with Spree's 1-3-stable branch, the core logic of this tutorial will work with any version of Spree.

Promotion Eligibility

Begin by creating a new promotion using a meaningful name. Set the
"event name" field to be "Order contents changed"
so the promotion's actions are updated as the order is updated. Save
this new promotion, so we can then configure Rules and Actions. In
the Rules section, select the "Product(s)" rule and click
Add. Now choose the products you'd like to be eligible for your
promotion. If you'd like to include broader sets such as entire
taxonomies (and have implemented the custom promotion rules to do
so), feel free to use them. When we implement the promotion
action, you'll be able to make things work.

You should now have a product rule that selects some subset of
products eligible for your promotion.

Adding a Custom Promotion Action

We'll now add a custom promotion action that will do the work of
creating the free line items for each eligible paid line item. Again, this implementation is specifically for the 1-3-stable branch, but the public interface for promotion actions has (amazingly) remained stable, and is supported from 0-7-0-stable all the way through 2-0-stable.

And although the guide doesn't instruct you to, it seems required
that you add an empty partial to be rendered in the Spree admin when
you select the rule.

# app/views/spree/admin/promotions/actions/_buy_one_get_one.html.erb
# Empty File. Spree automatically renders the name and description you provide,
# but you could expand here if you'd like.

Pre-flight check

Before moving any farther, it's best to make sure the new
promotion action has been wired up correctly. Restart your
development server so the initializer registers your new promotion
action and refresh your browser. You should now see a "Buy One
Get One" promotion action.

Buy One Get One Logic

Now that we've got the promotion action wired, we're ready to
implement the logic needed to create the new line items. Begin by
collecting the order from the options.

Next we need to determine which line items in the order are eligible
for a corresponding free line item. Because line items are for
variants of products, we must collect the variant ids from the
product rule we setup. If you've used something other the default
Spree "Product(s)" rule, just make sure you end up with
equivalent output.

We've now completed most of the implementation for the promotion
action. Every time the order is updated, all line items are scanned
for eligibility and the appropriate create/update/destroy actions are
taken. This however, isn't the end of our implementation. Unlike
other items in our cart, we need to prevent users from changing the
quantity of "get one" line items or removing them from the
cart. We also need some way of indicating that these zero price line
items are complements of the Buy One Get One promotion.

Locked Line Items with Additional Text

While the concept of locked or immutable line items might rightly
deserve its own, separate Spree plugin, we'll roll a quick one here
to complete the implementation of our promotion. We'll need to add a
few attributes to the spree_line_items database table, and tweak our
implementation of create_matching_get_one_line_item.

Now that we know which line items aren't meant to be edited by users,
we can update our UI to not render the options to remove immutable
line items or update their quantity. We can also display the
additional text in the line line item when it's available.

Missing from this implementation is a way to secure the immutable
line items from manipulation of the immutable line items POSTed
parameters. While this might be required in other cases using
immutable line items, because our promotion action sets the "get
one" line items quantity with every order update, we don't need
to worry about this issue in this case.

Test Driving

At this point, you can begin manual testing of your
implementation, but of course automated testing is best. Ideally, we
would have TDDed this against a failing integration test, but the
testing infrastructural setup required to do this is beyond the scope
of the article. What's worth sharing though, is the syntax of the
assertions I've developed to inspect line items, so that you can
implement something similar for your specific needs. Here's a snippet
from an integration test to give you a sense of the DSL we've built
up.

Of course you'd want to test all the cases we've implemented, but
what's worth focusing on is the ability to assert specific attributes
across many different line items. This is an extremely reusable tool
to have in your testing suite. Good luck implementing!

H12 timeout errors

The performance problems started when we migrated Bamboo to Cedar on Heroku and replaced Thin webserver with Unicorn. We started getting a lot of Heroku Request timeout errors - H12:

The problems happened mostly when logging in to admin dashboard or during the checkout for the certain orders. H12 errors occur when a HTTP request takes longer than 30 seconds to complete. For example, if a Rails app takes 35 seconds to render the page, the HTTP router returns a 503 after 30 seconds and abandons the incomplete Rails request for good. The Rails request will keep working and logging the normal errorless execution. After completion, the request will indefinitely hang in the application dyno.

We started debugging H12: we set Unicorn timeout to 20 seconds to prevent the runaway requests and installed the rack-timeout gem with the timeout of 10 seconds to raise an error on a slow request. It all came down to a trivial database timeout!

The application source has not changed during the transition from Bamboo to Cedar, but apparently Cedar/Unicorn is much more sensitive to the troubled code. Below is the list of performance bottlenecks and solutions in Spree. Some of them exist in version 0.60 only, but a lot of them are still present in Spree 1.x, which means that your application may have them too.

Issue #1: Real-time reports

Let's take a closer look at the earlier database timeout code. It came from Admin Dashboard and admin/overview_controller.rb.

The "Best Selling variants" report was being calculated real-time right in the web process:

Issue #2: Large numbers

It is an established fact that humans eat a lot! Think about an order of a thousand Heineken 6-pack cans...

Or even something like this:

or this:

Spree, both 0.60 and 1.x, proved to have a huge problem if an order has a lot of line items and/or a large quantity of single line items. The potentially dangerous code can be found all over the place. Consider the following example:

Now imagine what will happen with the order from the first screenshot. We have 15100 inventory units for that one. They will be meticulously destroyed from the inventory one by one in a loop after the checkout. This method was born to crash the application!

Solution

A simple mindful refactoring was enough to solve the problem for me. There is no need to call "destroy" in the loop for every single inventory unit because we can use the efficient "destroy_all" method. I'm sure this can be optimized further, but it was enough to get rid of the timeout:

In my application "confirm_email" was overridden and generated the pdf invoice. The invoice listed all the products in the order and had about 200 lines in it. Again, it lead to the H12 timeout error.

Solution

All emails should be sent in the background rather than in the web request. First, because network operations can take a long time, and second, because generating an email can also be slow. For example, sending in the background can be accomplished using delayed_job gem:

OrderMailer.delay.confirm_email(self)

Issue #4: Lazy-loading

Ecommerce objects are usually complicated with a lot of associations. This is totally fine as long as you eager-load the associations that will be used most with the loaded object later on. In most cases, Spree does not preload associations for its orders. For example, in spree/base_controller.rb:

@order = Order.find_by_number! params[:order_id]

As the result, here is what I see in server console while loading the order display page on the frontend:

...And five more screens like this! The associations of the order - line items, variants and products - generate an additional chain of queries to the database during lazy-loading.

Eager-loading did the trick. Of course, not everything needs to be loaded eagerly, and the solution varies from case to case. It worked like charm in my case.

Issue #5: Dangerous code all over the place

There a lot of places in the Spree source code that are not optimized for performance. We don't need to look far for an example, because there is another killer method right near the "destroy_units" one we inspected earlier!

Another example: every line item has the after_create and after_save callbacks. The callback invokes the Order.update! method. Order.update! method calls update_totals method...twice throughout the method.

Now imagine the order from the second screenshot with a lot of line items. During the checkout "update_totals" will be called each time the line item is saved. Typically, this line would produce a timeout, because the "line_items" association was, of course, not preloaded:

line_items.map(&:amount).sum

I couldn't list every circumstance like that, because it would require a lot of context to explain the catch, but I can still say many times: "No long-running tasks in the web request!".

No long-running tasks in the web request!

Be it Spree, Heroku or any other context, environment or platform, please, never do the following in the web process:

(!!!) Heavy database usage (slow or numerous queries, N+1 queries)

Sending an email

Accessing a remote API (posting to Twitter, querying Flickr, etc.)

Rendering an image or PDF

Heavy computation (computing a fibonacci sequence, etc.)

Say "No" to all these things to ensure a much happier life for your application!

End Point recently had the pleasure to work with Disney and Google to bring the Liquid Galaxy to Disney Expo in Sao Paolo, Brazil. Disney saw the Liquid Galaxy at the Google office in Sao Paulo and recognized the “WOW!” factor that the display platform can provide, and the Disney Expo organizers saw a great fit for promoting the release of the upcoming animated film PLANES. Disney engaged End Point to develop a custom menu of "fly to" locations featured in the movie. Attendees at the Expo experienced those locations in immersive high definition across 7 screens surrounding the viewer. End Point created a custom menu for the touch screen with one-touch buttons that “flew” the users to locations featured in PLANES:

USA - Statue of Liberty

ICELAND - Reykjavik Botanical Garden

GERMANY - Deutsches Museum

INDIA - Taj Mahal

NEPAL - Himalayas

CHINA - Great Wall

MEXICO - Pyramids of Yucatan

The experience for the attendees at the expo was such that they could virtually fly to these locations just like the characters in the movie. Other options on the touch screen menu featured the Disney resort properties:

USA - Orlando - Walt Disney World (Magic Kingdom)

USA - Anaheim / California - Disneyland

PARIS - Disneyland Paris

JAPAN - Tokyo Disneyland

HONG KONG - Hong Kong Disneyland

We at End Point have always believed in the ability of the immersive Liquid Galaxy experience to present and promote information and engage an audience. Before the Disney Expo staff took notice of the Liquid Galaxy in Sao Paulo, I must confess that promoting a major motion picture was not on our radar as a potential use case. In true Show Business fashion, the budget and timeline was tight. We could not have done it without our great crew who made it all happen behind the scenes. Our stars on this set are:

Marco Manchego - Our man in Brazil who handled logistics arrangements with Disney and Google and End Point coordinating with *Gerard Drazba* at End Point in the US. Marco also performed the hands on work of setting up the frame, screens, and computers, and manned the Liquid Galaxy at the Expo.

Kiel Christofferson - Was handed the custom programming requirements with only days to go before delivery and built the touchscreen interface and many of the individual tours.

Josh Tolley - Pitched in with some work in Kamelopard, the fly-to authoring and management tool developed by End Point.

Adam Vollrath - Joined Marco and Gerard to clean up some slight network issues on the morning of the expo at the venue and then it was SHOWTIME!

The Expo opened July 13 running through Aug 1 at the TransAmerica Center in Sao Paulo-- and the Liquid Galaxy is dazzling and amazing as usual. Kids, adults, and everyone in-between is seeing the magic of our planet as they zoom from location to location. We’re proud of this opportunity to continue working with Google and to add Disney to our list of Liquid Galaxy deployments!