http://shulhi.com/Ghost 0.11Mon, 11 Dec 2017 16:33:53 GMT60When I first started writing Haskell, there was a tendency that my codes will shift to the right that it becomes really hard to read. This is mostly due to the use of Maybe or Either, and it is often that I need to unwrap value from these constructs and]]>http://shulhi.com/nested-maybe-maybe-no-more/3177cf72-0d6f-426c-ba85-6334bb8b8c47Tue, 08 Aug 2017 16:00:57 GMTWhen I first started writing Haskell, there was a tendency that my codes will shift to the right that it becomes really hard to read. This is mostly due to the use of Maybe or Either, and it is often that I need to unwrap value from these constructs and make a decision based on the unwrapped value.

You don't have to understand what it does to know that is ugly. What is this? JavaScript?!

One thing to look for when you have this mess of nesting is to check if it is returning the same type. For example, I am always returning an error message on Nothing. Although the type here is IO (), if you squint hard enough what I intend to return is the error string.

We also note that all those nesting are resultant of Maybe, and we know Maybe is a Monad. It would be nice if we can somehow take that into advantage.

If x is Nothing that will short-circuit the entire checks and return Nothing, that is the same idea as case splitting on Maybe with our terrible nesting example at the top.

With the help of monad transformers (since we are working in IO monad at this point), we could make use of Control.Monad.Trans.Either to clean up our codes. Below works pretty much the same as the above except that we are operating in Either monad.

Instead of running an action on Nothing, we simply return Left String result so that we can return error message at the point of failure.

]]>Polymorphic association in database is when you have multiple tables that need to refer to the same table. For example, I have table users that contains details like profile information. I then have two tables, guards and operators which both have profile information. I need to link that to users]]>http://shulhi.com/polymorphic-assocation-in-ecto-part-i/bb1689b5-39b3-4346-a402-e50f817369ddTue, 08 Aug 2017 15:31:47 GMTPolymorphic association in database is when you have multiple tables that need to refer to the same table. For example, I have table users that contains details like profile information. I then have two tables, guards and operators which both have profile information. I need to link that to users table.

In Django/Rails, polymorphic association is done through this approach where you would have two fields, user_type and user_id (there are other approaches available in Django/Rails). But this breaks references in the database.

Ecto's documentation explains three ways to approach this. I would say the documentation details are a bit lacking, thus this blog post. I won't explain the first one as it is easy to understand.

One thing to note is that Ecto is not ORM, so you can create a schema that doesn't exist in the database or the schema does not have to be a one-on-one projection to the columns in the table. Take our users schema for example, we noted it as abstract table because we are not really creating users table, instead we are creating guards_user and operators_user tables with that schema.

Schema POV

From schema POV, we only have a single schema users that maps to two different tables in the database, namely guards_user and operators_user.

Database POV

We have two tables, guards_user and operators_user which have the same fields. Now, for table guards, we have Foreign Key that points to guards_user. The same goes for operators table.

Besides the cons mentioned in the documentation, pretty much we would have two tables that have the same structure. If we ever need to change the structure of users, we have to handle the changes for both tables.

In next part, we will look at the next approach using intermediary table as association. This solves the issue above.

]]>) to second argument in Elixir]]>Pipe operator in Elixir is great, you can compose your functions naturally. However, there are times when you want to pipe the input to the second argument. How would you do that?

Desugar

iex> sum_number = fn x, y -> x - y end
iex> sum_number.

]]>http://shulhi.com/piping-to-second-argument-in-elixir/70082370-e499-4734-a659-a87935528660Wed, 16 Nov 2016 23:45:40 GMTPipe operator in Elixir is great, you can compose your functions naturally. However, there are times when you want to pipe the input to the second argument. How would you do that?

]]>With Ecto 2.0, there is support for Postgres schema or multiple databases for MySQL. What I am trying to do here is to use Postgres schema to achieve multi-tenancy, and of course using Ecto.

Disclaimer: This is mostly for my personal notes as I try to understand Ecto/Elixir

]]>http://shulhi.com/multi-tenancy-with-ecto-part-1/d5615102-b5ba-49b8-a756-0051310c5fbaMon, 14 Nov 2016 05:57:25 GMTWith Ecto 2.0, there is support for Postgres schema or multiple databases for MySQL. What I am trying to do here is to use Postgres schema to achieve multi-tenancy, and of course using Ecto.

Disclaimer: This is mostly for my personal notes as I try to understand Ecto/Elixir better. There is also library called Apartmentex that you could use to do multi-tenancy.

This assumes the following model in web/models/guard.ex

defmodule Tenancy.Guard do
use Tenancy.Web, :model
schema "guards" do
field :name, :string
field :position, :string
timestamps()
end
@doc """
Builds a changeset based on the `struct` and `params`.
"""
def changeset(struct, params \\ %{}) do
struct
|> cast(params, [:name, :position])
|> validate_required([:name, :position])
end
end

Setup

You would also need to create your own schema by issuing CREATE SCHEMA "tenant_1"; from psql shell. I don't think Ecto has a built-in function to actually create a schema for you.

First

These are the things that I do to actually insert/update records into a correct schema. Also, there are few things that are still confusing to me.

When defining a model, you could set module attribute @schema_prefix to tell Ecto that this table should be in postgres schema as defined in @schema_prefix. But for multi-tenancy purpose, @schema_prefix needs to be set dynamically - usually from subdomain i.e. tenant-1.example.com should use postgres schema for tenant-1. However, elixir module attribute is resolved at compile time, so you can't change that dynamically during run time.

Then, I found out about Ecto.put_meta/2 after looking through Ecto source code.

So, what I did was extract out the changeset.data because changeset.data is Ecto.Schema. Then update __meta__ to include :prefix option and create a new changeset. Then you could just insert the changeset and it will be added to the correct postgres schema.

Then the record will be added to the correct table in tenant_1 schema. But of course, the above doesn't work because this feature is only in master branch at this point in time.

Part 2

What I want to do next is automatically issue query for creating postgres schema when new tenant is created. It also should delete the schema if tenant is deleted. Then, create a mix task to handle migrations.

]]>You can quickly build a multi-tenant site in Django with the help of this nifty package django-tenant-schemas. It works by assigning each tenant into its own database schema and having one public schema for anything that needs to be shared.

It wraps Django's connection with its own wrapper that help

]]>http://shulhi.com/django-multi-tenant-postgres-schema-issue/c98e7ff0-e27c-4831-befb-1acd2ef775d5Fri, 18 Mar 2016 15:50:48 GMTYou can quickly build a multi-tenant site in Django with the help of this nifty package django-tenant-schemas. It works by assigning each tenant into its own database schema and having one public schema for anything that needs to be shared.

It wraps Django's connection with its own wrapper that help to route connection request accordingly based on the hostname tenant1.domain.com to schema tenant1. So this works great when you're in the context of Django application.

I have Celery task that needs to be run and needs to access the database. I could just pass the model directly to the Celery task params but it is not advisable to do so.

If I don't have the commented lines, it won't work because it is hitting public schema and the tables do not exist in public schema.

Note: We can use set_schema because that was the function available in the wrapper that was used internally by the package to handle switching of the schema.

]]>In the previous post, I showed you how to implement basic Celery task that make use of @task decorator and some pattern on how to remove circular dependencies when calling the task from Flask view. Let's recall some part of the code.

def run_task_async():
task = chain(long_run_

]]>http://shulhi.com/class-based-celery-task/ba32450c-aca0-4724-9e0b-63f4fc5b38a6Tue, 13 Oct 2015 02:07:00 GMTIn the previous post, I showed you how to implement basic Celery task that make use of @task decorator and some pattern on how to remove circular dependencies when calling the task from Flask view. Let's recall some part of the code.

Here, I am chaining two celery tasks. Initially, this works in my case but I have issues like how to check for the task status. The variable task contains the taskid for the last job in the chain, which is long_map_task. To find status for all other tasks, I have to recursively query the parentid of the last task to get all the statuses in the chain.

This might work if you're running few simple tasks that chained together. In my case, my task complexity is quite high - build graph of companies, run matrix inversion, save the result into database, and generate some reports. I would like to have this inside its own module and have the module called from the view. So that we have a cleaner view. Let's look at how we can create this module by implementing class based Celery task.

Our custom task class inherits celery.Task and we override the run method to call our custom codes that we would like to run. So, when you run like someTask.apply_async(), the run method here will be invoked. We override bind method so that we can wrap Flask context into our task. We then override on_success, so that maybe I want other services to be notified that this task has just finish running. You can check the documentation to see other available methods that you can override.

This is cleaner since there is only one taskid per actual task. We no longer need to recursively query its parent to find all the related tasks.

You might ask how to make the status much more fine grained since now I have multiple jobs running in a single celery task. We can leverage the update_state method to achieve this. I can pass metadata (dictionary) to the meta keyword argument of the update_state. See above code to see how this is implemented.

To access this metadata from the view (maybe you exposed an endpoint to check for task status), you can do something like someTask.info.get('stage', None). One caveat though, the return type of someTask.info is dependent of the task status/state. If the someTask.state is SUCCESS, someTask.info will contains the result. If it is still running, then the return type is a dictionary containing the meta params. That's the reason why I use get method on someTask.info so that we don't throw exception but just fall back to whatever default value that works for my case. Have a look at the documentation for more details.

Improvement

What I showed you here is the basic skeleton. On top of this, I also use factory pattern to create task worker. I have multiple task workers that should do different kind of tasks. It would just make the code much cleaner this way.

]]>As of Celery version 3.0 and above, Celery integration with Flask should no longer need to depend on third party extension. That's what they said. However, my experience integrating Celery with Flask especially when using Flask with blueprints shows that it can be a little bit tricky.

Challenges

So

]]>http://shulhi.com/celery-integration-with-flask/623ceaf1-e32a-448a-abe9-69e2eece8467Tue, 08 Sep 2015 13:33:11 GMTAs of Celery version 3.0 and above, Celery integration with Flask should no longer need to depend on third party extension. That's what they said. However, my experience integrating Celery with Flask especially when using Flask with blueprints shows that it can be a little bit tricky.

Now, this works fine if your app is simple. The challenge here is when you want to fire up celery task from the view. You'll notice that you're going to have circular reference and that is not allowed in Python.

Notice the circular dependencies app.py -> views.py -> worker.py -> app.py. No matter how I tried to refactor the application, there will always be circular reference like this due to how I need to fire up the worker task from the view.

One of the way I tried is to actually move out creation of celery instance into extensions.py where I have all my extensions like flask-sqlalchemy, flask-login, etc and have it lazily configured during app creation in app.py. This works but it is not a real solution although I am instantiating celery instance but I cannot actually make it lazily reconfigured to use the config from flask app instance. Celery will works but it is not really configured to use it with Flask.

If I approach it this way, I have to set all the backend, broker params here to make it work. Then later during Flask app creation, I could just import celery instance from extensions.py and have it reconfigured by doing celery.conf.update(...). Turns out, it will not pickup the changes.
Now, I think the reason was that when you're running celery worker by running celery -A app.extensions.celery ..., the celery instance picked up during running that command was all that settings in extensions.py. Even if I update the celery conf to include for example CELERY_START_TRACKED to True, you will see that it is not doing so since the celery instance tied to the worker is the one from extensions.py.

What should happen during celery -A app.extension.celery ... is that first create Flask app, pass it to celery for Flask integration, then give it to the worker. What it is happening now is totally different, create celery, pass it to the worker, create Flask app and try to reconfigure celery. So celery is not aware of the changes since that was run when Flask is running.

Solution

Now, to make it to work is really simple but it is really a hackish solution. The way I need to solve it is by trying to break the circular reference. We can do this by doing local import instead of importing package at the top level.

This breaks the circular reference. In my worker.py, I can just instantiate Flask app, and pass it to my celery instance to have it configured with Flask. Now I can continue using Flask blueprint like I used to.

There are still some questions that I have yet to figure out. One is that should I create one celery instance per module or should I just share it. The documentation said to share it, but it only work when you're using the decorator method (i.e. @task). If I use custom Task classes, it seems to break. I'm trying to figure that one out. I will share in the next post on how to create custom Task classes.

]]>Today I discovered way to handle changes of IP address for Hadoop cluster managed by CDH 5.3. I think this should also be applicable for CDH 5.x. Took me couple of hours to figure this one out :/

I've done this previously but it was on CDH 4.x.

]]>http://shulhi.com/change-ip-address-for-existing-nodes-in-cdh-5-3/48b447f3-da8b-4001-8644-4f919ed3ec0fMon, 12 Jan 2015 14:21:50 GMTToday I discovered way to handle changes of IP address for Hadoop cluster managed by CDH 5.3. I think this should also be applicable for CDH 5.x. Took me couple of hours to figure this one out :/

I've done this previously but it was on CDH 4.x. On CDH 4.x, you need to access the DB holding hosts information. You can find the tutorial here. Since CDH 5.x, this is no longer valid, even if you change through the DB, it will get overwritten by Cloudera Manager (CM).

How it works

It's best to get a little bit background on how CM works in CDH 5.x. If you actually went to the link mentioned above and followed the steps up until accessing the DB under CDH 5.x, you will notice that the column host_identifier inside the table hosts is no longer the FQDN of each host but instead some sort of hash.

This hash (or uuid) is actually random string generated by Cloudera Manager agent (run by the daemon cloudera-scm-agent). Now, instead of each host identified by its IP like in CDH 4.x, each host is identified by its uuid. One of the advantage is that how ever you changed the IP it will always get propagated/updated to the CM server (run by the daemon cloudera-scm-server).

When IP address changes, whenever cloudera-scm-agent sent heartbeat to its cloudera-scm-server (set in /etc/cloudera-scm-agent/config.ini), it will send together its uuid and if there is any changes in hostname or ip address, the DB will get updated based on the uuid. Smart right compared to CDH 4.x?

Conflict

Say for example that there are nodes having the same uuid, in my case it happens since for development I run a small cluster in ESX and I create one master copy and clone it to other servers, thus it has the same uuid. If you have the same uuid, you will see in Cloudera Manager under the Hosts tab (menu), duplicated hostname/ip for that particular uuid.

In order to solve this, you need to generate new uuid, but how since these are automatically managed by CM.

These uuid is kept under /var/lib/cloudera-scm-agent. There will be two files there, uuid and response.avro. If you cat uuid, you will get the uuid for that particular node. In order to change it, follow these steps:

Stop CM server - service cloudera-scm-server stop

On all nodes, stop the CM agent - service cloudera-scm-agent hard_stop. hard_stop is needed because we need to restart the supervisord as well and flushes all the settings pertaining to the CM.

Now remove both uuid and response.avro under /var/lib/cloudera-scm-agent, or just rename it just in case.

Make sure /etc/cloudera-scm-agent/config.ini is pointing to the right CM server.

Start CM server - service cloudera-scm-server start

On all nodes, start the CM agent - service cloudera-scm-server clean_start or service cloudera-scm-server hard_start.

Check CM, you'll see all nodes are working now.

]]>Recently I had the need to create a homebrew external command to extend homebrew default behavior. My goal is to extend brew deps command by creating a new command brew deps-group. Executing the command will list all top-level packages in homebrew that is related to each other by common dependencies,]]>http://shulhi.com/creating-brew-external-command/1282ce58-33ff-4948-8feb-2fc58f892634Sat, 03 Jan 2015 19:32:49 GMTRecently I had the need to create a homebrew external command to extend homebrew default behavior. My goal is to extend brew deps command by creating a new command brew deps-group. Executing the command will list all top-level packages in homebrew that is related to each other by common dependencies, doing so will let me know if it safe for me to actually delete the package without breaking other packages.

Structure

To achieve this, the overall file structure consist of:
1. Formula - this tells homebrew about where to get file for your packages and their dependencies.
2. Application - actual implementation and I wrote my simple app in Python.

Formula

Explanation in comments

require "formula"
# this will translate to brew-deps-group as the formula name
class BrewDepsGroup < Formula
homepage "https://github.com/shulhi/homebrew-deps-group"
# this is the source of your package
# when you run brew install, it will be installed from this source
url "https://github.com/shulhi/homebrew-deps-group.git"
# my package depends on python
depends_on :python if MacOS.version <= :snow_leopard
# this is the dependency for my formula
resource "networkx" do
url "https://pypi.python.org/packages/source/n/networkx/networkx-1.9.1.tar.gz"
sha1 "ac2db3b29c7c4d16435f2a7ebe90fc8bd687b59c"
end
# skip cleaning path in formula
skip_clean "bin"
def install
# since my python application has other dependency, we need to install it
# and ensure it is installed in /vendor
# to avoid pollution with system packages
ENV.prepend_create_path "PYTHONPATH", libexec/"vendor/lib/python2.7/site-packages"
%w[networkx].each do |r|
resource(r).stage do
system "python", *Language::Python.setup_install_args(libexec/"vendor")
end
end
# if your python application has setup.py, then uncomment this line
# this will install your application in the correct location
# ENV.prepend_create_path "PYTHONPATH", libexec/"lib/python2.7/site-packages"
# system "python", *Language::Python.setup_install_args(libexec)
# copies your python application to bin
# since we don't have any, I don't need this
# instead, we are just going to copy the actual file (see next section below)
# bin.install Dir[libexec/"bin/*"]
# bin.env_script_all_files(libexec/"bin", :PYTHONPATH => ENV["PYTHONPATH"])
# copies all files to bin so we can execute the files from other scripts
bin.install 'brew-deps-group.py'
bin.install 'brew-deps-group.rb'
(bin + 'brew-deps-group.py').chmod 0755
(bin + 'brew-deps-group.rb').chmod 0755
end
end

Entry point

So our application/command needs an entry point, i.e. when I run brew deps-group, some file should be executed right? In our case, this will call brew-deps-group.rb which in turn will call up brew-deps-group.py.

]]>Finally had a chance to play around with Kaggle challenge, and bike sharing demand seems to be the easiest to tackle - no domain expertise required or atleast very minimal.

This is going to be Part 1 where I'll go over how I apply minimal statistical knowledge to extract features.

]]>http://shulhi.com/kaggle-bike-sharing-demand-part-1/2cfa221d-5807-4b3a-a01b-f21e0cd2e6f9Mon, 03 Nov 2014 13:00:07 GMTFinally had a chance to play around with Kaggle challenge, and bike sharing demand seems to be the easiest to tackle - no domain expertise required or atleast very minimal.

This is going to be Part 1 where I'll go over how I apply minimal statistical knowledge to extract features. Well, extraction and selection to be exact. From these features, I fed them into Random Forest (from Scikit-Learn) which hopefully will be covered in Part 2. The result I got so far is mediocre, I plan to play around with features and try different algos later when I have the time. My final target is to get score below 0.5 or around 0.4 at best before I give up :( If some of the reasoning are wrong let me know, although I have background in Actuarial Science, it has been quite a while.

Feature selection

I will be using R to toy around with the data.

Since this is going to be a regression problem (predicting a continous value), I would like to start simple with linear regression to see how things first.

time and day are some of the variable that I extracted from datetime. I could do so for month as well. There is no statistical basis behind this, it's just purely from intuition/knowledge. But, we will let stats decide if these features are worthy.

Our R-squared is not that great. Maybe we can improve this by excluding some features. Usually features with * is considered as important when evaluating the model. One thing to note is that atemp (feels like) has 3 *'s while temp does not have any. This is quite misleading. temp should also be equally important. However, because the correlation between temp and atemp is 0.9849481 (highly correlated), the other information is already being 'captured' by the other variable. So deleting one of this feature will result in minimal loss of information.

Let's take a look at humidity. The co-efficient is negative - the relationship between count and humidity is inverse, the higher the humidity the lesser the bike rental. Since I am not a weather expert, I'm not sure how true is this, but let's see it on a graph so we can confirm this.

Well, the bike rental does get lesser as humidity increases although not smoothly (maybe there are some other factors influencing bike rental at this humidity level).

Another thing that I would like to do is to actually breakdown the rental between the registered and casual and to see if timings are actually affecting them. Registered user might use bike for works, and casual might use for leisure purpose. Let's see this on graph.

Looking at the graph, we can confirm our assumption. We can actually do a separate model to predict for casual and registered. Some of the features might remain the same, but for time, we might model them differently. Once we have the count for casual and registered, we can total them to get the predicted count.

These are some of my thought process when given data. I applied the same simple methodology for any other features like windspeed, day (Sunday is somehow important), etc. Try to analyze the data and later feed them into a model.

]]>I was reading reading about convolution matrix which was brillianty explained by Colah. If you're looking for good blog to follow, you might want to add him into the list. One of the application of convolution matrix is image manipulation like applying filters to the image.

The cool part is

]]>http://shulhi.com/image-filtering-with-convolution-matrix/ccf37a61-bf2d-429b-8a5a-8ef92168c609Tue, 15 Jul 2014 14:40:47 GMTI was reading reading about convolution matrix which was brillianty explained by Colah. If you're looking for good blog to follow, you might want to add him into the list. One of the application of convolution matrix is image manipulation like applying filters to the image.

The cool part is that the filters are just some matrix that you apply to the image. If you want to know more about the inner working of convolution matrix in image processing, check out this brief explanation from GIMP.

There are several packages that have convolution filter built-in like the one in OpenCV, but just for fun, I implemented image filtering from scratch.

Play around with the kernel variable and change it to any of the pre-defined kernel.

Results

Here are some results.

Original image

Using edge_detect_kernel

Using motion_blur_kernel

As you may notice, the edge of the image is from the original. This is expected. There are several method to average out the pixel which is unprocessed, the easiest would be to crop the image :p

]]>I've been toying with OpenCV lately and surprisingly it is pretty easy to implement face-tracking using OpenCV. However, OpenCV documentation is a bit lacking especially with its Python binding.

Installing OpenCV

In OSX (yes, I've moved to OSX from Windows since I do a lot of *nix development now), this

]]>http://shulhi.com/face-tracking-with-opencv/98e3d753-3c19-4b9f-8f4e-97f6beb5f24eSun, 13 Jul 2014 07:36:52 GMTI've been toying with OpenCV lately and surprisingly it is pretty easy to implement face-tracking using OpenCV. However, OpenCV documentation is a bit lacking especially with its Python binding.

Installing OpenCV

In OSX (yes, I've moved to OSX from Windows since I do a lot of *nix development now), this should be pretty easy.

$ brew install opencv

Note: A bit of caveat here if you're using pyenv to manage your Python version. I had to use the system Python in order to correctly build the opencv python binding. i.e in the terminal, issue the command pyenv global system to set pyenv to use Python provided by the system. On my machine, this will set pyenv to use my brewed Python.

Face-tracking

Haar-Cascade

We are going to use Haar-Cascade to detect faces. You can read more about how it works in OpenCV documentation, but in general what it does is that it looks for area that defines face. For example, it will scan the image looking for feature like certain area is darker than some area, that might indicate eyes, nose bridge, etc. To make it runs faster, they create cascade of classifiers (thus the name). It groups certain features and it will look for group of feature one by one. It needs to pass all these classifiers before we can conclude this is a face.

It is actually possible to use Haar-Cascade to detect any other object as well, but you need to train them first and it takes hours to days to do this. So, I'm just going to use the one provided by OpenCV.

The important piece is detectMultiScale(image[, scaleFactor[, minNeighbors[, flags[, minSize[, maxSize]]]]]). To understand what each parameter is, refer this good SO thread.

]]>Haven't post anything for the past couple of months. I just started moving all the contents to Ghost blogging platform.

Previously, I was hosting my blog on Github pages, which means I have to use some sort of static generator and push the content to Github pages. I was using

]]>http://shulhi.com/moving-to-ghost/c0bce913-777c-46a5-830d-65bac3be3518Sat, 12 Jul 2014 12:39:01 GMTHaven't post anything for the past couple of months. I just started moving all the contents to Ghost blogging platform.

Previously, I was hosting my blog on Github pages, which means I have to use some sort of static generator and push the content to Github pages. I was using Hexo to generate static files for my blog. Hexo is great so does Github pages.

After a while, it just gets tiring to do all the processes - write in Markdown, generate static files, push to Github. I did give Ghost a try when it was released, but it was not good enough at that point. Now it is getting better, so I decided to move to Ghost and focus on the content rather than the process.

]]>While trying to implement Solr indexer, I came across problem on how to actually normalize the date format according to Solr format. Surprisingly, this question was asked quite often in Stackoverflow. So, I decided to write a Solr plugin with the hope to solve this problem. However, at the end]]>http://shulhi.com/creating-solr-plugin/efab8c93-e26e-435b-9d9f-2b0d3244bb5bFri, 31 Jan 2014 00:00:00 GMTWhile trying to implement Solr indexer, I came across problem on how to actually normalize the date format according to Solr format. Surprisingly, this question was asked quite often in Stackoverflow. So, I decided to write a Solr plugin with the hope to solve this problem. However, at the end of the day, it turned out that going this approach does not actually solve my problem due to some Solr's peculiarity. Nonetheless, I hope this writing will help anyone who is trying to develop any Solr plugin as most examples are outdated. I will also explain the reason why it doesn't work (in my case).

The Solr version I'm working on is 4.4. There are 2 + 2 steps in creating the plugin,

Create the plugin factory class

Create the plugin real implementation that will be called later by (1)

Creating .jar file

I was stucked at where should I place this jar file at first. I tried placing it inside Solr's home directory as suggested and also tried loading it through solrconfig.xml but none work for me. So I end up placing it in /var/lib/solr/lib/, so whenever it loads dependency for Solr, it will also load my jar file.

The unformatted date will be indexed to isodate_t field. Then it will get copied to ztisodate where my analyzer will parse the date and reformat it accordingly to Solr's format and copy it to trieisodate.

It doesn't work because when you are copying fields, it copies from the input stream rather than the output/result. So, when I'm trying to copy the result of ztisodate to trieisodate, it is actually copying isodate_t to trieisodate. face palm waste of effort. But I glad I learn something from this.

]]>I was playing around with AngularJs recently and decided to integrate it with Rails. The goal is to have total separation between front-end and back-end. Back-end will act as a RESTful service that spits JSON. Both front-end and back-end will be served from different server and will have different domain,]]>http://shulhi.com/cross-domain-request-with-rails-angular-js/c5e1b04f-ea65-4a3c-84cb-db2cd78f218dWed, 23 Oct 2013 23:00:00 GMTI was playing around with AngularJs recently and decided to integrate it with Rails. The goal is to have total separation between front-end and back-end. Back-end will act as a RESTful service that spits JSON. Both front-end and back-end will be served from different server and will have different domain, so there are few options to achieve this. Whenever you are doing cross domain request, you will be subjected to same-origin policy.

Options

JSON-P

At first, I did it using JSON-P but I don't like having a callback in the url. I prefer to keep it clean and there is also an issue of security when doing JSON-P like XSS. Fortunately, it was quite easy to implement JSON-P in both Rails and Angular. I don't need to have special treatment whether I'm sending a GET request or POST request. Both will be treated the same.

The rails server must be able to receive the callback.

# assuming this is the controller that will receive the request
def index
render :json => @posts, :callback => params[:callback]
end

CORS

CORS is the way to go, but not all browsers support this and like all thing in web development, there is inconsistency of support (feature).

Proxy-server

Don't know shit about the implementation.

CORS Implementation

In order to understand CORS better, google and read a lot about CORS. To summarize, CORS can be divided into two - 1. Simple request 2. Not-so-simple/Complex request.
There are criteria regarding how thing fall into either one of these categories. In short, if you're dealing with standard headers, plain/text content-type, and sending a GET request, chances are it will be simple request and things will be easy. Otherwise, it will be a complex request.

Complex Request

When doing complex request, the browser will send a preflight request to the server before sending an actual request. If the server is okay with it, browser can proceed with sending an actual request. The preflight request is sent through HTTP OPTIONS request. So it is important to react to this request.

Server Side

Because not all servers can respond to OPTIONS request, you need to implement this. In Rails, you will usually do this in routes.rb

resources :posts

and this will create routes that correspond to RESTful verb like GET, POST, PUT, PATCH, and DELETE. But, it doesn't cover OPTIONS, you will need to this manually. So, add this in routes.rb to support OPTIONS verb.

Basically, this is saying to Rails that "Hey, if someone send an OPTIONS request to /posts, please goto posts controller and execute the options action". Then, the action in the controller must now act to this request. Usually what we need to do is check if the domain is whitelisted and we can send an OK signal to proceed with the actual request. For simplicity sake, I would just allow everything.

Access-Control-Allow-Origin is where we control who can have the access. Also note that Access-Control-Allow-Headers is set to x-requested-with. This is usually used when using an AJAX request. We do not want the server to reject the request. If we set this here in the server, we won't need to delete the header when sending the request from the client. You'll see this in a client-side code in a bit.

Client-side

For your information, I'm just using the seed project template provided by Angular. Nothing fancy.

Notice that if compared to JSON-P, this method is much cleaner as there is no callback anymore, although a bit more involved.

Gems

Of course like all things in Rails, there is gem for that. You can use this middleware to handle CORS in Rails, Rack-Cors. And if you're developing API only in Rails, you can use Rails-API. It removes all the unnecessary things, just focus on API support.

I believe in understanding the basic of the inner workings of anything before using it. Of course, not trying to re-invent the wheel here but that's how I learn a lot. YMMV.