I am not sure whether I am answering you correctly or not since I am a newbie myself but the only thing which I didn't understand was that why have you made the local(2) just before you execute the kernel. You seem to have 4 elements and your global work size is 4 so why is the local work size (2). Might be changing that to 0 could solve your problem. This is just a thought...