Different recognition results by SURF on CUDA and OpenCL

I use both three SURF algorithm implementation in the program. When I conducted the tests I noticed that the algorithm using OpenCL detect much smaller objects in the image than that on CUDA. What may be the reason? Anybody notice like that? Implementation of algorithms written by different people, can differ in the implementation of the algorithms? P.S. All parameters are the same sets of reference images too. I can provide code samples and processing results.