Can I use gpu::Stream for CascadeClassifier_GPU on OpenCV and how?

I use Opencv 2.4.10 (current stable version). And I use gpu::CascadeClassifier_GPU::detectMultiScale to detection faces. I want to use it async with my CUDA-kernel-code which I launch in separate cudaStream_t. But by default CascadeClassifier_GPU launch in the default zero-stream that make impossible to launch anything async to this on GPU.

As I see here no way to use gpu::Stream for CascadeClassifier_GPU: OpenCV DOC link

Can I use gpu::Stream for CascadeClassifier_GPU and how?

If no, then in what version of OpenCV can I do it?

UPDATE: So far the only way I've found - use a separate CPU thread to perform the function of the gpu::CascadeClassifier_GPU::detectMultiScale and perform it on a separate GPU. But for this at least I need 2 GPUs.

Best How To :

CascadeClassifier_GPU uses mixed GPU/CPU implementation and performs extra synchronizations internally, that's why it doesn't support asynchronous mode with gpu::Stream parameter. In order to launch it asynchronously with your code, you need to use separate CPU thread for it.

The pointer has to be created (i.e. allocated) with cudaHostAlloc, even on integrated systems like Jetson. The reason for this is that the GPU requires (zero-copy) memory to be pinned, i.e. removed from the host demand-paging system. Ordinary allocations are subject to demand-paging, and may not be used as zero-copy...

You can get each point of the raster line using cv::LineIterator class, e.g.: // grabs pixels along the line (pt1, pt2) // from 8-bit 3-channel image to the buffer LineIterator it(img, pt1, pt2, 8); LineIterator it2 = it; vector<Vec3b> buf(it.count); for(int i = 0; i < it.count; i++, ++it) buf[i]...

You're using a Ptr<DescriptorMatcher> so you should dereference it in order to call the method... matcher.knnMatch(descriptorsLeft, descriptorsRight,3); //error matcher->knnMatch(descriptorsLeft, descriptorsRight,3); // should be better ...

The reason the error doesn't occur on this line: REAL tmp = unew_row[j]; // no error on this line is because the compiler is optimizing that line out. It doesn't do anything useful, and so the compiler completely eliminates it. The compiler warning: xxx.cu(87): warning: variable "tmp" was declared but...

The performance difference you observe is mostly due to the increased instruction overhead in the pitched memory indexing scheme. Because your array size is a large power of two in the major direction, it is very likely that the pitched array allocated with cudaMalloc3D is the same size as the...

Remove all references to the library. Somewhere that project is pointing at the path you give above and you need to remove that. Then add the library into the executable project. Right click->add->existing item, change the type to all files, then browse to the file location. ...

Answers in order: 1) "r" is the pixel's radius with respect to the distortion center. That is: r = sqrt((x - x_c)^2 + (y - y_c)^2) where (x_c, y_c) is the center of the nonlinear distortion (i.e. the point in the image that has zero nonlinear distortion. This is usually...

Effective load throughput is not the only metric that determines the performance of your kernel! A kernel with perfectly coalesced loads will always have a lower effective load throughput than the equivalent, non coalesced kernel, but that alone says nothing about its execution time: in the end, the one metric...

Training a bag of words system goes as follows: Compute the features for each image of the training set Cluster those features Label each cluster with the images that have features in that cluster At this point the training is done and you can start with the testing as follows:...

This code will work for very specific dimensions but not for others. It will work for square matrix multiplication when width is exactly equal to the product of your block dimension (number of threads - 20 in the code you have shown) and your grid dimension (number of blocks -...

You need to know the camera's intrinsic parameters, so that you can also know the distance between pixels in the same units (mm). This distance between pixels is obviously true for a certain distance from the camera (i.e. the value of the center pixel) If the camera matrix is K...

for 20000 random points with about 27 neighbors for each point this function gave me a speed-up. It needed about 33% less time than your original method. std::vector<std::vector<cv::Point> > findNeighborsOptimized(std::vector<cv::Point> p, float maxDistance = 3.0f) { std::vector<std::vector<cv::Point> > centerbox(p.size()); // already create a output vector for each input point /*...

As per the documentation, the cv2.adaptiveThreshold() returns only 1 value that is the threshold image and in this case you are trying to receive 2 values from that method, that is why ValueError: too many values to unpack error is raised. After fixing the issue the code may look like:...

just do the obvious thing, and specify your c, c++ compiler and the make tool in question: cmake -G "MinGW Makefiles" -DCMAKE_MAKE_PROGRAM="D:/Programme/MinGW/bin/mingw32-make.exe" -DCMAKE_CXX_COMPILER="D:/Programme/MinGW/bin/mingw32-g++.exe" -DCMAKE_C_COMPILER="D:/Programme/MinGW/bin/mingw32-gcc.exe" -DWITH_IPP=OFF .. (ofc. your path will vary, but i hope, you get the idea) ((if you read between the lines - the opencv devs seem to...

As I suspected, it's the Coded! I used many of them, but then I found this question: Create Video from images using VideoCapture (OpenCV) then I used the coded MJPG in: outputVideo.open(name, CV_FOURCC('M', 'J', 'P', 'G'), 25, size, true); // create a new videoFile with 25fps and it worked! Here's...

There are various atomic functions which support atomic operations on unsigned long long int (ie. a 64-bit unsigned integer), such as atomicCAS, atomicExch and atomicAdd. And if you have a cc3.5 or higher GPU you have even more options. Referring to the documentation on clock64(): long long int clock64(); when...

You have not linked the executable against several libraries that are required by the program Try using this: g++ -lpthread `pkg-config opencv --libs` -I/usr/local/include/ -lraspicam -lraspicam_cv -L/opt/vc/lib -lmmal -lmmal_core -lmmal_util -I/usr/include -lwiringPi test3.cpp -o test3 ...

So I tried different methods for this problem and the only way I could achieve a better performance than Matlab was using memcpy and directly copying the data myself. Mat out( index.cols, w2c.cols, w2c.type() ); for ( int i=0;i<index.cols;++i ){ int ind = index.at<int>(i)-1; const float *src = w2c.ptr<float> (ind);...

Few things: use sendall instead of send since you're not guaranteed everything will be sent in one go pickle is ok for data serialization but you have to make a protocol of you own for the messages you exchange between the client and the server, this way you can know...

The main thing to take away is energy function used in this context is any function that is used for a maximization problem. Here, the energy function is the sum of gradients/derivatives/differences (i.e. "detected borders likelihood" in this case). Since you seem to have a non-algorithmic background, I suggest you...

The camera calibration process estimates the intrinsic camera parameters: the camera matrix, usually denoted K, and the lens distortion coefficients, D. (NB: the rotation translation matrices of the camera with respect to the pattern are also computed for each image used for the calibration, see "Extrinsic_Parameters", but they are generally...

Use cv2.fillConvexPoly so that you can specify a 2D array of points and define a mask which fills in the shape that is defined by these points to be white in the mask. Some fair warning should be made where the points that are defined in your polygon are convex...

No, this won't be possible. K20m can be used (with some effort) with OpenGL graphics on Linux, but at least up through windows 8.x, you won't be able to use K20m as a D3D device in Windows. The K20m does not publish a VGA classcode in PCI configuration space, which...

Downloaded 2.4.11 version couple weeks ago, so I guess that's the latest stable 2x version. You should be fine learning stuff from whole 2.4 version, most of them are essentially the same, this newspost tells that 2.4.3 version was more a bug and performance update. Offtopic, learning via Youtube videos...

You can use cv2.resize . Documentation here: http://docs.opencv.org/modules/imgproc/doc/geometric_transformations.html#resize In your case, assuming the input image im is a numpy array: maxsize = (1024,1024) imRes = cv2.resize(im,maxsize,interpolation=cv2.CV_INTER_AREA) There are different types of interpolation available (INTER_CUBIC, INTER_NEAREST, INTER_AREA,...) but according to the documentation if you need to shrink the image, you should...

What you are observing is probably an artifact of running the code on a Windows WDDM platform. The WDDM subsystem has a lot of latency which other platforms are not hampered by, so to improve overall performance, the CUDA WDDM driver performs command batching. This can interfere with the expect...

You should always do things that improve the readability and understandability of your code when first learning a language. (And, in many cases, well beyond that point.) Readability of code should be your number one priority at this point. That being said, functions do not really cost any more time...

What I think is to Save Mat using FileStorage class using JNI. The following code can be used to save Mat as File Storage FileStorage storage("image.xml", FileStorage::WRITE); storage << "img" << mat; storage.release(); Then send the file using Socket and then retrive Mat back from File. FileStorage fs("image.xml", FileStorage::READ); Mat...

As @JaredHoberock pointed out, probably the key issue is that you are trying to compile a .cpp file. You need to rename that file to .cu and also make sure it is being compiled by nvcc. After you fix that, you will probably run into another issue. This is not...

The only actual problem in your code is here: cudaMalloc( &d_x,sizeof(d_x) ); sizeof(d_x) is just the size of a pointer. You can fix it like this: cudaMalloc( &d_x,sizeof(x) ); If you want to find out if a CUBLAS API call is failing, then you should check the return code of...

This is not the right way to test for type conversion. OpenCV's data variable in cv::Mat is always of type uchar. It is basically a pointer to memory, but it doesn't mean that the data is uchar. To get the type of the image data use the type() function. Here...

your code works for me. But you used cv::waitKey(0) which means that the program waits there until you press a keyboard key. So try pressing a key after drawing, or use cv::waitKey(30) instead. If this doesnt help you, please add some std::cout in your callback function to verify it is...

Not sure what's wrong with the original C-like code, but I'm managed to get it working with C++ like code: using OpenCvSharp; using OpenCvSharp.CPlusPlus; // ... var image = new Mat("Image.png"); var template = new Mat("Template.png"); double minVal, maxVal; Point minLoc, maxLoc; var result = image.MatchTemplate(template, MatchTemplateMethod.CCoeffNormed); result.MinMaxLoc(out minVal, out...

OpenCV is a framework written in C++. Apple's reference tell us that You cannot import C++ code directly into Swift. Instead, create an Objective-C or C wrapper for C++ code. so you cannot directly import and use OpenCV in a swift project, but this is actually not bad at all...

(The author of JCuda here (not "JCUDA", please)) As mentioned in the forum post linked from the comment: It is not impossible to use structs in CUDA kernels and fill them from JCuda side. It is just very complicated, and rarely beneficial. For the reason of why it is rarely...

You need to make your OpenCV jar available to both the IDE as well as the application server. I believe you've already made it available to your IDE by adding it to your web project's classpath. Now to satisfy the dependency when running on the application server too, just copy...

Plenty of solutions are possible. A geometric approach would detect that the one moving blob is too big to be a single passenger car. Still, this may indicate a car with a caravan. That leads us to another question: if you have two blobs moving close together, how do you...

As hinted by the commenter, I’ve tried creating a single instance of CudaDirectXInteropResource along with the D3D texture. It worked. It’s counter-intuitive and undocumented, but it looks like cuGraphicsUnregisterResource destroys the newly written data. At least on my machine with GeForce GTX 960, Cuda 7.0 and Windows 8.1 x64. So,...