Author
Topic: Live vision (Read 13280 times)

If you have a moving robot, how do you want to have it's vision? If it has to be processed by a program to find objects.

Do you just want a high speed camera to take a pic ever sec, or is there a way to do it with live feed (like a video camera). I know it's generally the same, but it's like .jpg vs. .wmv ... or does it just depend on if your software can break a live feed into images and process it fast enough?

The webcams each have their own internal clocks, so they snap their frames independently. This makes the stereo vision more difficult because the left and right frames are not taken at the exact same time.

Increasing the frame rate will reduce the error, but I can't count on two independent webcams for any high speed tracking without some sort of frame/vector interpolation routine. If the bot is slow moving and the frame rate is high, then the data will be acceptable.

I'd like to take the webcams apart and wire them together somehow, but I hesitate without knowing what I'll find inside.

****

It's possible that the frames are being triggered from within the software drivers. I'm going to ask the manufacturer about it.

I guess my plan (when I add vision) is to couple a range finding sensor with my camera.Seems to me the more info you have about the picture you take the better it can be used.

I'm not so interested in Stereo vision I guess...after all you can play a first person shooter(Half Life, Counter Strike, Halo) with only the illusion of 3d right? I know stereo visionwould help in terms of judging distances and everything, but I don't think it's strictlynecessary... and I think processing would be better spent elsewhere...

Essentialy what you are doing to find or track objects is frame differencing. IE lookingat how frames of the last picture are different from your current one. Video is just a bunchof pictures so it really depends on the throughput of your system...

One thing to consider is that Jpgs are smaller but they take time to decode...you have to balancespeed of transfer and speed of decodeing...

You make a lot of good points. One can get by with a single camera. I'm trying to get to that next level of analysis.

You are correct. High speed tracking, in the projectile sense, can be done by differencing frame sequences at a high rate, but I want to track 3D objects in a way that allows me to more accurately calculate angles and compensate for lighting effects and distortions. I'm heading in the direction of a 3D modeling system for "spacial awareness," complete with "peripheral vision." Differencing between two frames that were taken at the same time will provide a more precise position for comlex objects in both time and space. Imagine trying to track the full body motion of a martial artist in action.

Your point about jpg decompression is a good one, also. It would be better if a camera were providing BMP's. I think they want to keep the transmission time down.

I've got a vision for this system. hehe. Pardon the pun. I do have a long way to go, so I can't answer every question about how it will work yet.

I don’t want to discourage anyone from experimenting with stereo vision in their robot – but I believe the use of dual cameras to develop a sense of depth perception has been way overrated. The sense of depth perception in human vision is a result of our very complex brain’s ability to combine the sensory inputs of motion, texture, shading, stereo vision, and learned knowledge into a judgment of perceived depth. The sense of depth that most people attribute to stereo vision is mostly derived from motion parallax. Put them into an unfamiliar environment – say a cave for instance – and all of their depth perception is lost, until a familiar object is placed into their field of view for a reference. I just measured the distance between my pupils for a quick calculation, and came up with roughly 2.10 inches. That means that at 10 feet away, the difference between the angles of perception of my eyes of a viewed object is only 1.003˚ (rectangular to polar conversion). That’s a very subtle difference of perspective. With a small robot, looking at an object at close range, and especially if the cameras are placed well apart – stereo vision may be useful. If you’re trying to get your robot to work at distance, start thinking along the lines of the construction of the Hammerhead Shark.

One interesting idea for stereo vision depth perception that I had was to have the two cameras mounted on high-precision yaw servos. The servos would be able to point the eyes - basically, to go from looking straight ahead to looking cross-eyed. If you can do object recognition, then adjust the servos until the image is overlapped in both cameras, and the amount the eyes are turned in is a good indication of how far away the thing is.

Of course, this only works for fairly close objects, but that's the most important area as far as the robot is concerned...

It says it can do 6fps which is ok by me. I plan on doing a 'world model' update at 4hrz mostlybecause my slowest sensor will be the GPS and it runs at 4hrz.

But obviously I've got a long ways to go...camera is probably the last thing I'm going to work onbecause to a small robot like mine I don't see it as necessary.SO i expect you to have it all figured out by this summer!

Militoy - you are correct, but think about other phenomenon that are angle sensitive, such as glare. I intend full sensory integration with cognitive function. My system is far less modular than you imagine.

JonHylands, you just described my concept for a 3 field, adjustable vision system. It allows one to change the balance between stereo scopic and peripheral motion sensitivity - thus optimizing processing power for any given situation.

I got the idea from looking at the difference between predator and prey - eyes forward or to the side. At first, I thought, why not have four eyes? Then I saw a way of using two cameras to form 3 fields of vision, the middle one being the overlapped portion for stereoscopy.

Notice that in your own field of vision, you can see a double image of your nose? That's kinda like the boundary of that middle field.

just to go back to dcole07's question for a second....there is no real difference between taking a frame "one every second" and "live feed" other than the amount of delay you have to deal with.all live feed cameras are just taking a succession of single frames so it's fairly trivial to break a live feed into single images.

obviously it would be better to process this images at the speed the camera sends it (say 50 every second) but if you start working on slower speeds first you can speeding things up once you have a working algorithm.the main thing you will need to run at those sorts of speeds is processing power.if you are working on a desktop PC you'll have no problems.from experience, the limiting factor on a small robot is processor power that can be run on the batteries you have on board.

also relevant is what sort of image processing you are trying to do.everyone on here started talking about stereo vision because that's what a few of them are working on at the moment.there are far easier ways to make use of an onboard camera. movement recognition would be one of the simplest (read: least processor intensive) where you simply compare the current frame to the previous one and look for changes.

if you are interested in picking objects of a known shape from their background, have a look at this library: http://opencvlibrary.sourceforge.net/of particular interest to me was their routines for recognising a "chess board" pattern and being able to determine the camera's position relative to it from the chess board's orientation and scale.again, this sort of technique would require something approaching a desktop PC to do in real time (ie, as quickly as the camera can provide the images).

Do you just want a high speed camera to take a pic ever sec, or is there a way to do it with live feed (like a video camera)

So this really depends on what you need the camera to capture. If your robot is super slow, a low frame rate (say 3fps) will work fine. If you want your robot arm to catch baseballs going at 30mph, you probably need more like 80fps on your camera.

Quote

The webcams each have their own internal clocks, so they snap their frames independently.

Some cameras have a 'trigger' option, which when wired together, a single square wave commands both cameras to snap an image synchronously. In my lab I have two 10,000 fps capable cameras that I have linked by a low tech trigger for stereo vision. The issue however is that I need two PC's running simultaneously to keep up with the frame capture rate . . . So obviously if you want stereo vision, your processing rate will be halved. But you dont always need both cameras in sync. Assumming that the frame rate >> action being perceived, the error will be negligible.

But you will lose speed . . . might be better to attach a rangefinder to a single servo controlled camera, no?

That's my plan! I think a sonar would be best because of the similarities in what you view...IE the closest thing that gets ranged by the sonar will most likely be the thing that has edge overlap over everything else...seems reasonable...any one come up with any thing else?

Someone brought up sonar, which I was going to add but they bet me to it... that would work better than 2 cameras... and I would think it would be easier for object recognition and stuff...

Do you have any info on sonar? I read a little about it on the internet but it got me more confused... any thing that would display it like an image but with depth, not light. something that's not single point sonar, so like it tells you 10 degrees to the left and 20 degrees up and 10 meters away... but it does that for the whole visible range.

Is there such a thing as 3d modeling, where the computer sets up a model of the world that it could understand but also allow people to view it? Like a map…

so the problem with sonar is you have to imagine it like a cone extending away from the sensor.it will tell you very precisely how far away the first thing it senses in that cone is.it will not tell you how far left, right, up or down the object it has sensed is.

as a result, sonar is great for sanity checking other sensors.ie, it's good for telling you that the area ahead is indeed empty.it is not particularly good at telling you exactly where an object is though.it's just too hard to focus for that.

With vision, you could use referencing (like humans), but it requires a known object in view.Sonar can only do one point.2 cameras are tricky unless you have a super fast frame rate, which then requires a lot of CPUJust going with a camera could be risky with shadows and illusions…

I was thinking about vision, mixed with something like lidar to define boarders and make it easier to pick out objects. Some way of turning a 2d image into a 3d world.

so you have me thinking about how this ultrasound imaging is done...if i was trying to reproduce it i'd start with one sound source and 3 receivers.mount the receivers in a triangle and the source in the middle of that triangle.now when you send a ping from the source, record the echos on all 3 receivers.there will be a slight difference in the time it takes the sound to travel out and back ti the 3 different receivers.

let's simplify the situation and presume there is only one small object out there to reflect the sound.in this case you would see one reply ping on each of the receivers.apply some maths involving the speed of sound in air to the time it takes for the ping to return to each sensor.you now have the distance the object is away from each sensor.apply some basic trigonometry to these distances and you can tell which direction the object lies in.

now, if you pointed the same sensors at a busy room you would get more than just one reply ping on each sensor.working out the relevant maths to turn these pings into a 3d scene holds similar problems to stereo vision but it could definitely be done.at the very least you would get a solid range and direction for the first object to reflect the ping.

hmm, i'd always presumed sonar was too inaccurate for accurate positioning but with a small rethink i see it's not.thanks for prodding the grey matter into life guys!

I haven’t had too much of a chance to play with sonar in air. My efforts through the end of next year are concentrated on our Urban Challenge robot, and at the speeds we’re running and range we need to scan, ultrasonics are of limited use. I am however, planning to start working after the race, on a mine-exploring robot. Sonar seems to work pretty well at close range for bats – maybe I can make some use of it in combination with my IR camera and IMU. I did pick up a Harbor Freight portable fishfinder to play with and get a better feel for the technology - but I won't have an opportunity to break into it until after the New Year.

now when you send a ping from the source, record the echos on all 3 receivers.there will be a slight difference in the time it takes the sound to travel out and back ti the 3 different receivers.

Ive seen this done before at the Bat Lab at University of Marylandhttp://www.bsos.umd.edu/psyc/batlab/movies.html(watch the movies, they are really cool!)They had to build a special room covered in specially shaped foam, and use like 20 microphones to track the bats. What is also neat is they can not only track the bat by its echolocation, but also track the insect, and direction that the echolocation is aimed at . . . if I remember correctly, the video is slowed down to 1/16th . . .

Sonar seems to work pretty well at close range for bats – maybe I can make some use of it in combination with my IR camera and IMU.

that page also has a lot of useful publications, although they tend more to understanding the bat and less toward making a robot from what they learn . . . new research has also found out that bats have an internal magnetic compass . . .

key points they gave me when i visited the bat lab:bats use sonar more often, and more directed, the closer they are to a targetbat ears change shape and direction, but they have no data on it (maybe just like a barn owl?)when bats fly together, some bats never chirp, just using the echos from other batsbats have 'voices,' so they can distinguish between its chirp and another bats chirp

ok, a little bit offtrack . . . im fairly sure the Didson device doesnt use any special microphone array . . . after all, it has 'sonar shadows' which wouldnt happen with an array . . . but otherwise, for cheap hobbyist sonar, what dunk says is correct - its a single point reading.

The multiple receiver array is also used in both passive and active radar. There was a project proposed by and for SW radio ops to create that kind of array on the internet for tracking near earth objects. The military has an equivelent system. I'm just posting that for confirmation purposes.

One key feature about waves, be they audio or radio, is - the higher the frequency - the more beamlike the transmission.

If you think of a home speaker system, you can see how the tweaters are at ear level, the mids are also strategically positioned, but the sub woofer can be placed almost anywhere in the room and pointed in any direction. I won't bore you with the detailed explaination. Just remember that focus sharpens as frequency is increased.

Hey Admin - I've wondered how the bats could identify their own click in such a crowd. Thanks. You've just answered that for me. I thought they must have some genetically imprinted pulse shape, but even that wouldn't help in a big swarm. The answer is social. Cool.

Since they can all be set to activly listen you ping with one and then listen with all...this seems pretty doable except it has some disadvantages...

your 3 sonar have to be far enough apart to where it can make a difference...IE if they are with in 5cm or so it won't be as usefull because the resolution of the sonar isn't good enough for that small of change...

the second thing is if the sonar is to far apart 2 objects that are close to opposing sonars could confuse the system....

a normal camera takes in all the light in it's range and makes a 2D image of it.IR camera takes in all the heat rays and makes a 2D images of it.

So is the a camera that could do distance? (it would make a image like IR but distance, not temp)I looked at that Lidar link that was posted and in a pdf link that the site has, it basically say the lidar thing is a 1D senor (meaning it tells distance on 1 plane horizontally)

2D meaning it scans distance horizontally and vertically, then the data retrieved is depth

or with this lidar senor, do you have to make it rotate and have it just take slices? Then put the slices to gather to get a full image..

It all depends on your definition of 2d and 3d....a camera will tell you 3 thing for each pixel it records (x position, y position, and color) but you can't use that for '3d' modeling without some processing.

the lidar (hokuyo) is a sensor that takes a sweep of redings and tells you the distance for each reading, which can be used to make an 2d view of the world at the level of the sensor

I’m not familiar with the Hokuyo LIDAR, but the way most users get “3D” from the SICK units is by recording each data “sweep” of the LIDAR as the robot is moving. Since the speed and direction of the robot are known, it’s a fairly straightforward step from there to build a “moving map” of the nearby terrain and obstacles into memory, and then to reconcile it with pre-stored route data. As the scan moves over each object from a different perspective, the shape of the object can be determined, to an extent. The data is in “slices”, but more than one LMS unit can be used – and their ranges can even be crossed and compared. The main use of them is for collision avoidance and edge detection – and they are at least decent for that at low speeds.

You'll need a high sample rate. You'll need to use a pulse rather than a continuous tone. I think that the pulse should be one sample in length - but I'll defer to the experts. You'll need to isolate your pulse from ambient noise, possibly with a high pass filter.

Optipns...What layout did you have in mind for your array? Will your receivers be in a line or triangle? How directional are your sensors of choice?

What is the field of view that you would like to achieve? Is your goal to achieve this field of view with or without moving parts?