Author
Topic: Universal Objects Database (Read 9489 times)

Eventually everbody meddling with robotics wants to be able to detect objects such as cans of coke etc.. in order to do stuff with them.

This requires a lot of thought such as storing known physical dimension, colours, decals, weights, all of the possible properties and attributes of the objects. Aswell as this it is also a good idea to think about a way of storing methods for that object (like the method of opening a can and pouring it).

This is time consuming work to actually make yourself a standard way of defining all of the above and implementing an easily searchable database structure for it all.

You also have to think about what sensor might be used to detect it, so the database has to work with cameras or rangefinders etc (rangefinders can extract object shapes meaning that they can search a database and get a shortlist but with no color data, the object search isnt going to be precise but is still a functional tool).

Once you have the database structure made, youve then got to populate it(this could be done autonomously by a robot programmed to detect things with as little intervention as possible or wholly by the human user). This whole process is also going to be very time consuming for any one person alone.

Also we need to think of other storage methods so aswell as building a 3d cartesian style map of the object dimensions, maybe we can also add a 3d kohonen feature map for an additional way of storing the data

Im suggesting that to be a progressive community withiin robotics that we should all look to collectiovely make an robotics standard of object storage and retrieval and also create a basic object library so when people do reach this stage, they have a basic object set to work with.

There seems to be a lack of robots around the household to help with daily life and i think that the whole object definition and methods etc.. are the primary reason behind why there hasnt been much progress in the way of domestic robotics.

A standardized camera would definitely be needed, and likely a standard mcu. I had thought about playing with camera vision for a while now, but have too much on my plate for the time being. However, my thought process was to use multiple mcus for object detection. Basically something like this:

Basically a lot of seperate functions passed off to a lot of mcus to allow as close to real time processing as possible, allowing image processing to become very granular in nature. Another layer might be added between current layer2 and layer3, to allow individual part breakdown lookup in a table, which then would get fed to the last layer. Basically a break down that says "we have objects a, b, c, & d that in our database correspond to an eye, a human type nose, a human type mouth, etc", the last layer then says "we know all these objects types and layout are part of a human face, so this overall object must be a human face".

Look at internet databases as an example. All they do is hold data. You can access them with any browser or a databasing utility on any computer platform with whatever processor and also using different programming languages.

This is why there is a need to create a standard system so that it can be used by many more setups. Otherwise by the time you have your structure made and have data in it, the cameras you based it around will have ceased to be manufactured.

What you do with the data is up to the end user, if you want to use an elaborate 7 mcu processing system then fair enough (im likely to use a system like that myself!) but a data repository isnt the complete system its only a part of a system that can be plugged into and used.

For example:-

A rangefinder detects an object which it gets shape data from (lets say a shampoo bottle). It can then search the data base for a match in that data and it is likely to find that it is or resembles a shampoo bottle.

At the same time a system that uses a 640x480 pixel camera can also get that exact same shape data and also search the database, this time the camera can also search the results by colors or patterns on the bottle so it can better refine its results.

As long as the shape data etc.. is stored and searched for in a standard way then both types of senors above can utilise the database. Maybe the rangefinder will come back with 3 types of objects that have the same shape because it is limited in only having to search for the shape but it can still know that the object is a bottle or a container.

I hope there is a way to unify a data storage sytem like this. And i dont fancy tackling it alone! (all i get from colleagues when i explain the concept is "Good Luck!". Everbody seems to want to do it but its such a big task that people shy away from it)

I agree with Paul here. We don't need a standard computing platform because all we're doing is accessing information. I think by using standard web services and other real-time internet standards we can allow this information to be accessed by multiple systems.

Andrew, I do like your robust design for image processing.... perhaps, you'd be interested in just using a vision chip like the ARM9 CAP series to do that heavy lifting?

Ok. Now unless otherwise directed I am going to start planning the database schema for the information to be stored.

So far I'm thinking a 2D Cartesian style map? My concern with 3D is that visual processing might not be able to detect the "depth" of an object. So what we typically can end up with is the 2D representation and our "distance" to the object (assuming that we're facing the obstacle at 0 degree angle).

Exactly the type of thing that i would like to see. Ive not used roborealm but it seems that there is a database there to do this after all. Maybe we could kind of elaborate on that system? see how it works and wether it could be improved, such as adding an attribute for how any sounds the objects make and known methods of using them etc...

I would also like to see a database that everybody has on their systems which doesnt necessarily do a live internet search but rather maybe once a day connects up to a shared repository and downloads any updates and also uploads any new objects that it has discovered or maybe find a new method or attribute for an existing object, a system like this would be highly desirable and an extremely rich source of information

I don't think that the "database" that you're thinking of exists as a part of RoboRealm, it's a machine vision capability that can be part of a database.

I like the idea of a share repository and I'd be concerned about the size of the database for robotic applications. Maybe it's a good idea to have both 1) a central repository and 2) a internet search-able version?

Not to be argumentative, but if you are not going to use a standard camera (not meaning specific model, just resolution), then you will have to store your object shape data using vectors. And while I haven't seen any hobby level robot code to produce vector graphics from a robot captured image, it would provide for your best level of universal objects.

Forgetting about things like "this is a 1967 Cobra Streetjet" and "this is a 1967 Cobra Streetjet 1/10th scale model", you will have to have some type of standard object classification system otherwise you just have a database of data that is only useful to the person who put it there. With standard resolutions you have "a coke can is 92px across and 143px tall from X distance". With vectoring you can "a coke can has these proportions". I don't see another way you can have useful object shape classifications without using one or the other, standard resolution or vectored images.

The reason I brought up speciifying an mcu as a possibility is level of detail. An 8 bit processor, uses, well 8 bits. You could not realistically use an 8bit processor for the same level of colors and resolution that you can with a 32bit processor. Even the Internet databases have very specific formats (RFCs) that determine how each type of data is stored and delivered, as well as what that data may contain. HTML, RSS, XML, SQL, etc, etc. All have some type of standard in place to prevent the data from becoming a trashheap.

Andrew, I do like your robust design for image processing.... perhaps, you'd be interested in just using a vision chip like the ARM9 CAP series to do that heavy lifting?

Sorry to stray off topic, but I just read the specs on the CAP, very robust. If I ever get around to a vision system it definitely would warrant further investigation especially with the ability to add custom logic. Nice.

One more thing before I head off to get some much needed sleep. Before you get too far into this, I would recommend coming up with an object classification system very early on. It will make things much easier for not only people to upload their objects into the database correctly, but for people to be able to easily find the types of objects they might need for their own robot project when they download from the db. Something like the following:

You make some very good points and all should be considered. As I'm a Systems Engineer I'll look at all of the aspects that everyone mentions, thinks of , contributes, derives requirements , and see how they can be put together. :-)

Nice discussion, I quickly read it but I would like to mention a few things:

1)Such image database would take PETAbytes of data. Each virtual object should have a 3d model with textures. I doubt the robot will the object from the same angle every single time.

2)Search through this data base will require very fast processors. That means find files and doing all the comparations.

Also I disagree with standards. I think you are talking about protocols. Each robot is a different robot and demands different hardware. If all robots have the same standard hardware then all robots would be the same and would be limited to just a certain set of tasks. Protocols are like languages between machines. Even radically different machines, if they comunicate with the same protocol, then they will be able to transfer data. And this already exists: file formats are protocols, TCP/IP, USB, rs232-c and so on.If such an searchable image database was possible with current technology, then Google would have already implemented it on its image search. And Google has several neural networks built on powerful sever workstations, so if they cant do that then good luck doing this on a single ARM.

I think this kind of stuff will only be plausible after:1)the current computer arquitectures get extremelly fast and powerful.(lets face it: binary logic is very inneficient when it comes to intelligent tasks)2)a revolutionary hardware that works like the human neuron is invented. And after we understand how the human brain works as well, specially the vision and visual memory centers of the human brain.

Sorry if I sound pessimistic, but I really wishe such technology exited. But I already did my own reaseach in the past and that was my conclusion which, unfortunatelly, is very sad.

1)Such image database would take PETAbytes of data. Each virtual object should have a 3d model with textures. I doubt the robot will the object from the same angle every single time.

2)Search through this data base will require very fast processors. That means find files and doing all the comparations.

While I think the master database might wind up being pretty huge, I don't think the intention is for a robot to contain all that data. I might be wrong with the intent, but I see it like this:1) Office full of programmers like to drink Mountain Dew, but leave their cans floating around the office.2) Ingenious programmer who is into robots decides to create a robot to go around the office and clean up the cans using a vision system.3) Robot builder downloads the object information for a few items such as Mountain Dew cans, a trashcan, etc and stores this data in a lookup table on the robot.4) Robot builder saves a ton of time spent trying to record object information and classify it themself with the robot.5) Now that the cans are cleaned up, robot builder notices all the snickers wrappers floating around6) Robot builder accesses the object database to download the object information for candy bar wrappers and adds it to the robots lookup tables.7) Managment is happy that the programmer's area is now kept clean and gives the Robot builder a huge raise

I use the term "standard" because it is what the IETF calls such things

Can visual processing by a robot currently determine that the object is 3D? IMO 2D is a good working issue from visual imagery - please correct me if I am wrong.

Affordable 3d cameras are just around the corner (check out the zcam by 3dv systems, supposed to release to the public sometime this year), it would probably pay off to do models, but simple ones. Say, a simple cylinder for a coke can, it takes 170k for an autodesk inventor file.

Can visual processing by a robot currently determine that the object is 3D? IMO 2D is a good working issue from visual imagery - please correct me if I am wrong.

Affordable 3d cameras are just around the corner (check out the zcam by 3dv systems, supposed to release to the public sometime this year), it would probably pay off to do models, but simple ones. Say, a simple cylinder for a coke can, it takes 170k for an autodesk inventor file.

according to their website:

Quote

3DV Systems has developed a unique video imaging technology and camera for sensing distance in real-time between an imaging sensor and the objects in its field of view (i.e. the objects' depth), at high speed and high resolution. The technology, which is based on the Time-of-Flight principle, is described thoroughly in several publications by the company's founders and engineers, and is well protected by international patents

This means that they don't do anything different than what machine vision does using a camera w/ an IR sensor, Sonar, or an additional camera. This means that the "depth" is only representing the "distance" between your robot and the object. So, again, if you're looking at that coke can, you'll only be able to detect 1/2 of the can, or 180 degrees - revolute about the can (i.e. a slice) at that instance in time.

A full model is 360 degrees. So at most the database might want to contain "sections" of objects?

Camera systems can determine 3d with more than one camera. Its also possible to make assumption based on other factors (look at the vision tutorial on this site).

Maybe when objects are being created by the robots etc... they can make the basic entry and add to it.

Also searching etc.. even through a small sub database like this will become very slow which is why I suggested adding other systems such as a kohonen feature map for the objects which is in itself an implementation of a software based neural net.

Also i think that implementing a good kohonen style map for the objects will also allow searches for small sections of objects a lot easier or for example comparing a small section of the pattern rather than rotating an entire 3d object in every possible direction to see if it might fit

Camera systems can determine 3d with more than one camera. Its also possible to make assumption based on other factors (look at the vision tutorial on this site).

I'll only believe you on this one if the robot is looking at more than one viewing plane of the object. This is because your human eye can't see through objects either, and you can only visually see 360 degrees of an object if you take multiple views.

A single view won't reveal anything more than a section; which may in fact be a 2D or 3D section of that object.

To prove this: Look at the coke can from the front; and then with out moving yourself, your eyes, or the can - try to read the back of the can.

Either way, I think we're all mostly saying the same / similar thing.

Now back to building the database. Who wants to start providing raw visual sensor data that your robot sees? We need Content!

The 3d thing is nt to do with seeing the bak of an object......Its to do with seeing a depth of an object, therefore determining wether an object exists depending on wether it protrudes from its environment and therefore gaining some kind of shape to search for.

if you just haved a 2d image then its very difficult to determine a difference between 1 object and another, this is where the depth perceptions come in.

Of course, the database doesnt need these perceptions, it just needs the overall shape or to be more precise the dimensional graph. Its up to the end system or user to implement the detection systems.

It needs to be made clear now that this is just a database and that it is up to the system or user to interpret theri own data in order to use the database

Surely XML is a way forward. Databases are great for constantly changing data - but a Coke can doesn't change - although your view of it might. XML would also allow you to add all sorts of header tags that may help you to scan across lots of DTD/XML schemas. ie 'Its red' so it could be Coke but its 'Not a Pepsi'.

Can visual processing by a robot currently determine that the object is 3D? IMO 2D is a good working issue from visual imagery - please correct me if I am wrong.

Affordable 3d cameras are just around the corner (check out the zcam by 3dv systems, supposed to release to the public sometime this year), it would probably pay off to do models, but simple ones. Say, a simple cylinder for a coke can, it takes 170k for an autodesk inventor file.

I knew you guys would missunderstand me. You dont need a 3d camera at all. 3d cameras are good with you want depth and that is not the point. Without the 3d model database, you wil need lots of pictures of the same object from different angles otherwise your robot wont be able to idetify it. For example lets suppose your robot has a picture of a car in the database. This picture shows the right side of the car. But then your robot aproaches the car from the front. The image of the car will be frontal side of the car. The right side and the frontal side of car are shaped very differently(usless you heva a cubecar). If you have a 3d model in the database, the coputer will try rendering this mode in different angles until it finds one 2D image that maches the actual view from the robot camera. By the way, you guys should learn why and how 3d cameras are used before making an argument, otherwise your arguments would sound silly(that applies for any serious or academic discussion).

However I would like to mention that I liked the idea of downloading a just a set of items from a bigger database in some server. That is actually a nice idea. It is good to be able to dynamically change the settings of the machine. I thought the database would be inside the robot.

Finally I would like to mention you guys are focusing too much on the database problem when that is, in my opinion, the smallest one. I think that the actual image processing and recognition algorithm. is the real issuue here. Recogizing a ball or a line or even a can of soda(talking about only the shape) is relatively an easy and very specific task. If you are planning on building a database, that is because you are going to have lots of objects. And most objects in real world have colorfull, complex, assymetrical shapes. Programming for those shapes is going to be a nightmare. And you also have to program lots of filters because the conditions will also change from time to time. Finally for a complete robot you would like to write the following three programs:1)The easiest program, but hard. It would be selecting an (random) object from the database, and searching for it.2)The even (much) harder program. Finding random objects in the real world and looking for them in the database.3)The hardest: combining the above two in a single program.

So for problem (2) match a real world object against a database - then this would benefit from some high level indexing routines to quickly filter out wrong data. ie 'its red so it could be Coke', 'its purple so it could be Pepsi'. Sorry to labour the soda gag

This means taking some kind of sensor data and using it, or many of them, as a quick high level filter. This could be done on the 'robot' and then going to the database for the filtered results. e.g. no point checking if its a red hexagon, red pentagon etc if you know it 'aint red'.

Its a bit like using 'keywords' for searching web pages. You could also make this adaptive - so that if you took a long time to find the actual 'thing' in the database then you could automatically associate more 'keywords' against it based on your initial sensor input. So it would be found quicker next time.

Which brings me to Jesses point:

Quote

I would think that XML can't be sifted through as fast as a Relational or Object databases because of internal representation...

If the 'answer' is something concrete and unchanging - say a CAD diagram of the Coke can - then that data isn't going to change (its read only). Equally: the individual points that make up the diagram probably aren't that useful for searching puroposes. Its more likely that you want to find something thats a 'red cylinder' rather than it has a 'point at (5,1,6) in 3D coords'. In which case there's no point putting the CAD diagram into a database other than as a BLOB - in which case you might as well just store it as a file - and XML is great for that. XML is also a good format for you to run a process on to create the high level indexes.

So the problem is how to build the high level indexes that allow you sift out the relevant results and what sits on what machine. I would think the bulky, read-only data could sit elsewhere (eg on a server) but you could always load the indexes onto the robot.

I think you are missing the point that XML is text and needs to be parsed. Every time. It's a strength and weakness because it has ultimate portability, and readability, but sucks for very very large amounts of what could be binary data.

I'm imagining a 3d model, or 3d point cloud, stored in the data base for each object and then the model being looked at could be matched to objects in the data base with a baysean filter, but you would need access to your almost your whole data base to do that

I think you are missing the point that XML is text and needs to be parsed. Every time. It's a strength and weakness because it has ultimate portability, and readability, but sucks for very very large amounts of what could be binary data.

I'm imagining a 3d model, or 3d point cloud, stored in the data base for each object and then the model being looked at could be matched to objects in the data base with a baysean filter, but you would need access to your almost your whole data base to do that

Unless you thought of the system as a bipartite (2 part) system. The database can do the heavy processing and object matching, the robot can send in the "visual data" and just get back from the database "the answer". If the database just kept updating every time it got new data, then it would be "current" and hopefully "Fast enough" for the robot.

Well....... As someone who is now doing an M.Sc. in computer science, and has taken both grad level AI and vision courses....

You could have a database of images containing specific objects in focus, with their outline marked in some file, and some description of the contents of the image at those coordinates marked in some format of your choice... These images would not have to be taken from the same camera, or in the same environment.... In fact, the more variablility the better.... Why would this be useful? To train learning algorithms for vision. Many algorithms in AI require training data (artificial neural networks being the most obvious example). Others require some data from which to build a database of characteristics common to certain types of objects (eg: eigenvector based approaches, eigenfaces for facial recognition).

Anyways, the whole point of such algorithms is that they learn by themselves, by seeing many examples of a coke can, and many examples of a lamp, what would be the common characteristics to coke cans or lamps. What you can hope to do is provide such examples, along with some useful accompanying information (such as what object is in focus, and what its rough outline is).

You can't really hope to just lookup a specific image you just filmed into a database of images taken with a "standardized camera". That is never going to work... Never has, never will. And no, you won't get everyone to use the same camera... Especially because, you know, technology progresses, and whatever is your "standard camera" now, will suck in 10 years.

You can't really hope to just lookup a specific image you just filmed into a database of images taken with a "standardized camera". That is never going to work... Never has, never will. And no, you won't get everyone to use the same camera... Especially because, you know, technology progresses, and whatever is your "standard camera" now, will suck in 10 years.

Think about how a robotic hobbyist might go about sharing image capture object data between robots. If you were to build two robots that you wanted to share the same type of data between the two, you would probably want to ensure the cameras used on each robot supported the same kind of data, otherwise you are comparing apples to concrete blocks (metaphorically speaking).

That being said, I still believe the better approach is to have all captured data be in some type of vector format, that would have to be... standardized. (I love that word ) One possible solution would be to require all vector data in the database to use a minimum point to point length of 1 unit, allowing all robots that will use the data to have a set reference on how the data will be presented. Here's an example:We have two robots with two different cameras. Robot A has a camera with a resolution of 640x480. Robot B has a camera with resolution of 320x320 (just to make it oddball).Robot A captures an image of a rectangular white box that is 4 inches wide and 8 inches tall. The captured image is 200px across and 400px down. Robot A converts this to a vector where the width of the image is 1 unit, the height is 2 units. This gets sent to the database.Robot B downloads this image into its lookup table. Robot B then comes across a white rectangular box. The captured image is 50px by 98 px. Robot B converts this image into a vector making the shortest post (the width) 1 unit, the height comes out to be 1.96 units. It looks for an object in its database and finds a probably match (around 98%) with the white rectangular box (based on dimensions alone).Next might come color recognition or actual size determination based on distance or internal composition recognition (the vector objects that make up the object as a whole), or any of a number of other things.

Without the standard of minimum vector length of 1 unit, each robot will have to use up a lot more processing time to find matches as it upscales and downscales images to match. With the standard the robot builder/programmer knows how the data will be presented from the global object database, so can write his code to match.

The vector standard would also allow an easier integration of 3D objects in the database (not necessary, but can be added in the future). Once a full 3D object is in the database using the vector standard, the robot builder can slice off sections of that 3D object to store in a lookup table at say every 15o rotation providing a higher "probability" match. Or, with a fast enough mcu, can dump the entire 3D object into the robot and do rotational comparisons on the fly.

The idea is to make the database useful for robotists around the globe.