Two things - Seach for name exact or simialr to user input, and name recognition

Hey,

I have an assignment for school, not too hard just stuck on a couple of things.

One the teacher ecpects us to search for a name the user inputs and match it to the closest match, or couple of matches i guess, never done this, so just a little confused on how to do it.
And the other thing is he gave us a txt file to read and write info to, such as fname sname pnumber etc, anyway we have to input the text and store it in the array, which is not hard but we are to expect that the first name and surname can both include white space, for example two first names and one surname or any other combo, the thing is we cant make any markers in the txt file such as a _ between the names which we get rid of in output. Any ideas would be helpful

Thnx in advance.

03-27-2008

anon

So, what are the couple of things you are stuck with? What do you have so far?

Neither looks like a trivial task. The second seems quite impossible (if there are no rules, I can't see how a program should sort out where the surname starts).

03-27-2008

grimuth

Well ive started the code now going on the assumption that the person only has one first name and one surname, just so i could finish the rest of the functions but my problem is the algorithm, i just cant think of a way, i dont want any code, im just not sure about the logic behing both issues:

1)searching for a name equal too or similar to user input
2) input from txt without any markers etc, assuming there can be white space in both first and surname.

03-27-2008

matsp

if you have NO markers, there is no possibility to determine what is a first name or last name. It's just not possible:
Peter James Osboure 1234567890

What is first name and last name of that? It's not possible to tell the difference between "Peter James" "Osbourne" and "Peter" "James Osbourne".

I would query the teachers thoughts on this, because it is either not well thought out, or you have misunderstood something.

--
Mats

03-27-2008

grimuth

thats what i thought but im waiting for his reply, anyway any ideas about the searching for similar names ?

03-27-2008

laserlight

Quote:

any ideas about the searching for similar names ?

You could search the Web for "soundex", though I am not sure if such an algorithm is overkill and your teacher just wants you to search for matching substrings.

03-27-2008

Perspective

>1)searching for a name equal too or similar to user input

You could look up (and implement) edit distance. Then you just find the record with the smallest edit distance to the query.

03-27-2008

matsp

Aside from soundex, you can do "least difference matching", which would involve counting how many characters in a row matches, and display the "most number of matching in a row". Or how large a proportion (e.g. a percentage) of the characters that match the search-term.

These are both fairly simple algorithms to use, so shouldn't be too difficult to implement.

--
Mats

03-27-2008

grimuth

Thanks Heaps for help, that should keep me going :)

03-27-2008

Perspective

>Or how large a proportion (e.g. a percentage) of the characters that match the search-term.

Becareful of this one, &#37;100 precision says nothing about recall. (i.e. the entire alphabet as a query string will have a %100 match to every name, but likely isn't close to any of them).

Matching the longest prefix is probably the best simple thing to do, from there I'd try computing edit distance.

03-27-2008

matsp

Yes, single chars and percentage are not a good working model. But then editing distance or soundex aren't going to be HUGELY better when the user enters a single letter - sure they will be better, but the results (on a large phone directory) for entering a single letter will still be hopelessly useless.

--
Mats

03-27-2008

Dino

If the struct with the first name and last name are saved in "struct" format, there is no issue with determining what is the first or last name.

If a struct is not used, similarly, you could use fixed length fields, like the fname is 30 bytes and sname is 40 bytes.

As for a fuzzy match when searching, you could keep stats like how many characters matched, how many characters that matched were in order, etc.

Todd

03-27-2008

matsp

Since the input file is a TEXT file, you should not store structs in it. But using fixed width would of course be a possibility for storing first and last name.