Complex string matching based on list items

list2 = ['There1', '1 are 1', 'helloooo', 'you'] #list of random strings that have similarity to list1's items but not in order

list1 and list2 have the same number of items all the time.

I would like to fix the items in list2 by finding a closest match for each item in list2 from all the items in list1 and substituting that closest matching word to list2 instead, or adding it to a new list and returning that list. Since list1 and list2 have the same number of items all the time, I would also like to make it not possible to have two items from list2 match to a single item in list1 - ie each item in list2 corresponds to a unique item in list1. list1 will never have 2 or more identical words. A possible output (if item matches are added to a new list):

First you need to decide how exactly you define the closeness of two strings. One approach would be to use the Levenshtein distance.

Once you've decided on that and implemented your chosen metric in Python, you can simply iterate over the items in list2 and for each item find the closest item in list1 (according to your chosen metric). After you choose an item, set it to None in list1, so it isn't chosen again. (If list1 should not be mutated, make a copy of it at the beginning of the method).

First you need to decide how exactly you define the closeness of two strings. One approach would be to use the Levenshtein distance.

Once you've decided on that and implemented your chosen metric in Python, you can simply iterate over the items in list2 and for each item find the closest item in list1 (according to your chosen metric). After you choose an item, set it to None in list1, so it isn't chosen again. (If list1 should not be mutated, make a copy of it at the beginning of the method).

Hi, I would like to use the method you have suggested. How would I be able to implement this on my script?

To implement the Levenshtein distance there's pseudo code on the wiki article I linked to (and I'm sure there's a python implementation flying around somewhere on the web as well). The rest should be straight forward.