I could not geocode a subset of my dataset (around 140,000 observations) which contains full postal addresses from Germany, i.e. zip code, city and street. Apparently, the geocoder failed to convert the addresses due to mistakes in the data. Eyeballing led me to the impression that the most prevalent reasons are the following

minor spelling mistakes in terms of the city name. Example: "Neunburg v. Wald" instead of "Neunburg vorm Wald" or "Hessheim" instead of "Heßheim"

missing part of the name. Example: "Pohlheim" instead of "Pohlheim-Watzenborn"

wrong match between zip code and city name (maybe because the address was recorded before a change in the zip code occured, e.g. two zip codes were merged)

This is why I would like to "auto-complete" / "auto-correct" the city names. I could imagine to do the following: If I had access to a database that contains all zip codes and cities for Germany, I would make a list of "suggestions" among which I choose the one that is "closest" to the wrong city name. The way one could go about the list of suggestions is to match all cities that have the same plz (not a one-to-one mapping) or match those cities from the database that start with the same first x characters (under the assumption that the error does not occur in that range). Then, one could pick the closest string (city name) based on an approximate string match algorithm like the Levenshtein algorithm.

This leads me to two subquestions:

Is there a possiblity to extract this information (zip codes and city names) from the OSM database (I do not have an installed instance though)?

Thank you so much for your offer! I will get back to it if I cannot solve it with the engines you mention above. Photon seems to be very successful with a few addresses I tried, I am very impressed. I decided to give it a try but unfortunately I do not have any experience with (Geo) APIs. Is there a way I can run a Java Code on my machine (using Eclipse or whatever IDE) that makes queries for each entry of addresses in a csv file? I couldn't find a straight forward tutorial
– JhonnyMar 8 '15 at 10:17

Address parsing & standardization is not a trivial problem. It's especially hard for developing countries. We have been working on this problem for the past 9 months and I can say that there is no silver bullet. Our algorithm works really good at such mistakes and even at more complex ones. It currently works on Turkey.

I've been testing our algorithm against online geocoders and I can say that Here Maps API is really succesful at fixing small mistakes in the data. Google, Yandex, Bing and others have low performances on wrong/mispelled addresses.