First task is to take all the 25 thousand html-files and to strip out - (parse) the therein contained adress-sets. This is a Perl-task! Sure thing!

That is an html parsing task, sure thing! But since you haven't shown any of the html, it is impossible to know how to extract the data. But...you will need to use one of perl's html processing modules, like HTML::TreeBuilder to extract the data you want from the html.

Quote

how should i do this second task!?

In the data you extract from the html page, look for a string that matches a regex that begins with 'ID-Number:, and then capture everything after the colon. For example:

see this site here - this is the page where i gathered the information: http://schulweb.de/de/schulsuche/liste.html?trefferzahlauswahl=alle&x=29&y=9&kategorie=&region=de&auswahl_1=0&auswahl_2=0&auswahl_3=0&suchtext=

I have gathered all the results of this page: Treffer 1 - 10517 von 10517 this means i have more that 10 000 files in a folder - i got this with httrack - a good tool:

So i have all pages with the detailed information http://schulweb.de/de/schulsuche/einzelergebnis.html?Id=3122800&treffer=623&auswahl_1=0&auswahl_2=0&auswahl_3=0&suchtext=&kategorie=&region=de&trefferzahlauswahl=alle&trefferzahl=10517&list_anfang=0&sort=

This contains a set of information:

1) i have more than 10 000 files in a folder - all look the same. They contain informations. i want to gather this set of information.

2) If i can parse one file - then i am able to do it with all the ohters

3) How to parse to get the information (the above mentioned aresses with 5 lines of text [see also below])

4) after having the adresses - i have to get the URL - it is written down in a combination of an id-number.

5) the adress-data-set contains this id-number. I only have to add this to the URL and then i get the