Match a Location tag in Apache httpd.conf with python

Hi,
I need some help parsing an httpd.conf file using Python. I'm unable to match a multiline block of text for some reason. I am trying to open this file to read and match the <Location /testing> </location> block and print out the matching text only. Can someone please provide me some Python code to do this. HEre is the sample text:

Firstly, the content of the httpd.conf is broken (from the XML point of view). The <Location /> at the line 7 means an empty XML element (a single tag where the closing </Location> is not expected after). The "/" must be added as some attribute of the element. The same holds for the line 13. The line 16 is not paired with any opening <Limit> tag. The line 17 must be </Location> as XML is case sensitive.

I do recommend to use the standard xml.etree.ElementTree module for parsing XML files instead of the regular expressions. Try the following as the start point (docs.python.org/library/xml.etree.elementtree.html):

Thank you for the help. It seems to work, but I also want to match the </Location /testing> </location> tags as well. Also, even after looking at this code for a while, I still don't understand what it is doing. Is there any way you can summarize the logic here? It seems like you have somehow tagged the matched block, but I don't understand how.

The f is the open file object. You can directly use it in the loop to iterate through the lines of the text file.

The status variable and its testing inside the loop is a simple implementation of so called "finite automaton". When processing the file "by hand", you think the way: Read the lines in the loop and process them. If I am outside the interesting idea, ignore the lines. Once the start line was found, I start to be interested (status changed to 1). If the ending line was found, status again changes, and I ignore the rest of the lines.

The usual temptation is to use a boolean variable to express "inside the collected lines area". But this way you can express only two states. If the two states are not enough, you can often see that programmers introduce another boolean variable. But this way things are going to be complicated, and the future maintenance is more difficult.

With the single automaton variabl, the code looks more complicated at first, but you can think about each section separately. You can test and decide to switch to another section via assignment to the status variable. It is easy to find the part of the code that takes care of the situation -- see how it can be modified for your purpose below.

The yield command makes the function a generator that returns lines on-the-fly. You can use it for feeding a loop, or you can process it via other means that expect an iterator -- see the end of the code where multiline string is constructed using the same generator (instead of the for loop):