If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Ok, I downloaded it, but it's all too complex for me. I just need a simple preg_replace for just this (stripping extra <tr> <td> <table> and </tr> </td> </table>) function. Any ideas? The thing is I don't have any issues with anything else, just tables.

The best way to do this is by parsing/tokenizing it - that means DomDocument or similar, and HTMLpurifier is far easier than that. The preg_match function is easier, but the regex you'd need would get very complicated, and you'd still run the risk of making things worse by accident.

The best way to do this is by parsing/tokenizing it - that means DomDocument or similar, and HTMLpurifier is far easier than that. The preg_match function is easier, but the regex you'd need would get very complicated, and you'd still run the risk of making things worse by accident.

I've looked into this issue again and what I actually need is to strip all extra </table> tags. Just those. When extra closed table tags are gone everything seems to be formed ok. Will it make making a preg_match or regex script easier?

You'd still have to count them (both opening and closing tags) and make sure they're in the right order. That means lookaheads and sub-pattern matching - no, it's not any less complicated. In fact, once you implement it for one kind of tag, it's not really much more work to do it for all of them.

Also consider that extra <table> tags aren't the only thing that can ruin your markup; and a ruined layout isn't the only risk of allowing users to input HTML. I *highly* recommend using HTMLpurifier if you allow user-submitted HTML, if only for the security benefits.

You might ask at RegexAdvice if you really want to pursue a preg_match solution.

I *highly* recommend using HTMLpurifier if you allow user-submitted HTML, if only for the security benefits.

I AM putting security in place. I am "training" my markdown to filter out anything that can launch an attack or anything that can be used to take advantage of the site. I never thought I'd have the issue with tables. I am seriously considering just stripping them all if I can't resolve this thing. Why do you think it is so hard to resolve something so simple? I've seen preg_match or preg_replace scripts that do AMAZING and complicated things. And here, all I need is for a script to remove extra open </table> tags - and yet it is such a hassle? I posted this same question on two other forums and everybody seems to be having a "just forget it!" type of attitude.... It's kinda frustrating.

You'd still have to count them (both opening and closing tags) and make sure they're in the right order.

That's what I was going to say, in response to the latest post here. It's not an easy task. It's possible-- browsers manage to do this. But you'd need to fully parse the HTML of the page. One option would be to limit the scope of what you're doing to something like single (or dual) level tables, so that you only have one table (or two) at most, and you don't need to worry so much about subpatterns, but this really is complicated.

The issue isn't that preg_match can't do this relatively well, but that to get a perfect script (with zero exceptions) it would be incredibly complicated-- as I said, you'd have to parse all of the HTML on the page to be certain nothing conflicts.

So your options:
1. Do nothing.
2. Fully parse all of the HTML.
3. Simplify the parameters (such as not allowed embedded tables).
4. Settle for an imperfect (but generally working) solution that covers maybe 75-95% of the possible problems, depending on how you write it.

I am "training" my markdown to filter out anything that can launch an attack or anything that can be used to take advantage of the site.

The problem here is the difference between a whitelist and a blacklist. If you use a whitelist, then you will only allow those things that are approved and are known to cause no problems (while blocking harmful and harmless other things). If you use a blacklist, as you are suggesting, then it will block all known bad things (while letting everything else-- good or bad) through; the problem with that is that if you just don't know about something (or some new hacking technique is invented) then you will have no defenses at all. There *are* ways to create a working blacklist by overdenying possibly good code, such as removing all HTML, but it doesn't sound like that's what you want either.