Dionysis Zindros wrote:
> I'm developing an application using TidyLib and C++. I want to tidy up
> certain HTML code, but I want to whitelist certain tags. e.g. I would
> only like to allow <strong>, <ul>, and <li>, but not <table>, <tr>,
> <td>, <script>, and so forth.
>
> If that isn't possible, would it be possible to do some kind of
> blacklisting instead?
>
> Also, one last question: Is it possible to use a similar mechanism to
> whitelist/blacklist attributes on particular properties? For example,
> I might want to allow the attribute "name", but not the attribute
> "class" for "input" tags. How would one go about doing that?
No, tidy is a "pretty printer" not an html stripper.
What you're looking for is something like the perl "hstrip" script from:
http://search.cpan.org/src/GAAS/HTML-Parser-3.56/eg/hstrip