s9e\TextFormatter, a text formatting library that supports BBCodes and most other features found in forums, implemented as plugins. I'm its author. This RFC is about replacing the current BBCode/smilies/autolinking/censor routines (hereinafter referred to as legacy) with s9e\TextFormatter. I originally wanted to focus on providing support from the s9e\TextFormatter side of things while leaving the phpBB-centric code to somebody more proficient in it but I've recently come to the conlusion I should work on integration directly. This is why I'm currently implementing this RFC.

The goal is to close as many BBCodes-related bugs as possible and get to an equivalent level of functionality, backed with decent testing and within a reasonable time frame. As far as schedule goes, my goal is to be done before September 2013. I intentionally leave out other improvements (other than the more robust BBCode parser) for future RFCs because I don't want this RFC to die of inactivity or lack of consensus.

This RFC does not replace the legacy routines, it complements them. Old messages are still handled by the old code, but new messages are parsed and displayed by the new text formatter. Editing an old message will transparently convert it to the new format.

Altered functions/methods

acp_bbcodes::main()

acp_icons::main()

acp_words::main()

decode_message()

generate_text_for_display()

generate_text_for_storage()

strip_bbcode()

message_parser::parse()

Differences from 3.1

The markup inside quote's author is not parsed, e.g. [quote="[b]author[/b]"] -- One exception: up to 1 url tag is supported

Up to 1 blank line around block-level elements (quotes, lists, etc...) is ignored. In 3.1, if there's a blank line after a quote it is rendered as two <br>.

Censored words are marked at posting time. Changing the words list does not affect old posts. The censor can still be turned off at rendering time. Censoring does not apply to BBCode attributes. That means you can't prevent links to a given site by censoring its domain name.

No server-side syntax highlighting in code blocks.

Default limits are set on the number of occurence of individual BBCodes in text (1000 of each), global number of BBCodes in text (10000) and nesting level (10.) Same thing for smilies, the default is 1000 smilies per post. Past this limit, the markup is ignored.

There can't be BBCodes named E or CENSOR because those names are used for smilies and censored words respectively. This can be changed to something else, including namespaced names such as e:e or foo:bar. See the discussion.

✓ Legacy messages are preserved and displayed with the legacy routines✓ Legacy messages can be edited (but then they're parsed with s9e\TextFormatter)✓ New messages are parsed with s9e\TextFormatter✓ New messages are properly displayed in viewtopic✓ Editing BBCodes, smilies and censored words in the ACP regenerates the new parser✓ Default BBCodes are supported: B, CODE, COLOR, EMAIL, FLASH, I, IMG, LIST, *, QUOTE, SIZE, U, URL✓ Per-style BBCode templates✓ Custom BBCodes are supported, provided they are safe to use (e.g. no XSS)✓[attachment] BBCode✓ Posting limits

✓ max_*_font_size

✓ max_*_img_height / max_*_img_width applied to [img]

✓ max_*_img_height / max_*_img_width applied to [flash]

✓ max_*_smilies

✓ max_*_urls

✓ {LOCAL_URL} doesn't prepend the board's URL✓ Tabs in code blocks are preserved, legacy routines replaces them. This might be the occasion to change this behaviour, preserve tabs and use a JavaScript syntax highlighter✓ Handles viewcensors, viewflash, viewimg and viewsmilies✓ Takes care of toggling BBCodes, magic URLs, smilies, IMG BBCode, FLASH BBCode, QUOTE BBCode, and URLs in parse_message::parse()✓Cron job that removes old renderers from the cache Nope, piggyback on the "Purge all cache" button in the ACP instead.✓ Error message when using an unauthorized BBCode in poll options✓ strip_bbcode()✓ Polls✓ Look into how text is cleaned up for the search engine✓ Copy the tests from tests/bbcode/parser_test.php and instead of testing the parsed text, test the rendered test. Mostly done✓ That path_in_domain() thing Many other things I forgot to list and that you should remind me

Last edited by JoshyPHP on Mon Mar 02, 2015 11:08 pm, edited 47 times in total.

Great to see this is being worked on. I haven't seen a pull request for the work so far. Would you mind creating a pull request (labeled WIP if it's not yet complete) against develop? It would be nice to watch the progress on this.

I've created a WIP pull request. It might be noisy because I try to commit regularly and I often push so that Travis can keep an eye on it.

I'm currently looking into how phpBB's tests work, especially wrt database. Also, I'm not sure where which tests should go so for the time being I put them where it makes sense to me and I'll set aside some time to organize them properly.

Alright, I have a few questions for which I couldn't find a definite answer. It's about tests:

I'm dumping compiled templates to cache/ -- can I hardcode the .php extension or do I have to use %core.php_ext%/$phpEx ?

the normal code saves files to cache/ -- where should those files be saved during testing?

same as production

in the normal cache/ dir but with a different extension

sys_get_temp_dir()

use vfsStream instead

something else?

what's the proper type hint for $cache? Or rather, is there a proper type hint for the cache service that also work with the mock cache?

Also, I noticed that most files use dirname(__FILE__) instead of __DIR__. I assume this is legacy code predating PHP 5.3 and I use __DIR__ in my code.

Finally, status update: I'm currently working on refactoring the new code as a proper service. I don't intend to refactor the legacy code as a service, but I'm trying to account for that possibility so that it can be done later on.

If the library dumps the files itself and you only specify the directory, the extension is not important.

For testing, you can either store them in the cache directory or in tests/tmp/.

The mock cache is a driver, not the cache service. You can either setup the service with that as the driver or just pass the driver if that's all you need and use the driver interface or the service to typehint.

The old BBCode system can be removed from the system at a later time in a different PR. I wouldn't refactor any of that code. Most of the phpBB code that uses the BBCode parser should be using the generate_text_for_ functions (brunoais is working on this). I would modify those functions to parse with your system instead and not modify any areas that parse BBCode directly as those may be transitioned to the generate_text_for functions before your PR is merged (it would just cause conflicts).

Alright, I've changed the cache dir to point to tests/tmp/ and I'm typehinting the cache driver interface, as that's what other services seem to do. I don't intend to touch the legacy code, so no worries there.

I've just pushed a big update that implements the new text formatting as a service. Or rather, as services. When formatting text, the three big operations are parsing, unparsing, and rendering. In the legacy code, they're implemented as the global functions generate_text_for_(storage|edit|display). Here, they are implemented as 3 services: text_formatter.parser, text_formatter.unparser, text_formatter.renderer. Those 3 services correspond to 3 abstract classes detailed below, implemented by 3 concrete classes that act as adapters for the s9e\TextFormatter library. I used abstract classes rather than pure interfaces because I wanted to keep as much of the logic (e.g. what the default options are based on the user/config/auth) out of the adapters as possible.

I'm trying to keep any code specific to s9e\TextFormatter in the adapters, although there's a couple of operations that I haven't standardized: cache invalidation (e.g. whenever BBCodes change) and detecting whether a message should be displayed by the legacy code or the new one. (for the time being, I've hardcoded a simple regexp)

I've also created a new test case helper, to create those services in a couple of lines.

$phpbb_container = new phpbb_mock_container_builder;
// ...here you can optionally add other services such as user or auth...
$this->get_test_case_helpers()->set_s9e_services($phpbb_container);
$phpbb_container->get('text_formatter.parser')->parse($text)

It may not look like much, but I can understand why there were almost no existing tests covering text formatting, because setting up a working parser is actually quite tedious. Now that I can do that more easily I'll be able to move forward with my TODO list without spending as much time figuring out tests.

My next item is testing whether the posting limits (e.g. image dimensions, number of URLs and so on... there's a dozen of them) are enforced. And while I'm doing that, I'd like to discuss per-style templates. First off, does anybody even implement custom BBCode templates? The BBCodes admin panel doesn't offer a way to create multiple templates, so I guess that only the base BBCodes are styled with custom templates? I don't know much about phpBB styles, so I'd need more info about that.

Last edited by JoshyPHP on Wed Aug 14, 2013 2:36 am, edited 1 time in total.