Streaming multi-byte UTF8 characters not being parsed correctly
#13

Comments

When streaming data into jsonparse that consists of multi-byte utf8 characters, if a data chunk splits a multi-byte character, jsonparse does not properly reconcile the character between data events. I wrote a quick demo repo to show this behavior and started writing blog post to explain the issue in more detail (not finished). In the meantime check the demo repo out, it has the current implementation and proposed patch working. For more context on this issue see this thread with @mikeal discussing where the "proper" place to reconcile / parse mutli-byte utf8 characters is. I already have a proposed fix written up for jsonparse with test cases, but wanted to open an issue first and get your feedback before I made a PR.