A few questions scratch an itch around perl6 grammars and raster (binary in general) data. For what I understand, the text approach is to work at the grapheme-level trhough grammars, may we approach raster data that way ? Can we make custom grapheme definition to approach raster data or a basic unit of binary data to parse them using Grammars ?

Seeing that perl6 is defined by perl6 grammars, can we define similar grammars as kind of "validation" test with a basic case being if the grammar can parse the data, the data is well-formed and is structurally validated ? Using this approach for text data, it is kind of obvious with grammars as the basic unit are text-oriented but can we customize those back-end definition (by example, it's possible to overwrite the :sigspace to make rules and tokens parse with a another separator for grapheme) to enable the power of grammars in the binary data territory ?

Thanks!

For the background part:

During the past few weeks, I begin to learn-ish Perl6 by personal interest. After seeing this talk at FOSDEM 2019 and I begin to ask myself (and the people around me) about using using grammars to inspect/parse binary data. My usecase will be for example to replicate the Cloud Optimized Geotiff validator without the support of a GDAL binding (I didn't see one yet in perl6). It's clearly a learning project for me.

A short time ago, this article was published on using Perl 6 grammars for GFX3 files blogs.perl.org/users/sylvain_colinet/2019/01/…, which I understand is a binary format. So I understand it can be done, although of course you'd have to put the GeoTIFF in grammar form to parse it.
– jjmereloFeb 3 at 16:34

1

@jjmerelo Didn't found that link, thanks, i'll dig in it. But as I understand, perl6.c actually didn't natively manage binary data. For example in the blog article linked : "Since grammars does not really support pure binary data you have to pass your data that you store originally in a buf as latin1 encoded string. " The logic in the article is what I'm looking for, thanks!
– notagoodideaFeb 3 at 21:36

1

@notagoodidea here's an answer to the "what's perl6's definition of a grapheme?"; perl6 adheres to unicode's algorithm for grapheme cluster rules; here's a link to the algorithm concisely shown: unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules - I hope that makes things clear! There is no mechanism in perl6 to change how graphemes work for strings, at least to my knowledge
– timotimoFeb 3 at 23:36

Use or write a binding using NativeCall to bind to dynamic libraries who follow the C Calling Covention ;

Use or write a native perl6 module.

Concerning the parsing of binary data, I'll split the subject in two parts :

Generally speaking ;

Leveraging Grammars ;

1. Generally speaking

Leveraging the P5pack module or using Inline::Perl5 to use the unpack/pack is actually (with perl6.c) the best to parse binary data structure (the former seems favoraed as it's native module).
Go to see first comment from @raiph to a SO anwser showing a basic use case.

2. Leveraging the grammars

With perl6.c, grammars can only parse text.
However, the question about parsing binary data seems to be moderatly hot (based on feedbacks seen on the #perl6 irc channel) and a few to document, yet not implemetend, seems to pave the way with a hope to see it happens in a future (near or distant?).

One of the main point, IMO, is the related to the grapheme definition based on the UTF-8 one. If we were able to overwrite the grapheme definition to a custom one for specialized grammar as we can for now overwrite the :sigspace modifier to affect what is the separators for rulesand tokens, we will access a new way to operate around data structure and grammars. For now, the grapheme is defined in the string-level not the grammar-level or meta. See @timotimo comments linking to the UTF-8 document describing the Grapheme Cluster Boundary Rules.