Crowdsourcing the evolution of text parsing with unified

New projects and how you, like ZEIT and Gatsby, can help out

unified’s interface used to transform markdown to HTML while adding other features

unified is a tool for manipulating content using syntax trees. MDX is an extension that lets you use markdown in JSX. micromark is a new parser we’re planning to make manipulation super fast. Today, we’re announcing the unified collective to fund the development of all three. And we’d love you to get involved.

First, what’s unified?

unified is a friendly interface backed by an ecosystem of plugins built for creating and manipulating content. unified does this by taking markdown, HTML, or plain text prose, turning it into structured data, and making it available to over 100 plugins. Tasks like text analysis, preprocessing, spellchecking, linting, and more can all be done through compatible tools, and even chained together.

This and more is possible thanks to unified’s plugin pipeline, which lets you typically write one line of code to chain a feature into this process. It’s also possible to stitch together content from different sources and output it as a single source.

From a maintainer’s perspective, all this new traction comes with an immense amount of customer support, where maintainers are spending their evenings responding to questions posed as issues. The stress that comes with working on highly used open source ecosystems and the ever-increasing amount of issues results in more time spent on maintaining existing code, instead of creating new things.

Organisations under the unified umbrella

Announcing the unified collective

Today, we are pleased to announce the creation of the unified collective. It’s an effort to bring together like-minded organisations to collaboratively work on the innovation of content through seamless, interchangeable, and extendible tooling. We build parsers, transformers, and utilities so that others don’t have to worry about syntax. We make it easier for developers to develop.

Of course, we also want to thank all the lovely contributors across the ecosystem who have helped us to even get to this point by reporting issues, writing utilities and plugins, and submitting all kinds of improvements!

To be able to deliver on our mission, we need to start maintaining unified in a sustainable way, create a better ecosystem, and grow by adding new projects. We’re doing just that today: unified is expanding, with MDX and micromark.

MDX joins forces with unified

Next to existing low-level organisations under unified — such as remark for markdown, rehype for HTML, retext for natural language — we’re excited to announce that we are partnering with high-level projects as well. MDX is joining unified 🎉

A large part of MDX’s success has been leveraging the unified and remark ecosystem. I was able to get a prototype working in a few hours because I didn’t have to worry about markdown parsing: remark gave it to me for free. It provided the primitives to build on. It makes sense for these projects to come together and make each other better.

MDX is powerful. It’s markdown for the component era. It lets you write JSX embedded inside markdown. That’s a great combination because it allows you to use markdown’s often terse syntax (such as # heading) for the little things and JSX for more advanced components. MDX is useful for a JAMStack application, injecting dynamic data into a document, or building slides in mdx-deck.

Introducing micromark

micromark is a new, tiny, and fast, markdown parser written in TypeScript under the unified umbrella — micromark/micromark.

We believe evolving unified shouldn’t just be about new high-level features, like MDX, but also about rethinking core mechanisms. That’s where micromark comes in.

In March 2019 markdown will be turning 15. Over the years it has become ubiquitous, but as it wasn’t formally specified, many flavours emerged. Most of these flavours continue to serve their purpose but ever since GFM (GitHub Flavored Markdown) settled on using CommonMark as a base, it became more or less the de facto style.

The original Markdown.pl, and CommonMark as well, focused on making writing websites as easy as writing an email. Nowadays, markdown is used to do all kinds of different things. It’s used to create slides or to generate man pages. It’s supported in major CMS’s and is the language most developers document their code in. Things like Gatsby and MDX attest to the fact that this syntax is reaching a new era.

A new project is needed to support standards like CommonMark and GFM but also support extensions like MDX, while still being fast, small, and modern.

Something like remark, but on a lower level: a lexer (in nerdy terms 🤓). Syntax trees have many good things, but they do come with the downside of having a big memory footprint and sometimes being more than what you need.

We’re launching micromark as just an idea. The first line of code still needs to be written. But we imagine it to be:

small in file size, max 10 kB minzipped, and tiny in memory use

fast in speed, compared to existing parsers on real world documents

safe to use, it should safely work on untrusted content by default

compliant to CommonMark but extendible for GFM, MDX, etc.

complete, in that it should give access to all info in the source document

But it’s not:

something that creates HTML and the like: other projects use micromark for that

something that creates a syntax tree: remark will use it to do just that

micromark will likely not be something you’d directly interact with, unless you’re interested in working on parsers, but it will make high-level tooling better.

Be part of the change

We’re invested in making unified and the ecosystem under it better. We believe micromark should exist. And we need your help.

For example, you could contribute in the following ways:

Use the projects, and let us know through spectrum or GitHub issues what was hard to figure out, so we can improve the docs

Discuss. Just excited but want to keep it simple for now? Head over to spectrum and start a conversation!

Being an open collective

Open Collective allows unified to collect money from backers and sponsors in a transparent way. We need your support…

to pay out core maintainers for project leadership

to finance non-coding work, like technical writing, community consulting, etc.

to get our remote team together in real life

to do fun things for the community, such as getting stickers to people that contribute

Both individuals and companies can back our mission. You can help make unified sustainable by becoming a backer, starting at $2 per month, or an official unified sponsor, starting at $100 per month. As our way of saying thanks, we list backers and sponsors on our main GitHub repositories. Sponsors will also appear on unified.js.org and get a shout-out on Twitter. 🥈 Silver ($500+) and 🥇 Gold ($1000+) sponsors additionally get access to help chats with core maintainers.

This is just the beginning

With our early sponsorship we’ll be able to make the ecosystem better starting today. micromark will go into development shortly and it should be ready on markdown’s birthday, March 15, 2019. In the meantime we hope to be as transparent as possible on what we will be doing and you can expect more blog posts to keep you in the loop. For more information, find us on GitHub and visit unified.js.org. If you have any questions already you can ask them on spectrum or tweet to us @unifiedjs.

These are exciting times for unified and open source in general. We strive to improve the quality and possibilities of the organisations that make up a sustainable unified collective. Rethinking its core with micromark and joining with high-level organisations like MDX, are the first two steps we’re taking to do just that.

Together, thanks to sponsors, we can build the most friendly, secure, fast, and extensive bridges between content formats.