OnlineSchemaChange rebuilt in Python

In 2010, Facebook open-sourced OnlineSchemaChange.php, a tool to perform MySQL schema changes while minimizing downtime. We are happy to announce that an improved version written in Python is now available on GitHub.

Making schema change easier

OnlineSchemaChange.php was initially implemented to make DDL on MySQL less cumbersome. It covers more use cases that native Online DDL supports and provides more features.

As we continued to use the PHP version, we found design constraints that made it hard to add and test new features. So we decided to rebuild it with a more flexible architecture and rewrite it in Python, which is more widely used in the operations world.

To learn more about how OSC works and how to use it, please visit its GitHub wiki page.

What's new

Easier

The original open sourced OSC was more like an engine than a tool. Users needed to write PHP code wrapping to run the schema change, and, with PHP becoming less popular in the operations world, OSC.php wasn’t widely adopted by the community.

After taking feedback from that community, together with OSC.py we’ve created a standalone CLI. With all supported options provided as a parameter on the command line, a wrapper is no longer required. Users simply need to Git clone the repo, and they’re good to go.

Users can also integrate the logic deeply inside their infrastructure if the tech stack is based on Python. We’ve implemented OSC.py so the main logic is a portable module. With the support of the plugin, you'll be able to interact with your internal system before/after each stage/operation during OSC.

Testable

OSC.php wasn’t originally designed as an open source project, so we didn’t include a reliable way to test changes and accept pull requests efficiently. Unreliable testing also became an issue when we tried to add features ourselves. These constraints made us realize we’d need to rewrite OSC to make it easier to test.

Rewriting OSC for Python allowed us to carefully design the code structure in an easily testable form. The Python language allows us to do a thorough unit test for each code piece. We also borrow the idea from MySQL server's integration test case, and implemented a similar integration framework. Now if we want to ensure a change won't cause regression, we don't need to write Python code. We simply write SQL files and put arguments in a JSON configuration file then we are able to trigger the test against a real MySQL instance with a one-line command.

Reliable

OSC.py incorporates a data consistency check to avoid losing data or causing corruption. With the support of data consistency check we're much more confident when rolling out our changes to production. In our use case, this feature has proved itself being a good way for edge case detection and protection.

Future

While migrating to RBR replication for our production environments, we are also working on triggerless schema changes in OSC. We also plan to support utilize native online DDL in OSC to avoid unnecessary logical table rebuilds.

We are happy to work with the community to improve OSC and make it more efficient and cover more use cases. Feature requests and bug fixes from the community are more than welcome!