User:Ffledgling/Senbonzakura

This service, I'm calling it Senbonzakura (or 'SBZ' for those who prefer TL;DRing everthing) will generate partial MAR (Mozilla ARchive) files for updates from Version A to Version B of firefox on demand.

Benefits

Update generation as a service rather than a step that simply happens during the build process. This makes the updates available to a wider audience, although the consequences of doing so are a little unclear at the moment.

Open Issues

These are a list of 'issues' that have no definite solution at the moment, but are important in some way or the other and thus need to be kept note of.

Figure out tool versioning.

Integration with Releng API (need to talk to dustin after we have a concrete prototype)

Parallelizing the MAR build process further by using separate celery workers or subprocess calls to fetch the MARs and do diffs on larger files (ref: Level 2 caching)

Do we need end-to-end testing? Mozmill has a suite of tests called update tests that apply a MAR and check if the update applied correctly, can we/do we want to use this to test our prototype? How is QA affected when we change the way we generate our updates? Can they still test if Firefox updates correctly? We might want to talk to Henrik(:whimnoo) or Clint(:ctalbert) eventually. (See conversation snippet at the end)

What do we want to use for our Caching layer? Why is X better/preferred over Y?

There seems to be some confusion about whether all the required tooling will be available somewhere (even in-tree) for some of the older Firefox versions (talk to bhearsum & catlee)

Use SHA-512 or another instead of MD5

Other open issues?

Pertinent Questions

Subset of Open Issues, using this as a scratchpad to note down issues and later polish them and move them upto the Open Issues section

does the client require the request to be synchronous or asynchronous?

Caching

This service will be dealing with and generating a lot of files. It therefore makes sense to have an underlying caching layer that stores the generated and downloaded files/tools.

The caching layer can be implemented in a number of ways, some of the initial ideas being: - As storage on Amazon S3 - As a shared NFS file-system - Local storage on the nodes (probably not the best way)

There are certain requirements that are imposed on the caching layer, and more might be added as the requirements for the caching layer clear up. Some of these requirements are as followed:

Must be agnostic to the file type being stored in the cache.

Accessing the cache Must be much faster than directly accessing the files via a direct download.

The caching layer should provide an identifier that can be used to uniquely identify and reference the files in the cache.

The caching layer should ideally have fast read, write and lookup, but in a toss up between all the 3, lookup and read need to be the faster operations (they will ideally be used much more than anything else)

OPTIONAL: A method to access files via the identifier over the network, so that clients/users can directly access the files in the cache without Senbonzakura acting as middle man.

There are two levels of caching that are planned for this service, detailed as follows:

Level 0

This level simply keeps track of the downloaded files and their hashes on the worker's local file system. This cache is not persistant and is not meant to be, this is simply a cache that exists for convenience.

This Cache level has not been stubbed out yet and may or may not make it into the service.

Requires Discussion

Level 1 Caching

This level does caching at the MAR level. Downloaded complete MARs are cached to save bandwidth and improve speed during the Partial MAR generation phase.

Partial MARs are stored in the Cache after generation and are returned after a lookup in the Cache when requested for by the client.

Each of the files are identified by a unique identifier which at the moment is the MD5 Hash of the file for lack of a better function.

Level 2 Caching

A lot of the bigger stuff between releases like the XUL libs on every platform remain the same despite different locales, this locale independent stuff should probably be cached and re-used. The level will cache the files inside the different MAR versions.

The idea is to not re-do already done work by diff-ing files or to be aware of the files that don't need to be diff'd.

If we take the example of the XUL binary, it is an extremely large binary that takes a very large chunk of the total time it takes to generate a partial MAR. If we can recognize that the XUL binary has not changed, we can skip the binary diff'ing step and this should theoretically save us a lot of compute time and resources. If we also manage to cache the binary diff of two different XUL runners, this diff is useful to cache and keep track of because, we this is likely to be common across all firefox version updates regardless of locales, so it should help us speed up partial MAR generation after we have the diff'd binary as long as we can recognize the duplication effort.

The actual recognition logic will be separate from the caching layer and ideally a part of the par generation/diff'ing service.

Implementation details

Dependencies

Nearly everything we use is pip installable for the application, but the host machine must provide a few things that might not be pip installable. The known ones are:

RabbitMQ (or anykind of message queue to be used by Celery)

Virtualenv

Python 2.7

File Structure

api.py This file contains all the Flask related code for routing and handling the API call parameters.

cache.py This is currently a stub file that contains function prototypes for the caching layer.

core.py This file contains all the core logic for Building and generating MARs.

Things to take care about:

Use a resilient retry library while fetching (bhearsum's redo is a good one to look at)

Catching Exceptions and raising the correct exceptions at different parts in the code. Currently a lot of places have a commented out raise these need actual custom exceptions and need to be raised. These and other exceptions need to be caught and handled properly so that the build does not fail in between and if it does there's enough traceback or logs to debug.

Replace all the print statements with logging statements and LOG ALL THE THINGS ~!

Unit-test ALL of teh things!

determine which version of the mar, mbsdiff tools to use, use them. These probably need to be cached as well, maybe based on own version, maybe based on gecko version, simply keep a function that decides and determines which one to use and points you to the right one. Use the one given by that tool, assume abstraction. We might have to cache these as well based on the version of update paths we're given.

cache the generated partial mar file based on the update path or based on a combination of the hashes of the input mar files. Where and how the partial mars are actually cached again depends on our caching strategy, we simply use our abstraction functions.

Tooling

We need to figure out how which tools to use with any given combination of CompleteMAR files. There are atleast three different versions of these tools and there is no central location for these tools.

Tools also fall into two categories:

The partial mar generation scripts.

The mar and mbsdiff binaries.

These live in separate locations and it might be in our best interest to consolidate them.

To be able to decide which tools to use with the targeted version of firefox, we need to figure out a Tool Version --> FF version mapping. To the best of my knowledge and based on feedback from Ben and Catlee such a mapping does not exist at the moment and will need to be built as part of the project going forward.

How do we handle fetching/Building/using the tools? Issues: - Tools like mar and mbsdiff are built as part of a firefox build. Their source code exists in Mozilla Central, but the complied binaries are built as part of the build and available on FTP.m.o after a build has been completed, do we pull the source in and compile them? Do we keep pre-compiled versions at hand? - To move to central repo or not to move to a central repo, that is the question. - As ranted about above, versioning.

Note on Scaling, Resilience and Caching

It is probably best to design for scalability, resilience and caching from the ground up so things to keep in mind are:

Retry retry retry

Log more than enough to debug (See Things to care about above)

Have our application/service start up from a config file

Do not trust your machine to store state, keep it on disk or on file? We now use an SQL database to do this.

abstraction abstraction abstraction?

How do we optimize our caching? It will depend on caching strategy and underlying caching layer in use.

Signing and Certs

Still very hazy on how this plugins into the rest of the system, where it's needed and how if at all it changes things. Feedback needed by catlee, nthomas, bhearsum

Implementation questions

Does it make sense to modify the script? Probably not, because we have no control over the older scripts

How do we fetch the tools? Just the ones we need without cloning all of MC.

Deliverables

I do not have a concrete idea of the deliverables so everything below is subject to possibly radical change, but for now, this is what makes sense to me:

Prototype 0.1

The intial prototype will simply be a bunch of python that essentially simply takes the input MAR urls, diffs them and spits them out

Prototype 0.2

The second prototype starts to add the caching functions, resilience logic, mar/mbsdiff tool versioning logic and generally attempts to map out the entire structure/flow of code. Should probably have some ideas about the certs as well at this point in time

Deliverable 1.0

Have all the basics services up and running with our partial Mar (Level 1) caching up and running, should ideally try deployment on a machine in the cloud and let it run for a bit to see how things go

Deliverable 1.x

Change things around based on feedback from various team members, fine tune the system, add features requested and most importantly iron out glitches and swat those bugs.

Unit Tests

Unit-Test as much code as possible

Docs

Keep documenting stuff being done. Using this Wiki as general documentation purposes. Use Sphinx for API level documentation.

Relevant Links

IRC Conversation snippets

Conversation with Henrik re: Browser update testing

22:15 < ffledgling> I was wondering if it's possible to use mozmill to test browser updates
22:15 < ffledgling> but with custom MAR files and make sure they applied correctly?
22:16 <@whimboo> browser updates? thats something we are doing for a long time
22:16 < ffledgling> I think I found some tests that do what I want with the actual updation from offical servers -- http://hg.mozilla.org/qa/mozmill-tests/file/tip/firefox/tests/update/testDirectUpdate/
22:16 <@whimboo> the only thing you would have to do is to set the right update url
22:16 < ffledgling> whimboo: yes, but I want to use a custom MAR
22:16 <@whimboo> right
22:17 < ffledgling> ah, can you point me to how I can configure that?
22:17 <@whimboo> as said you would have to modify the update server url
22:17 <@whimboo> app.update.url
22:18 <@whimboo> just change that pref and ensure to send correct update snippets