Re: Continuous benchmarking setup

Currently, there are 3 snowflakes :)
- Benchmark setup: https://github.com/TomAugspurger/asv-runner
+ Some setup to bootstrap a clean install with airflow, conda, asv,
supervisor, etc. All the infrastructure around running the benchmarks.
+ Each project adds itself to the list of benchmarks, as in
https://github.com/TomAugspurger/asv-runner/pull/3. Then things are
re-deployed. Deployment requires ansible and an SSH key for the benchmark
machine
- Benchmark publishing: After running all the benchmarks, the results are
collected and pushed to https://github.com/tomaugspurger/asv-collection
- Benchmark hosting: A cron job on the server hosting pandas docs pulls
https://github.com/tomaugspurger/asv-collection and serves them from the
`/speed` directory.
There are many things that could be improved on here, but I personally
won't have time in the near term. Happy to assist though.
On Mon, Apr 23, 2018 at 10:15 AM, Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
> hi Tom -- is the publishing workflow for this documented someplace, or
> available in a GitHub repo? We want to make sure we don't accumulate
> any "snowflakes" in the development process.
>
> thanks!
> Wes
>
> On Fri, Apr 13, 2018 at 8:36 AM, Tom Augspurger
> <tom.augspurger88@xxxxxxxxx> wrote:
> > They are run daily and published to http://pandas.pydata.org/speed/
> >
> >
> > ________________________________
> > From: Antoine Pitrou <antoine@xxxxxxxxxx>
> > Sent: Friday, April 13, 2018 4:28:11 AM
> > To: dev@xxxxxxxxxxxxxxxx
> > Subject: Re: Continuous benchmarking setup
> >
> >
> > Nice! Are the benchmark results published somewhere?
> >
> >
> >
> > Le 13/04/2018 à 02:50, Tom Augspurger a écrit :
> >> https://github.com/TomAugspurger/asv-runner/ is the setup for the
> projects currently running. Adding arrow to https://github.com/
> TomAugspurger/asv-runner/blob/master/tests/full.yml might work. I'll have
> to redeploy with the update.
> >>
> >> ________________________________
> >> From: Wes McKinney <wesmckinn@xxxxxxxxx>
> >> Sent: Thursday, April 12, 2018 7:24:20 PM
> >> To: dev@xxxxxxxxxxxxxxxx
> >> Subject: Re: Continuous benchmarking setup
> >>
> >> hi Antoine,
> >>
> >> I have a bare metal machine at home (affectionately known as the
> >> "pandabox") that's available via SSH that we've been using for
> >> continuous benchmarking for other projects. Arrow is welcome to use
> >> it. I can give you access to the machine if you would like. Hopefully,
> >> we can suitably the process of setting up a continuous benchmarking
> >> machine so that if we need to migrate to a new machine, it is not too
> >> much of a hardship to do so.
> >>
> >> Thanks
> >> Wes
> >>
> >> On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <antoine@xxxxxxxxxx>
> wrote:
> >>>
> >>> Hello
> >>>
> >>> With the following changes, it seems we might reach the point where
> >>> we're able to run the Python-based benchmark suite accross multiple
> >>> commits (at least the ones not anterior to those changes):
> >>> https://github.com/apache/arrow/pull/1775
> >>>
> >>> To make this truly useful, we would need a dedicated host. Ideally a
> >>> (Linux) OS running on bare metal, with SMT/HyperThreading disabled.
> >>> If running virtualized, the VM should have dedicated physical CPU
> cores.
> >>>
> >>> That machine would run the benchmarks on a regular basis (perhaps once
> >>> per night) and publish the results in static HTML form somewhere.
> >>>
> >>> (note: nice to have in the future might be access to NVidia hardware,
> >>> but right now there are no CUDA benchmarks in the Python benchmarks)
> >>>
> >>> What should be the procedure here?
> >>>
> >>> Regards
> >>>
> >>> Antoine.
> >>
>