Download a file with Headless Chrome, Node.js and Puppeteer

I recently had a go with Headless Chrome and Puppeteer to download
bank account statements.
Browser scripting has never been that easy, up to date and closer to a modern development stack.

One thing has been harder to coin though: handling the download of a file and hand it over to Node.js.
This blog post documents how to achieve it.

Some Context

The content I was headed to automate the download is tricky to obtain:

there is no direct nor predictable download URL

it is placed behind a login screen

the download is bound to a multi page process

each page writes something in a server session

The download eventually starts when one has submitted the various forms in the right order.

Puppeteer Page and Browser API

I found Puppeteer implementation quite clever: the browser is manipulated directly from the Node.js app itself thanks to the DevTools Protocol.
I find this move interesting because it provides a better feedback loop to the software.
Our browser scripts are now closer to the headless browser.

Puppeteer has several concepts but 2 of them are of our interest when
automating browser actions:

Browser API: it's what happens at a browser level

Page API: it's what happens in a browser tab

We can navigate in a page, intercept browser requests before they even reach a page and click on elements.
The Promise-based flow makes it is easy to script alongside async/await.

The Download Issue

One thing seemed quite different though: the download of the bank statement
triggered a download.

I could not see the download starting by looking at the browser events:

My script would end up nicely: the download would have been triggered but no data were written on disk.

Fetch Forest, Fetch!

I saw other people reporting the sameissue.
Download would not be triggered in headless mode, no matter what is attempted.

This comment led me to think the answer was… to not submit the form per say.
But rather to evaluate code in context and use fetch() to submit the form and pass the resulting response to Node.

So instead of banging my head around these two lines:

1234567

const [response] = awaitPromise.all([ page.waitForNavigation({ waitUntil: 'networkidle0' }), page.click('form[name="telechargementForm"] input[name="btConfirmer"]'),]);// I expected the statement to be the body of the navigated pagereturn response.buffer();

I had to instead evaluate these ones:

123456789101112131415161718192021

const result = await page.evaluate(async () => {const form = document.querySelector('form[name="telechargementForm"]');const data = new FormData(form);// if the button value is not part of the request// then the download is not prompted data.append('btConfirmer', 'Confirmer');//return fetch(form.action, { method: 'POST', credentials: 'include', body: data, })// I'm expecting to download a CSV so it's "safe"// It is actually sent as latin1 instead of utf8… .then(response => response.text());});// CSV data as plain textreturn result;