We need to be able to identify the crawler to avoid counting visits. Until now that was easy but now, using random user agents, we’ll need to use a IP blacklist (we can’t use robots, we don’t want to block).

Is this new behaviour ok or just some tests? Is there any safe way we can identify the crawler now?

I’m not sure I come to the same conclusion - the second IP is not owned by the Twitter network, and I also note that in the three different posts above, the “extra” request has a different user-agent string every time. I’m still at a loss to explain how this is happening.

Hi,
To add some info to what @ivanguardado posted, I can confirm that in all the tests we have performed, the user-agent of the extra request seems totally randomized, which reminds me of the behaviour some crawlers use to bypass protections and pretend to be legit requests.
Since this reproduces with an external tool (requestbin), it seems to leave out an issue with our infrastructure at least.

Absolutely - to be clear I’m not suggesting that this is an infrastructure issue on your side or that your tools are not correctly reporting the behaviour… but I do not believe that Twitter’s network or services are directly responsible for the additional requests. This leaves us with a mystery

We’re seeing the same thing happen – we send links via Twitter DM and they are being requested without user action from a bot that’s using a morphing UserAgent (initially it was Opera/9.80 (J2ME/MIDP; Opera Mini/9.80 (J2ME/22.478; U; en) Presto/2.5.25 Version/10.54), but it seems like we can’t rely on that).