On Pokemon GO Scanners and Mapping Systems

Written by andy on {{ "2016-08-11T16:00:00+00:00" | date "longDate" }}

The Cat and Mouse Nature of Things

I've been thinking; this is quite a cat and mouse game we're playing with Niantic here. While as of this moment, we are winning over the Unknown6 wrench, it is going to be very much an up-hill battle from this point forth. However, conturary to what is largely believed, I don't believe new UnknownVars would be the big issue we'd need to deal with in the long run, but rather, the accessibility of data.

As Niantic clamp down with IP banning on the various Cloud service providers, and rate limiting on residential connections, it will become increasingly more difficult for services such as PokeVision and FastPokeMap to operate. Compound that with everyone wanting their own piece of the pie in having accurate scanners for their own areas of interest, we get more and more people running multiple accounts on their own systems, covering overlapping areas, increasing the loads on Niantic's servers, further forcing them to clamp down harder, and thus creating an ever lasting vicious cycle.

There are several parties on the right track to address this issue. SkipLagged and various community have proposed and even implemented centralized systems where people can contribute data. However, from my perspective, the correct long term solution have not been implemented yet. These centralized systems all tends to limit how these centrally collected data could be accessed (typically through their website with some sort of ulterior motive be it donations or promote their other commercial services). Furthermore, short from the first party implemented these solutions, no one can on-demand request for map updates of certain region (going back to the centralized display).

It is nobel to think that "Maybe people will stop using scanners and maps once Niantic fixes the in game tracking", but all it takes is one missed Snorlax, Lapras or Dragonite, and we're all back to square one. So, short of Niantic coming up with a read-only API with realtime spawn data, which they've already made it very clear that they don't want to do, the cat and mouse game continues.

A Possible Solution

I'd like to propose a possible solution. Due to the nature of this solution, despite me trying to simplifying things down, it might still get too technical for some users. Please bear with this until it gets refined enough to be clear for all.

Part I. Establishing the Objective

The Objective of this solution is to create a semi-decentralized system, where users can join this said semi-decentralized system to contribute to the scanning process, as well as consume newly collected data from the community. Thus allowing people to create mapping solutions catered to their own areas of interest by reusing community data, without adding significant load on to Niantic servers or other scanner's infrastructure. The collected data should also be fingerprintted, such that it would be possible to avoid 'poisoning' of the collective database.

Part II. Identifying the Components / Separation of Concern

Due to the nature of such system, it is important to separate all components to the lowest denomination possible to help us make it flexible and scalable. Here's a non-exhaulstive list of components that pops to mind:

Centralized Refresh Request Queue: Software running in a cloud solution to collect refresh (scan) requests from the Centralized Refresh Request Rate-Limiter in order to create Refresh Job which it can dispatch to a small subset of Scanner Workers in close geographical proximity (e.g.: only route requests of Seattle Washington area to Scanner Workers in Seattle Washington).

Centralized Refresh Result Validator: Software running in a cloud solution to validate Refresh Job Results from Centralized Refresh Result Intake Queue, before publishing Validated Refresh Job Result to the Centralized Refresh Result Publishing Queue.

Display Software: Software capable of displaying information in a Local Database. This could be map display such as PokeVision, list display such as Pokemon Go Radar on Pebble, or other forms of display at the discretion of the author.

Local Data Collector: Software capable of subscribing to the Centralized Refresh Result Publishing Queue to capture new Validated Refresh Job Results to the Local Database.

Local Database: Local copy of data collected by the collective, collected from the Centralized Publish Queue.

Local Refresh Rate-Limiter: Software system capable of rate limiting and clustering requests for refresh to be put into the Centralized Refresh Intake Queue, such that a single instance of the Display Software used by multiple users does not add stress to the collective system.

Passive Scanner: MITM software that can passively monitor for points of interest during game play, and submit data to the Centralized Refresh Result Intake Queue.

Scanner Hardware: Hardware capable of running Scanner Software.

Scanner Worker: Software capable of receiving a Refresh Job to scan for any current and future points of interest on the game, and publish Refresh Job Result to Centralized Scan Result Intake Queue.

Part III. Possible Technologies

On the 'decentralized' aspects, the components are technology agnostic; that is, the technology can be different between each piece, as long as the communications are standardized. Resulting in situations where it should be entirely possible to build a Mapping Software using PHP, accesing Local Database built on PostgreSQL, populated by a Python based Local Data Collector, with a Local Refresh Rate-Limiter implemented in Node.js. Similarly, it should be entirely possible to have a single Scanner Worker implemented in C++. It is important to choose the right tool for the right job, and that should be an exercise left for the implementer to complete.

On the 'centralized' queues, there are many suitable solutions available. Tools such as MQTT or other subscription based message queue system could be suitable candidates for the front-facing portion. This is because ideally, the system should have minimal knowledge of the decentralized Scanner Workers/Display Softwares, so it doesn't have to maintain a centralized list of out going connections to each client, which harms scalability. For the internal facing queues, tools as AWS SQS or other equivalents may be suitable candidates due to the ecosystem already available in place for scaling.

Part IV. Exchange Protocols

Recently, I've grew fondly of Youtube's API... not so much the API itself, but rather, the typed responses. A sample response for a Channel List request on Pokemon GO's official channel looks like this:

I would propose the communications between components exchange data using JSON with clear Entity type, and expiry tags, so that each component can understand and validate Entities being exchanged.

Part V. Data Validity and Anti-Poisoning

Systems will only be used if the system is creditable. This is why there needs to be some level of centralization (registry and processing queues), despite of the decentralized nature of the design. It is the job of the operator of these systems to rate-limit Scanner Worker instances, and vet the Refresh Job Result so that the system doesn't get poisoned by malicious users.

A possible approach for this is a per-user-per-IP based rate limit for Scanner Worker registration, combined with Refresh Job Result validation process to weed out malicious users. For example: Service operator may require registration before allowing users to register Scanner Worker or Local Data Collector instances, and further impose a Scanner Worker registration on a rate limit such that it is difficult to have a large swarm of workers operated by the same user. Each Refresh Job is issued to several Scanner Workers, and the Refresh Job are compared against each other. I wanted to share the early draft to see if there are enough interest to move this further.