These scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers. Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder.

The study looked at seven of the most popular session replay providers — Yandex, FullStory, Hotjar, UserReplay, Smartlook, Clicktale and SessionCam. Among the top 50,000 sites on Alexa, 482 employed session replay services.

Session replays are intended to highlight usability issues, show the relationship of one navigational choice to another and find out if sessions with similar types of goals take similar paths, among other objectives.

These replays show the screen exactly as the user would see it, with the cursor movements and clicks, pages shown, interactions undertaken and so on. Customer service departments, for instance, might use such replays to better understand the bottlenecks users might have in self-answering certain questions.

But the Princeton study, the first in a series of “No Boundaries” posts about how third-party scripts on sites extract personal info, shows that session replay scripts can reveal much more sensitive kinds of information.

Redaction tools

A given session, for instance, can reveal which medical condition a user is interested in finding out about, or it can include confidential information — such as email addresses or credit card numbers — that are entered into online forms.

This confidential information can be transmitted to the third-party services. The Princeton study notes that the services often offer manual and automated redaction tools to remove the collection of sensitive info, but avoiding capturing such data on dynamically generated webpages would require that the site owner analyze the server-side code — and do so every time the code is updated.

To test the redaction tools, Princeton set up test pages and installed replay scripts from six of the seven providers.

It found that passwords were often included, such as from mobile-friendly logins, and that the kinds of redacted fields differed by provider. Yandex, for instance, doesn’t redact credit card number fields, while FullStory does.

Brandon Dixon, VP of product at cybersecurity firm RiskIQ, told me that session replay scripts should be opt-out by default so that users in the post-GDPR age would have to actively give their consent each time.

He also recommended that session replay providers be more transparent and uniform in the kinds of information they collect, how it is being used and how it is being stored.

About The Author

Barry Levine

Barry Levine covers marketing technology for Third Door Media. Previously, he covered this space as a Senior Writer for VentureBeat, and he has written about these and other tech subjects for such publications as CMSWire and NewsFactor. He founded and led the web site/unit at PBS station Thirteen/WNET; worked as an online Senior Producer/writer for Viacom; created a successful interactive game, PLAY IT BY EAR: The First CD Game; founded and led an independent film showcase, CENTER SCREEN, based at Harvard and M.I.T.; and served over five years as a consultant to the M.I.T. Media Lab. You can find him at LinkedIn, and on Twitter at xBarryLevine.