Boffins throw Amazon Alexa on the rack to extract hidden clues

Investigators can look forward to better thumbscrews for making digital assistants squeal

Last year, police in Bentonville, Arkansas, investigating the death of Victor Collins, demanded that Amazon turn over audio recordings that may have been made by an Amazon Echo device in his home.

Amazon initially resisted the warrant, but in March, James Bates, charged with Collins' murder, consented [PDF] to the release of the data in the hope it would exonerate him. Bates has pleaded not guilty.

This appears to have been the first publicly reported case involving the interrogation of a digital personal assistant system. It won't be the last.

Fortunately for investigators, computer boffins believe they can make smart speaker forensics a bit easier.

In a paper [PDF] to be presented at the Digital Forensic Research Workshop (DFRWS) in Austin, Texas, next month, researchers from the Center for Information Security Technologies at Korea University in Seoul, South Korea, describe software they developed for gathering cloud and client-side data from the Echo/Alexa ecosystem through undocumented APIs.

The software, dubbed CIFT – Cloud-based IoT Forensic Toolkit – is designed to find data on servers and mobile apps that might not be available to investigators.

CIFT was developed in Python and the researchers say they plan to release the source code at a later date.

The researchers – Hyunji Chung, Jungheum Park, and Sangjin Lee – used the Charles web debugging proxy to discover the private API endpoints used by Amazon's system. They did so using a session created with valid credentials; CIFT isn't intended as a tool for hacking accounts.

"[W]e can acquire forensically meaningful native artifacts from the Alexa, such as registered user accounts, Alexa-enabled devices, saved Wi-Fi settings (including unencrypted passwords), linked Google calendars, and installed skill lists that may be used to interact with other cloud services," the paper says.

The researchers observe that through the API, they found a lot of JSON data related to cards, activities, media, notifications, compatible devices, and todos (text conversion between DOS and UNIX formats) with UNIX timestamps.

"This may provide sources of evidence that allow reconstruction of user activities with a time zone identified by device-preference API," they state in the paper.

They also found data that includes a URL pointing to user voice recordings stored in the cloud, making it possible to download those voice files using the utterance API.

"In a situation where existing tools and procedures cannot meet the demand for this emerging IoT system, our findings and proof-of-concept tool will be helpful for investigators attempting to work in the Amazon Alexa environment," the paper says. ®