The increasing functionality of smart speakers also leads to a growing attack surface for hackers. In 2019 research by SRLabs unveiled two scenarios hackers might abuse both Alexa or Google Home to spy on users. The vulnerability allows third parties to create functions that use vishing (voice phishing) methods or eavesdrop on the user. The researchers demonstrated the hacks by creating voice applications for both device platforms, turning the smart assistants into ‘Smart Spies’.

Third-party developers get access to user inputs

Both Skills (Alexa) and Actions (Google Home) can be activated by calling out the invocation name chosen by the developer: “Alexa, open Netflix”. Users can then call functions (Intents) within the application by speaking specific phrases: “Play Star Trek: The Next Generation”. Users can input variable arguments as slot values in these phrases. The input slots convert to text and sent to the application’s backend, which Amazon or Google is typically independently operating.

How hackers may abuse smart speaker development functionality

Through the standard development interfaces, the SRLabs researchers were able to compromise the data privacy of users in two ways:

  1. Request and collect personal data including user passwords
  2. Eavesdrop on users after they believe the smart speaker has stopped listening

The ‘Smart Spies’ hacks combine three building blocks:

  • The “fallback intent”, a voice apps default response when it cannot assign the user’s last command to an intent and should offer help: “I’m sorry, I did not understand that. Can you please repeat it?”
  • To eavesdrop on Alexa users, the team exploited the built-in ‘stop intent’ that developers can modify even after the application has passed the platform’s review process.

  • Finally the researchers exploited the possibility in the Text-to-Speech engines that allows developers to add long pauses in the speech output.

Hack 1: Requesting a user’s password

It is possible to ask for sensitive data such as a password from any voice app. To create a password phishing Skill/Action, a hacker would follow these steps:

  1. Create a seemingly innocent application that includes an intent triggered by “start” and takes the next words as slot values. This intent behaves like the fallback intent.
  2. In the test, the researchers altered the welcome message to an error message, tricking the user into thinking the application hasn’t started: “This skill is currently not available in your country.” This change was made after the app had been reviewed and didn’t prompt a second review.
  3. After the error message the team added a long audio pause by making the voice app “say” the character sequence “�. “ (U+D801, dot, space). Because this sequence can not be pronounced the speaker remains silent but is still active. Repeating the sequence increases the length of this silence.
  4. After a while the silence ends by playing a phishing message: “An important security update is available for your device. Please say start update followed by your password.”.

Now anything the user says after “start” is send to the hacker’s backend. That’s because the intent, which acted like the fallback intent before, saves the user input after the “start” as a slot value.

Hack 2: Eavesdropping on users

In a second experiment, the researchers at SRLabs were also able to listen to conversations after users believed to have stopped our voice app. To accomplish this, the team had to adapt their strategy for each of the voice speaker platforms.

Eavesdropping with Amazon Alexa

For Alexa devices, the voice recording is started by the user calling certain trigger words, that are defined by the developer of the skill. This can also include common words such as “I” or words indicating that some personal information will follow, like “email”, “password” or “address”.

To researches created an eavesdropping skill following these steps:

If the user tries to end the malicious skill, they hear a goodbye message, but the skill continues to run for few seconds. If the user starts a sentence beginning with the word selected in step 1 within this time, the intent will save the sentence as slot values and send them to the attacker.

Eavesdropping with Google Home

Google Home devices provide a more potent hack, as specific trigger words are unnecessary, enabling unlimited monitoring of user conversations.

Google allows the user to be trapped in a loop, continuously sending recognized speech to the hacker’s server with short silences in between.

To create such an eavesdropping Action, a hacker follows these steps:

  1. Create an Action and submit it for review
  2. After review, the researchers modified the main intent by utilizing the Speech Synthesis Markup Language (SSML) to play a recording of the Bye earcon sound. This change sets expectUserResponse to “true,” typically indicating the completion of a voice app. After that, several noInputPrompts are added, consisting only of a short silence, using the SSML <break> element or the unpronounceable Unicode character sequence “�. “.

  3. Create a second intent that is started whenever an actions.intent.TEXT request is received. This intent returns a short silence and defines several silent noInputPrompts.

After outputting the requested information and playing the Bye earcon, the Google Home device waits for approximately 9 seconds. If it cannot detect any speech input, the device “plays” a short silence and waits for user input again. The Action actually stops if it doesn’t detect any speech input within 3 iterations.

If it detects speech input, the second intent starts. This intent only consists of one silent output, again with multiple silent reprompt texts. Every time it detects speech, reprompt count resets.

The hacker obtains a complete transcript of the user’s subsequent conversations until there is a 30-second pause without speech input. Hackers can prolong this pause by extending the duration of silence, temporarily pausing the eavesdropping.

In this state, the Google Home Device will also forward all commands prefixed by “OK Google” (except “stop”) to the hacker. Therefore, the hacker could also use this hack to imitate other applications, man-in-the-middle the user’s interaction with the spoofed Actions and start believable phishing attacks.

The researchers would also have been able to also request the corresponding email address and try to gain access to the user’s Amazon or Google account.

Conclusion

Alexa and Google Home are powerful, and the smart devices can be very useful, especially in private environments. However, their implications for privacy are reaching further than what many users might know. Users need to be aware of the possibilities for hackers to use malicious voice apps to abuse their smart speakers. While using a new voice app, approach with a similar level of caution as installing a new app on your smartphone.

Amazon and Google need to implement better protection, starting with a more thorough review process of third-party applications made available in their voice app stores. The voice app review needs to check explicitly for copies of built-in intents. Unpronounceable characters like “�. “ and silent SSML messages sneeds removing to prevent arbitrary long pauses in the speakers’ output. Texts that include “password” should receive special attention and disallowed due to their suspicious nature.

The original research was done by Fabian Bräunlein (@breakingsystems) & Luise Frerichs and published on SRLabs.com.

The researches shared their findings with Amazon and Google through their responsible disclosure process.