Tristan Penman's Blog

Chromecast Deep Dive - Part 1

Protobufs and Device Discovery

Last Updated: 22 March 2023

Welcome to my series on the inner workings of Chromecast. This aim of this series is to do a deep dive into the inner workings of Chromecast, and the underlying Google Cast protocol. This knowledge was gained while building my own Chromecast receiver software, and reverse engineering aspects of the protocol that weren’t officially documented.

For those who would like to play around with working code, I’ve used all of this to write my own Chromecast receiver in Golang, which has now been published on GitHub.

Contents

Introduction

If you’ve used a Chromecast before, you may be familiar with the Cast extension in Google Chrome (now built in to the browser). Or you may have used the Cast functionality available on an Android device, to mirror your screen or play a video on a Chromecast.

Maybe you’ve never seen a Chromecast. In that case, here’s what a boxed first-generation Chromecast looks like (I happen to have two of these):

First-generation Chromecasts

While the Chromecast design has been through a number of iterations, the basic concept remains the same. It is a small USB-powered device, that you plug in to your TV via HDMI. Basic capabilities include Wi-Fi and Bluetooth, and hardware accelerated video decoding, all in a low-power embedded device.

Key features include screen mirroring (as mentioned above), as well as third-party app support via a stripped down Chromium browser running on the device.

Google Cast Protocol

The protocol underlying all of this is called Google Cast. This is Google’s proprietary protocol for launching and controlling applications on a Chromecast device, and for sharing video and audio content to Chromecast-compatible devices. While our main focus in this series is screen mirroring, we’ll also look at how the Google Cast protocol has been designed to accomodate custom Chromecast apps.

But first, we should talk about the difference between Chromecast Clients and Chromecast Receivers.

Clients and Receivers

As the naming convention suggests, a Chromecast Receiver serves primarily to play video/audio content, or to mirror the screen of a Chromecast Client.

Note however, this will not always take the form of video/audio content being streamed directly from Client to Receiver. A Chromecast Receiver is capable of hosting many different Applications, which are capable of receiving application-specific control messages. Such messages can be used to instruct an application to play video/audio from external sources, such as YouTube.

Device Discovery

The next topic we should cover is Discovery Discovery. This is the process by which a Chromecast Client (e.g. your web browser or Android device) is able to find the Chromecast Receivers on your local network.

Device Discovery relies on a protocol called Multicast DNS, or mDNS for short. As you may guess from the name, Multicast DNS is a variation on the traditional DNS protocol that powers the internet.

With traditional DNS it is possible for a host to query other services on a network using human-readable domain names (e.g. google.com). For example, when you enter tristanpenman.com into your web browser, DNS is used to find the IP address of the web server that contains the relevant content.

A limitation of DNS is that it is centralised, in the sense that we must know the IP address of a DNS server ahead of time. The DNS server must also know about devices on the network ahead of time, via manual configuration. Luckily, there is a standard we can use to overcome this limitation… Multicast DNS.

Multicast DNS

Multicast DNS, or mDNS, handles service lookup in a decentralised fashion. Instead of directing queries to a particular server, they are broadcast on the local network. Broadcasts are sent to a multicast address, which means that they will be forwarded to all devices on the network. Any device is free to respond to the query.

To discover a Chromecast, a client will broadcast a query for devices that offer the _googlecast._tcp service. Chromecast devices on the local network will then broadcast a response describing their capabilities, and how to connect. This is generally described as ‘advertising’ a service. mDNS responses are broadcast to all devices on the local network, so it is possible for clients to passively discover new Chromecast devices, even they are not actively querying for them.

At startup, a Chromecast Receiver will advertise the existence of the _googlecast._tcp service, on port 8009. It will also respond to queries for the _googlecast._tcp so that new Chromecast Clients are able to find it when they are first started.

Once a Chromecast Client has discovered a Receiver, it is free to connect to it on port 8009. Connections are secured using TLS, which is an important detail, as it relates to the Device Authentication mechanism that is part of the Google Cast protocol.

The Protocol

The Google Cast protocol is the protocol that enables users to remotely control a Chromecast Receiver. It allows users to stream media content from their mobile device or computer, play online media, or to mirror their screen to the Receiver.

Although this protocol was developed specifically for Chromecast devices, it has since evolved to include a wider Google Cast ecosystem. These days it can also be used to control Chromecast-compatible devices such as smart TVs and speakers. One of the aims of this series is to show you how it can be used to implement a custom Chromecast Receiver using Golang.

Protobufs

The messages used by the Google Cast protocol are defined using Google’s Protocol Buffers data serialisation format, more commonly known as Protobuf. Protobuf takes a single file that describes the serialisation format for a protocol, and uses that to generate language-specific bindings that marshall and unmarshall data in that format.

At a high level, each Google Cast message includes the following fields:

  • Namespace
  • Source ID
  • Destination ID
  • Payload (can be binary or a UTF-8 string)

The payload can be binary or text (assumed to be JSON formatted). The namespace tells us how to interpret the contents of the payload. The source ID and destination ID are included because a single communication channel may carry messages that are intended for different Chromecast applications.

My main focus in this series is the ‘channel’ aspects of the Google Cast protocol. You can see how these are defined in Protobuf format in the file cast_channel.proto from the Chromium source code. Note that this file is now part of the Open Screen Library, which aims to implement the Open Screen Protocol, Multicast DNS, and the Google Cast protocol.

The Open Screen library is now embedded in Google Chrome, and by default, handles Chromecast Client functionality.

Namespaces

The namespaces used by Google Cast are defined as Uniform Resource Names (or URNs). These are the namespaces that we need to support to screen mirroring:

  • urn:x-cast:com.google.cast.tp.connection - Messages exchanged while establishing a connection
  • urn:x-cast:com.google.cast.tp.deviceauth - Messages exchanged as part of Device Authentication
  • urn:x-cast:com.google.cast.tp.heartbeat - Pings and pongs
  • urn:x-cast:com.google.cast.receiver - Receiver control message
  • urn:x-cast:com.google.cast.webrtc - Screen mirroring

There are many more namespaces, but screen mirroring depends only on those listed.

Transport IDs

The purpose of the Source ID and Destination ID fields is to route messages to an application (or session) that can handle them. For example, when a client first connects to a Chromecast (over TLS) it will send a device authentication message to receiver-0, which is a special destination used for device authentication and other Chromecast control messages.

Chromecast devices are also able to broadcast messages to interested clients. A client can subscribe to these messages by sending a CONNECT message (in the tp.connection namespace), containing its own unique Source ID (e.g. source-3991). Generally speaking, a client is expected to send a CONNECT message before sending any messages that are not in the tp.connection or tp.deviceauth namespaces.

Applications

A benefit of this transport mechanism is that it allow individual Chromecast applications to handle particular messages. This leads to effective separation of concerns, and allows various third-party applications to be supported. We’ll explore this further later in the series.

To start an application, a client will send a LAUNCH message to receiver-0, including an AppId corresponding to the application that should be started. LAUNCH messages are part of the receiver namespace. Once launched, each app has it’s own transport ID (e.g. pid-992), so that messages can be sent to it directly.

After launching an app, the receiver will broadcast the status of its running applications, and the client will be able to connect to the app, by sending a CONNECT message containing its transport ID. The sender and receiver app can then exchange messages freely.

Device Authentication

A key challenge in implementing a viable Chromecast Receiver is Device Authentication. This is the mechanism by which a Chromecast Client determines that a device is a genuine Chromecast, or an officially licensed Chromecast-compatible Receiver.

The way this works is that a Chromecast Client sends a Device Authentication challenge just after connecting to a Chromecast Receiver. The Receiver is expected to extract the challenge payload, sign it cryptographically using a private key that is unique to the Receiver device, thus producing a valid Device Authentication response.

Next Up

In the next post in this series, we’ll go further into how Chromecast Apps work. This will be released soon, and will be linked here once ready.

References

There are many resources available to learn more about the Google Cast protocol. These are just a few that I used while building my receiver implementation: