Introduction
When working on a mobile application security assessment we noted an unusual traffic flow. This was a DTLS handshake coming from a remote server to the mobile application listener. As we always pay close attention to transport security implementation in the applications we test, we were about to verify if certificates are properly validated in the observed case. However, it was not that simple.
The desire to test client-side TLS certificates validation led us to: development of custom tools, interception and digging through various types of network traffic, reading through RFC documents and even patching of binary files. Now we understand that we were facing DTLS handshake initiated to derive keys to secure media communications between parties, following WebRTC recommendations.
In this article we describe what we did, how we did that and what obstacles we had to overcome. This exercise helped us to formulate a methodology to test some aspects of WebRTC-capable applications from a security standpoint.
TL;DR
For readers who want a short summary of this article we suggest to jump to the conclusion section. In this case the most valuable part is the description itself as it provides a methodology applicable to similar cases. We also developed some tools which helped us during this assessment. However, they can only be applied to perform similar exercises and quite useless as a standalone software.
Twilio Platform
Preparing Custom Setup
We noted that the application we were initially testing uses Twilio platform to make voice calls. The Android build uses Twilio's Java module and corresponding native library. In order to avoid hitting our customer's infrastructure while performing our tests we decided to deploy our own solution. This way we can minimise the study surface and have full control over the software behaviour.
Twilio has great introductory articles: getting started for voice applications, getting started with Android client. They also allow to create a trial account for ~€10 which one can use to make voice calls between test applications. After completing "Getting started" guides your Twilio account will be properly configured, various tokens created, Google Firebase messaging configured. You will also have a working Android mobile application, a web server to route calls and to feed Twilio with your tokens. Same way you can get an iOS application, but in this article we will cover Android only.
Note that we were not assessing the whole Twilio solution, we were just interested in voice calls implementation. In that particular case it was enough to get an example Android application configured to use in our setup and install it on two devices. We used the suggested Android quickstart example with the following minor updates:
-
TWILIO_ACCESS_TOKEN_SERVER_URL
pointed to our web server (which was based on another example. -
identy
string was set to different values on each device we used for testing; -
app/google-services.json
file was updated after integration with Google Firebase service; - we enabled debug logging by adding
Voice.setLogLevel(LogLevel.DEBUG);
call toonCreate()
method ofapp/src/main/java/com/twilio/voice/quickstart/VoiceActivity.java
file.
As we had two mobile applications able to setup voice calls between each other, we started to investigate the network communications they produce.
Traffic Interception and Analysis
Information Gathering
We configured one of the test devices to be connected via WiFi access point setup on our test laptop. This way we observed all communications made by Twilio example application when issuing a phone call.
The application was contacting the following domains:
-
eventgw.twilio.com
-
ers.twilio.com
- our test web server holding Twilio tokens
-
global.stun.twilio.com
-
chunderm.gll.twilio.com
We noted that traffic towards eventgw.twilio.com
, ers.twilio.com
and access tokens server follows Android proxy rules. Thus, we observed the corresponding communication using our Burp proxy instance. Its certificate authority was installed on our test device in advance.
ers.twilio.com
was accessed using HTTPS to check and update Twilio registration token. The application informed Twilio about events happening on the device by sending HTTPS POST requests to ers.twilio.com
server. There was no interesting data in these communication channels.
TLS-protected packets were exchanged with chunderm.gll.twilio.com
server over TCP/443. This communication ignored global proxy settings.
Then the application sent STUN binding request to global.stun.twilio.com
over UDP/3478.
After more exchanges with chunderm.gll.twilio.com
the application started UDP STUN communication with a server in 3.122.181.0/24
subnet (approximately). The server IP address and port were different each time we started a call. After STUN registration and binding requests **in the same channel** the application received **incoming** DTLS Client Hello
message. You can see this on the following screenshot from Wireshark capture:
DTLS handshake was quickly established, no data packets were sent and traffic with RTP data continued to flow in the very same UDP channel.
Our objective was to verify the proper validation of the server's certificate by the client side (Twilio server in this case). In this scenario the role of the server was played by our mobile client.
Searching the Internet with traces we already collected revealed that what we observed is DTLS-SRTP protocol, defined in RFC5734. The idea here is to use DTLS protocol to derive key material to secure RTP data. DTLS is already defined and implemented, it authorises the remote party and secures sensitive data. When using key exchange algorithms based on elliptic curves, it works fast even on low powered devices. The corresponding packet exchange is described in RFC5764 as follows:
ClientHello + use_srtp -------->
ServerHello + use_srtp
Certificate*
ServerKeyExchange*
CertificateRequest*
<-------- ServerHelloDone
Certificate*
ClientKeyExchange
CertificateVerify*
[ChangeCipherSpec]
Finished -------->
[ChangeCipherSpec]
<-------- Finished
SRTP packets <-------> SRTP packets
The SRTP keys are derived with help of a specific DTLS protocol "usesrtp" extension. The details here are not important to us, except that these keys stay inside DTLS library implementation and not transferred as Application Data
. For this reason we will need a specific software to terminate such connections.
The security outcome here is that if DTLS certificates are not properly verified then a suitably positioned network-based attacker can obtain keys for encrypted media traffic.
The important remaining question is how such certificates can be verified? There is no particular "domain name" here. The client needs to rely on something else that we do not see yet. RFC5764 mentions this in the following passage:
protocol like SIP. When the signaling exchange is integrity-
protected (e.g., when SIP Identity protection via digital signatures
is used), DTLS-SRTP can leverage this integrity guarantee to provide
complete security of the media stream. A description of how to
indicate DTLS-SRTP sessions in SIP and SDP [RFC4566], and how to
authenticate the endpoints using fingerprints can be found in
[RFC5763].
Obscure, as it often happens with such documents. But it gives us a hint that we have to look into other channels. And we have such a candidate: chunderm.gll.twilio.com
. Nevertheless, we have to redirect DTLS handshake to our DTLS server keeping other protocols untouched. Let's solve this problem first.
DTLS Demultiplexing
From our observations and RFC5764 specification, DTLS-SRTP traffic is a mix of STUN, DTLS and RTP protocols, all in the same UDP channel. Thus, we can not just forward this traffic to some listener and handle only DTLS packets. This will break the application logic.
We need a proxy tool which normally just redirects traffic between two endpoints. When it catches DTLS packets it forwards them to our fake listener. The responses have to be delivered back to their intended destination. Thus, the remote side and the mobile application will exchange STUN and RTP packets as usual but DTLS channel will be handled by us.
Determining the packet type is simple, it is described in RFC5764:
| 127 < B < 192 -|--> forward to RTP
| |
packet --> | 19 < B < 64 -|--> forward to DTLS
| |
| B < 2 -|--> forward to STUN
+----------------+
The following describes the traffic flows our proxy has to handle:
| demux tool |
| |
mobile <--> +- STUN/RTP --*-+ <--> original
app | / | destination
| DTLS | |
| +--+ <--> fake DTLS
| | server
+---------------+
This was implemented in quick-and-dirty style using Go and can be grabbed from this repo.
This proxy requires the following parameters:
- Socket to listen on. This can be anything, we just redirect traffic to it using firewall.
- Fake DTLS server address. We fully control this parameter.
- Original destination. Well, here we have a problem. In our case it is different each time, so we have to figure out this parameter in runtime. This is covered in the next section.
Intercepting Traffic to chunderm.gll.twilio.com
From prior observations it became clear that essential information is transmitted inside TLS-protected data with chunderm.gll.twilio.com
server (port 443). Unfortunately for us, the application properly verified TLS certificates. We were not able to bypass it with Xposed modules or Frida with objection framework.
The application uses libtwilio_voice_android_so.so
native library which includes (but not linked to) resiprocate and boringssl sources. Thus, we can not just handle calls with Frida. In this situation we had to go the binary patching way.
We used Ghidra project for decompiling and patching. Binary patching is not well supported by Ghidra (yet?) but it is possible to overcome it with SavePatch external script.
We patched calls (five in total) to SSL_CTX_set_verify
function (address 0x0026786c
) which sets certificate verification mode in its second argument. To avoid verification this parameter has to be zero. You can see this in the following illustrations.
SSL_CTX_set_verify
, the entry #1, original:
SSL_CTX_set_verify
, the entry #1, patched
SSL_CTX_set_verify
, the entry #2, original
SSL_CTX_set_verify
, the entry #2, patched
SSL_CTX_set_verify
, the entry #3, original
SSL_CTX_set_verify
, the entry #3, patched
SSL_CTX_set_verify
, the entry #4, original
SSL_CTX_set_verify
, the entry #4, patched
SSL_CTX_set_verify
, the entry #5, original
SSL_CTX_set_verify
, the entry #5, patched
SSL_CTX_set_verify
(offset 0x0016163c
) also sets a callback which always has to return 1
as a result of successful verification.
Verification callback, original:
Verification callback, patched:
Even if verification succeeds, the library validates if domain name matches the expected one.
Common name verification, decompiled:
Common name verification, source code:
Common name verification, patched:
Common name verification, patched, decompiled:
On top of that, the library checks if common name ends with .twilio.com
. It is of course an addition made to the resiprocate
library. We left it untouched and just took this behaviour into account when configuring intercepting proxies.
Parent domain validation:
The final difference between the original file and the patched version:
85248c85248
< 0014cff0: 7a44 0121 0af1 3afc 6668 19a8 4946 7ff7 zD.!..:.fh..IF..
---
> 0014cff0: 7a44 0021 0af1 3afc 6668 19a8 4946 7ff7 zD.!..:.fh..IF..
85252c85252
< 0014d030: a068 dff8 3426 7a44 0121 0af1 17fc a668 .h..4&zD.!.....h
---
> 0014d030: a068 dff8 3426 7a44 0021 0af1 17fc a668 .h..4&zD.!.....h
86212c86212
< 00150c30: dff8 042a 4046 0121 7a44 06f1 17fe 0af1 ...*@F.!zD......
---
> 00150c30: dff8 042a 4046 0021 7a44 06f1 17fe 0af1 ...*@F.!zD......
86401c86401
< 00151800: 0830 1a90 4846 17f7 a6e9 5846 0df5 657d .0..HF....XF..e}
---
> 00151800: 0830 1a90 4846 17f7 a6e9 0120 0df5 657d .0..HF..... ..e}
122442c122442
< 001de490: c8f8 2c01 0968 0029 00f0 bc80 bb48 52ad ..,..h.).....HR.
---
> 001de490: c8f8 2c01 0968 0229 40f0 bc80 bb48 52ad ..,..h.)@....HR.
186554c186554
< 002d8b90: 334a 2046 0121 7a44 7ef7 68fe 2046 0421 3J F.!zD~.h. F.!
---
> 002d8b90: 334a 2046 0021 7a44 7ef7 68fe 2046 0421 3J F.!zD~.h. F.!
187441,187442c187441,187442
< 002dc300: edfb 38b3 95f8 6100 0321 0022 0028 08bf ..8...a..!.".(..
< 002dc310: 0121 2046 7bf7 aafa 1349 2046 0022 7944 .! F{....I F."yD
---
> 002dc300: edfb 38b3 95f8 6100 0021 0022 0028 08bf ..8...a..!.".(..
> 002dc310: 0021 2046 7bf7 aafa 1349 2046 0022 7944 .! F{....I F."yD
This was still not enough. The library limits the list of trusted certificate authorities to the following list:
subject= /C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert High Assurance EV Root CA
subject= /C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert Global Root CA
subject= /C=US/O=Amazon/CN=Amazon Root CA 4
subject= /C=US/O=Amazon/CN=Amazon Root CA 3
subject= /C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert Assured ID Root G3
subject= /C=US/O=Starfield Technologies, Inc./OU=Starfield Class 2 Certification Authority
subject= /C=US/O=thawte, Inc./OU=Certification Services Division/OU=(c) 2006 thawte, Inc. - For authorized use only/CN=thawte Primary Root CA
subject= /C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./CN=Starfield Services Root Certificate Authority - G2
subject= /C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert Global Root G3
subject= /C=US/O=thawte, Inc./OU=(c) 2007 thawte, Inc. - For authorized use only/CN=thawte Primary Root CA - G2
To overcome this we created our own CA which, when formatted as PEM, matches the length of one of the listed above.
We updated Amazon's root CA:
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
06:6c:9f:d2:96:35:86:9f:0a:0f:e5:86:78:f8:5b:26:bb:8a:37
Signature Algorithm: sha384WithRSAEncryption
Issuer: C = US, O = Amazon, CN = Amazon Root CA 2
Validity
Not Before: May 26 00:00:00 2015 GMT
Not After : May 26 00:00:00 2040 GMT
Subject: C = US, O = Amazon, CN = Amazon Root CA 2
Our certificate:
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 5957144336496509465 (0x52ac0736349ac219)
Signature Algorithm: sha256WithRSAEncryption
Issuer: O = xxxxxxx, CN = Gremwell
Validity
Not Before: Mar 13 15:45:00 2020 GMT
Not After : Mar 13 15:45:00 2030 GMT
Subject: O = xxxxxxx, CN = Gremwell
Subject Public Key Info:
We used organization
field to make certificates sizes match. Then we just replaced one PEM string with another in the binary.
Uploading the patched library to the mobile device:
Altering the existing library on the device (as superuser):
# cp /sdcard/libtwilio_voice_android_so.so.patched_cert /data/data/com.twilio.voice.quickstart/app_lib/libtwilio_voice_android_so.so
With such alterations we were able to intercept the data transmitted between the mobile client and chunderm.gll.twilio.com
.
Signalling Traffic
We are about to intercept the traffic flow between our test mobile application and chunderm.gll.twilio.com
host. As chunderm.gll.twilio.com
is load-balanced and deployed on Amazon services, it is better to fix its IP address to a single one. The easiest way to do this is to add the corresponding entry to /etc/hosts
and to verify that the test device is using this configuration:
Traffic redirection can be configured with iptables
:
When using DNAT to localhost do not forget to enable local routing:
The traffic we want to intercept is not a HTTPS one (as we quickly learned by trying to forward it to Burp proxy). Thus, we used a general-purpose proxy tool, viproxy. We used it like this:
--l-sslkey chunderm.gll.twilio.com.pem --r-ssl -f twilio_chunderm_1.log
Certificate chunderm.gll.twilio.com.crt
was generated specifically for this purpose and signed by CA that we previously added to the libtwilio_voice_android_so.so
library. Common name in this certificate was set to chunderm.gll.twilio.com
. Thus, the application trusted our proxy and we were finally able to get the traffic in clear-text.
The traffic that we observed contained SIP signaling. It was used to indicate initiated calls to the remote party, receive notifications regarding call status and negotiate media channels parameters. Exactly for this purpose the Twilio voice native library includes resiprocate source code. Let's briefly analyse the intercepted SIP messages.
Client (our test mobile app) sent the following SIP INVITE message:
Via: SIP/2.0/TLS 192.168.12.118;branch=z9hG4bK-524287-1---40ce49870111dfe0;rport
Max-Forwards: 70
Contact: <sip:VoiceSDK@192.168.12.118;ob;transport=tls>;+sip.instance="bd3b91dB35B543C6d83A09d877b0Dd5D"
To: <sip:chunderm.gll.twilio.com:443;transport=tls>
From: <sip:VoiceSDK@chunderm.gll.twilio.com>;tag=56541ce5
Call-ID: irMlCaSv9fso9ZI2vhCwOg..
CSeq: 1 INVITE
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE
Content-Type: application/sdp
Supported: outbound, path, gruu
User-Agent: VoiceSDK
X-Twilio-BridgeToken: eyJ6a....
X-Twilio-Client: %7B%22mob...
X-Twilio-ClientVersion: 5
Content-Length: 1131
v=0
o=- 6896153525930087223 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio
a=msid-semantic: WMS 4f3033cBBB87fac4f85efe4E4a8Ef2dd
m=audio 9 UDP/TLS/RTP/SAVPF 111 103 9 0 8 105 13 110 113 126
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:sdmi
a=ice-pwd:gw1kcxcee+d0P/MKRe2Bm5A7
a=ice-options:trickle
a=fingerprint:sha-256 1B:EA:BF:33:B8:11:26:6D:91:AD:1B:A0:16:FD:5D:60:59:33:F7:46:A3:BA:99:2A:1D:04:99:A6:F2:C6:2D:43
a=setup:actpass
...
a=ssrc:667633809 mslabel:4f3033cBBB87fac4f85efe4E4a8Ef2dd
a=ssrc:667633809 label:631b37accdEfD305ABcE805FdaDD25F5
Note the fingerprint
parameter: sha-256 1B:EA:BF:33:B8:11:26:6D:91:AD:1B:A0:16:FD:5D:60:59:33:F7:46:A3:BA:99:2A:1D:04:99:A6:F2:C6:2D:43
.
The server replied with the following:
CSeq: 1 INVITE
Call-ID: irMlCaSv9fso9ZI2vhCwOg..
From: <sip:VoiceSDK@chunderm.gll.twilio.com>;tag=56541ce5
To: <sip:chunderm.gll.twilio.com:443;transport=tls>;tag=98073708_6772d868_46d541a4-5a12-4872-8aa4-31b1ca3e294b
Via: SIP/2.0/TLS 192.168.12.118;received=78.23.55.64;branch=z9hG4bK-524287-1---40ce49870111dfe0;rport=37922
Record-Route: <sip:172.21.5.43:10193;r2=on;transport=udp;ftag=56541ce5;lr>
Record-Route: <sip:35.157.205.11:443;r2=on;transport=tls;ftag=56541ce5;lr>
Server: Twilio
Contact: <sip:172.21.10.120:10193>
Allow: INVITE,ACK,CANCEL,OPTIONS,BYE
Content-Type: application/sdp
X-Twilio-CallSid: CA2fad432f2da6576c4735f03af99592af
Content-Length: 1039
X-Twilio-EdgeHost: ec2-35-157-205-11.eu-central-1.compute.amazonaws.com
X-Twilio-EdgeRegion: de1
X-Twilio-Zone: EU_FRANKFURT
v=0
o=root 219971503 219971503 IN IP4 172.21.25.231
s=Twilio Media Gateway
c=IN IP4 3.122.181.240
t=0 0
a=group:BUNDLE audio
a=ice-lite
m=audio 17612 RTP/SAVPF 111 0 126
a=rtpmap:111 opus/48000/2
a=rtpmap:0 PCMU/8000
a=rtpmap:126 telephone-event/8000
a=fmtp:126 0-16
a=ptime:20
a=maxptime:20
a=ice-ufrag:7d27112b7d583e977e998884781408ef
a=ice-pwd:5cfdaed470f11afc443384ae3a39434c
a=candidate:H37ab5f0 1 UDP 2130706431 3.122.181.240 17612 typ host
a=end-of-candidates
a=connection:new
a=setup:active
a=fingerprint:sha-256 33:4E:FE:3C:76:F2:04:B4:18:FC:95:85:56:3C:1C:A7:B0:87:39:15:3D:07:42:45:85:40:6C:2C:77:A9:80:76
a=mid:audio
...
Note the parameters c=IN IP4 3.122.181.240
and m=audio 17612 RTP/SAVPF 111 0 126
. This sets DTLS-SRTP remote endpoint to 3.122.181.240:17612
.
There is another fingerprint
parameter in server's reply: sha-256 33:4E:FE:3C:76:F2:04:B4:18:FC:95:85:56:3C:1C:A7:B0:87:39:15:3D:07:42:45:85:40:6C:2C:77:A9:80:76
.
If we look into the corresponding traffic capture we will see the UDP channel established between our application and 3.122.181.240:17612
. The mobile app presented to the client the following certificate during DTLS handshake:
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
d3:83:6b:a2:6c:9e:c6:a3
Signature Algorithm: ecdsa-with-SHA256
Issuer: CN = WebRTC
Validity
Not Before: Mar 15 15:29:49 2020 GMT
Not After : Apr 15 15:29:49 2020 GMT
Subject: CN = WebRTC
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:d1:b2:19:57:ed:ec:17:70:05:02:0c:ea:4a:57:
2a:dc:f8:c2:c1:d2:83:b5:cf:38:dd:09:af:c3:b8:
d5:fe:e2:ac:c2:1e:e1:0f:d5:f2:b3:94:ba:e0:d5:
d0:a2:df:36:a3:f1:e7:a3:ca:c3:30:4c:8e:8b:78:
eb:b5:25:a9:2a
ASN1 OID: prime256v1
NIST CURVE: P-256
Signature Algorithm: ecdsa-with-SHA256
30:44:02:20:05:da:f7:2e:d4:01:a9:0f:dd:70:70:33:f5:1c:
8e:f2:2e:51:d6:71:c9:07:d0:ef:1c:e1:4e:76:b1:f0:1f:e1:
02:20:7a:cf:e0:49:a0:07:58:c6:b7:f5:8f:fe:2b:c3:91:ff:
17:ea:72:62:0b:f0:22:80:7b:09:8e:4c:8a:83:a8:11
$ openssl x509 -noout -fingerprint -sha256 -inform pem -in incoming1.crt
SHA256 Fingerprint=1B:EA:BF:33:B8:11:26:6D:91:AD:1B:A0:16:FD:5D:60:59:33:F7:46:A3:BA:99:2A:1D:04:99:A6:F2:C6:2D:43
This is exactly the same fingerprint that client has sent in the SIP INVITE message, that was described earlier.
We saw that SIP signalling exchange with chunderm.gll.twilio.com
negotiated DTLS certificates fingerprints. They can be used to validate the client. We can assume that client certificate can contain arbitrary information, it just has to have the fingerprint matching the one, that was transmitted in the signalling channel. However, this is a subject to verify.
Another parameter that was negotiated in this signalling channel is the remote endpoint to establish DTLS-SRTP channel:
...
m=audio 17612 RTP/SAVPF 111 0 126
In this case it is 3.122.181.240:17612
.
This format for transferring media parameters is called Session Description Protocol (SDP). As we later learned, it is one of the recommended approaches for negotiating media channels in WebRTC. The SDP Anatomy article describes the most common parameters.
One important outcome is that the remote party in this case is Twilio server. This means that it has full control over voice communications and can (in theory) intercept the traffic, decrypt it, record it and even modify. This is what can be found on Twilio website regarding this link:
accessed by Twilio.
Each Participant in a Peer-to-Peer Room negotiates a separate DTLS/SRTP
connection to every other participant. All media published to or subscribed from
the Room is sent over these secure connections, and is encrypted only at the
sender and decrypted only at the receiver.
Network Traversal Service TURN cannot decrypt media: TURN only routes the packet
between peers.
However, what we observed before could be just Twilio's "demo account" feature (as we used demo subscription). Indeed, Twilio server somehow has to insert "thank you for using Twilio demo account" voice message when making outgoing call.
Anyway, the idea of this exercise is to test resilience against man-in-the-middle attacker, not if Twilio servers sniff communications. Thus, the next step is to intercept DTLS-SRTP.
Intercepting DTLS-SRTP
Now we have our demultiplexing proxy tool ready and we know how to determine the remote endpoint it needs to connect to. We can automate it with "replace" scripts feature of viproxy tool. For that we launch it like this:
--l-sslkey chunderm.gll.twilio.com.pem --r-ssl \
--resp-replace replace_run-stunproxy.rb -f twilio_chunderm_2.log
Where replace_run-stunproxy.rb
contains the following:
# m=audio 19516 RTP/SAVPF 111 0 126
if self.index('c=IN IP4 ')
m = self.match('^c=IN IP4 (3\.122\.181\.[0-9]{1,3})')
ip = m[1]
m = self.match('^m=audio (\d+) RTP/SAVPF .+')
port = m[1].to_i
puts "DTLS-SRTP endpoint found (#{ip}:#{port}), launching proxy..."
system("dtls-srtp-demux -H #{ip} -P #{port} -D 127.0.0.1 -d 8443 -h 127.0.0.1 -p 6001 &")
:ok
end
We still need to forward DTLS-SRTP traffic from the mobile app to the intercepting proxy. This can be achieved with the following iptables
command:
We are almost ready, the remaining step is to prepare DTLS server.
Terminating DTLS with SRTP Extension
As we learned from RFC5764, DTLS handshake in DTLS-SRTP protocol relies on use_srtp
extension. Thus, we need a TLS library which supports it. We chose GnuTLS as it has a useful API and plenty of examples for many cases.
We used mini-dtls-srtp server as an example and combined it with the corresponding documentation. The source code is published as dtls-srtp-server.
DTLS server should present some certificate to the remote client. We assumed that there are no specific requirements for it (at the time of writing we did not have any evidences saying the opposite). Thus, we tried to make it close to the ones we observed with the validity period extended:
Data:
Version: 3 (0x2)
Serial Number: 5550018979492260217 (0x4d05a0db494fe979)
Signature Algorithm: ecdsa-with-SHA256
Issuer: CN = WebRTC
Validity
Not Before: Mar 11 15:14:00 2020 GMT
Not After : Mar 11 15:14:00 2030 GMT
Subject: CN = WebRTC
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:6d:49:e5:56:72:f8:f3:13:34:94:ae:a2:0e:11:
dc:a9:a1:6e:62:1e:8c:e6:59:80:f3:6d:c6:42:5c:
22:e6:c9:20:83:ba:f2:49:1d:18:ad:38:a6:5f:d1:
7a:2b:d9:03:08:9b:bf:ef:39:96:94:f8:b2:6f:fd:
44:15:61:2d:93
ASN1 OID: prime256v1
NIST CURVE: P-256
Signature Algorithm: ecdsa-with-SHA256
30:44:02:20:1e:5d:99:c5:9b:c9:9e:b1:ee:bb:64:fd:30:86:
c2:70:be:72:61:8e:fe:6d:bf:23:8d:da:87:91:e8:7e:c9:ad:
02:20:77:f6:27:ce:ec:87:8e:e6:28:ab:df:e7:13:70:12:d9:
8b:31:7c:84:3e:a5:37:88:f5:32:fd:1c:52:55:3f:7a
However, we noted that there is a requirement for private key used along with the certificate. It has to be generated with prime256v1
elliptic curve. This was a practical observation and the default curve used by GnuTLS.
For now we left the certificate and the corresponding private key hardcoded in our DTLS server. If we spot cases when client verifies certificate parameters other than fingerprint, we will add the corresponding test to our qsslcaudit tool.
Putting It All Together
As we have all the components ready, we can now test if Twilio server properly verifies client's DTLS certificate.
When our test application received the incoming voice call, our viproxy
instance intercepted several SIP messages and determined remote DTLS-SRTP endpoint. Knowing its address, the proxy launched DTLS-SRTP multiplexing tool. This tool transparently forwarded all packets, except for DTLS. The latter was redirected to our DTLS server which terminated the connection.
Communications diagram:
Surprisingly our DTLS server returned negotiated SRTP keys:
Client salt: 52537c2d1142be963ceeb89f3917
Server key: 7a0c83f7bc03832ca37ba97bd6e0e111
Server salt: 9d1a8e282f6e66a83a77e25512d0
Wireshark captured the following DTLS handshake:
As can be seen in the screenshot, there is no DTLS alert message. Did we successfully spoofed the client? (Un)fortunately, no: immediately after this handshake, the server sent SIP BYE message:
CSeq: 1 BYE
From: <sip:chunderm.gll.twilio.com:443;transport=tls>;tag=32934414_6772d868_c358fa1f-bffa-4081-8470-9fa6ff46176d
To: <sip:VoiceSDK@chunderm.gll.twilio.com>;tag=01ed5987
Call-ID: oc1h_oktbATMC4pHJ-kkpQ..
Max-Forwards: 69
User-Agent: Twilio
Via: SIP/2.0/TLS 35.157.205.11:443;branch=z9hG4bK6c4.566d9a05.0
Via: SIP/2.0/UDP 172.21.10.120:10193;branch=z9hG4bKc358fa1f-bffa-4081-8470-9fa6ff46176d_6772d868_440-16643670738841742609
X-Twilio-CallSid: CA2a489088afb6cb198c47fa45f038ff45
Content-Length: 0
It is the same message that was sent during normal voice call termination, no explicit error message provided. From user experience point of view, the application just stops the call.
The corresponding error message in the application's debug log (over ADB) is not verbose at all:
03-17 09:15:13.181 12647-12647/com.twilio.voice.quickstart E/VoiceActivity: Call Error: 31005, Connection error
Previously enabled debug logging did not help either. However, it produced quite a lot of output, including SIP signalling messages.
The "Call Error" is returned by the following hook in its Java source code:
setAudioFocus(false);
if (BuildConfig.playCustomRingback) {
SoundPoolManager.getInstance(VoiceActivity.this).stopRinging();
}
Log.d(TAG, "Connect failure");
String message = String.format(
Locale.US,
"Call Error: %d, %s",
error.getErrorCode(),
error.getMessage());
Log.e(TAG, message);
Snackbar.make(coordinatorLayout, message, Snackbar.LENGTH_LONG).show();
resetUI();
}
We did not identify the exact source code responsible for handling the case that emits onConnectFailure()
event.
In order to be confident that we did all things properly, we have to simulate the valid client's behaviour.
Spoofing DTLS Certificate Fingerprint
We configured our intercepting proxy to substitute mobile application's DTLS certificate fingerprint with the one that we use. This was accomplished by the following script for viproxy
tool:
# (we do not care if we substitute server's certificate too)
# a=fingerprint:sha-256 1C:02:25:E....
#
if self.index('fingerprint:sha-256')
fprint = "37:BE:BB:AA:0B:14:D5:0B:A5:A5:8D:A3:7C:25:00:E9:BE:FE:89:07:C0:35:66:9F:D0:54:20:BF:48:D4:E0:0F"
self.gsub!(/^a=fingerprint:sha-256 .*$/, "a=fingerprint:sha-256 #{fprint}")
puts "client's certificate fingerprint replaced"
# send packet further
:ok
end
The corresponding viproxy
commandline:
--l-sslkey chunderm.gll.twilio.com.pem --r-ssl \
--resp-replace replace_run-stunproxy.rb --req-replace replace_dtls-fingerprint.rb
Having such setup, we intercepted network traffic. Now the mobile client did not display any errors and we observed SRTP traffic:
There is no DTLS handshake in this excerpt, as we intercepted it before it reached the mobile client.
This means that our setup was correct. Twilio server indeed verifies the mobile client's DTLS certificate fingerprint.
Our DTLS server negotiated the following SRTP keys:
Client salt: 0cd646cd7ff704c309f77155f3dc
Server key: 11b6d02f9be14e79a248ecdce22fc2d0
Server salt: 99d5daf0d315869ec880d8b72ad8
We checked if we can decrypt the intercepted SRTP traffic with these keys. We tried to use ffmpeg project tools to avoid writing our own SRTP decrypt-assemble-play implementation.
Preparing the key in ffmpeg
format:
$ echo "11b6d02f9be14e79a248ecdce22fc2d099d5daf0d315869ec880d8b72ad8" | xxd -r -p | base64
EbbQL5vhTnmiSOzc4i/C0JnV2vDTFYaeyIDYtyrY
# server --> client key
$ echo "ef8122e36bdaeef5ed84723168ea86660cd646cd7ff704c309f77155f3dc" | xxd -r -p | base64
74Ei42va7vXthHIxaOqGZgzWRs1/9wTDCfdxVfPc
Running ffmpeg
:
We replayed SRTP traffic that was originally sent by server to our mobile application:
--portmap=50000-55000:8888 --enet-smac=00:....:9e --enet-dmac=10:...:20 --intf1=wlan0 incoming3_srtp.pcapng
For some reason, ffplay
did not produce any output or sound. However, it printed the following errors when we used the incorrect key:
ffplay version 4.2.2-alt1 Copyright (c) 2003-2019 the FFmpeg developers
...
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
HMAC mismatch 0.000 fd= 0 aq= 0KB vq= 0KB sq= 0B f=0/0
Last message repeated 1 times
HMAC mismatch 0.000 fd= 0 aq= 0KB vq= 0KB sq= 0B f=0/0
HMAC mismatch 0.000 fd= 0 aq= 0KB vq= 0KB sq= 0B f=0/0
HMAC mismatch 0.000 fd= 0 aq= 0KB vq= 0KB sq= 0B f=0/0
Last message repeated 1 times
This confirms that the keys we obtained can indeed be used to decrypt voice traffic.
Conclusion
With this exercise we learned a bit about DTLS-SRTP protocol (defined in RFC5764) and how it is used by Twilio platform for voice communications between Android mobile clients.
We were able to prepare the setup which allowed us to intercept various communication channels. The most important of them were the aforementioned DTLS-SRTP and SIP signalling channels. This required advanced usage of transparent TCP proxy to parse signalling data and development of following tools: DTLS server and SRTP-DTLS demultiplexor.
We confirmed that Twilio properly verifies the fingerprint of mobile client's DTLS certificate. Together with proper validation of SIP signalling channel security, this prevents man-in-the-middle attacks against voice communications.
The described activity encouraged us to test other products which use DTLS-SRTP protocol. This we describe in the next section.
Wire
After looking into Twilio platform described above, we decided to assess other applications which use DTLS-SRTP protocol. Our fist victim was Wire platform/application. This was a lucky choice as it was a pleasure to work with an open source client which is well designed and written in clean style.
When we initiated this assessment we began with tooling and approach developed when working on Twilio platform. However, it turned out that some alterations were required. As a result, the description you have read in the previous chapter was slightly rewritten to comply with the updated software.
Our objective is not to fully assess Wire application but to make our understanding of media communications better, improve testing approach and corresponding software. Thus, we focused on intercepting and analysing network traffic during phone calls, keeping the rest to other researchers.
Setup
The application was installed on two different Android devices from Google Play store. Two new accounts were created. For brevity, let's call one device and application app1
and another one app2
.
The application's version at the moment was 3.46.890
:
versionCode=890 targetSdk=28
versionName=3.46.890
Traffic Analysis
We captured the traffic the application produces when making the phone call. To capture DNS requests we started the capture process before the application launch.
We used the following setup:
-
app1
was running on the device with IP address192.168.12.118
-
app2
was running on another device having IP address192.168.0.221
- the first device was connected to
192.168.0.0/24
network via test laptop, serving as a gateway, with IP address192.168.12.1
If we exclude TLS-encrypted traffic, the application starts with STUN communications between two servers:
-
turn04.de.prod.wire.com
,116.203.131.31:3478
(both UDP and TCP) -
turn03.de.prod.wire.com
,116.203.137.142:3478
(both UDP and TCP)
Both STUN channels were used to determine peer address in peer-to-peer setup (probably, for redundancy). In this case the negotiated peers were 192.168.0.221:55831
and 192.168.12.118:33880
. The endpoints were negotiated in XOR-PEER-ADDRESS
STUN attribute. Note that these are the IP address of app1
and app2
. This allowed media traffic to be transmitted on the local network, without crossing the Internet.
After defining the peer channel, the applications establish DTLS-SRTP communication with each other over 192.168.0.221:55831 / 192.168.12.118:33880
. In this channel, followed by several STUN messages, app2
starts DTLS handshake with app1
:
Certificate presented by DTLS server:
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
d3:50:43:ba:f1:bd:2f:e4
Signature Algorithm: ecdsa-with-SHA256
Issuer: CN = WebRTC
Validity
Not Before: Mar 17 09:18:28 2020 GMT
Not After : Apr 17 09:18:28 2020 GMT
Subject: CN = WebRTC
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:d7:26:d1:fe:00:f2:28:f0:95:44:f4:b5:f6:eb:
62:95:49:66:e6:57:83:3a:a5:76:5c:c7:22:b9:93:
a5:f3:cc:79:f9:68:c6:62:30:3e:f6:3e:33:63:68:
fb:aa:ec:4c:2f:83:b4:85:1e:f0:78:19:72:fc:39:
9c:18:8d:73:3e
ASN1 OID: prime256v1
NIST CURVE: P-256
Signature Algorithm: ecdsa-with-SHA256
30:46:02:21:00:a6:33:62:fe:2e:32:63:3f:47:33:ec:2f:85:
0c:1e:94:f4:24:36:07:f1:70:d7:e9:01:7a:e5:d0:96:99:ed:
db:02:21:00:bc:de:98:a3:88:f6:a9:bf:55:75:a3:70:9c:5c:
27:f3:c2:25:ca:8f:64:a2:a7:10:47:35:59:90:63:a7:90:fb
We also noted that after initial handshake there were several data packets transmitted inside DTLS channel. Then SRTP flow starts with the first packet from app2
:
When we initiated the similar call once more after relaunching the application, we noted that initial STUN communications were established with the same servers. The DTLS certificate presented by app1
was different:
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
94:db:c9:6b:ef:06:88:d2
Signature Algorithm: ecdsa-with-SHA256
Issuer: CN = WebRTC
Validity
Not Before: Mar 17 10:21:47 2020 GMT
Not After : Apr 17 10:21:47 2020 GMT
Subject: CN = WebRTC
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:99:6b:45:17:c4:2c:b0:c3:82:95:c6:43:c0:8a:
65:74:e0:5e:75:c3:82:4b:2f:01:aa:e4:09:d5:67:
4e:b3:46:bf:83:da:8b:70:db:d2:79:8e:16:a8:13:
bc:89:29:36:60:e1:4b:a1:24:d0:93:83:fe:72:47:
f3:25:e3:5e:71
ASN1 OID: prime256v1
NIST CURVE: P-256
Signature Algorithm: ecdsa-with-SHA256
30:45:02:21:00:81:fa:e2:f7:57:18:cf:40:a7:0c:53:59:51:
63:fb:35:4f:81:19:2d:6f:ed:2b:bd:5a:38:84:83:ac:e3:7f:
42:02:20:21:3c:6c:20:bd:b9:ec:7f:09:4d:dd:df:5f:eb:1a:
65:4b:10:98:ca:88:34:77:6e:0b:1a:42:de:89:95:c2:74
TLS Certificates Validation
We performed client-side TLS implementation test using our qsslcaudit tool. We checked all communication channels that we noted:
- prod-nginz-https.wire.com
- prod-nginz-ssl.wire.com
- turn03.de.prod.wire.com
- turn04.de.prod.wire.com
The results were perfectly fine for prod-nginz-https.wire.com
and prod-nginz-ssl.wire.com
, see the following excerpt:
...
tests results summary table:
+----|------------------------------------|------------|-----------------------------+
| ## | Test Name | Result | Comment |
+----|------------------------------------|------------|-----------------------------+
| 1 | custom certificate trust | PASSED | |
| 2 | self-signed certificate for target | PASSED | |
| | domain trust | | |
| 3 | self-signed certificate for invali | PASSED | |
| | d domain trust | | |
| 4 | custom certificate for target doma | PASSED | |
| | in trust | | |
| 5 | custom certificate for invalid dom | PASSED | |
| | ain trust | | |
| 8 | SSLv2 protocol support | PASSED | |
| 9 | SSLv3 protocol support | PASSED | |
| 10 | SSLv3 protocol and EXPORT grade ci | PASSED | |
| | phers support | | |
| 11 | SSLv3 protocol and LOW grade ciphe | PASSED | |
| | rs support | | |
| 12 | SSLv3 protocol and MEDIUM grade ci | PASSED | |
| | phers support | | |
| 13 | TLS 1.0 protocol support | PASSED | |
| 14 | TLS 1.0 protocol and EXPORT grade | PASSED | |
| | ciphers support | | |
| 15 | TLS 1.0 protocol and LOW grade cip | PASSED | |
| | hers support | | |
| 16 | TLS 1.0 protocol and MEDIUM grade | PASSED | |
| | ciphers support | | |
| 17 | TLS 1.1 protocol and EXPORT grade | PASSED | |
| | ciphers support | | |
| 18 | TLS 1.1 protocol and LOW grade cip | PASSED | |
| | hers support | | |
| 19 | TLS 1.1 protocol and MEDIUM grade | PASSED | |
| | ciphers support | | |
| 20 | TLS 1.2 protocol and EXPORT grade | PASSED | |
| | ciphers support | | |
| 21 | TLS 1.2 protocol and LOW grade cip | PASSED | |
| | hers support | | |
| 22 | TLS 1.2 protocol and MEDIUM grade | PASSED | |
| | ciphers support | | |
+----|------------------------------------|------------|-----------------------------+
most likely all connections were established by the same client
the first connection details:
source host: 192.168.12.118
dtls?: false
ssl errors: The TLS/SSL connection has been closed The remote host closed the connection
ssl conn established?: true
socket errors ids: 1 1
received data, bytes: 317
transmitted data, bytes: 4231
protocol: TLSv1.2
accepted ciphers: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256:TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384:TLS_EMPTY_RENEGOTIATION_INFO_SCSV
SNI: prod-nginz-https.wire.com
ALPN: h2, http/1.1
qsslcaudit version: 0.8.1
However, the client trusts any certificate when connecting to TURN servers:
...
tests results summary table:
+----|------------------------------------|------------|-----------------------------+
| ## | Test Name | Result | Comment |
+----|------------------------------------|------------|-----------------------------+
| 1 | custom certificate trust | FAILED !!! | mitm possible |
| 2 | self-signed certificate for target | FAILED !!! | -//- |
| | domain trust | | |
| 3 | self-signed certificate for invali | FAILED !!! | -//- |
| | d domain trust | | |
| 4 | custom certificate for target doma | FAILED !!! | -//- |
| | in trust | | |
| 5 | custom certificate for invalid dom | FAILED !!! | -//- |
| | ain trust | | |
| 8 | SSLv2 protocol support | PASSED | |
| 9 | SSLv3 protocol support | PASSED | |
| 10 | SSLv3 protocol and EXPORT grade ci | PASSED | |
| | phers support | | |
| 11 | SSLv3 protocol and LOW grade ciphe | PASSED | |
| | rs support | | |
| 12 | SSLv3 protocol and MEDIUM grade ci | PASSED | |
| | phers support | | |
| 13 | TLS 1.0 protocol support | FAILED !!! | |
| 14 | TLS 1.0 protocol and EXPORT grade | PASSED | |
| | ciphers support | | |
| 15 | TLS 1.0 protocol and LOW grade cip | PASSED | |
| | hers support | | |
| 16 | TLS 1.0 protocol and MEDIUM grade | FAILED !!! | |
| | ciphers support | | |
| 17 | TLS 1.1 protocol and EXPORT grade | PASSED | |
| | ciphers support | | |
| 18 | TLS 1.1 protocol and LOW grade cip | PASSED | |
| | hers support | | |
| 19 | TLS 1.1 protocol and MEDIUM grade | FAILED !!! | |
| | ciphers support | | |
| 20 | TLS 1.2 protocol and EXPORT grade | PASSED | |
| | ciphers support | | |
| 21 | TLS 1.2 protocol and LOW grade cip | PASSED | |
| | hers support | | |
| 22 | TLS 1.2 protocol and MEDIUM grade | FAILED !!! | |
| | ciphers support | | |
+----|------------------------------------|------------|-----------------------------+
most likely all connections were established by the same client
the first connection details:
source host: 192.168.12.118
dtls?: false
ssl errors:
ssl conn established?: true
intercepted data:
qsslcaudit version: 0.8.1
It is a weird result, but we can not consider this as a security issue. The application transmits the same data simultaneously in plain text, using UDP and TCP channels. This communication channel is not used to transfer any sensitive data.
Media Flow Analysis
As was briefly mentioned previously, STUN protocol is used to negotiate peers. We have to parse the protocol in runtime and extract them. For this purpose we developed a stunpeersniff tool. This tool searches for XOR-PEER-ADDRESS
attribute and launches UDP demultiplexor with the corresponding parameters. This UDP demux tool was described earlier: it searches for DTLS packets and forwards them to our DTLS server instance.
We forwarded STUN traffic, that was sent by app1
, towards our TCP proxy with the following iptables
command:
iptables
command to forward DTLS-SRTP traffic to UDP demultiplexor:
STUN proxy command:
This setup intercepted the communications we need and forwarded DTLS handshake to our DTLS server.
DTLS server received Alert (Bad Certificate)
DTLS error immediately after providing certificate. This confirms that the remote party (another mobile client in our case) validates DTLS certificate. It is a more robust behaviour than we observed with Twilio: DTLS handshake does not reach the point to derive SRTP encryption keys.
We noted that if we redirect DTLS handshake into nowhere, the call establishes successfully and both parties can talk. The captured traffic reveals that SRTP data is transmitted using a TURN channel. This traffic goes between the mobile application and remote TURN server. To decode it with Wireshark we followed this advice.
You can see the result in the following illustration:
We tried to identify how (if_) DTLS handshake is now established. Apparently, the corresponding data is transmitted inside TURN channel in this case. You can see this in the following screenshot from Wireshark. The certificate presence is revealed by WebRTC
common name string and expiration date 200418
encoded as a string:
This data exchange continues using TURN channel data messages:
Wire application is based on Wire signaling library in terms of media communications. This library uses Google's WebRTC reference implementation to handle the communications we observed.
WebRTC library implements a single module that handles DTLS handshake and fingerprint verification (p2p/base/dtlstransport.cc
). The DTLS transport implementation does not depend on a particular communication channel and is fed with data by the rest of the library. This allows us to assume that the single DTLS validation test we performed earlier is enough to be sure that parties are properly verified in other cases too.
We also tried to figure out how signalling data is transmitted. WebRTC standard does not explicitly define how this should be implemented. According to src/econn/README.md
file of the AVS Wire library source code, SDP data is sent as SETUP
message via Backend
. Backend
in this case can be arbitrary communication channel. Mobile clients receive remote signalling via WebSockets protocol which transmits end-to-end encrypted messages. In upstream direction encrypted messages are sent as POST requests to prod-nginz-https.wire.com
. WebSockets message example:
"payload": [
{
"conversation": "fd2c8985-cb03-4a0c-9291-513a02355693",
"time": "2020-03-19T14:22:01.638Z",
"data": {
"text": "owABAaEAWCAFs5h5Cft0kXBPKdIWvtlIbuNUuU7vNLeO23zzAGM7ngJZDMYBp ... nNab4HfDHOZZVDNXDu+ymAmpsuKSW7Q=",
"sender": "cbc19847050b5b9d",
"recipient": "141e979399699446"
},
"from": "4707e7e4-25a4-4afe-ad4d-13eb565b7fab",
"type": "conversation.otr-message-add"
}
],
"transient": false,
"id": "0000aaa8-69ed-11ea-8075-22000a23ecad"
}
According to src/peerflow/peerflow.cpp
source of AVS Wire library, some messages use WebRTC data channel. Data channel interface is defined in WebRTC's api/datachannelinterface.{hh,cc}
. It is encapsulated in SCTP protocol which goes inside DTLS channel and which is transmitted over STUN/TURN messages.
Conclusion
Working with Wire allowed us to observe some WebRTC internals. Compared to Twilio platform, Wire uses vanilla WebRTC approach to establish peer-to-peer multimedia communications. Due to its open source nature we were able to check our observations against the implementation.
Overall security of media data transmitted by Wire mobile application follows WebRTC guidelines:
- RTP media data is secured as SRTP.
- Keys for SRTP are derived by DTLS handshake.
- DTLS handshake fails if peer fingerprint does not match the announced one.
- Peer fingerprint is transmitted as end-to-end encrypted data inside WebSocket, secured with TLS.
- Critical TLS servers certificates are properly validated by Android client.
In order to intercept Wire media traffic the same tools and firewall configuration is needed as with Twilio case. Additionally, we wrote a STUN sniffer tool stunpeersniff which is required to determine peers on the fly and configure DTLS-SRTP proxy accordingly.