All About HackingBlackhat Hacking ToolsFree CoursesHacking

VoIP Call Modification 2023

This article is about VoIP Call Modification.

Introduction with VoIP Call Modification:

This exercise describes telephone signal manipulation, a type of Man in The Middle attack to insert, modify, and delete a VoIP packet to alter the communication session.

The most common VoIP phone infrastructures are based on two protocols: RTP and SIP.

RTP (Real-time Transport Protocol) is a network protocol for encoding and delivering audio and video signals over IP networks; provides end-to-end network transmission capabilities suitable for real-time data transfer applications such as telephony and video conferencing.

SIP (Session Initiation Protocol) is a communication protocol used in both Enterprise and Provider environments for signaling and managing multimedia communication sessions: it can create, modify and terminate sessions with one or more participants, such as Internet telephony calls. Its operation is very similar to HTTP, but it is peer-to-peer with two message types: Request, a message sent from the client to the server, and Response, a message sent from the server to the client.

The request message provides the following main methods:

  • INVITE: starts a conversation;
  • ACK: Acknowledgment from the INVITE message;
  • BYE: Terminates a call session between two users;
  • REGISTER: used to register a location from a SIP user;
  • OPTIONS: allows a UA to query another UA or proxy for its capabilities;
  • CANCEL: used to cancel a pending INVITE request;

SIP Response messages are three-digit codes, for example HTTP:

  • 1xx for information
  • 2xx for successful
  • 3xx for redirection
  • 4xx for request failure
  • 5xx for server failure
  • 6xx for global failure

The SIP architecture has five logical core components:

  • A user agent (UA) is a client application or device that initiates and terminates SIP connections.
  • A proxy server is a component that receives SIP requests from various user agents and routes them to the appropriate next hop.
  • A redirect server is a server that generates redirect responses to received requests.
  • Registration server, the server that processes REGISTER requests to map SIP URIs to their current location.
  • The location server uses a forwarding server or proxy to find the possible location of the caller.

Laboratory infrastructure

In this lab, to perform and explain VoIP call modification attacks, we use a local network scenario with the following VMs:

  • Kali Linux, as an attacker with
  • IP address:
  • Windows Server 2012 R2, as User Agent “A” (UA) with IP address:
  • Windows 7, as User Agent “B” (UA) with IP address:
  • Trixbox as a VoIP server
  • with IP address:

The software tools needed for this exercise are:

  • Linphone application, a simple open source VoIP client.
  • Ettercap, a powerful suite for Man In The Middle attacks on LAN.
  • Wireshark, the most popular network sniffing tool.
  • Rtpinsertsound, a tool for inserting sound into a specified audio stream.
  • Rtpmixsound, a tool for mixing pre-recorded audio in real time with audio in a specified target audio stream.
  • Sox, a command line utility that can convert various audio file formats to other formats.

Step 0 – Set up VoIP configuration

To set the current VoIP configuration, we need to go to this path to manage the VoIP server through the web application. The default Trixbox web interface login is maint:password.

Related article:The Hacker Methodology 2023

Now we have to add the User Agent “A” and the User Agent “B” by typing PBX
menu, then PBX setting tab.

In this tab we have to select Extension button, then choose the Generic SIP Device.

Finally, we have to insert the User Extension with its Display Name and the Secret for this account.

In this exercise, we set User Agent “A” with User Extension to 1000 and 1001 for User Agent “B”.

All the steps shown, including the commands, are performed on the Kali VM (IP: to reproduce the attacker’s view.

Exercise 1: Eavesdropping Attack

VoIP Eavesdropping is a type of network attack that aims to eavesdrop on the communication session of other actors in an unauthorized way. An attacker can use this malicious activity to intercept and read content containing sensitive and confidential information.

Step 1 – ARP poisoning

This threat uses the concept of Man in the Middle, in which an attacker can read, insert, or modify messages between two communicating parties without either party knowing that the communication channel has been compromised by a third party. In a local network scenario, this attack can be done by poisoning the ARP cache with a fake MAC address.

To communicate over a LAN, it requires it to have a MAC address to route network packets correctly; the mapping of a MAC address to an IP address is managed by the Address Resolution Protocol and stored in an associated cache. This cache can be poisoned by spoofing and sending packets to the destination containing the spoofed address of the victim host.

To explain these concepts, we use the Ettercap tool, which contains many types of MITM attacks, such as ARP poisoning.

To run an ARP poisoning attack using Ettercap, we can run the following command:

$ ettercap –T –M arp:remote –i eth0 / /

// -T = text version

// -M = type of man in mid attack

// -i = interface

// / = user agent victim

// / = VoIP server

Step 2 – Start a VoIP call

In order to analyze the VoIP calls in the next steps, we need to initiate a call using our Linphone client VoIP application, from User Agent “A” (with Extension as 1000) to User Agent “B” (with Extension as 1001). First, we need to register our account on our lab’s VoIP server through the preferences panel found in the options menu and add our previously defined configuration: User extension, password and VoIP domain.

After opening the client in User agent “A”, we type the destination user Extension to call it and press Call:

Finally, we must answer the call on the other side Linphone client.

Step 3 – Packet sniffing

After running the ARP poisoning command, we can start sniffing the VoIP conversation using Wireshark. After running it with $ wireshark command, we need to select the network interface eth0 and click the start packet capture button to observe the traffic. After a few seconds, we can see the SIP and RTP packets as shown in the following image.

Step 4 – Analyze the chart

In order to analyze the packets in offline mode, we need to stop the packet capture using the activity-related button. VoIP packets can be analyzed to see the entire communication and understand the flow. To do this, we need to go to Wireshark’s Telephony menu and select the VoIP Calls tab and then click on the Flow feature.

Step 5 – Listen to the conversation

Using Wireshark we can also analyze the RTP packet as well as the session communication; this tool can compress packets, decode and reproduce the communication stream to listen to the entire conversation. To listen to the conversation, we need to click on the Telephony menu and select VoIP Call, then select the conversation and finally use the player button to serialize packets. Now we can replay the communication using Decode and Play commands.

Step 6 – Save the conversation

In this step, we continue to save the conversation and then reuse it in the future for another attack. Using Wireshark to save the conversation in audio format, we need to do the following path: Menu Telephony -> RTP -> Show all streams, then select the stream you want to save and select Analyze
-> Save payload in .au format.

Step 7 – Convert the audio conversation format

The audio file is saved in au format, but most players can’t read it, so we need to convert it to wav format. We can do this task using the “Swiss Army Knife” of audio processing programs called Sox, which will convert the audio of our file into the desired format. This tool is already present in the Debian repository, so we can install it in a Kali Linux VM using the following command:

$ apt-get install sox

After a quick install, we can convert our file to wav format using the following command:

$ sox -r 8000 -V sample.wav

// -r To change the sample rate to 8000

// -V Chatty

// is the audio input that represents the saved conversation

// sample.wav is the audio output in wav format

Exercise 2: Handling an RTP packet

Once we have successfully performed a Man in the Middle attack to eavesdrop on conversations, we can alter the communication flow by inserting or replacing RTP packets. With this attack conducted in the right way, it allows you to modify the conversation by entering different parts of the pre-recorded audio. The attack is successful because the RTP protocol is vulnerable to media manipulation, especially when used without encryption and using the UDP connectionless transport protocol.

The communication session between two VoIP endpoints is controlled by SSRC (Synchronization Source Identifier), sequence number and time stamp number. An attacker can intercept RTP packets and replicate them with the same SSRC and a higher sequence number and timestamp, forcing the target endpoint to drop the legitimate ones and intercept the attacker’s packets because they have a higher sequence number.

Step 8 – Insert the audio file

To demonstrate this scenario, we use the Rtpinsertsound tool, which allows you to insert and replicate RTP messages into the target audio stream by inserting the pre-recorded (in step 5) sample.wav file into the communication stream by entering the following commands:

$ rtpinsertsound –v –i eth0 –a –A 11198 –b –B 7078 –f 1 –j 50 sample.wav

// -v Verbose output

// -i Network interface

// -a Source IP address

// -A Source UDP port

// -b Destination IP address

// -f Spoof factor

// -j Jitter factor specifying when to transmit a packet as a percentage of the transmission interval of the target audio stream

The result of this manipulation is that during a VoIP call, the target user will receive a wav file instead of the actual message, which will be effectively muted, allowing the call to be modified to allow the victim to obtain the injected sample and drop legitimate packets for the duration of the sample.

Step 9 – Insert the audio mix file

Also used was a similar tool called Rtpmixsound, which allows pre-recorded audio to be mixed in real-time. We can use this tool by running the following commands:

$ rtpmixsound –i eth0 –a –A 14312 –b –B 7078 sample_mix.wav

// -i Network interface

// -a Source IP address

// -A Source UDP port

// -b Destination IP address

The difference with this tool is that the user on the target receiving end can hear the person on the target transmitting end continue talking while playing the fake pre-recorded audio.


This lab focused on the voice session modification attack and introduced Eavesdropping and RTP packet attacks.

To protect the VoIP infrastructure from this type of attack, it should implement an encryption protocol to encrypt the channel, such as SVoIP (Secure VoIP), which proposes to secure VOIP clients, or VOSIP (Voice Over Secure IP), which aims to secure VoIP. Sew. In this way, we can guarantee the confidentiality, integrity and availability of the communication session, even though it might lead to performance degradation. It also recommends using a VoIP-oriented firewall or IDS/IPS to monitor RTP traffic and detect or block audio embedding threats.


  • Hacking Exposed VoIP, Voice Over IP Security Secret & Solutions
  • RFC 3261 SIP: Session Initiation Protocol
  • RFC 3550 RTP: Transport Protocol for Real-Time Applications

Leave a Reply

Your email address will not be published. Required fields are marked *