Three related flaws were found in the Linux kernel’s handling of TCP networking. The most severe vulnerability could allow a remote attacker to trigger a kernel panic in systems running the affected software and, as a result, impact the system’s availability.
The first two are related to the Selective Acknowledgement (SACK) packets combined with Maximum Segment Size (MSS), the third solely with the Maximum Segment Size (MSS).
These issues are corrected either through applying mitigations or kernel patches. Mitigation details and links to RHSA advsories can be found on the RESOLVE tab of this article.
Issue Details and Background
Three related flaws were found in the Linux kernel’s handling of TCP Selective Acknowledgement (SACK) packets handling with low MSS size. The extent of impact is understood to be limited to denial of service at this time. No privilege escalation or information leak is currently suspected.
While mitigations shown in this article are available, they might affect traffic from legitimate sources that require the lower MSS values to transmit correctly and system performance. Please evaluate the mitigation that is appropriate for the system’s environment before applying.
What is a selective acknowledgement ?
TCP Selective Acknowledgment (SACK) is a mechanism where the data receiver can inform the sender about all the segments that have successfully been accepted. This allows the sender to retransmit segments of the stream that are missing from its ‘known good’ set. When TCP SACK is disabled a much larger set of retransmits are required to retransmit a complete stream.
What is MSS
The maximum segment size (MSS) is a parameter set in the TCP header of a packet that specifies the total amount of data contained in a reconstructed TCP segment.
As packets might become fragmented when transmitting across different routes, a host must specify the MSS as equal to the largest IP datagram payload size that a host can handle. Very large MSS sizes might mean that a stream of packets ends up fragmented on their way to the destination, whereas smaller packets can ensure less fragmentation but end up with unused overhead.
Operating systems and transport types can default to specified MSS sizes. Attackers with privileged access can create raw packets with crafted MSS options in the packet to create this attack.
TCP is a connection oriented protocol. When two parties wish to communicate over a TCP connection, they establish a connection by exchanging certain information such as requesting to initiate (SYN) a connection, initial sequence number, acknowledgement number, maximum segment size (MSS) to use over this connection, permission to send and process Selective Acknowledgements(SACKs), etc. This connection establishment process is known as 3-way handshake.
TCP sends and receives user data by a unit called Segment. A TCP segment consists of TCP Header, Options and user data.
Each TCP segment has a Sequence Number (SEQ) and Acknowledgement Number (ACK).
These SEQ & ACK numbers are used to track which segments are successfully received by the receiver. ACK number indicates the next expected segment by the receiver.
Example: user ‘A’ above sends 1 kilobytes of data through 13 segments of 100 bytes each, 13 because each segment has TCP header of 20 bytes. On the receiving end, user ‘B’ receives segments 1, 2, 4, 6, 8 – 13, segments 3, 5 and 7 are lost, not received by user ‘B’.
By using ACK numbers, user ‘B’ will indicate that it is expecting segment number 3, which the user ‘A’ reads as none of the segments after 2 were received by the user ‘B’,and user ‘A’ will retransmit all the segments from 3 onwards, even though segments 4, 6 and 8-13 were successfully received by user ‘B’. User ‘B’ has no way to indicate that to user ‘A’. This leads to an inefficient usage of the network.
Selective Acknowledgement: SACK
To overcome above problem, Selective Acknowledgement(SACK) mechanism was devised and defined by RFC-2018. With Selective Acknowledgement(SACK), user ‘B’ above uses its TCP options field to inform user ‘A’ about all the segments(1,2,4,6,8-13) it has received successfully, so user ‘A’ needs to retransmit only segments 3, 5, and 7, thus considerably saving the network bandwidth and avoiding further congestion.
CVE-2019-11477 SACK Panic:
Socket Buffer (SKB) is the most central data structure used in the Linux TCP/IP implementation. It is a linked list of buffers, which holds network packets. Such list can act as a Transmission queue, Receive queue, SACK’d queue, Retransmission queue, etc. SKB can hold packet data into fragments. Linux SKB can hold up to 17 fragments.
define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1) => 17
With each fragment holding up to 32KB on x86 (64KB on PowerPC) of data. When packet is due to be sent, it’s placed on the Send queue and it’s details are kept in a control buffer structure like
__u32 seq; /* Starting sequence number */
__u32 end_seq; /* SEQ + FIN + SYN + datalen */
__u8 tcp_flags; /2* TCP header flags. (tcp) */
Of these, ‘tcp_gso_segs’ and ‘tcp_gso_size’ fields are used to tell device driver about segmentation offload.
When Segmentation offload is on and SACK mechanism is also enabled, due to packet loss and selective retransmission of some packets, SKB could end up holding multiple packets, counted by ‘tcp_gso_segs’. Multiple such SKB in the list are merged together into one to efficiently process different SACK blocks. It involves moving data from one SKB to another in the list. During this movement of data, the SKB structure can reach its maximum limit of 17 fragments and ‘tcp_gso_segs’ parameter can overflow and hit the BUG_ON() call below resulting in the said kernel panic issue.
static bool tcp_shifted_skb (struct sock *sk, …, unsigned int pcount, …)
BUG_ON(tcp_skb_pcount(skb) < pcount); <= SACK panic
A remote user can trigger this issue by setting the Maximum Segment Size(MSS) of a TCP connection to its lowest limit of 48 bytes and sending a sequence of specially crafted SACK packets. Lowest MSS leaves merely 8 bytes of data per segment, thus increasing the number of TCP segments required to send all data.
Jonathan Looney (Netflix Information Security)