Linux系统traffic control流量控制

tc命令的使用细节,可以参考脚本后面的内容,这些内容提取自tc命令的说明手册,侧重于从整体上说明tc命令的用法和脚本中所涉及的qdisc, classfilter的概念/描述;若要深入理解,可参考tc-iproute2帮助手册

脚本使用说明

可以通过--help, -h参数来获取脚本帮助:

1
2
3
4
5
6
7
8
9
10
11
tao@S20:~/tao$ sudo sh traffic_control.sh -h
---------traffic_control.sh usage:-------------
this script support follow paramters:
--addr/-a: the ip address, separate by ','
--clean/-c: boolean, remove the traffic control setting
--devices/-d: the network interfaces, separate by ','
--delay/-e: int, the packet delay, default 900, (900ms)
--loss/-l: int, the packet loss, default 50, (50%)
--rate/-r: int, the traffic rate, default 100, (100kbps)
-------------------------------------------
tao@S20:~/tao$

此脚本默认是在接口(--devices参数,支持指定多个接口)层面对报文的速率、延迟和丢包率进行限制;当指定ip(--addr参数)时,只会对具体ip的流量进行限制,即不会在接口层面对整个流量进行限制。
--clean-c参数用来清除指定接口上的限制。

示例

在进行流量控制之前,pingcurl命令的输出如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
tao@S20:~/tao$ ping baidu.com
PING baidu.com (39.156.66.10) 56(84) bytes of data.
64 bytes from 39.156.66.10: icmp_seq=1 ttl=52 time=26.3 ms
64 bytes from 39.156.66.10: icmp_seq=2 ttl=52 time=26.1 ms
64 bytes from 39.156.66.10: icmp_seq=3 ttl=52 time=26.0 ms
^C64 bytes from 39.156.66.10: icmp_seq=4 ttl=52 time=26.3 ms

--- baidu.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 10665ms
rtt min/avg/max/mdev = 26.045/26.178/26.347/0.125 ms
tao@S20:~/tao$
tao@S20:~/tao$ curl -u 'tao_test:******' -X GET http://1.*.*.*/TAO_other/temp/29105342_b.zip -o 29105342.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2803M 100 2803M 0 0 109M 0 0:00:25 0:00:25 --:--:-- 111M
tao@S20:~/tao$

可以看到上面的ping的延迟为26ms,没有丢包;curl命令的下载速度为:111MB/s。
现在运行脚本来对enp99s0接口的流量进行限制,设置速率为300KB/s、延迟为:200ms、丢包率为:20%

1
2
3
4
5
6
7
8
9
10
11
tao@S20:~/tao$ sudo sh traffic_control.sh --devices enp99s0 --delay 200 --rate 300 --loss 20
[INFO] >>> devices: 'enp99s0'
[INFO] >>> delay: '200ms'
[INFO] >>> loss: '20%'
[INFO] >>> rate: '300kbps'
[INFO] >>> addr: 'all'
[INFO] >>> Executing: traffic_control
[INFO] >>> start to configure device 'enp99s0'
[INFO] >>> device 'enp99s0' configure complete
[INFO] >>> all devices are configured and ready for testing
tao@S20:~/tao$

再次用pingcurl进行测试,可以看到速率、丢包率和延迟都达到了预期效果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
tao@S20:~/tao$ ping baidu.com
PING baidu.com (110.242.68.66) 56(84) bytes of data.
64 bytes from 110.242.68.66 (110.242.68.66): icmp_seq=1 ttl=49 time=230 ms
64 bytes from 110.242.68.66 (110.242.68.66): icmp_seq=3 ttl=49 time=230 ms
64 bytes from 110.242.68.66 (110.242.68.66): icmp_seq=5 ttl=49 time=230 ms
64 bytes from 110.242.68.66 (110.242.68.66): icmp_seq=6 ttl=49 time=230 ms
64 bytes from 110.242.68.66 (110.242.68.66): icmp_seq=8 ttl=49 time=230 ms
64 bytes from 110.242.68.66 (110.242.68.66): icmp_seq=9 ttl=49 time=230 ms
64 bytes from 110.242.68.66: icmp_seq=10 ttl=49 time=230 ms
^C64 bytes from 110.242.68.66: icmp_seq=11 ttl=49 time=230 ms

--- baidu.com ping statistics ---
11 packets transmitted, 8 received, 27.2727% packet loss, time 23961ms
rtt min/avg/max/mdev = 229.778/229.910/230.175/0.138 ms
tao@S20:~/tao$
tao@S20:~/tao$ curl -u 'tao_test:******' -X GET http://1.*.*.*/TAO_other/temp/29105342_b.zip -o 29105342.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 2803M 0 7887k 0 0 285k 0 2:47:29 0:00:27 2:47:02 303k^C
tao@S20:~/tao$

最后,清除上面所作的限制:

1
2
3
4
5
6
tao@S20:~/tao$ sudo sh traffic_control.sh --devices enp99s0 --clean
[INFO] >>> devices: 'enp99s0'
[INFO] >>> cleanup traffic control setting...
[INFO] >>> device 'enp99s0' traffic control config removed
[INFO] >>> cleanup done
tao@S20:~/tao$

脚本

traffic_control.sh脚本如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
#!/usr/bin/env sh
set -e


die() {
echo "[ERROR] >>> $*" >&2
exit 1
}

einfo() {
echo "[INFO] >>> $*" >&2
}

ewarn() {
echo "[WARNING] >>> $*" >&2
}

run() {
# It's nice to be able to print some commands without
# enabling XTRACE for all the things.
einfo "Executing: $*"
"$@"
}

traffic_control() {
for eth in $(echo "${devices}" | tr ',' ' '); do
if ! ip a s | grep -q "${eth}"; then
ewarn "device '${eth}' not found, skip!"
continue
fi
einfo "start to configure device '${eth}'"
configure_device "${eth}"
einfo "device '${eth}' configure complete"
done
einfo "all devices are configured and ready for testing"
# while true; do
# sleep 13s
# done
}

edebug() {
if [[ -n "${DEBUG}" ]]; then
echo "[DEBUG] >>> $*" >&2
fi
}

help() {
echo "---------$(basename $0) usage:-------------"
echo "this script support follow paramters:"
echo " --addr/-a: the ip address, separate by ','"
echo " --clean/-c: boolean, remove the traffic control setting"
echo " --devices/-d: the network interfaces, separate by ','"
echo " --delay/-e: int, the packet delay, default 900, (900ms)"
echo " --loss/-l: int, the packet loss, default 50, (50%)"
echo " --rate/-r: int, the traffic rate, default 100, (100kbps)"
echo "-------------------------------------------"
}

verify_arguments() {
if [ -z "${devices}" ]; then
help
die "the 'devices' is required!"
fi
einfo "devices: '${devices}'"
if "${clean}"; then
cleanup
exit 0
fi
if [ -z "${delay}" ]; then
delay=900
fi
einfo "delay: '${delay}ms'"
if [ -z "${loss}" ]; then
loss=50
fi
einfo "loss: '${loss}%'"
if [ -z "${rate}" ]; then
rate=100
fi
einfo "rate: '${rate}kbps'"
if [ -z "${addr}" ]; then
addr="all"
fi
einfo "addr: '${addr}'"
}

configure_device() {
local device="${1}"

tc qdisc add dev "${device}" root handle ab: htb default 10
tc class add dev "${device}" parent ab: \
classid ab:10 htb rate "${rate}kbps" \
burst "${rate}k"
tc qdisc add dev "${device}" parent ab:10 \
netem loss "${loss}%" delay "${delay}ms"
if [ -n "${addr}" ]; then
configure_ip "${device}"
fi
config_ingress_traffic "${device}"
}

config_ingress_traffic() {
local device="${1}"

tc qdisc add dev "${device}" handle ffff: ingress

for ip_addr in $(echo "${addr}" | tr ',' ' '); do
if ! check_ip_addr "${ip_addr}"; then
continue
fi
tc filter add dev "${device}" parent ffff: \
protocol ip prio 3 \
u32 match ip src "${ip_addr}" \
police rate "${rate}kbps" burst "${rate}k" drop flowid :10
done
}

get_arguments() {
clean="false"
while [ $# -ge 1 ]; do
case "${1}" in
--devices|-d)
devices="${2}"
shift 2
;;
--delay|-e)
delay="${2}"
shift 2
;;
--rate|-r)
rate="${2}"
shift 2
;;
--addr|-a)
addr="${2}"
shift 2
;;
--loss|-l)
loss="${2}"
shift 2
;;
--help|-h)
help
exit 0
;;
--clean|-c)
clean="true"
shift
;;
*)
ewarn "unknown option: ${1}! use '--help' for help info"
shift 1
;;
esac
done
verify_arguments
}

configure_ip() {
local device="${1}"

for ip_addr in $(echo "${addr}" | tr ',' ' '); do
if ! check_ip_addr "${ip_addr}"; then
continue
fi
tc filter add dev "${device}" \
protocol ip \
parent ab:0 prio 1 \
u32 match ip dst "${ip_addr}" flowid ab:10

tc filter add dev "${device}" \
protocol ip \
parent ab:0 prio 1 \
u32 match ip src "${ip_addr}" flowid ab:10
done
}

check_permission() {
local user_id
user_id="$(id -u)"
if [ "${user_id}" -ne "0" ]; then
die "please use root permission to run this script!"
fi
}

cleanup() {
einfo "cleanup traffic control setting..."
for eth in $(echo "${devices}" | tr ',' ' '); do
if ! ip a s | grep -q "${eth}"; then
continue
fi
if ! tc qdisc show dev "${eth}" | grep -q "qdisc htb ab:"; then
continue
fi
tc qdisc del dev "${eth}" root handle ab:
tc qdisc del dev "${eth}" handle ffff: ingress
einfo "device '${eth}' traffic control config removed"
done
einfo "cleanup done"
}

check_ip_addr() {
local ip_addr="${1}"

if [ "${ip_addr}" != "all" ]; then
if ! echo "${ip_addr}" | grep -Eq \
"^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$"; then
ewarn "addr '${ip_addr}' format error, skip!"
return 1
fi
fi
return 0
}


if [ "${1}" != "--source-only" ]; then
check_permission
get_arguments "${@}"
# trap cleanup EXIT
run traffic_control
fi

TC 命令简介

tc命令主要是用来查看/操作内核中的traffic control设置。

TC DESCRIPTION

Traffic Control 流量控制由以下概念组成:

  • SHAPING
    When traffic is shaped, its rate of transmission is under control. Shaping may be more than lowering the available bandwidth - it is also used to smooth out bursts in traffic for better network behaviour. Shaping occurs on egress.
  • SCHEDULING
    By scheduling the transmission of packets it is possible to improve interactivity for traffic that needs it while still guaranteeing bandwidth to bulk容量,体积,大多数,大部分 transfers. Reordering is also called prioritizing, and happens only on egress.
  • POLICING
    Whereas shaping deals with transmission of traffic, policing pertains to traffic arriving. Policing thus occurs on ingress.
  • DROPPING
    Traffic exceeding超过 a set bandwidth may also be dropped forthwith, both on ingress and on egress.

Processing of traffic is controlled by three kinds of objects: qdiscs, classes and filters.

  • QDISCS
    qdisc is short for ‘queueing discipline’ and it is elementary基础,初级 to understanding traffic control. Whenever the kernel needs to send a packet to an interface, it is enqueued to the qdisc configured for that interface. Immediately afterwards, the kernel tries to get as many packets as possible from the qdisc, for giving them to the network adaptor driver.
  • CLASSES
    Some qdiscs can contain classes, which contain further qdiscs - traffic may then be enqueued in any of the inner qdiscs, which are within the classes. When the kernel tries to dequeue a packet from such a classful qdisc it can come from any of the classes.A qdisc may for example prioritize certain kinds of traffic by trying to dequeue from certain classes before others.
  • FILTERS
    A filter is used by a classful qdisc to determine in which class a packet will be enqueued. Whenever traffic arrives at a class with subclasses, it needs to be classified. Various methods may be employed to do so, one of these are the filters. All filters attached to the class are called, until one of them returns with a verdict判决. It is important to notice that filters reside within qdiscs - they are not masters of what happens.

    The available filters are: basic, bpf, cgroup, route, u32等等,本文用到的是u32:
    u32: Generic filtering on arbitrary packet data. The Universal/Ugly 32bit filter allows one to match arbitrary bitfields in the packet. Due to breaking everything down to values, masks and offsets, It is equally powerful and hard to use. Luckily many abstracting directives are present which allow defining rules on a higher level and therefore free the user from having to fiddle不停摆弄 with bits and masks in many cases.

QEVENTS

Qdiscs may invoke user-configured actions when certain interesting events take place in the qdisc. Each qevent can either be unused, or can have a block attached to it. To this block are then attached filters using the “tc block BLOCK_IDX” syntax. The block is executed when the qevent associated with the attachment point takes place.

1
2
3
tc qdisc add dev eth0 root handle 1: red limit 500K avpkt 1K \
qevent early_drop block 10
tc filter add block 10 matchall action mirred egress mirror dev eth1

CLASSLESS QDISCS

In the absence of classful qdiscs, classless qdiscs can only be attached at the root of a device. Full syntax:

1
2
tc qdisc add dev DEV root QDISC_NAME QDISC-PARAMETERS
tc qdisc del dev DEV root

The pfifo_fast qdisc is the automatic default in the absence of a configured qdisc. The classless qdiscs are: ingress, tbf,netem,[p|b]fifo,red等等

  • [p|b]fifo
    Simplest usable qdisc, pure First In, First Out behaviour. Limited in packets or in bytes.
  • ingress
    This is a special qdisc as it applies to incoming traffic on an interface, allowing for it to be filtered and policed.
  • red
    Random Early Detection simulates physical congestion拥塞 by randomly dropping packets when nearing configured bandwidth allocation分配. Well suited to very large bandwidth applications.主要用于出口网络

CLASSFUL QDISCS

The classful qdiscs are: HTB,PRIO,HFSC等等
HTB: The Hierarchy Token Bucket implements a rich linksharing hierarchy of classes with an emphasis用来强调某个观点或主题 on conforming相一致、相符合 to existing practices. HTB facilitates guaranteeing bandwidth to classes, while also allowing specification of upper limits to inter-class sharing. It contains shaping elements, based on TBF and can prioritize classes.

THEORY理论 OF OPERATION

Classes form a tree, where each class has a single parent. A class may have multiple children. Some qdiscs allow for runtime addition of classes (HTB) while others (PRIO) are created with a static number of children.
Qdiscs which allow dynamic addition of classes can have zero or more subclasses to which traffic may be enqueued.
Furthermore, each class contains a leaf qdisc which by default has pfifo behaviour, although another qdisc can be attached in place. This qdisc may again contain classes, but each class can have only one leaf qdisc.
When a packet enters a classful qdisc it can be classified to one of the classes within.
Each node within the tree can have its own filters but higher level filters may also point directly to lower classes.
If classification did not succeed, packets are enqueued to the leaf qdisc attached to that class. Check qdisc specific manpages for details, however.

NAMING

All qdiscs, classes and filters have IDs, which can either be specified or be automatically assigned.
IDs consist of major number and a minor number, separated by a colon - major:minor. Both major and minor are hexadecimal numbers and are limited to 16 bits. There are two special values: root is signified by major and minor of all ones, and unspecified is all zeros.

  • QDISCS
    A qdisc, which potentially can have children, gets assigned a major number, called a ‘handle’, leaving the minor number namespace available for classes. The handle is expressed as ‘10:’. It is customary to explicitly assign a handle to qdiscs expected to have children.
  • CLASSES
    Classes residing under a qdisc share their qdisc major number, but each have a separate minor number called a ‘classid’ that has no relation to their parent classes, only to their parent qdisc. The same naming custom as for qdiscs applies.
  • FILTERS
    Filters have a three part ID, which is only needed when using a hashed filter hierarchy.

TC COMMANDS

The following commands are available for qdiscs, classes and filter:

  • add
    Add a qdisc, class or filter to a node. For all entities, a parent must be passed, either by passing its ID or by attaching directly to the root of a device. When creating a qdisc or a filter, it can be named with the handle parameter. A class is named with the classid parameter.
  • delete
    A qdisc can be deleted by specifying its handle, which may also be ‘root’. All subclasses and their leaf qdiscs are automatically deleted, as well as any filters attached to them.
  • show
    Displays all filters attached to the given interface. A valid parent ID must be passed.

Linux系统traffic control流量控制
https://www.tao-wt.fun/linux/traffic_control/
作者
tao-wt@qq.com
发布于
2024年2月2日
许可协议