Currently, tc_calc_xmittime and tc_calc_xmitsize round from double to
int three times — once when they call tc_core_time2tick /
tc_core_tick2time (whose argument is int), once when those functions
return (their return value is int), and then finally when the tc_calc_*
functions return. This leads to extremely granular and inaccurate
conversions.
As a result, for example, on my test system (where tick_in_usec=15.625,
clock_factor=1, and hz=1000000000) for a bitrate of 1Gbps, all tc htb
burst values between 0 and 999 bytes get encoded as 0 ticks; all values
between 1000 and 1999 bytes get encoded as 15 ticks (equivalent to 960
bytes); all values between 2000 and 2999 bytes as 31 ticks (1984 bytes);
etc.
The patch changes the code so these calculations are done internally in
floating-point, and only rounded to integer values when the value is
returned. It also changes tc_calc_xmittime to round its calculated value
up, rather than down, to ensure that the calculated time is actually
sufficient for the requested size.
Signed-off-by: Jonathan Lennox <jonathan.lennox@8x8.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The netlink nest that carriers tc action statistics looks as follows:
[TCA_ACT_STATS]
[TCA_STATS_BASIC]
[TCA_STATS_BASIC_HW]
Where 'TCA_STATS_BASIC' carries the combined software and hardware
packets (32-bits) and bytes (64-bit) counters and 'TCA_STATS_BASIC_HW'
carries the hardware statistics.
When the number of packets exceeds 0xffffffff, the kernel emits the
'TCA_STATS_PKT64' attribute:
[TCA_ACT_STATS]
[TCA_STATS_BASIC]
[TCA_STATS_PKT64]
[TCA_STATS_BASIC_HW]
[TCA_STATS_PKT64]
This layout is not ideal as the only way for user space to know what
each 'TCA_STATS_PKT64' attribute carries is to check which attribute
precedes it, which is exactly what some applications are doing [1].
Do the same in iproute2 so that users with existing kernels could read
the 64-bit hardware packets counter of tc actions instead of reading the
truncated 32-bit counter.
Before:
$ tc -s filter show dev swp2 ingress
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
skip_sw
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device swp1) stolen
index 1 ref 1 bind 1 installed 47 sec used 23 sec
Action statistics:
Sent 368689092544 bytes 5760767071 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 368689092544 bytes 1465799775 pkt
backlog 0b 0p requeues 0
used_hw_stats immediate
Where 5760767071 - 1465799775 = 0x100000000
After:
$ tc -s filter show dev swp2 ingress
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
skip_sw
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device swp1) stolen
index 1 ref 1 bind 1 installed 71 sec used 47 sec
Action statistics:
Sent 368689092544 bytes 5760767071 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 368689092544 bytes 5760767071 pkt
backlog 0b 0p requeues 0
used_hw_stats immediate
[1] 006e1c6dbf
Reported-by: Joe Botha <joe@atomic.ac>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
In linux-6.13, we added the ability to offload pacing on
capable devices.
tc qdisc add ... fq ... offload_horizon 100ms
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The flower tc parser was using XATTR_SIZE_MAX from linux/limits.h,
but this constant is intended to before extended filesystem attributes
not for TC. Replace it with a local define.
This fixes issue on systems with musl and XATTR_SIZE_MAX is not
defined in limits.h there.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This commit enhances the q_taprio module by adding support for the
Hold/Release mechanism in Time-Sensitive Networking (TSN), as specified
in the IEEE 802.1Q-2018 standard.
Changes include:
- Addition of `TC_TAPRIO_CMD_SET_AND_HOLD` and `TC_TAPRIO_CMD_SET_AND_RELEASE`
cases in the `entry_cmd_to_str` function to return "H" and "R" respectively.
- Addition of corresponding string comparisons in the `str_to_entry_cmd`
function to map "H" and "R" to `TC_TAPRIO_CMD_SET_AND_HOLD` and
`TC_TAPRIO_CMD_SET_AND_RELEASE`.
The Hold/Release feature works as follows:
- Set-And-Hold-MAC (H): This command sets the gates and holds the current
configuration, preventing any further changes until a release command is
issued.
- Set-And-Release-MAC (R): This command releases the hold, allowing
subsequent gate configuration changes to take effect.
These changes ensure that the q_taprio module can correctly interpret and
handle the Hold/Release commands, aligning with the IEEE 802.1Q-2018 standard
for enhanced TSN configuration.
Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
`print_num()` was born in `ip/ipaddress.c` but considering it has
nothing to do with IP addresses it should really live in `lib/utils.c`.
(I've had reason to call it from bridge/* on some random hackery.)
Signed-off-by: David Lamparter <equinox@diac24.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
extend TC flower for matching on tunnel metadata.
Changes since v2:
- split uAPI changes and TC code in separate patches, as per David's request [2]
Changes since v1:
- fix incostintent naming in explain() and in tc-flower.8 (Asbjørn)
Changes since RFC:
- update uAPI bits to Asbjørn's most recent code [1]
- add 'tun' prefix to all flag names (Asbjørn)
- allow parsing 'enc_flags' multiple times, without clearing the match
mask every time, like happens for 'ip_flags' (Asbjørn)
- don't use "matches()" for parsing argv[] (Stephen)
- (hopefully) improve usage() printout (Asbjørn)
- update man page
[1] https://lore.kernel.org/netdev/20240709163825.1210046-1-ast@fiberby.net/
[2] https://lore.kernel.org/netdev/cc73004c-9aa8-9cd3-b46e-443c0727c34d@kernel.org/
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Expression 'ttl & ~(255 >> 0)' is always zero, because right operand
has 8 trailing zero bits, which is greater or equal than the size
of the left operand == 8 bits.
Found by RASU JSC.
Signed-off-by: Maks Mishin <maks.mishinFZ@gmail.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
There is a helper in utilities to handle missing argument,
but it was not being used consistently.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The removal of tick usage in netem, means that some of the
helper functions in tc are no longer used and can be safely removed.
Other functions can be made static.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The current version of netem in iproute2 has a maximum of 4.3 seconds
because of scaled 32 bit clock values. Some users would like to be
able to use larger delays to emulate things like storage delays.
Since kernel version 4.15, netem qdisc had netlink parameters
to express wider range of delays in nanoseconds. But the iproute2
side was never updated to use them.
This does break compatibility with older kernels (4.14 and earlier).
With these out of support kernels, the latency/delay parameter
will end up being ignored.
Reported-by: Marc Blanchet <marc.blanchet@viagenie.ca>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The callbacks in exec_util should not be modifying underlying
qdisc operations structure.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The callbacks in action_util should not be modifying underlying
qdisc operations structure.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The callbacks in filter_util should not be modifying underlying
qdisc operations structure.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The callbacks in qdisc_util should not be modifying underlying
qdisc operations structure.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix json corruption when using the "-json" option in some cases
Signed-off-by: Takanori Hirano <me@hrntknr.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
In the case of a process such as mapping a json to a structure,
it can be difficult if the keys have the same name but different types.
Since handle is used in hex string, change it to fw.
Signed-off-by: Takanori Hirano <me@hrntknr.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add assertion to check for case of snprintf failing (bad format?)
or buffer getting full.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix json corruption when using the "-json" option in cases where tc-fw is set.
Signed-off-by: Takanori Hirano <me@hrntknr.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If the user specifies this flag for a filter command the kernel will
return the command's result back to user space.
For example:
tc -echo filter add dev lo parent ffff: protocol ip matchall action ok
added filter dev lo parent ffff: protocol ip pref 49152 matchall chain 0
As illustrated above, the kernel will give us a pref of 491252
The same can be done for other filter commands (replace, delete, and
change). For example:
tc -echo filter del dev lo parent ffff: pref 49152 protocol ip matchall
deleted filter dev lo parent ffff: protocol ip pref 49152 matchall chain 0
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch adds the -echo flag to tc command line and support for it in
tc actions. If the user specifies this flag for an action command, the
kernel will return the command's result back to user space.
For example:
tc -echo actions add action mirred egress mirror dev lo
total acts 0
Added action
action order 1: mirred (Egress Mirror to device lo) pipe
index 10 ref 1 bind 0
not_in_hw
As illustrated above, the kernel will give us an index of 10
The same can be done for other action commands (replace, change, and
delete). For example:
tc -echo actions delete action mirred index 10
total acts 0
Deleted action
action order 1: mirred (Egress Mirror to device lo) pipe
index 10 ref 0 bind 0
not_in_hw
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
So far the mirred action has dealt with syntax that handles
mirror/redirection for netdev. A matching packet is redirected or mirrored
to a target netdev.
In this patch we enable mirred to mirror to a tc block as well.
IOW, the new syntax looks as follows:
... mirred <ingress | egress> <mirror | redirect> [index INDEX] < <blockid BLOCKID> | <dev <devname>> >
Examples of mirroring or redirecting to a tc block:
$ tc filter add block 22 protocol ip pref 25 \
flower dst_ip 192.168.0.0/16 action mirred egress mirror blockid 22
$ tc filter add block 22 protocol ip pref 25 \
flower dst_ip 10.10.10.10/32 action mirred egress redirect blockid 22
Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Co-developed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
There are three places in tc which all have same code for
handling clockid (copy/paste). Move it into tc_util.c.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
There is an open upstream kernel patch to remove ipt action from
kernel. This is corresponding iproute2 change.
- Remove support fot ipt and xt in tc.
- Remove no longer used header files.
- Update man pages.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>