Add DSCP mask support, allowing users to specify a DSCP value with an
optional mask. Example:
# ip rule add dscp 1 table 100
# ip rule add dscp 0x02/0x3f table 200
# ip rule add dscp AF42/0x3f table 300
# ip rule add dscp 0x10/0x30 table 400
In non-JSON output, the DSCP mask is not printed in case of exact match
and the DSCP value is printed in hexadecimal format in case of inexact
match:
$ ip rule show
0: from all lookup local
32762: from all lookup 400 dscp 0x10/0x30
32763: from all lookup 300 dscp AF42
32764: from all lookup 200 dscp 2
32765: from all lookup 100 dscp 1
32766: from all lookup main
32767: from all lookup default
Dump can be filtered by DSCP value and mask:
$ ip rule show dscp 1
32765: from all lookup 100 dscp 1
$ ip rule show dscp AF42
32763: from all lookup 300 dscp AF42
$ ip rule show dscp 0x10/0x30
32762: from all lookup 400 dscp 0x10/0x30
In JSON output, the DSCP mask is printed as an hexadecimal string to be
consistent with other masks. The DSCP value is printed as an integer in
order not to break existing scripts:
$ ip -j -p -N rule show dscp 0x10/0x30
[ {
"priority": 32762,
"src": "all",
"table": "400",
"dscp": "16",
"dscp_mask": "0x30"
} ]
The mask attribute is only sent to the kernel in case of inexact match
so that iproute2 will continue working with kernels that do not support
the attribute.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add port mask support, allowing users to specify a source or destination
port with an optional mask. Example:
# ip rule add sport 80 table 100
# ip rule add sport 90/0xffff table 200
# ip rule add dport 1000-2000 table 300
# ip rule add sport 0x123/0xfff table 400
# ip rule add dport 0x4/0xff table 500
# ip rule add dport 0x8/0xf table 600
# ip rule del dport 0x8/0xf table 600
In non-JSON output, the mask is not printed in case of exact match:
$ ip rule show
0: from all lookup local
32761: from all dport 0x4/0xff lookup 500
32762: from all sport 0x123/0xfff lookup 400
32763: from all dport 1000-2000 lookup 300
32764: from all sport 90 lookup 200
32765: from all sport 80 lookup 100
32766: from all lookup main
32767: from all lookup default
Dump can be filtered by port value and mask:
$ ip rule show sport 80
32765: from all sport 80 lookup 100
$ ip rule show sport 90
32764: from all sport 90 lookup 200
$ ip rule show sport 0x123/0x0fff
32762: from all sport 0x123/0xfff lookup 400
$ ip rule show dport 4/0xff
32761: from all dport 0x4/0xff lookup 500
In JSON output, the port mask is printed as an hexadecimal string to be
consistent with other masks. The port value is printed as an integer in
order not to break existing scripts:
$ ip -j -p rule show sport 0x123/0xfff table 400
[ {
"priority": 32762,
"src": "all",
"sport": 291,
"sport_mask": "0xfff",
"table": "400"
} ]
The mask attribute is only sent to the kernel in case of inexact match
so that iproute2 will continue working with kernels that do not support
the attribute.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
This will be useful when enabling port masks in the next patch.
Before:
# ip rule add sport 0x1 table 100
Invalid "sport"
After:
# ip rule add sport 0x1 table 100
$ ip rule show sport 0x1
32765: from all sport 1 lookup 100
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
In preparation for adding port mask support, move port parsing to a
function.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Currently, tc_calc_xmittime and tc_calc_xmitsize round from double to
int three times — once when they call tc_core_time2tick /
tc_core_tick2time (whose argument is int), once when those functions
return (their return value is int), and then finally when the tc_calc_*
functions return. This leads to extremely granular and inaccurate
conversions.
As a result, for example, on my test system (where tick_in_usec=15.625,
clock_factor=1, and hz=1000000000) for a bitrate of 1Gbps, all tc htb
burst values between 0 and 999 bytes get encoded as 0 ticks; all values
between 1000 and 1999 bytes get encoded as 15 ticks (equivalent to 960
bytes); all values between 2000 and 2999 bytes as 31 ticks (1984 bytes);
etc.
The patch changes the code so these calculations are done internally in
floating-point, and only rounded to integer values when the value is
returned. It also changes tc_calc_xmittime to round its calculated value
up, rather than down, to ensure that the calculated time is actually
sufficient for the requested size.
Signed-off-by: Jonathan Lennox <jonathan.lennox@8x8.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add "scrub" option to configure IFLA_NETKIT_SCRUB and
IFLA_NETKIT_PEER_SCRUB when setting up a link. Add "scrub" and
"peer scrub" to device details as well when printing.
$ sudo ./ip/ip link add jordan type netkit scrub default peer scrub none
$ ./ip/ip -details link show jordan
43: jordan@nk0: <BROADCAST,MULTICAST,NOARP,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
netkit mode l3 type primary policy forward peer policy forward scrub default peer scrub none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536
v2->v3: Updated man page.
v1->v2: Added some spaces around "scrub SCRUB" in the help message.
Link: https://lore.kernel.org/netdev/20241004101335.117711-1-daniel@iogearbox.net/
Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: David Ahern <dsahern@kernel.org>
Static analyzer reported:
1. if (res > 0xFFFFFFFFFFFFFFFFULL)
Expression 'res > 0xFFFFFFFFFFFFFFFFULL' is always false , which may be caused by a logical error:
'res' has a type 'unsigned long long' with minimum value '0' and a maximum value '18446744073709551615'
2. if (res > INT64_MAX || res < INT64_MIN)
Expression 'res > INT64_MAX' is always false , which may be caused by a logical error: 'res' has a type 'long long'
with minimum value '-9223372036854775808' and a maximum value '9223372036854775807'
Expression 'res < INT64_MIN' is always false , which may be caused by a logical error: 'res' has a type 'long long'
with minimum value '-9223372036854775808' and a maximum value '9223372036854775807'
Corrections explained:
- Removed redundant check `res > 0xFFFFFFFFFFFFFFFFULL` in `get_u64`,
as `res` cannot exceed this value due to its type.
- Removed redundant checks `res > INT64_MAX` and `res < INT64_MIN` in `get_s64`,
as `res` cannot exceed the range of `long long`.
Triggers found by static analyzer Svace.
Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Static analyzer reported:
expression is identical to previous conditio
Corrections explained:
The condition checking for "neutral-map-auto" was duplicated in the
ila_csum_name2mode function. This commit removes the redundant check
to improve code readability and maintainability.
Triggers found by static analyzer Svace.
Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Static analyzer reported:
Pointer 'tp', returned from function 'localtime' at ipxfrm.c:352, may be NULL
and is dereferenced at ipxfrm.c:354 by calling function 'strftime'.
Corrections explained:
The function localtime() may return NULL if the provided time value is
invalid. This commit adds a check for NULL and handles the error case
by copying "invalid-time" into the output buffer.
Unlikely, but may return an error
Triggers found by static analyzer Svace.
Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Static analyzer reported:
Return value of function 'iproute_flush_cache', called at iproute.c:1732,
is not checked. The return value is obtained from function 'open64' and possibly contains an error code.
Corrections explained:
The function iproute_flush_cache() may return an error code, which was
previously ignored. This could lead to unexpected behavior if the cache
flush fails. Added error handling to ensure the function fails gracefully
when iproute_flush_cache() returns an error.
Triggers found by static analyzer Svace.
Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The netlink nest that carriers tc action statistics looks as follows:
[TCA_ACT_STATS]
[TCA_STATS_BASIC]
[TCA_STATS_BASIC_HW]
Where 'TCA_STATS_BASIC' carries the combined software and hardware
packets (32-bits) and bytes (64-bit) counters and 'TCA_STATS_BASIC_HW'
carries the hardware statistics.
When the number of packets exceeds 0xffffffff, the kernel emits the
'TCA_STATS_PKT64' attribute:
[TCA_ACT_STATS]
[TCA_STATS_BASIC]
[TCA_STATS_PKT64]
[TCA_STATS_BASIC_HW]
[TCA_STATS_PKT64]
This layout is not ideal as the only way for user space to know what
each 'TCA_STATS_PKT64' attribute carries is to check which attribute
precedes it, which is exactly what some applications are doing [1].
Do the same in iproute2 so that users with existing kernels could read
the 64-bit hardware packets counter of tc actions instead of reading the
truncated 32-bit counter.
Before:
$ tc -s filter show dev swp2 ingress
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
skip_sw
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device swp1) stolen
index 1 ref 1 bind 1 installed 47 sec used 23 sec
Action statistics:
Sent 368689092544 bytes 5760767071 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 368689092544 bytes 1465799775 pkt
backlog 0b 0p requeues 0
used_hw_stats immediate
Where 5760767071 - 1465799775 = 0x100000000
After:
$ tc -s filter show dev swp2 ingress
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
skip_sw
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device swp1) stolen
index 1 ref 1 bind 1 installed 71 sec used 47 sec
Action statistics:
Sent 368689092544 bytes 5760767071 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 368689092544 bytes 5760767071 pkt
backlog 0b 0p requeues 0
used_hw_stats immediate
[1] 006e1c6dbf
Reported-by: Joe Botha <joe@atomic.ac>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
A new attribute, IFLA_VXLAN_RESERVED_BITS, was added in Linux kernel
commit 6c11379b104e ("vxlan: Add an attribute to make VXLAN header
validation configurable") (See the link below for the full patchset).
The payload is a 64-bit binary field that covers the VXLAN header. The set
bits indicate which bits in a VXLAN packet header should be allowed to
carry 1's. Support the new attribute through a CLI keyword "reserved_bits".
Link: https://patch.msgid.link/173378643250.273075.13832548579412179113.git-patchwork-notify@kernel.org
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Enhanced the 'ip monitor' command to track changes in IPv6
anycast addresses. This update allows the command to listen for
events related to anycast address additions and deletions by
registering to the newly introduced RTNLGRP_IPV6_ACADDR netlink group.
This patch depends on the kernel patch that adds RTNLGRP_IPV6_ACADDR
being merged first.
Here is an example usage:
root@uml-x86-64:/# ip monitor acaddress
2: if2 inet6 any 2001:db8:7b:0:528e:a53a:9224:c9c5 scope global
valid_lft forever preferred_lft forever
Deleted 2: if2 inet6 any 2001:db8:7b:0:528e:a53a:9224:c9c5 scope global
valid_lft forever preferred_lft forever
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Update kernel headers to commit:
59372af69d4d ("Merge tag 'batadv-next-pullrequest-20250117' of git://git.open-mesh.org/linux-merge")
Signed-off-by: David Ahern <dsahern@kernel.org>
Change "is a garbage" to "is garbage". Because garbage is a collective
noun, it does not need the indefinite article.
Signed-off-by: Neil Svedberg <neil.svedberg@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add support for 'flowlabel' selector in ip-rule.
Rules can be added with or without a mask in which case exact match is
used:
# ip -6 rule add flowlabel 0x12345 table 100
# ip -6 rule add flowlabel 0x11/0xff table 200
# ip -6 rule add flowlabel 0x54321 table 300
# ip -6 rule del flowlabel 0x54321 table 300
Dump output:
$ ip -6 rule show
0: from all lookup local
32764: from all lookup 200 flowlabel 0x11/0xff
32765: from all lookup 100 flowlabel 0x12345
32766: from all lookup main
Dump can be filtered by flow label value and mask:
$ ip -6 rule show flowlabel 0x12345
32765: from all lookup 100 flowlabel 0x12345
$ ip -6 rule show flowlabel 0x11/0xff
32764: from all lookup 200 flowlabel 0x11/0xff
JSON output:
$ ip -6 -j -p rule show flowlabel 0x12345
[ {
"priority": 32765,
"src": "all",
"table": "100",
"flowlabel": "0x12345",
"flowlabel_mask": "0xfffff"
} ]
$ ip -6 -j -p rule show flowlabel 0x11/0xff
[ {
"priority": 32764,
"src": "all",
"table": "200",
"flowlabel": "0x11",
"flowlabel_mask": "0xff"
} ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
In linux-6.13, we added the ability to offload pacing on
capable devices.
tc qdisc add ... fq ... offload_horizon 100ms
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Enhanced the 'ip monitor' command to track changes in IPv4 and IPv6
multicast addresses. This update allows the command to listen for
events related to multicast address additions and deletions by
registering to the newly introduced RTNLGRP_IPV4_MCADDR and
RTNLGRP_IPV6_MCADDR netlink groups.
This patch depends on the kernel patch that adds RTNLGRP_IPV4_MCADDR
and RTNLGRP_IPV6_MCADDR being merged first.
Here is an example usage:
root@uml-x86-64:/# ip monitor maddress
9: nettest123 inet6 mcast ff01::1 scope global
valid_lft forever preferred_lft forever
9: nettest123 inet6 mcast ff02::1 scope global
valid_lft forever preferred_lft forever
9: nettest123 inet mcast 224.0.0.1 scope global
valid_lft forever preferred_lft forever
9: nettest123 inet6 mcast ff02::1:ff00:7b01 scope global
valid_lft forever preferred_lft forever
Deleted 9: nettest123 inet mcast 224.0.0.1 scope global
valid_lft forever preferred_lft forever
Deleted 9: nettest123 inet6 mcast ff02::1:ff00:7b01 scope global
valid_lft forever preferred_lft forever
Deleted 9: nettest123 inet6 mcast ff02::1 scope global
valid_lft forever preferred_lft forever
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Extend the current rmnet support to allow enabling or disabling
IFLA_RMNET_FLAGS via ip link as well as printing the current settings.
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: David Ahern <dsahern@kernel.org>
The flower tc parser was using XATTR_SIZE_MAX from linux/limits.h,
but this constant is intended to before extended filesystem attributes
not for TC. Replace it with a local define.
This fixes issue on systems with musl and XATTR_SIZE_MAX is not
defined in limits.h there.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The recent report of issues with missing limits.h impacting musl
suggested looking at what files are and are not included in ip code.
The standard practice is to put standard headers first, then system,
then local headers. Used iwyu to get suggestions about missing
and extraneous headers.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Kernel support for dumping the multicast querier state was added in this
commit [1]. As some people might be interested to get this information
from userspace, this commit implements the necessary changes to show it
via
ip -d link show [dev]
The querier state shows the following information for IPv4 and IPv6
respectively:
1) The ip address of the current querier in the network. This could be
ourselves or an external querier.
2) The port on which the querier was seen
3) Querier timeout in seconds
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c7fa1d9b1fb179375e889ff076a1566ecc997bfc
Signed-off-by: Fabian Pfitzner <f.pfitzner@pengutronix.de>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Port param show command arg parser used the devlink dev flag
instead of the port, which caused to not identify the port device
argument, causing the following error:
$ devlink port param show eth0 name link_type
Wrong identification string format.
Devlink identification ("bus_name/dev_name") expected
Use the correct the devlink handle flag.
Fixes: 70faecdca8 ("devlink: implement dump selector for devlink objects show commands")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When parsing with selector, there's a list of extended handles
(devname/busname/x) which require special treatment.
DL_OPT_HANDLEP is one of them. The code tries to parse devname/busname
handle and in case it is successful, it goes the "dump" way. However if
it's not, parsing is directly done. That is wrong, as the options may
still be incomplete. Do break in that case instead allowing to do dry
parse and possibly go the "dump" way in case the option list is not
complete.
Fixes: 70faecdca8 ("devlink: implement dump selector for devlink objects show commands")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When the return value of rtnl_talk() is greater than
or equal to 0, 'answer' will be allocated.
The 'answer' should be free after using,
otherwise it will cause memory leak.
Signed-off-by: Minhong He <heminhong@kylinos.cn>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
EditorConfig is a specification to define the most basic code formatting
stuff, and it is supported by many editors and IDEs, either directly or via
plugins, including VSCode/VSCodium, Vim, emacs and more.
It allows to define formatting style related to indentation, charset, end
of lines and trailing whitespaces. It also allows to apply different
formats for different files based on wildcards, so for example it is
possible to apply different configurations to *.{c,h}, *.json or *.yaml.
In linux related projects, defining a .editorconfig might help people that
work on different projects with different indentation styles, so they
cannot define a global style. Now they will directly see the correct
indentation on every fresh clone of the project.
Add the .editorconfig file at the root of the iproute2 project with a broad
generic configuration for all file types. Then add exceptions for the file
types which follow different conventions.
See https://editorconfig.org
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
This commit enhances the q_taprio module by adding support for the
Hold/Release mechanism in Time-Sensitive Networking (TSN), as specified
in the IEEE 802.1Q-2018 standard.
Changes include:
- Addition of `TC_TAPRIO_CMD_SET_AND_HOLD` and `TC_TAPRIO_CMD_SET_AND_RELEASE`
cases in the `entry_cmd_to_str` function to return "H" and "R" respectively.
- Addition of corresponding string comparisons in the `str_to_entry_cmd`
function to map "H" and "R" to `TC_TAPRIO_CMD_SET_AND_HOLD` and
`TC_TAPRIO_CMD_SET_AND_RELEASE`.
The Hold/Release feature works as follows:
- Set-And-Hold-MAC (H): This command sets the gates and holds the current
configuration, preventing any further changes until a release command is
issued.
- Set-And-Release-MAC (R): This command releases the hold, allowing
subsequent gate configuration changes to take effect.
These changes ensure that the q_taprio module can correctly interpret and
handle the Hold/Release commands, aligning with the IEEE 802.1Q-2018 standard
for enhanced TSN configuration.
Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Vincent Mailhol says:
====================
An RFC was sent last weekend to kick-off the discussion of the
introduction of CAN XL: [1] for the kernel side and [2] for the
iproute2 interface. While the series received some positive feedback,
it is far from completion. Some work is still needed to:
- adjust the nesting of the IFLA_CAN_XL_DATA_BITTIMING_CONST in the
netlink interface
- add the CAN XL PWM configuration
and this TODO list may grow if more feedback is received.
Regardless of this, the RFC started with a set of trivial patches to
do some clean-up and some renaming in preparation of the introduction
of CAN XL.
This series just contains those preparation patches which were cherry
picked from the RFC.
The goal is to have those merged first to remove some overhead from
the netlink CAN XL main series before tacking care of the other
comments.
[1] [RFC] can: netlink: add CAN XL
Link: https://lore.kernel.org/linux-can/20241110155902.72807-16-mailhol.vincent@wanadoo.fr/
[2] [RFC] iplink_can: add CAN XL
Link: https://lore.kernel.org/linux-can/20241110160406.73584-10-mailhol.vincent@wanadoo.fr/
====================
Signed-off-by: David Ahern <dsahern@kernel.org>