Request for bug verification/comment: LuCI erroneously adds bridge to interface stanza

TL;DR:
I'm looking for verification of this behavior, and confirmation about what I believe to be a bug that can be induced by LuCI operations. Specifically, this has to do with the addition of this line to a network interface stanza:

	option type 'bridge'

Apologies in advance, this will be a bit long, but I want to make this clear and provide the recipe for reproducing the erroneous entry.

Background:
The line option type 'bridge' no longer belongs in network interface stanzas. I think this dates back to either 19.07 or 21.02 when the syntax shifted to defining the bridge in its own config device stanza. The issue is that not only is the old practice of defining the bridge inside the network interface deprecated, it will actually break things.

Despite the fact that network interfaces do not accept this line anymore, we still find that it appears in current configs (for example, this one which just showed up several posts into the thread). And what's puzzling is that the users almost always say that they used LuCI and did not edit the config file with a text editor or CLI. So... how does it get in there?

I figured it out today, and I'll share the recipe with you. It applies to both swconfig and DSA devices. I've tested it with a Linksys EA6300 (DSA) and a Linksys E3000 (swconfig), both on 23.05.2, starting with completely default configs and then making changes only through LuCI. I'll describe the general flow and show the text (mainly the deltas) from /etc/config/network. If it is not clear how I got there and screenshots or a video are required, let me know.

Reproduction Procedure:
We'll start with a completely default configuration on my EA6300.

Default /etc/config/network
root@OpenWrt:~# cat /etc/config/network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd12:926e:51b9::/48'
	option packet_steering '1'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'
	option ip6assign '60'

config device
	option name 'wan'
	option macaddr 'C8:D7:19:40:01:CD'

config interface 'wan'
	option device 'wan'
	option proto 'dhcp'

config interface 'wan6'
	option device 'wan'
	option proto 'dhcpv6'

Now, using LuCI, I'll remove port lan4 from br-lan, then I'll add a new network interface test and attach it to port lan4. I'll also add a DHCP server and assign the network to the lan firewall zone (this is optional, but useful later to prove that the bridge line does indeed break things).

--> Pay attention to the test network stanza in the below config:

Resulting network config file with new interface on lan4
root@OpenWrt:~# cat /etc/config/network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd12:926e:51b9::/48'
	option packet_steering '1'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'
	option ip6assign '60'

config device
	option name 'wan'
	option macaddr 'C8:D7:19:40:01:CD'

config interface 'wan'
	option device 'wan'
	option proto 'dhcp'

config interface 'wan6'
	option device 'wan'
	option proto 'dhcpv6'

config interface 'test'
	option proto 'static'
	option device 'lan4'
	option ipaddr '192.168.5.1'
	option netmask '255.255.255.0'

So far, so good. The test network stanza looks like you'd expect with device lan4 attached.

Now, I'll go over to wireless, edit the default OpenWrt SSID to use network test (instead of lan) and then save and apply.

Now, we see the erroneous line appear:

config interface 'test'
	option proto 'static'
	option device 'lan4'
	option ipaddr '192.168.5.1'
	option netmask '255.255.255.0'
	option type 'bridge'

Ok... so it's true this isn't expected to work because I've attached two physical interfaces to a network interface without using a separately defined bridge. I'll fix that now by creating a new bridge with lan4 in br-test, and then using br-test as the device in the test network.

The resulting config entries look fine except that the bridge line is still in the network stanza:

New bridge device results
config interface 'test'
	option proto 'static'
	option device 'br-test'
	option ipaddr '192.168.5.1'
	option netmask '255.255.255.0'
	option type 'bridge'

config device
	option type 'bridge'
	option name 'br-test'
	list ports 'lan4'

So we're still stuck with that bridge line in the network config stanza and there is no way to remove it (aside from text/CLI edits). Further, if you don't already know that this is invalid syntax, there's nothing to indicate that it shouldn't be there.

I'll go ahead and enable wifi, and then I'll test to see if I can get an IP address on wifi and/or ethernet.

  • :white_check_mark: I connect my phone via wifi and I get an IP address.
  • :negative_squared_cross_mark: I connect my computer via ethernet to port lan4 and I do not get an IP address
    • :negative_squared_cross_mark: In fact, the port doesn't even make a physical connection! The port literally does not light up.

Finally, I will now edit the file manually (using vi) and remove that one line:

bridge line removed from test network interface stanza
config interface 'test'
	option proto 'static'
	option device 'br-test'
	option ipaddr '192.168.5.1'
	option netmask '255.255.255.0'

config device
	option type 'bridge'
	option name 'br-test'
	list ports 'lan4'

Following the deletion of that one line, I execute /etc/init.d/network restart to reload the network configs... And now I'll repeat the tests:

  • :white_check_mark: I connect my phone via wifi and I get an IP address.
  • :white_check_mark: I connect my computer via ethernet to port lan4 and it successfully obtains an IP (and the link is working again)

Conclusion
It seems that LuCI (or some underlying function) detects the need for a bridge when I associate a second physical interface (wifi) with a network stanza that is currently bound to another physical interface (ethernet) without a bridge. However, in the process, it inserts a line (option type 'bridge') that is both deprecated and damaging into the network interface stanza. The presence of this line will specifically break connectivity, and there is no GUI method to remove that line.

Expected Behavior (in my view)
The option type 'bridge' line should never be added to the network interface stanza.

I don't really know what should happen In the case where two physical interfaces are assigned (without a bridge) to a single network interface... I guess it should just passively fail insofar as one or the other physical interfaces may end up logically unconnected until a bridge device is declared and used instead. By 'passively' fail, I mean that the remedy can be achieved entirely via the GUI and the config file will not have an invalid line introduced into the network stanza that requires both knowledge of the deprecation as well as the use of the command line to remove the offending line.

Notes:
I have not tested all the different possible ways that two physical interfaces could be assigned (i.e. 2 ethernet ports on DSA, 2 radios, wifi first then add the ethernet, etc.), so it's possible that there are other 'recipes' to end up in the faulty state. But I have achieved the same results with two different systems, so I believe that I have found the generalized root-cause.

If needed, feel free to ask for more details, screenshots, etc... whatever is necessary to fully demonstrate the process by which the invalid line is injected into the config.

Thanks for reading through this... I'd love to know if this is a true bug (I would say it is), or if this is expected behavior (and if so, an explanation of why it's working as intended and how a novice user would fix it would be usesful for documentation).

5 Likes

I can confirm that I am observing the same behavior.

4 Likes

This issue has been found in several threads that I've reviewed lately. It occurred to me that this thread kind of died on the vine, but I'm hoping this can be addressed in the next major release.

To that end, is there anyone else who can confirm (or refute) my assertion that this is a bug?

And what is the next step -- should this be filed on the github bug tracker? Is more information needed for it to be successfully addressed in time for the next major release? (Note: I'm not a dev or a software engineer, so might be hard for me to find the actual source code that is implicated here, but hopefully it's not too difficult to find and address, especially by the devs who are really familiar with the code. I am happy to help where I can, though).

Pretty sure when we were working together, you advised me to remove this option from my configs after an upgrade. I was porting configs from backup archives manually... It was working previously then broke due to this option.

ty

Just guessing…

2 Likes

@jow - seems like you were involved in those commits. Would you be able to comment on findings and how it can best be addressed?

LuCI does this when both of the following conditions apply:

  • the referenced logical network is not empty (empty = has no option ifname/option device)
  • the referenced device of the referenced logical network is not a bridge device

So if you attach the wifi-iface to an option network with device lan4, then LuCI sets type 'bridge' to form a bridge over lan4 and the wireless device(s) belonging to the wifi-iface.

This variant is deprecated but should still work. The proper solution would be to refactor the code to:

  • implicitly create a new bridge device containing the target networks current device (lan4 in this example)
  • change the target network to use the new device br-implicitnewbridge instead of lan4
  • stop setting type bridge

Alternatively change LuCI to refuse to attach wifi-ifaces to non-bridge, non-empty logical networks (preferable since easier to implement and less pitfalls and corner cases to worry about).

Responsible code location here:

2 Likes

Thanks @jow. Should I create a bug on GitHub so that this can be addressed?