Troubleshooting (apparent) weird Powerline "routing" situation

For a few months now I've been experiencing intermittent problems with my Powerline setup. Some experiments I did today make me wonder if there's actually more of a routing/configuration issue than the Powerline itself?

There are three Powerline nodes in my setup. PL#1 and PL#2 have a nearly direct electrical connection on the same circuit (i.e. not even a breaker between them) and I've always seen good TX/RX speeds between them. With PL#3 β€” I don't know if it's on the opposite leg of my breaker box or what β€” there tends to be a poorer signal.

I hadn't worried about PL#3 since the stuff connected via it is (currently) lower priority and more experimental anyway. During the day my laptop is usually connected to PL#2 and should have a "straight shot" to the internet with the main router on PL#1 regardless, or so I thought, of whether PL#3 was having a good day:

   PL#1                               PL#2                β•‘           PL#3              
                                                          β•‘                             
   ─────────────────────────────────────────Powerline─────╬───────────────────────────  
    β–²    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”                     β–²                 β•‘             β–²               
    β”‚    β”‚        β”‚                     β”‚                 β•‘             β”‚               
β”Œβ”€β”€β”€β”€β”€β”€β” β”‚        β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”           
β”‚  PL  β”‚ β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚  PL "router"   β”‚                   β”‚  PL   β”‚           
β”‚bridgeβ”‚ β”‚  β”‚VLAN-awareβ”‚       β”‚  used as WAP   β”‚                   β”‚bridge β”‚           
β””β”€β”€β”€β”€β”€β”€β”˜ β”‚  β”‚  Switch  β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β””β”€β”€β”€β”€β”€β”€β”€β”˜           
    β”‚    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚                               β”‚               
    β”‚    β”‚        β”‚                     β”‚                               └──────┐        
    β””β”€β”€β”€β”€β”˜  β”Œβ”€β”€β”€β”€β”€β”˜                     └───┐                                  β”‚        
            β”‚                               β”‚                                  β”‚        
        β”Œβ”€β”€β”€β”€β”€β”€β”.───.             .─.       β”‚                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚ Main ( WAN ) ◀───────▢ (me )────────────┐                   β”‚"Router" used asβ”‚
        β”‚routerβ”‚`───'             `─'β”‚   Laptop   β”‚                   β”‚  switch / WAP  β”‚
        β””β”€β”€β”€β”€β”€β”€β”˜                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Well, today my Powerline setup was having a bad day again to the point where even basic websites were taking forever to load. I logged into OpenWrt running on PL#2 and as usual saw that one REM CCO node had a strong 305 403 TX/RX status but the REM STA node was back down to 000 015 again. Just to be 100% sure that my assumption that PL#2 was really the strong REM CCO node being and not the crummy REM STA node, I just unplugged PL#3 completely. And that's when things got funny!

The Powerline status light (on my PL#2 access point) went from crummy orange to happy green β€” but now I had no internet connection and couldn't even talk to the router. Even after rebooting PL#2 my laptop struggled to get DHCP and such when it rejoined the WiFi. What is going on?? Then suddenly everything just "magically" started to work as expected!!

So: when PL#3 is being crummy there's an orange Powerline light (seems legit) AND my laptop has a really crummy connection (why??). When I unplug PL#3 the Powerline light goes green again (as I hoped) BUT my laptop is completely offline for a few minutes until suddenly it's working perfectly!

This makes me wonder: was all my traffic somehow going through the OpenWrt box attached on PL#3? And when that box went away it took awhile to establish the right path to the real router?

Where I'm kind of stuck is that the traceroutes only ever show the real router. So I don't think the extra "router" attached to PL#3 is ever acting as more than a switch. But could it still end up somehow attracting/accepting Β± all packets that go through the Powerline and then sending them right back around before they make it to the main router?

1 Like

Your powerline infrastructure should behave as a dumb three port layer-2 switch. Unless one of the modules is faulty, the problem you describe must be one layer upwards.

I would review how are the router and the access points configured.

I don't know because I can't make out your ASCII diagram on my phone but maybe you could draw a diagram either on paper or a diagram tool that shows the network topology entirely? Do you have a network loop in your switches that has a leg going through the bad power line connection, and hence doesn't take down your infrastructure. It still clogs it up with extra packets?

I definitely think that having one device be marginal would slow your network down as the power line device likely falls back to more robust and slower encoding. The delay you experienced might just be renegotiation time.

Based on your impressive ASCII masterpiece:

  • Try swapping the powerlines: 1-->2, 2-->3, 3-->1 (if possible)

  • ...or swapping as you see fit to establish if one of them is faulty.

  • Maybe in the last few months you connected a new non-networked or networked device that is causing interference to the AC power? Use filtered / surge suppressor power bars on other equipment to help clean the power up a bit.

  • PL#3 always has been slow - try finding a better outlet for it or different circuit? If it is on a bad circuit then it will hobble the powerline network with error correction and re-transmitted packets.

  • Standard North American AC power is typically common ground center tapped 240VAC to produce two 120VAC legs. If you attach a powerline to leg 1 and then another one to leg 2 the signal has to go all the way back to the power company's transformer on the pole outside leading to bad signal. One easy way to avoid that would be to turn the power off to each breaker to map which outlets it covers. Try to connect all the powerlines on the same circuit or same leg for best connectivity. Or since they have built in LED's just try a few other outlets if possible.

  • Inspect your breaker box. Typically they look like this. The breakers on the left are all leg 1. The breakers on the right are all leg 2. HTH

Thanks all so far, interesting things to consider but I'm still a bit puzzled.

This doesn't seem to match my experience. I reconnected PL#3 last night and this morning plcstat -t still reports quite low TX/RX to it:

 P/L NET TEI ------ MAC ------ ------ BDA ------  TX  RX CHIPSET FIRMWARE
 LOC STA 077 68:FF:nn:nn:nn:nn 68:FF:nn:nn:nn:nn n/a n/a QCA7500 MAC-QCA7550-2.2.3.32-00-20161104-CS
 REM CCO 074 34:E8:nn:nn:nn:nn D6:85:nn:nn:nn:nn 299 394
 REM STA 078 34:E8:nn:nn:nn:nn F4:F2:nn:nn:nn:nn 000 013

So basically same "signal quality" as before but now with no significant harm to the rest of the network I can see: the Powerline light on LOC STA is still green and just a basic internet speed test reports ~40Mbps down and ~11Mbps up. (Which is maybe half of the speed I'd expect from my ISP so I guess there's an optimist/pessimist way to look at that…? Regardless it's working at a completely acceptable level compared to yesterday.)

That's worth considering and perhaps some further experimentation. I'm not sure it explains what I saw yesterday though where the Powerline light was green by the time I got back to the PL#2 router and the problem persisted several minutes after I rebooted the same PL#2 router and its lights/radios/etc. had come up.

So a "layer-2 switch" as opposed to a hub does still pick and choose what ports to go out based on MAC address, right? This is what I'm wondering about but don't know how to troubleshoot: do PLC adapters (or any other network management/troubleshooting tools) offer a way to see which MAC addresses are thought to be available via a particular switch port, or in this case: PLC adapter? Or any reason that the PLC "switch" would settle on a suboptimal/nonsensical path for packets? Here's my diagram again for those on phones or if the forum is misaligning it for others like it is for me:

(made in Monodraw btw β€” I'm maybe a little crazy but definitely not that particular kind of crazy :wink: )

Really besides all the other devices hung off the switches/APs there's not a lot more to the network. I do have VLANs set up though, which I'm sure adds complication. The switch is set for trunking on its "Main router" and "PL bridge" (aka PL#1) ports. The "PL router used as WAP" is also set for trunking on its built-in PL#2 port, and the "Router used as switch / WAP" also set for trunking on its port going to PL#3. Those two (PL#2/PL#3) routersaccess points both have OpenWrt but no firewall rules of their own, just a DHCP-client "LAN" interface plus a bunch of unmanaged VLAN bridge interfaces for their wired/wireless clients.

In my mind the next step is to wait until the PL#2 network goes crummy again and see if any unexpected packets are going through the PL#3-attached access point. Seems unlikely but it's still the only thing that explains it in my mind still. But if there's more proactive ideas/experiments then maybe I can solve this on "my own time" instead of dealing with it when it randomly interrupts "office hours" again.

what is the arrow between "WAN" and "me" on your diagram?

I don't see obvious loops in this diagram. I will say that a friend of mine had all sorts of issues with his powerline equipment, when I had him carefully re-organize it and then re-sync all the powerline adapters everything just worked fine from that point onwards.

Powerline stuff should always be plugged directly into the wall, never into power strips or extension cords or anything like that. (my friend didn't realize this, but probably you do).

maybe just renegotiating everything has fixed your issue? it's tough to say unless it reoccurs.

Actually it's usually breakers alternate legs as you move vertically down the stack on each side. This is so you can put a double breaker for a 240V appliance, the two legs will stack on top of each other, and the breaker can blow both poles at once.

1 Like

Nice catch. The phases are interleaved. So even breakers on both sides would be L1. Odd breakers on both sides would be L2.

Usually PLC works via capacitive coupling, even over different phases. This works as long as the cables are running in parallel for at least 50cm, which is almost always the case in any household.
There is no need for the signal to go through the transformer.

2 Likes

I am amazed that layer 1 even works given all the intermittent or constant noise sources. I didn't know they used stray capacitance that's pretty cool.

Quote:

However, the β€œnatural” coupling is not consistent and the installation of coupling devices is more often than not recommended in poly-phase systems.

The common solution is the use of a capacitive coupler, typically installed close to the main electrical panel.

I suspect @natevw is having a gross mismatch of link speeds which is leading to unreliable operation.

Thanks again, all. Powerline went crummy again today and I tried to do some packet captures and such to see if there was any evidence for my original hypothesis:

I couldn't figure out how to do a complete packet capture on the PL#3-connected router, since it has two "physical" interfaces (eth0/eth1) going to its CPU and everything else seemed to be at the VLAN level which I didn't really want either. What I did was:

  • connect my laptop to one of the WiFi networks of the PL#3-connected switch/AP "router" β€” this was a misguided attempt to get a "direct" connection to it but since I was crossing firewall zones I think everything went through the main/actual router after all. General connectivity (e.g. to the internet) didn't seem nearly so bad, but I didn't particularly test that even though I should have!
  • from my laptop use Wireshark sshdump remote capture on first the eth0 and then the eth1 interfaces of the PL#3-connected "router"
  • also from my laptop, ssh over to the other access point, the PL#2 "router" that had really poor WAN (and probably LAN) connectivity and ping -A … to the main router from it.

I did not see any of my pings from the PL#2 router showing up or getting "routed through" the PL#3 router. (I did see a bunch of weirdly-addressed packets that clued me into that most of my firewall zones had wrong masquerade settings, but that's another story.)

Finally, this time when I unplugged the PL#3 adapter nothing unexpected happened. The stuff connected to it of course became unreachable, but everything else seemed to be talking to each other just fine. No "ten minutes of nothing working" like in my original report.

So anyway, long story short I still don't know but it does point more towards the responses here that one bad links in a Powerline graph can throw off the good links too in just a general way, and away from the idea that the PL#3 node was somehow getting all traffic round-tripped through it.

Are all the same electrical devices (lights, chargers, TVs, etc) powered on and operational like the last time you've tested?

I ask because in my house, powerline speeds drop to 1/10th depending on certain LED lights being turned on or off. It's a 60 year old house with 60 year old spec wiring.

During the day this usually means slow download at the endpoint inside the LAN (because someone is in the kitchen), and fast at night (because everything is turned off).

If you think electrical interference might be the cause (even RF interferance can cause powerline to adjust, according to my research), start a file transfer between one point and the other, and then go around the house turning stuff on/off and check if that improves or dis-improves the situation. Might be worth a shot when one of your network segments goes dark. I couldn't believe how much interference a cheap LED driver can cause.

1 Like

This is a great point. One day I was doing some work in my office and turned on an LED "shop light" immediately my kids started complaining that their Netflix show would buffer and stutter. After about a half hour of debugging the devices I tried turning off the LED shop light and everything went back to working fine.

1 Like