Community Forum

Raspberry Pi WiFi Dongle IP reports "Host Down", But Not So - How to debug?

Highlighted
Frequent Visitor

Raspberry Pi WiFi Dongle IP reports "Host Down", But Not So - How to debug?

Sometimes after a week or two of a “life segment”, my GoPiGo3/Raspberry Pi robot's WiFi dongle address stops responding

ssh pi@X.0.0.X … pi@X.0.0.X Host is down

and when I connect with WiFi built-in address X.0.0.Y, everything looks fine in ifconfig - up, no RX or TX errors.

 

 

I’m wondering if the issue might be in my Xfinity WiFi router  (SCIENTIFIC ATLANTA DPC3941T), not the dongle.

 

Rebooting the Raspberry Pi, or unplugging and replugging the WiFi dongle will repair the incident, but how can I debug this to determine the cause?

 

pi@Carl:~/Carl $ packet_write_poll: Connection to X.0.0.X port 22: Host is down
Mac$ ssh pi@X.0.0.X
ssh: connect to host X.0.0.X port 22: Host is down
Mac$ ssh pi@X.0.0.Y
pi@X.0.0.Y's password: 
Linux Carl 4.14.98-v7+ #1200 SMP Tue Feb 12 20:27:48 GMT 2019 armv7l

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Jan 18 00:44:42 2020 from X.0.0.Z
pi@Carl:~ $ ifconfig
eth0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether b8:...:86  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 3112  bytes 186720 (182.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3112  bytes 186720 (182.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet X.0.0.X  netmask 255.255.255.0  broadcast X.0.0.255
        inet6 fe80::..1  prefixlen 64  scopeid 0x20<link>
        inet6 26..ff  prefixlen 128  scopeid 0x0<global>
        ether b4..b  txqueuelen 1000  (Ethernet)
        RX packets 2173  bytes 907969974 (865.9 MiB)
        RX errors 0  dropped 258  overruns 0  frame 0
        TX packets 3675  bytes 106031839 (101.1 MiB)
        TX errors 0  dropped 6 overruns 0  carrier 0  collisions 0

wlan1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet X.0.0.Y  netmask 255.255.255.0  broadcast X.0.0.255
        inet6 26...9a  prefixlen 128  scopeid 0x0<global>
        inet6 fe8...ce5  prefixlen 64  scopeid 0x20<link>
        inet6 26...aa  prefixlen 64  scopeid 0x0<global>
        ether b8:...d3  txqueuelen 1000  (Ethernet)
        RX packets 245090  bytes 26383152 (25.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1873  bytes 403869 (394.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

pi@Carl:~ $ 

 

Highlighted
Regular Contributor

Re: Raspberry Pi WiFi Dongle IP reports "Host Down", But Not So - How to debug?

The message 'Host Down' doesn't actually mean that the host is down; it means that the host is not responding on that particular IP address.   

 

If I understand your post, wlan0 stops responding to connections but wlan1 is up.   Here's where I'd start:

 

  - dmesg  - look for messages mentioning wlan0

  - ip link show wlan0 - is the link BROADCAST, MULTICAST, UP, LOWER_UP ?

  - iwlist wlan0 scan - can the interface see all cells?

  - cat /proc/net/wireless - anything interesting?  

  - /var/log/messages - check for problems w/ wlan0

 

Removing/inserting the dongle would cause the interface to bounce.   You may be able to accomplish the same thing with 'ifdown wlan0' followed by 'ifup wlan0'.    If that doesn't work, try removing the kernel driver for the device and re-installing it (rmmod <driver>; insmod <driver>).  

 

I see you have IPv6 addresses on the interfaces.  Have you tried reaching interface via IPv6?

Highlighted
Frequent Visitor

Re: Raspberry Pi WiFi Dongle IP reports "Host Down", But Not So - How to debug?

Thanks - now I have some things to check next time (which is likely in the next week to few weeks).

 

I did not try the IPv6 addresses, just the other IPv4 address.  I will try the v6 address for the unreachable v4 address next time as well.

 

The only suspicious dmesg item (both wlan adapters are still working):

[86790.429641] TCP: request_sock_TCP: Possible SYN flooding on port 8888. Sending cookies.  Check SNMP counters.

which is my RPI-Monitor web port.  

Highlighted
Frequent Visitor

Re: Raspberry Pi WiFi Dongle IP reports "Host Down", But Not So - How to debug?


@cxrider wrote:

The message 'Host Down' doesn't actually mean that the host is down; it means that the host is not responding on that particular IP address.   

 

If I understand your post, wlan0 stops responding to connections but wlan1 is up.   Here's where I'd start:

 

  - dmesg  - look for messages mentioning wlan0

Nothing interesting - (last entry was the daily "rsyslogd was HUP'd" many hours before the WiFi disconnect occured.)

 

  - ip link show wlan0 - is the link BROADCAST, MULTICAST, UP, LOWER_UP ?

yes, both wlan0 and wlan1

 

  - iwlist wlan0 scan - can the interface see all cells?

yes, both wlan0 and wlan1 see the router

 

  - cat /proc/net/wireless - anything interesting?  

Don't know what to look for:

pi@Carl:~ $ cat /proc/net/wireless
Inter-| sta-| Quality | Discarded packets | Missed | WE
face | tus | link level noise | nwid crypt frag retry misc | beacon | 22
wlan0: 0000 100. 100. 0. 0 0 0 0 0 0
wlan1: 0000 70. -37. -256 0 0 0 1 0 0

 

  - /var/log/messages - check for problems w/ wlan0

Nothing - (beside "HUP'd" entry)

 

I see you have IPv6 addresses on the interfaces.  Have you tried reaching interface via IPv6?


not reachable by IPv6 - the ifconfig does show an IPv6 difference from when working -

the two inet6 \<global\> entries are gone, only inet6 \<link\> entry for wlan0.

 

Bouncing it using sudo ifconfig wlan0 down, up results in the

inet 10.0.0.X netmask 255.255.255.0 broadcast 10.0.0.255 line disappearing,

and all inet6 lines disappearing.

 

Pulling the WiFi dongle out and re-inserting it does restore the wlan0 link completely,

(but the quickly varying load also fooled the robot's smart charge to switch to trickle charging

causing the robot to get off his dock early.  He will get thirsty and get back on soon.) 

 

 

Highlighted
Regular Contributor

Re: Raspberry Pi WiFi Dongle IP reports "Host Down", But Not So - How to debug?

I think the device driver went out to lunch; that or there's a problem with the hardware.

 

The Quality section is odd.   Specifically the level should never be 100 as it represents the signal strength received vs. what would be broadcast.    Signals always become weaker as one gets further and further from the source so this value should be negative and less than 100.    The fact that the noise level is 0 makes no sense either but would be consistent with the bogus level of 100.    Do you know what the device driver is for wlan0?    Is it the same for wlan1?

Highlighted
Frequent Visitor

Re: Raspberry Pi WiFi Dongle IP reports "Host Down", But Not So - How to debug?


@cxrider wrote:

I think the device driver went out to lunch; that or there's a problem with the hardware.

 

The Quality section is odd.   Specifically the level should never be 100 as it represents the signal strength received vs. what would be broadcast.    Signals always become weaker as one gets further and further from the source so this value should be negative and less than 100.    The fact that the noise level is 0 makes no sense either but would be consistent with the bogus level of 100.    

 

Do you know what the device driver is for wlan0?    Is it the same for wlan1?


wlan0 and wlan1 assignment changes between the USB WiFi Dongle, and the Raspberry Pi's onboard WiFi, and I am guessing the driver for the onboard WiFi is different from the dongle WiFi.

 

 

Feb 20 19:03:16 Carl kernel: [257053.510345] usb 1-1.4: new high-speed USB device number 6 using dwc_otg

Feb 20 19:03:16 Carl kernel: [257053.641746] usb 1-1.4: New USB device found, idVendor=050d, idProduct=2103

Feb 20 19:03:16 Carl kernel: [257053.641762] usb 1-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3

Feb 20 19:03:16 Carl kernel: [257053.641771] usb 1-1.4: Product: Belkin Wireless Adapter

Feb 20 19:03:16 Carl kernel: [257053.641779] usb 1-1.4: Manufacturer: Realtek

 

I am not certain, but I believe I may have seen the onboard WiFi stop responding in the past sometimes also.  I tend to use the Dongle WiFi most of the time.

Highlighted
Regular Contributor

Re: Raspberry Pi WiFi Dongle IP reports "Host Down", But Not So - How to debug?

I trust you have not tried removing the device driver (rmmod <driver_name>) and adding it (insmod <driver_name>) in the kernel correct?    Removing/inserting the device driver *should* initialize the device as if you had rebooted the system.    You can find the device driver using lsmod.     You may have to down the interface first and up it after module insertion.   Check the output from /proc/net/wireless again and see if the output looks more 'normal'.

 

I know under RHEL network devices use to be enumerated fastest device first then all other devices.   I suspect this is true for debian as well though I do not know it to be true.   You can control the enumeration of devices using udev rules.     You will create a file in /etc/udev/rules.d directory that maps a specific ethernet MAC to a device name.    Check google for specifics.   I know RHEL changed the syntax of the file between RHEL 5 and RHEL 6?   Maybe it was RHEL 6 and 7, so who knows how debian is going to handle it.

 

I'm curious, why two interfaces on the same network?    Generally that causes more problems than it solves and I'm starting to wonder about two radios in very close proximity to each other using the same channel for communication with the AP.     I am assuming that you are using the same AP to reach both radios, maybe you are using different channels and different APs....   Are you managing your ARP tables or is it just the wild wild west?    What about routing tables?  

Highlighted
Frequent Visitor

Re: Raspberry Pi WiFi Dongle IP reports "Host Down", But Not So - How to debug?


@cxrider wrote:

I trust you have not tried removing the device driver (rmmod <driver_name>) and adding it (insmod <driver_name>) in the kernel correct?    

No, I have not played with removing and re-adding wifi drivers. This issue hits about once a month or so and because the bot has a backup WiFi connection, I have just been safely rebooting the robot.

 

 You can control the enumeration of devices using udev rules.    

Thanks. I'll read up on that. Most of the time it does not matter which device is 0 or 1. It sometimes matters when I am doing system updates (which alway use wlan0), and are more reliable on the USB dongle than the onboard interface.

 

I'm curious, why two interfaces on the same network?   

'

This "host unreachable" problem began several years ago, and several versions of the Raspberry Pi ago, with the on-board WiFi. I found having a second interface (dongle WiFi) as an easy work-around. Years later and still no root problem diagnosis. There is one other WiFi mystery - often, (hourly or sometimes more), the stdout printing in the ssh shell over the WiFi will pause for one to several seconds, but this also occurred when the bot only had the single (onboard) WiFi interface.

 

Are you managing your ARP tables or is it just the wild wild west?    

 

What about routing tables?  


 

I must admit I don't know what an ARP or routing table is. I have dedicated IP addresses in the Xfinity router, and I use DHCP on the bot, and WPA-PSK on the interface. The onboard WiFi always gets a dedicated IP based on its MAC and the WiFi dongle always gets a dedicated IP based on its MAC.

 

The router is showing eight devices with assigned IP connected on the 2.4gHz WiFi.

 

Every house on the block has an Xfinity router, and each is serving a 2.4gHz home network, a 2.4gHz settop box network, and a 2.4gHz "xfinitywifi" public, (in addition to 5ghz triples).  The scan shows 8 cells of varying signal strengths.

Highlighted
Regular Contributor

Re: Raspberry Pi WiFi Dongle IP reports "Host Down", But Not So - How to debug?

What debian release are you running?    I'm running stretch, I have a half dozen Pi Zero Ws, a Pi 2B+, a Pi 3B+, and a few Orange Pi systems (Chinese version of a Raspberry PI... kinda cool honestly) and I never have issues with my Pis having that problem though the 2B, 3B, and one Orange Pi are actually wired and not wifi attached.

 

ARP - Address Resolution Protocol.  A ethernet packet is either switched ( source and destination are on the same LAN) or routed (source and destination are on a different LANs).     Switching is based on MAC addresses and routing is based on IP addresses.    The ARP table maps MAC address, IPv4 address, and NIC on which the MAC/IPv4 address has been observed.    It is only for systems on the same LAN so you should never see anything outside your network in an ARP table.    When sending a packet out the kernel asks these questions:

  - Is the destination IP address on the same LAN?

  - If yes, do I know the MAC address of that destination IP address?

 If yes to the second question the kernel fills the ethernet frame fields accordingly and pushes the packet out the NIC which last reported seeing that MAC address.   If the answer is no, there is a mechanism to find that MAC, populate the ARP table, and continue on.    This mechanism is called an ARP request and it basically asks everyone on the LAN to please reply with the MAC address for the system that has the IP address in question.  It is a broadcast packet so all systems see it.

 

You may see what is in your arp table with the command 'arp -a'.    Entries have a time to live after which they are flushed from the table.   Check out the man page on arp as there are a lot of things you can do to manage your tables.

 

Here's where you get a mess.

 

Linux by default allows you to broadcast requests out all configured interfaces, even if it is sent on the *wrong* VLAN.   And by default, linux will respond to an ARP request on any configured interface.   What you can end up with is the MAC address /IP address for a remote system associated with a NIC that physically can not reach the remote system.    This happens if the systems are wired to separate VLANs.   In your case you have everything on the same VLAN but that can still cause problems.  Here's a link to a brief description of the problem and how to fix it.   It's a fairly trivial fix and you will want to make it persistent across reboots of the node.

 

Routing.   Routing is how a packet gets to a remote system not on the same VLAN as the source.   Routers move packets between VLANs.    If you have a single NIC on a system then things are pretty simple:  all packets go out the single NIC but if you have more than one NIC, and let's talk about the case where the NICs are on different networks, then there a decisions to be made.    Assume you have a packet for a machine on network C and you have one NIC on network A and one on network B where all are different networks.   Which NIC do you use?    The answer is provided by the routing tables.

 

You can see your routing table with 'ip route show'.   The ip command has lots of other capabilities so be sure to read the man page for it.   

 

What problem are you trying to solve by using two wifi adapters on the same VLAN?  The outage where an adapter does not respond?   What you are doing *should* work, espeically if you tune the kernel to behave properly in a dual homed network, but honestly, I would be surprised to learn that it has been well tested by a large number of users.    If all you want to do is ensure that you can reach the remote system via the wifi I would look into bonding the wifi adapters in a high availability mode.    You will not be able to build an aggregate (mode=4 I think), I'd stick to active-backup (mode=1).