I'm into week two of trying to help a residential customer with a problem. I'll try to keep the hyperbole to a minimum.
The basic situation:
Whenever the Arris modem is restarted (for any reason), the modem hands my router a 192.168.100.20 address during Arris' bootup. (192.168.100.1 is the web UI of the Arris.) If I force my router to renew its WAN (Internet) IP, then the router gets a proper Public IP address. However, the router retains this private IP indefinitely until I force the router to renew.
A little more technical detail:
I work in enterprise support and have been in extensive contact with the router vendor. (It's a Fortigate device, for what it's worth.) We've sniffed the packets between the Arris and the router. They show, for certain, that, after the router broadcasts a DHCP Discover, the Arris responds with a 192.168.100.20 DHCP Offer. The router Responds to the offer and Acknowledges it, applying the IP, subnet, etc. onto the router's WAN interface.
I am still working with the router vendor because, well, this could go either way. I found that if I remove the Ethernet cable between the Arris and the router, the router asks for an IP again and gets a public IP. I've seen similar behavior on my own modem at home with a DDWRT, but I don't think this is an apples-to-apples comparison.
Still the packets ain't lyin! :-) MAC addresses are matched up the intial offer from the Arris is (in hexa)
1464A8C0 which translates to 192.168.100.20 (in reverse notation). Happy to post more if it will help.
I'm coming here because I haven't found the magic words to help Tier 1 understand that this is not something that will be fixed with a cable modem or router reboot, that it can happen at any time and knocks me offline indefinitely.
Haven't played with a Sniffer in a long time, but, empiracally, if you bypass the Router and connect a computer directly to the Modem and the Computer pulls a public IP reliably, replace the Router. If you get the same story , replace the Modem.
Of course, if you are really hardcore, you could bring a router from home and put it in place and compare the sniffer results which would again point either to the Router or the modem.
oops, Eg's right. your modem will hand out a Private IP if it loses sync with Comcast in order to keep your Local network up and running. So the Private IP's indicate you are losing connection. and can be a baad Modem or bad cabling or high impedance in the connection path.
How many splitters do you have between the walljack and the back of the modem?
Can you bypass the splitters and connect the modem directly to the wall jack just for test?
Actually we need some more information in order to be able to see how to assist you
If you would, Please CLICK HERE which will browse into your modem. From there you can navigare to the page that shows your Signal levels. If you would copy that information and paste it back here in your thread we will try to help you. Note: We are interested in your downstream Power level and SNR as well as your upstream power.
Two additional links with step by step processes for troubleshooting an ddata gathering
Troubleshooting Suggestions for Connection-Related Issues
For now, here's the specific downstream data you requested:
DCID Freq Power SNR Modulation Octets Correcteds Uncorrectables
Downstream 1 210 609.00 MHz -3.46 dBmV 35.60 dB 256QAM 52850039 284 0
Downstream 2 208 597.00 MHz -2.90 dBmV 35.42 dB 256QAM 20894051 327 0
Downstream 3 209 603.00 MHz -2.71 dBmV 35.78 dB 256QAM 20670508 269 0
UCID Freq Power Channel Type Symbol Rate Modulation
Upstream 1 23.70 MHz 45.75 dBmV DOCSIS2.0 (ATDMA) 5120 kSym/s 16QAM
There are no splitters. This modem is a replacement. I replaced it last week. The old modem had the same behavior.
I'll have to read the other links a bit later tonight.
But here's what I feel I need reiterate. The _service,_ in general, is fine. The problem is that anytime the cable modem loses sync, my router gets that private IP on its WAN interface and we're disconnected from the Internet. Even though the modem may regain sync and have a real public IP, my router doesn't get it. I have to pull the ethernet cable between modem and router to get my router to renew the WAN IP.
If there is a loss of signal it's very intermittent and probably in-line with what anyone could expect. I've been testing by rebooting the modem because I can consistently make the issue happen. My goal is to allow the router to survive a rebooted cable modem or a loss of sync.
I may be showing too many of my cards here, but here is my current theory, all based on the packets I've captured. In short, I believe the modem is acting as designed and there is an issue in the router's firmware.
When the cable modem has lost sync on the Cable i/f side, the Arris hands out a 192.168.100.x address to any DHCP client requesting it. Indeed, the packet capture shows that my router immediately releases the Comcast Public IP from its WAN interface within seconds of the modem rebooting. Then the router starts DHCP Discover.
At this point, the Arris is mainly up but is not fully synced with Comcast. It sees the Discovers and Offers the 192.168.100.20 lease.
But the difference is that this lease is very, very short, perhaps 20 seconds. The router is supposed to honor that lease and, after expiration, ask for a new IP address. If everything has gone okay on the Arris side, then the router should go through DHCP DORA again. I don't think the router does _or_ the router is specifically requesting the same 192.168.100.20 again. I am unsure. Would it do that, then I'd get a good public IP.
But I haven't been able to speak to anyone who can confirm or deny my theory on the Comcast side. One Level 2 person said that this modem model doesn't have _any_ private IPs bound to _any_ interfaces and that there's no way I could get this from the modem. Again the captures don't lie! :-) Plus the image I attached above shows data from the modem and proves that 192.168.100.1 is bound to the LAN i/f in the Arris.
If I can go to the router vendor and say, "Look, Comcast told me that this private IP behavior is by design and I believe them based upon my captures and other informationg I've found about Arris modems. So, show me in the logs that you're asking for a new IP after the short lease expiration." If I can tell them that, then the router vendor has something to go on.
On the other hand, if the Arris is not supposed to be offering 192.168.100.x, then that's good information, too.
I am going to test this with my DDWRT tonight just to see what happens. Hopefully I'll be able to see what kind of lease time comes with that initial private IP offering and then see my DDWRT get a public IP after lease expiration. According to my packet caps, it's a 20 second lease.
Good question. No, there isn't. Fwiw, it's a well respected vendor and I feel pretty confident they'll fix this problem... if it's their problem to fix.
After my test last night at home, I am more convinced than ever that it's something the firewall vendor will need to fix.
I performed the same test at home that breaks Internet access for my client. I rebooted the cable modem leaving the router up to see if it would cope.
Regrettably, I do not have an Arris modem. I have an older Ambit DOCSIS 2. Still, the behavior was what I expected. When the modem was up but had no sync, my router received a dynamic private IP in the 192.168. 100.x range with tiny 40 second leases. With each expiration, the modem received either another private IP or, after the sync light glowed solid, a public IP.
To my mind, this private and public swap is normal and the firewall is not handling it properly.
I had a conversation with the vendor this morning. I've gotten all the proper assurances and they'll do a review of the packets and the firmware. The contact brought up the possibility of an "incompatibility" which has me a little unnerved since that would be an easy out. We're a partner and give them a lot of business. Hopefully that will mean something to them!
The vendor is still researching the issue but I have a bit more information I can offer here.
We ran some special hardware diagnostics on the unit and they all pass. So, memory, filesystem, etc., seems okay.
But they had me load a pretty old firmware version. It's not THAT old, but it's pre a major point release. This particular firmware version does _not_ seem to have the same problem with DHCP and WAN connectivity.
So, it continues to look like a problem in the vendor's firmware...
So the router gets a private IP and a short lease. When the lease runs out the router is supposed to kick out a request for a new IP. Nothing happens and you are stuck with the priviate IP.
If you disconnect and reconnect the ethernet cable that makes a large enough request that the delivery of the neew ip happens.
Usually a router will have a hyperlink that you can browse in an click on to do a Release and Renew command which will release the private IP and initiate a new request. If you can click it It might give you some additional information to provide to the Router folks
I can't say exactly what the router is looking for to move past Connecting. This is something I've brought up with them.
I have swapped out the router with a Linksys unit and a different router from the same vendor. The Linksys performs fine. No troubles. The other router performs okay, too, but that unit is rather old and can't run a lot of the latest firmware.
The apples-to-apples comparison I think you're wondering about -- swapping out with another of the same model -- I have not been able to do. They wouldn't send a second one out to me unless I can demonstrate some sort of fault with the current unit. They have a special firmware that runs a number of diagnostic tests and I have run that recently. The only warning to come out of that test was the failure to find the USB i/f, but this particular unit does not have a USB port.
If I had the time, energy, and smarts, I'd love create some sort of small DHCP utility in which I can set individual options and then blast these tiny leases at the unit to see what it does. At this point, however, it looks less and less likely that I'll be able to do that. The unit seems stable on this older firmware and it can survive the test I have been using to force the condition wtinessed on the newer firmware. There's a good chance my days of flashing and testing are over. :-( I'd have to buy my own unit to continue testing.
Yeah, these are things I could do. Admittedly I'm running into a couple of things that are preventing that from happening.
First, while the customer has given us a long leash with which to sort out the problem, it's still a leash. Since I've been able to stablize the situation by loading the older firmware, there is less impetus for me to go down to run tests. tl;dr she's not going to pay for all of this.
Second, the vendor is (thus far) not interested in sending a replacement unit unless I can prove defect. I think that, in general, I've proven material difference between the two firmware versions, but that may not be enough. I'll float the option and see...
The latest word from the vendor is that they've not been able to reproduce the problem in the lab other than they have found a problem with the newer firmware versions using up too much memory in these lower-end units. They've kind of looked at my "success" running the older firmware and say that I should just stay there.
I attempted to find my own DHCP Server with which to test my theories and one of the issues I ran into early was that no DHCP Server I could find would allow me to set leases lower than one minute. I asked the vendor how they were getting around that and they responded that they weren't able to. So, the bottom line is they've not really been able to mimic the same conditions in their lab, but they're less interested in pursuing it since the older firmware seems okay and they're now recommended the newer firmware is not proper for this unit.
Just about any Linux distribution should have a DHCP server that allows setting lease times in seconds, though their various live CDs don't generally include that server. Much to my surprise, though, my favorite rescue CD from www.sysresccd.org does include a DHCP server. Put SystemRescueCd on a CD or USB stick and most any laptop could serve as a DHCP server for your test. If you've never use Linux before, though, the learning curve for getting that all set up might look more like climbing a cliff.
Another Linux-based live CD that includes a DHCP server is BackTrack Linux from www.backtrack-linux.org. BackTrack is a set of tools handy for breaking into other peop, excuse me, digital forensics use and penetration testing of your own network. Learning curve issues would be about the same as with SystemRescueCd.
 BTW, the BackTrack CD also includes wireshark (SystemRescueCd does not), so you'd be able to monitor the packet traffic right there.
Wow! Thanks for the contribution. I had asked on ServerFault for some suggestions and someone came back and suggested the dhcp3 server. I got it installed and configured in Ubuntu. I like your idea better. A little less muss/fuss.
Either way, our testing days may be over. I don't have access to the unit unless something else goes wrong with it. I'd have to buy my own unit for testing. They're not cheap, so I don't think that's going to happen anytime soon.
The vendor has decided to send the issue to QA. I think they're still going to check it, but I am unsure. I am going to suggest to them these options.
It occurs to me that reproducing the series of events the router sees when connected to your modem isn't going to be easy.
Short lease reaches first renewal time. Router sends renewal DHCPREQUEST to 192.168.100.1 and gets no response (modem's DHCP server is no longer active) or receives DHCPNAK.
If no response in (1), router retries the DHCPREQUEST several times, until the current lease expires.
Router now starts over and broadcasts a DHCPDISCOVER.
Comcast server now responds with DHCPOFFER.
Emulating that "original server doesn't respond; new server responds to DHCPDISCOVER" would take a bit more work, but it might be instructive to see if the router does restart the protocol with DHCPDISCOVER. But, to do that it would be easier just to put an ethernet hub between the modem and router so that you could use wireshark to monitor the interaction.
2. Load up the firmware on the router that we know is failing.
3. Put a hub between the router and modem.
4. Force the condition and grab the traffic via Wireshark.
5. Put a Linksys router in place of my current one.
6. Attempt to force the condition and capture this data via Wireshark. (This would be to see the difference in how the Linksys behaves vs. the Fortigate.)
From there, I'd study the caps looking for differences and put the working firmware back on the Fortigate.
If I had my own unit, I'd probably play with your CD tools and/or tcprewrite to try and replay the sequence to the unit and see how it behaves. My first task would be to see what happens when the unit gets tiny leases. I think the problem is centered there.
(I was just presented with a login name prompt and I chose something other than blitzdes.)
I wanted to alert everyone to the outcome.
The vendor could not reproduce the problem in their lab. I think this is simply due to the fact that they don't have the equipment. They don't have an Arris Modem and couldn't do an apples-to-apples comparison.
They sent the issue over to their developers. One of them replied and showed a piece of the DHCP capture that the Arris was inappropriately sending a different "xid" in response the a DHCP Request. In other words, the Arris changes its "xid" during the R and A of the DHCP DORA process. Because the xid is different, the router ignores the response from the Arris.
Sure, it's possible this is an issue on Arris' part. However, I have protested to the vendor that if a consumer-level router such as a Linksys can successfully navigate this without incident, then an enterprise-level device should as well. This fundmental difference between the two routers, I think, should be a larger concern with the vendor than it appears to be. (My cynical opinion is that the vendor doesn't care enough because home environments are likely the slimmest part of their market.)
The vendor is happy to look over more packet captures if I can provide them. They'd like to see packet captures from the Linksys navigating the problem successfully. They'd also like another capture from the new device on the Arris and an Ambit DOCSIS 2 modem I have.
However, I have stabilized my client's issue with an older version of the firmware that doesn't seem to get trampled by this issue. The circuit has been up for many weeks and certainly has survived an outage and/or IP lease renewals.
In other words, the client's connection -- (and, more importantly, the IPSEC Tunnel to her office (the whole reason we bought the new device that started this fiasco)) -- everything is stable. So, there is little desire to upset the combination of things necessary to perform more tests for the vendor.
So, I think I have taken this as far as I can. I have toyed with the idea of purchasing my own unit to run these tests they want to run. However, the unit is not inexpensive ($350.00) plus I'd still have to upset my client's connection to some degree to run tests on an ARRIS modem that I do not have. In the end, this is not going to happen anytime soon.
When you say the response is from the modem, you really mean Comcast just coming through the modem, right? Your typical Comcast DHCP lease comes from the headend somewhere, not the modem. By contrast, some modems are designed to act as DHCP servers in the event of a down or missing connection. I have to wonder if the modem is somehow causing trouble here by sticking its nose in the middle of things where it doesn't belong.
The ARRIS is definitely handing out a private IP while the Cable light is blinking (lost sync). These private IPs are part of 22-second leases.
Are you saying that the Arris itself doesn't hand the public WAN IP to my router once the cable sync is complete? Is it perhaps, the Cisco device I see (identified by MAC) also in my packet captures?
That's correct. When the modem is connected to the headend, it passes DHCP through, and you should get a response from the Comcast DHCP server. I'd expect the MAC address to be the same as the MAC of your default router.
When the modem loses sync, it responds to DHCP itself, handing out a 192.168.100.x address. The purpose of this is to allow you to view the modem's status pages in your browser. It gives this a short lease time, so that you should get a valid, public IP shortly after the modem gets sync.
The Flyer for the Arris speaks about a web gui used to override Channel Scanning or some such. Perhaps this private address is used to allow you to connect to the Arris and access this "feature". Perhaps Comcast could disable it for you, or when you have that address you can connect to the Arris and disable it yourself?
I know that you can get a similar private address out of a Motorola Surfboard if there is no public IP on it shortly after rebooting. It's probably why Level 1 tells you to plug in the Cable modem first, and wait before plugging in your computer. Just guessing...
Comcast generally disallows the use of any user settings on all cable modems on their network. The web interfaces are mostly read only for status information and such like, although some control functions are available. For example, on my SB6120, I can reset/reboot the modem from the GUI without having to pull the power cord. But I have no access to any settings that might change the BEHAVIOR of the modem.
For some unknown reason, I had plugged an ethernet cable from my router into the cable box for the TV. That apparently caused the router to send out IP addresses to the LAN in the 192.168.100.x range, making it impossible for my devices to connect to the internet without assigning static IP addresses. When I unplugged the rogue ethernet cable from the cable box, the problem disappeared and the DHCP on my router began to function properly again with no interference from the modem.
The ethernet port on the cable boxes do NOT function for Internet use, although it does appear they are active for some purpose (at least in certain locales with certain boxes). I suspect that whatever cable TV box you have was acting as a DHCP server and your systems requesting DHCP leases were getting the leases from the cable box and not the router, hence the issue.
You should NEVER connect a device to your network unless you know what it does and that is NEEDS your network in order to function properly.