I recently replaced my older Arris cable modem with a new TG1682g gateway at Xfinity's urging. I had an Asus RT56N providing routing and wireless access, and neither the Asus or the cable modem were showing problems. In fact, as far as I could tell, the network was fast and stable.
After I got the (first) TG1682G set up, a speedtest run showed it delivers plenty of bandwidth when it's working. We also have Xfinity internet phone service (2 lines) and that worked as before (when it's working).
The problem is that the gateway wanted to reset itself after 12 hours of running. Each reset takes between 6 and 10 minutes, during which there is no internet access or phone service, until the gateway comes back online, and then it's fine (and fast) until the next time it resets (which turns out be almost exactly 12 hours after it comes back online).
I called for support and got a seemingly knowledgeable agent (who happened to work out of an Xfinity support site close to my home). He "reprovisioned" the settings since he saw some things that looked unusual, and suggested that if the problems recurred I should swap the gateway hardware.
So, this past Tuesday I went to my nearby Xfinity service center/store in Nashua NH and picked up a replacement TG1682G. Yesterday morning I did the swap and got the new unit authorized and supporting my home network. There was some difficulty, at first it didn 't want to do 2.4G wireless and I had to do a "factory reset" (hold down the "reset" button on the back for at least 15 seconds) and then set everything up a second time, but the second time seemed to be a charm, and the network was up along with the phone service at about 10:30 am on Thursday 5/17. Good throughput, lights flashing for the wireless, phone working, all seemed good.
Until about 10:30 Thursday evening (12 hours after the gateway came online) when it reset itself. Nothing useful in the device logs, other than the timestamped messages showing it had reacquired its IP addresses for the internet and VOIP interfaces. The timestamps appears to be in UTC rather than local time, but the 12 hour delta is clear.
Then smooth sailing until a little after 10:30 am today (Friday, 5/18) when the gateway reset itself again.
So another call with an Xfinity technician, who again reprovisioned the (second) gateway and forced a reset, assuring me that after this things would be fine.
Why am I not confident this will resolve the problems?
I've looked at the operational parameters and nothing looks out of the ordinary. The regularity of the resets (every 12 hours like clockwork) and the fact that two different gateways have exhibited the symptoms make me suspect something is occuring on the backbone network that's causing this, but I have no idea what.
I've read and attempted to follow the helpful troubleshooting suggestions that are posted here on the site, but needless to say, "random" (not really random) service interruptions (after years of no problems until I followed the urging to "upgrade" my equipment) are quite frustrating.
If anyone else has seen something like this (and better yet, knows what's causing it) please share your insights.
I intend to update this topic with progress reports, assuming I continue to see these outages.
It did it again last night. I have multiple batch jobs that run to use the Gnu WGET utility to "shadow" some web sites, and when wget can't find the DNS address for a site it puts a message into its log file. My jobs look for the message and if they find it, they delay (with backoff logic to increase the delay if the problem recurs) and then retry the action. So I can gather the logged DNS failures and summarize them. For the period including yesterday afternoon around 1:30 pm local time when the technician "reprovisioned" things and forced a reset, through overnight, here's the summary:
Fri 05/18/2018 13:28 wget: unable to resolve host address
Fri 05/18/2018 13:29 wget: unable to resolve host address
Fri 05/18/2018 13:33 wget: unable to resolve host address
Fri 05/18/2018 13:34 wget: unable to resolve host address
Sat 05/19/2018 01:30 wget: unable to resolve host address
Sat 05/19/2018 01:31 wget: unable to resolve host address
Sat 05/19/2018 01:32 wget: unable to resolve host address
Sat 05/19/2018 01:35 wget: unable to resolve host address
Friday 13:28 for about 6 minutes, no DNS leading to wget problems.
Overnight Saturday at 1:30 (12 hours plus 2 minutes) and DNS goes away again.
I've attached a zip file with the status page for the Xfinity network connection and the system and event logs. Nothing looks out of the ordinary.
I'd like to say these are "random" resets (as some other people have reported) but there's nothing really random about "every 12 hours".
Well, I thought I was attaching the zip with the information, but that didn't work at all as expected. So I put the zip up on my Google Drive and here's a link to retrieve a copy:
The retrieved file will have the name "2018-05-19 Gateway_Connection_XFINITY_Network.zip"
For whatever it's worth, it restarted again this afternoon, right about 1:30 (a little after). That is, right on the "every 12 hours" schedule. Still a mystery.
And again last night, right around 1:48am.
I'm wondering if this is somehow related to MOCA. I don't have an Xfinity X1 box so I don't need to have MOCA turned off to keep it happy. I do have a TIVO BOLT and a pair of TIVO Minis that connect to the Internet via the MOCA bridge in the TIVO BOLT. This was working flawlessly when I had the Arris telephony modem that got replaced by the new TG1682G gateway. The modem didn't care about MOCA, my Asus router didn't care about MOCA, the TIVO did its thing just fine, all was good. Now I have a TG1682G that is resetting twice a day (every 12 hours). It doesn't have MOCA enabled, but it can certainly see the MOCA traffic on the cable. There's a MOCA signal blocker in the cable between the Xfinity provided signal booster and the coax that comes in from the street, so the only MOCA on my network should be the local traffic between the TIVO units.
If the gateway is reporting that there's MOCA traffic on the local network to the Xfinity network monitoring servers, I can imagine that if I'm not the "MOCA whitelist" they might be resetting my gateway periodically to try to turn off the MOCA bridge functionality (which as I understand it is off on reboot) to keep my non-existent X1 happy.
So I'm going to call and ask to be put on the "MOCA whitelist" (assuming it still exists) on the assumption that it means "don't try to manage MOCA on this gateway" and not just "don't try to turn off MOCA on this gateway" since I don't have MOCA enabled on the TG1682G, just on the TIVO BOLT unit.
I spoke to a helpful tech who assured me that the intermittent restarts are almost certainly not MOCA related, and that there was no need to be on the "MOCA whitelist" (if it even still exists -- he did sound like he knew of it). He agreed that since we'd replaced the original gateway unit (installed on May 2nd if I recall correctly) with another one and see the same symptoms, it's likely not the TG1682G.
He transferred me to an agent who deals with "new installation" issues, even though this is only a new installation in the sense of replacing an older Arris telephony modem and my Asus RT-N65U router (correctly named now), and she checked out things like the signal levels and said that some of the stuff she could see looked like a bad signal, which could of course in time cause the TG1682G to reset itself in hopes of getting a more reliable connection.
So now I have a technician visit scheduled for tomorrow mid-day. That leaves two opportunities for network resets, this afternoon around 2pm (if things stay "on schedule") and overnight tonight, likely a little after 2am. Then if things go well, we'll find out what's wrong with the connection (probably between the COMMSCOPE amplifier and the street, but maybe even farther upstream in the network, who knows?) and get it fixed so we've got reliable quality signalling. Then maybe the Internet will stay up (and the occasional "noise" in DVR recordings will go away too, that would be something that's a symptom of a noisy cable connection, but why every twelve hours?).
As expected, my TG1682G continues to reset itself (interrupting internet connections and phone service) every 12 hours like clockwork.
The scheduled visit with the Xfinity tech (Tim, nice guy) happened yesterday as planned. He showed up within the planned window, which worked really well because he was here when I expected the gateway to reset itself. We were at the gateway, he had his signal monitor connected to the diagnostic port on the (Xfinity provided) signal booster, and the gateway did its act right on schedule. While Tim was monitoring the signal, he saw nothing out of the ordinary.
As I explaned to him, as far as I know, the coax between the bulkhead where it enters the house (it doesn't look like it's in a conduit, I suspect it might not be, but I don't know for sure) and the road has been in place since the house was built (that is, about 33 years; we've lived here for 31 years come Memorial Day). Maybe two years ago the pole that is the transition point between the "aerial" cable from the road and the buried cable that leads to the house was replaced because the local telco Fairpoint wanted a taller pole, and the work to put in the new pole and move the power lines was done by the local power company (Eversource) and then Comcast/Xfinity and Fairpoint both moved their lines as well. But as far as I know, none of the lines were updated/replaced, the existing lines probably just got moved.
I think I wrote earlier that the gateway "resets" as if it had been power cycled or someone had pressed the "reset" button. That's not actually what happens. Power cycling or pressing the "reset" button turns off ALL the lights, and then things come back bit by bit (documented elsewhere) until the gateway is back on line. What actually happens when my gateway "resets" itself is different. First, it turns off the (two) phone lines -- easily confirmed by picking up the phone, no dial tone, instead a high-pitched whine. Next, it turns off the internet connection, evidenced by the "Online" light going out, and by attached computers showing a loss of internet connection (my primary system is directly attached to the first port on the gateway and is in ready eyeshot), but not of cable attachment. Then the "US/DS" light turns off momentarily, and then starts with a slow flash but quickly transitions to a fast flash (which as I understand it is the modem trying to establish connection to the cable network). After a bit, the US/DS goes back to solid, then after a while longer, the Online comes back on solid, then after a while longer, the Tel1 and Tel2 telephony lights start flashing (first Tel1, then later Tel2, then after a while both are on solid and the phones work again). Throughout this "reset" process, the 2.4GHz and 5Ghz wireless activity lights continue flashing, that is, it doesn't look like the wireless radios are being turned off and then on again. I haven't (yet) brought a wifi capable device (like my cell phone) to see if the wifi actually went away and then came back. But even if devices see a "network" they could not actually use it to communicate with the internet, because that's out of service. I'm not certain, but I think that access to the internet is off until after the telephony connections are restored. In any case, at some point (usually) the computers realize the internet is back online and things start working again.
Note that the "reset" (or soft restart) seems to take about 6 or 8 minutes. The next "reset" seems to happen 12 hours after the gateway comes back online. So it's not every 12 hours, it's every 12 hours plus a bit -- kind of like the tides.
I had the gateway powered from a UPS port on my APC BackUPS 1500 that also powers my primary computer system. The Xfinity folks were concerned that the resets might be caused by some kind of power glitch related to the UPS (seemed very unlikely to me based on years of experience with APC products), and asked me to reconnect it so it was powered directly from a wall outlet. So last night at about 6pm I unplugged the gateway from the UPS and plugged it into an extension cord connected to a wall outlet on a completely different circuit (to the extent that circuits within the house are isolated from one another, after all they all connect to a common feed from the "grid").
Gateway ran just fine overnight until about 6:10 this morning (that is, 12 hours after it came back online after the power cycling), when it reset itself again. So later I will connect it back to the UPS (and reset it's internal timeout).
Note that the reset cycle depends on when the gateway was last reset, either deliberately (e.g., power cycled to change the power feed) or by its own volition.
The software engineer in me (yeah, I did that for a LONG time before I retired) says that this is an undocumented firmware "feature" of some kind.
I suspect there is some condition, probably out on the cable network somewhere, that is getting monitored in the gateway, and if it occurs, it gets counted. Periodically (which would seem to be once every 12 hours) the firmware checks to see if the "problem" has been happening, and if so, it "resets" the gateway as observed in an attempt to make the problem go away.
It's kind of like when you're using any of a variety of backup software on Microsoft Windows that depends VSS to work correctly. After a while, VSS stops working and the only way to get the backups to run correctly again is to restart Windows. You look at the backup logs, errors related to VSS are preventing correct execution, you realize there is a problem so you restart the system to correct it. If it's good enough for Microsoft, why isn't it good enough for your internet gateway?
I call it a bug.
Well, my new TG1682G gateway has been installed now for over 20 days and it's still doing its "every 12 hours" resets and so far Xfinity still has not been able to come up with an explanation.
Yesterday we did a test of taking my Tivo BOLT off-line (by pulling its power plug) shortly before an anticipated gateway reset. This isolated the Tivo BOLT from the cable and from the ethernet as well as blocking access by the two Tivo Mini units which talk to the Internet through MOCA through the Tivo BOLT. The TG1682G reset itself right on time as expected. So we've ruled out the reset being something triggered by some activity of the Tivo BOLT or Tivo Minis, unless it's something that the TG1682G is noticing and tracking internally and then using on its 12 hour schedule to trigger the reset.
So I sort of don't think it's related to the Tivo setup in any way.
We haven't tried taking the Asus RT-N65U completely out of the picture, but it's running in "access point" mode so it's only doing WiFi connection support, not any kind of routing or DNS or DHCP stuff.
Very, very strange.
I had a second visit on Saturday from Tim who had visited earlier (and seen the gateway go through a "reset" cycle, the one that differs from the hard reset you get by pressing the "reset" button), and he was joined by his supervisor, John. He and John looked over my connections and setup and didn't see anything out of the ordinary. While they were here they did some escalation of the problem but the backup support didn't find anything out of the ordinary in the backbone network or the data they can access from the XB3 gateway. John suggested we could consider replacing the XB3 with an XB6 (which has different firmware but otherwise would get the job done just fine). We came up with a plan to isolate the gateway from both the phones (which we doubt are a problem) and the wired Ethernet, and then see if the gateway reset itself after 12 hours. So last night at around 10pm I reset the gateway (with the reset button) and once it was back online, I disconnected the two phone lines and the three Ethernet cables from the back, leaving the wi-fi turned on. (All the "heavy" network accessors are connected via hardwired Ethernet, the wi-fi traffic happens but it's not really significant most of the time, and most of the "heavy" bandwidth access systems don't have wi-fi interfaces to fall back on.) I also powered off the Tivo BOLT master that does MOCA routing, and powered off the Asus RT-N65U wifi access point. Then I just let the gateway sit and run overnight. Interestingly, it didn't reset itself at around 10 this morning. So whatever is making it misbehave is either related to the physical characteristics of my Ethernet (or phone) connections, or to the access patterns that the "wired" systems are presenting. So I went ahead and connected the phones and wired Ethernet back up (I did change one part of the Ethernet topology by connecting the "den" switch to the switch that's next to the gateway, this is the way it would need to be connected to work with the XB6 which has two gigabit Ethernet ports, not the 4 on the XB3, of which I was using 3, hooking the "den" switch to the switch by the gateway reduces the cable demand to 2 connections).
So now we monitor things and see whether the XB3 "resets" itself around 10 pm this evening (or even before then), since it's got all the systems and phones connected back up and the systems are presenting their access loads again. Time will tell.
To my XB3 gateway's credit, it ran just fine with no systems connected to the wired Ethernet and the phones disconnected. But that's not surprising and it's not functional.
To be fully functional, it has to run just fine with all the systems connected that were working just fine with my older configuration (Asus router + Arris cable modem) that was working very reliably (if perhaps not as fast as the speeds the XB3 promises) before we "upgraded" to the TG1682G aka XB3 at the beginning of the month. Yesterday, after the XB3 had run overnight with no hiccups (but with nothing connected, either), I had connected up the wired Ethernet systems and switches, powered up the Tivo BOLT, and also the Asus RT-N65U wireless access point (which was formerly configured as my network router and reliably supported both all the wireless devices and wired devices with the same workload I've been presenting now). On my primary Windows desktop system (the one that's connected directly to the first Ethernet port on the XB3), I just updated to the latest Windows 10 software and the patch kit, and Windows "ate" one of my disks, so I spent much of the morning in software updating and then trying to figure out what Windows had done to my disk. (I doubt it's a coincidence that the disk got NTFS MFT corruption just as the update to Version 1803 was happening, but maybe it's just bad luck that it occurred while the update was in progress.) Around 4 PM I started up the software I normally have running on this system (as noted before, the same suite of procedures that was running with no issues with the older router + modem setup), less the parts that had a dependency on the disk that's inaccessible. As it happens, I use a simple Windows script to start up the background programs that then stay running until I tell them to shut down, and that script produces a time-stamped log showing what and when it started. It shows my first background job got started at just about 4:10 PM and the last one (there are many and they are not all started at once because they need a bit of time to initialize themselves) at a little after 5 PM. No obvious hiccups, everything was running fine, other things were working as expected (Outlook mail, iTunes music, Chrome web browser, and so on).
As noted earlier, I had changed the topology of the wired Ethernet, and hooked all the wired systems back up a little after 10 AM, so I was hoping that perhaps things were going to be copacetic when the network didn't go away (that is, the TG1682G / XB3 didn't "reset" itself) when I went to bed a little after 10 PM -- the 12 hours cycle.
But, no joy, when I checked the various systems this morning, the ones that report problems with SmartVault connectivity had all lost their connections shortly after 4 AM and then re-established things about 10 minutes later, and my background processes that tried to establish fresh web connections reported DNS failures. (The processes that had connections with I/O in progress just get hung, usually, unless they manage to time out, which they usually don't seem to do.) So although I wasn't sitting up babysitting the XB3 and so did not see it cycle the lights in the odd pattern while it reset itself I'm quite sure that's what it did. And it did it pretty much 12 hours after I started up my workload that has worked with no issues with my older configuration.
I can only conclude that the XB3 has some serious problem when you actually load it up with lots of network transactions. It seems to get into a state (after a while) where it decides to just (figuratively) toss up its hands and quite servicing the network requests (by shutting down and then restarting).
Needless to say, this is unacceptable.
I looked for compatible retail products and the only one that includes DOCSIS 3.0 and telephony seems to be the ARRIS SVG2482AC and it looks suspiciously like the XB3 both in terms of specifications and packaging -- it's white and has Arris labelling and differently colored back panel, but it sure looks like a clone of the TG1862G. At the $219 price on Amazon the payback of owning versus leasing is under 24 months, so it might be a reasonable alternative to the XB3, but if it has the same mis-function under load, then it is not a good alternative. I don't want to buy one, have it fail in the same ways the XB3 is failing, and have to return it for a refund.
I am going to encourage Xfinity (the Tim and John team) to provide an XB6 and see if it's up to the load my systems are providing. It's actually rated for full Gigabit throughput, more than the XB3. My account is authorized for 250MB download speed, and that's what SpeedTest usually reports, sometimes even a bit more, and the XB3 is rated to be able to exceed that, so I believe I'm getting the throughput appropriate to the Blast! tier I'm on, but clearly the XB3 is not up to the task of supporting the workload that's being sent to it by my systems, whereas my older router + modem configuration was perfectly happy and stable with an even greater workload.
Last Friday (6/1) afternoon an Xfinity tech named Blake (who works with John and Tim on my local team) came out with a nice new XB6. This unit happens to be one of the ones made by Technicolor (they are also made by Arris):
I logged into the XB6's console interface (http://10.0.0.1/ same as on the XB3, but MUCH faster to respond), and checked both the system log and event log (as well as the firewall log just for completeness). No sign of anything out of the ordinary. Unlike the XB3 where you could see in the event log that the gateway was going out and getting DHCP assignment for its interfaces (which always came back with the same addresses that had been assigned before), the XB6 didn't log anything at around 3 AM yesterday.
So I am lead to believe that the XB6 was running just fine and performing its gateway duties reliably, but the Xfinity DNS server that's been assigned to my system (and the others in our home) had a hiccup and stopped providing name translations for about 5 minutes at 3 AM or thereabouts. I wonder if there is any customer accessible record of DNS service outages based on your service address. Probably not.
It's almost enough to make you start using some other DNS server than the one Xfinity is providing, like say the Google DNS server which has an excellent reputation for both speed and reliability. If I keep experiencing DNS outages, I just might do that. It would be a nuisance since as far as I can tell there is no way to tell the DHCP server in the gateway to configure clients with any DNS service other than the one proferred by the Xfinity DHCP server it contacts. But I know there is a way to configure the Windows DNS service to use a specific DNS server to resolve things it doesn't already have cached. Just probably not easy to do it. Oh well.
Well, the XB6 is still humming along just fine, no resets, nothing new in its event log. And other than a few site-specific DNS glitches, Xfinity's DNS has benn pretty stable since the major disruption around 3 AM last Monday. Knocking wood and keeping fingers crossed, but it's looking like the XB6 has solved my problems. More stable and faster than the XB3.
Another update: The XB6 (made by Technicolor) continues to chug along. Not a single interruption in service since we got it connected and configured back earlyi in June.
I can not say the same for the Xfinity DNS service, which simply goes away from time to time for varying periods, but rarely more than a few minutes, and usually in the dark of night. I've reworked most of my "spider" scripts to deal with this, detecting the problem and retrying until success. Of course, sometimes you can't find the address for a URL because the site has really gone away, but in my experience with the sites I'm accessing, the more common problem is that DNS has temporarily chosen to stop working for some subset of the sites I follow. So it goes.
In addition to being rock-solid reliable (so far), the XB6 is also fast. In particular, using the 10.0.0.1 management interface is not painfully sluggish as it often was with the XB3 (again, in my experience so far).
So, although we didn't figure out why the XB3 was restarting roughly every 12 hours pretty much like clockwork (load -related problems in the firmware?), the XB6 seems to work exactly as a modem/router aka gateway should.