soshdown
from our host
==================
At this time, we are still waiting on the last 30 or so servers to be manually rebooted- we originally had a list of approximately 70 servers (about 2.30 hours ago) and while some servers only needed a manual reboot and will be back online a few minutes after the reboot is issued, other servers may take longer if after the reboot was issued, they require a complete file system check. For this reason, any eta that is given is an estimate based on overall progress of all restoration.
Regarding the process of repair after an outage such as this: It was found that a large number of the servers at the FITX datacenter were unreachable this morning (at 7.30am EST- we are still trying to get the exact reasons for this). Network and hardware technicians at the datacenter on-hand were dispatched to ascertain what the issue was and correct as soon as they were able to minimize the amount of time servers are offline. Roughly 2 - 2.30 hours after the initial outage began, most of the servers that were affected came back online. During this time, additional technicians were called in to assist with the restoration of those that were still offline.
As soon as the network came back online, we compiled a listing of the servers that had yet to be restored and provided it to the hardware/network technicians to begin checking each one and manually reboot as needed. While this process will ensure that each server still offline will be brought back, it is a slower phase of server/network restoration since it requires that the techs go to each rack location, reboot as needed and verify that the server comes back online before going to the next server.
Our technical support staff are doing their best to follow up with the hardware technicians about progress and are also watching our monitoring software for a rough idea of how quickly the remaining servers are coming back online so they can respond to tickets and chats accordingly. As such, the etas they are providing are generalized and may fluctuate depending on how long each server needs- for those that do not come back right away, fscks may need to be performed and this can slow down progress. That said, we are doing our best to restore access to the remaining servers as quickly as we are able and the number of servers still offline has dropped by 5 or more since I began writing this.
We will provide updates as soon as we have them regarding the cause of the initial outage.
nip

1 Comments:
Is sosh down because Gammo said Hansen, Delcarmen or Cox could be the closer by year end?
Post a Comment
<< Home