nphyre, we have a team whose specific focus is panel speed. In order to move the discussion forward lets talk specifically about individual pages on the panel. So for instance what is the page load time for you on home.overview? what about domain.manage? Is there a particular page you use a lot that you are seeing slowness on?
There is no downward spiral. It is true that we had network speed issues in August that had a cascading effect on many services, but that issue has been resolved. As per the other thread on these boards, a couple of your server mates had their resource usage shoot up and it was slowing down your machine. We have moved all the top users off of your server to new servers now though. You have to be really careful when it comes to system administration to not couple problems when they are unrelated.
The main failing that I see here is that the august troubles acted as a time suck and clouded the playing field as far as seeing what servers had increased resource usage from the users on it. Over the past month since the network problem was resolved there has been enormous effort put into rebalancing all the servers. This is obviously a continuous effort, but the catching up portion has been done and we are actively training the new support personnel to better recognize servers that have growing user needs. As an example we are in the process of upgrading about 20 xeon servers to 4GB of memory per server as the user needs for memory on those servers have grown.
I can see the lure of this argument, but there are a few problems with it. First, it is better to view a company of 50 people as a collection of different types of specialized computers rather than
50 identical general purpose machines. Second, companies and the teams within them have momentum and trajectories. Third, the web hosting industry itself has a momentum and a trajectory.
There is an extraordinary amount of planning that goes on to weave all these considerations together coherently. It is inherently an artful and creative process and when a mistake is made it creates the problems you have experienced. You have a right to be upset about those, for instance the mail password re-authentication problem, the network slowness fiasco of august, and the overloading of your web server.
The network problem of august affected all services and so while all work did not stop on things such as panel speed during that time, we did stop the introduction of new features and we did delay the sale for a month after its originally scheduled date to be sure that the problem had been resolved.
I have addressed the individual case of the overloading of your web server in another thread, but I would point out that it was solved within a larger overall framework or server resource monitoring that I have been implementing over the last month. It is because of the panel optimizations/monitoring and improved web server resource monitoring that I have been implementing with the respective teams, that I can safely say based on hard data that service in those areas has improved over the last month and is continuing on an upward path.
The mail authentication problem you have experienced is one that I am not personally working on, but one that I have been following closely. The work being done on that is on the load balancer configuration side of things. It appears to be dropping the connection to the user database on occasion causing the mail machine to re-ask for the users password. Intermittent problems do tend to get the least attention, but I have been pushing to make sure this problem has the proper resources assigned to it. It did take too long for that to happen, but the problem does seem to be nearing a fix, which is good. Something to note here though is that a mail server load balancer issue really is detached logically from bandwidth and disk usage allotments.
I agree that improvements are needed and we are working furiously towards those. Still, putting everything on hold isn’t the best solution in this case.