"We had to reboot your VPS"

software development

#1

I have made a script to manage the size of a VPS based on actual usage. It’s called PsManager (see also: Dreamhost forum thread). I get a lot of feedback from users.

One particular problem I and several other people run in to every now and down on their VPS, is that trying to downsize a VPS sometimes leads to a forced reboot. We then usually get an e-mail like this one (some get this several times per day, I only get it every few days/weeks):

[quote]You asked us to resized your VPS (ps9296) to 300MB, but it was using
311MB when we tried. Because you asked us to force the change anyway
we went ahead and rebooted ps9296 for you. You can see this change in
the panel here:
http://panel.dreamhost.com/index.cgi?tree=vserver.usage
To prevent this from happening in the future, make sure your server is
using less memory than you try to limit it to!

Enjoy your newly compacted VPS!

~The Happy DreamHost VPS-Resizing Bot![/quote]

Now of course, there is a flag with the resize API call (and tickbox on the panel) telling Dreamhost to “force” the downsize. This means we can technically avoid the reboot by not setting that flag. That however doesn’t deal with the issue we have with this.

Not forcing the downsize causes either the downsize to fail if you try to set the correct amount of memory for your needs, or forces you to reserve more memory than you really need.

Maybe it’s best explained with an example so let’s look at the memory usage of my VPS the minute before my automated script (PsManager) decided to downsize it from 400 MB to 300 MB (which led to the reboot mentioned in the e-mail I quoted above):

Total memory available: 400 MB
Total memory used: 47 MB
Total memory free: 352 MB
Total memory used for cache: 269 MB

Now clearly it was only using 47 MB and not 311 MB as claimed in the e-mail. My logs also showed that my VPS had 352 MB of free memory at that time. A downsize to 300 MB (the minimum) was perfectly reasonable.

The problem with the DH resize robot arises from the fact that it considers the cache as memory that you are supposedly “using”. Indeed when we add the 269 MB cache to 47 MB we get 316 MB, which is close to what the e-mail stated (and above the 300 MB we requested).

The cache however isn’t memory that you are using for your processes, so you don’t really need it to keep your VPS running. The operating system simply uses free memory to cache the disk, which makes disk access faster. This is a nice feature, after all it’s better to use the memory for something useful then to let it be idle.

However that is still no reason to force a reboot of the VPS as the cache could simply be cleared by the DH resize robot prior to downsizing. The instruction for that is one simple line of code that needs to be performed as root (admin user):

After clearing the cache, the resize could be performed without a reboot.

A second option, would be for DH to provide an API call to clear the cache, and I have filed such a suggestion (via the DH panel), here: API call to clear the cache. I’ve asked people who wanted to avoid the reboot to vote for it (and please do still vote for it if you haven’t yet). However the suggestion was submitted back in March, 9 months ago. Nothing has happened since.

The two solutions above are the cleanest options, I really wish Dreamhost would implement one or the other.

On our side there are other possibilities. The first is to run PsManager as an admin user and have it issue the cache clearing command. However I do not like the idea of running anything as an admin user, especially as for most people, my software is 3rd party software. That requires a certain amount of trust and it’s just strange to give a script access to everything on a VPS just for 1 bit of functionality. An API call would be highly preferable, as it would give access to just this one feature.

The second option, one that I implemented in PsManager from the first time I found out about this issue, is the option to not resize to a memory level below actual usage + cache. This was even the default setting up until the latest release (changed in 0.6.1). It avoids the reboots, but the downside is of course that you’ll be reserving more memory than you really need, which leads to a higher cost of your VPS.

The third option is a dirty one, by having the script temporarily reserve a significant amount of memory, the cache can be reduced because the OS will free up cache memory to make it available to the script. Then the program would release that memory prior to making the resize API call. I prefer not to have to use dirty tricks like this, but since I’ve been waiting so long for a cleaner solution, I may decide to implement this anyway.

p.s. The purpose of this post is twofold, the first as an explanation to anyone who encounters this problem. Which saves me from explaining this again and again by e-mail, instead I can now just point people to this thread. The second purpose is the hope that Dreamhost might decide to fix this in their resize robot or provide the API call. This is after all Dreamhost, and they do in my experience actually listen to their users.


#2

Personally, I’d love for this suggestion to get implemented. Right now, my PS is using a lot of cache memory (used memory is about 600MB and cached is 1400MB!!!)

It’s a joke that it can’t be resized via the DH API and my costs are probably double what they should be. If I was cynical, I might think that this was one of the reasons DreamHost haven’t done anything about this suggestion.

Just as an after thought, would there be any major harm in having the dop_caches command run separately as a cron job on a regular schedule (every few hours) to clear out the cache?


#3

You could do that as root (admin user). It could theoretically slow down your sites when the cache is cleared, though I am not sure you’d really notice that much. It’s also a question how long it takes for the cache to be filled to the maximum again, if that doesn’t go too quickly it could help.


#4

The capabilities to do a drop_caches in a VPS guest simply don’t exist yet in the Linux kernel. The way it’s currently implemented unconditionally drops all caches on the host system.


#5

Thanks andrewf! I guess it goes to show that one shouldn’t ever assume a VPS is fully like a real server, and it was my mistake to test things on my home server and then assume they’d work on a VPS too. DH has indeed blocked that particular instruction for admin users on a VPS.

However I am still left wondering if there isn’t another way, after all if a VPS needs to use more memory resources the system does clear up some of the memory that is used for the cache. There is clearly something enabling selective clearing of the cache, but maybe there is no easy way to use it from outside the kernel?

ps. BTW has Dreamhost now changed the resource usage graph to include cache? It certainly does look like it to me: https://panel.dreamhost.com/index.cgi?tree=vserver.usage&


#6

See this thread: http://discussion.dreamhost.com/thread-134266.html


#7

Thanks for the explanation and it makes sense technically, but that still leaves the problem where some people on VPS have no way of managing their cache memory.

In my own case, my server has flat-lined at 2GB for the past month when it’s only ‘using’ 600MB. That means I’m paying over double the amount of money for memory I’m not really using and have no real way of managing except via a VPS restart which I don’t want to do because I want to provide a reliable service to my website visitors. I think there should be some mechanism to allow us to manage this, if not drop_caches, then something else.

(On a side note, why aren’t the suggestions ever updated with feedback from DreamHost? It would be great to know that the suggestions are being reviewed, and if they are turned down we could see a reason rather than assume they are just being ignored or whatever.)


#8

I had been working on a workaround to clear the cache anyway. Due to lack of time I haven’t been able to get it to work in Perl and thus it’s not integrated into PsManager (yet). However I do have a little C program that I made to test the principle that could help you:

[quote=‘memtest.c’]#include <stdlib.h>
#include <stdio.h>

int main(int argc,char *argv[]) {
char *p;
int size=100;

if (argc>1) {
    size=atoi(argv[1]);
}
printf("Using %d MB\n",size);
p=(char*) calloc(size*1024*1024,sizeof(char));
sleep(1);

}
[/quote]

Compile it with

Run it and use the number of MB you want it to use as argument. For instance for 1000 MB:

That causes it to reserve 1000 MB for about 1 second and forces the system to reduce the cache to accommodate that memory. You can then hopefully reduce your VPS’ reserved memory.

It’s not perfect, it works up to 2000 MB on my home system (and fails above that), and the short wait of 1 second is just to allow me to actually see the effect.

Eventually I hope to integrate this into PsManager somehow, but for now maybe this little trick can help you.


#9

Thanks, that looks interesting. I’m a bit of a Linux newbie and don’t have a proper understand of this, but how is the server forced to reduce cache memory by running that process?

To me, I would have thought the server would need to increase it’s used memory to accommodate that extra 1,000 MB process (the cache would temporarily drop), and then when it’s finished running the process, the cache would revert to where it was before and overall memory would stay roughly the same before and after. (This assumes the extra memory used by the temporary process doesn’t mean total memory goes over the limit that I have set for the VPS, which might cause a problem to the DreamHost process killers).

Just wanting to understand a bit better how that works… (as I said, I’m a newbie to this but wanting to understand better before I run it on the PS)…


#10

It’s exactly as you say, the cache would temporarily drop when running that process. Of course after the program stops, the cache will start growing again, but the system won’t just reload everything in cache that was in cache before. It caches things as they are needed and it will take some time before it’s filled again, thus giving you a window of opportunity to reduce your reserved memory (without the otherwise obligatory reboot).[hr]
You could also wait until I have it fully functional in PsManager, but I don’t know how long this will take (still have the issue of getting that little C program converted to Perl, everything I tried until now had some unwanted behavior).


#11

Thanks for the reply. It does sound a very clever approach, and the only thing that I can think of that might be a problem is if actual memory usage on the server is higher than I expected at the time the script was run. The script would have to check if there was a enough free memory before running that process.

e.g. if max memory was set at 2 GB and was temporarily 1400 MB. If it then tried to run a 1000 MB process, the server might be forced into a reboot by DreamHost’s process killer because it would exceed the memory on the server. The script should only run if there is enough free memory to run the process (sounds obvious, but if it’s automated, there’s potential for a problem).

EDIT: Maybe something it could do in future is detect the amount of cached memory being used on the server and allow you to use a configurable percentage of that for the process. Then, when the script is run it should never cause an OOM issue on the server and will always reduce the cache by that percentage (the percentage could be set higher or lower depending on how aggressive you want the script to be).


#12

Yes, those are the kind of checks that would happen once I’ve integrated it into PsManager - it would basically work out how much the cache must be reduced by in order for a resize to happen without a reboot.

Like I said, this was a program I wrote to test this concept for cache reduction. In the state the program is in now you’ll have to check yourself how big the cache is before running it. You can run it with other values than 1000, it’s up to you exactly how much. If your server is constantly using below 600 MB and you have 2000 MB it makes sense to reduce it a bit. If it sometimes uses 1400 MB then maybe it does not make sense to reduce it in the first place.


#13

No worries, I think what you’ve come up with is very useful, and it was just meant as feedback in case you ever decided to do something more with it.

This whole issue is just making me think it’s ridiculous to always have this money/memory battle. The cost I am paying right now for VPS is more than a dedicated server would be with quadruple the amount of RAM (I’m using 2GB a month on a web PS, plus 800 MB on a MySQL PS), so it’s probably time I switched to dedicated hosting…


#14

Although this thread is a year old, I am curious to know if this clever work-around to clear the cache has already been implemented. I ask because I have noticed that the VPS where I have PSManager running have a strong dip in cached memory around midnight each night.

If this is indeed PSManager, then kudos!


#15

No, unfortunately that’s still not included in PsManager. Mainly I am not quite happy with the hack and well I have been lacking time to experiment :-/

The cache does appear to be cleared every now and then, for reasons I do not know, could be just the way the OS works, maybe clearing the cache of old stuff, or it’s something that Dreamhost set up that way, maybe they do clear the cache on whole machines at set times. Strangely I haven’t gotten much feedback from people running into this problem lately either.


#16

Sorry to bump this for another year, but it doesn’t look like an API call or other solution has been implemented yet. I got a forced resize but then I have only just installed PSManager so it’s not really surprising, but it drew me to this thread.

I did however think of a potential “clean” workaround for this issue; simply reduce the memory size over time. It seems that the cache memory won’t completely fill the available free space (or at least it doesn’t on my host), which means PSManager should be able to reduce the memory limit in small increments.

For example:
My total memory is 400mb, I have 75mb of actual usage and 313mb of cached memory, this gives a total of 388mb. If PSManager decides that my optimal amount is say, 290mb, then a resize straight to that amount is of course impossible, however if it requested a change to 390mb then that should work just fine. At this point the cache is uncomfortably large, so next time PSManager runs it will hopefully have reduced in size a bit more, allowing the limit to go down further.

Of course this runs the risk of hitting the 30(?) a day limit on changes, so PSManager would have to throttle how often it does this, but it should allow the limit to still ultimately work its way down over time. Your memory use “hack” could still be used to hurry this along, with appropriate user configuration options of course. This way it’s possible to create a balance of trying to reclaim all the cache memory at once, or to do it in say… 32mb increments, or simply just whatever free memory is handy at the time; so long as the memory limit is working its way down as expected then PSManager shouldn’t need to force a restart, unless the user’s settings don’t allow it any other choice.


#17

This is what happens naturally already, memory will get reduced over time by PsManager, at least if usage doesn’t increase in the meanwhile. In theory it could be possible to speed it up, but as you say, we then have the 30 resizes per day limit which might become an issue.