The wizardry that makes your website hummm is often confusing so this article is designed to take the mystery out of the resource topic.
Resource limits on our servers are very generous but they are designed foremost to prevent a website from breaking the server due to an attack or some other abnormal activity. Resource limits, when an appropriate hosting plan is chosen for a given website, will not interfere with normal operation of a website.
There are four main limits that we have direct control over at the individual website level:
-
CPU
-
I/O
-
Memory
-
Concurrent connections (EPs)
If you reach the CPU limit or the I/O limit for your website, it will be "throttled" -- slowed down -- to prevent it from completely crashing or otherwise effecting the server. It is only slowed down during the period when too much CPU or too much I/O is being requested. It returns to normal immediately after the site is back within limits.
If your website reaches the memory limit or the EP limit, your website will return an error status to the visitor. In other words, the website will stop displaying temporarily. The reason those two resources return an error is because they are serious conditions that can crash the server when unchecked. (CPU and I/O don't typically crash a server, but can make it slow.) The moment the condition that is causing a 'fault' stops, the site returns to normal.
The natural question people ask is 'why do you limit my website?' The answer is because you can crash a server if a website "runs away."
Look at it this way: if somebody else's website is under sustained attack, or if it is running bad code that consumes all available resources, it is better to 'jail' that website than to allow it to effect your website.
What causes these excess resource situations? Bad bots can and do hammer websites. A DoS/DDoS can hammer a website. Bad code -- inefficient WordPress plugins, for example, or code that loops or doesn't properly stop executing -- is an unfortunately common issue with add-in CMS plug-ins. Or, maybe you just received too much traffic for your Hosting Plan.
For the three former issues, you have to do a little troubleshooting to figure out the specific problem. (If you are a managed client, we do the troubleshooting for you.) Bad bots are easy to see -- and they are by far the problem #1. Vulnerability scanners and various script kiddie attackers visit everyone's website every day. Bad code is more difficult to detect. But, for example, if there are resource issues every time a certain plugin is executed you can usually find evidence in the log files.
An easy way to determine if the problem is too much traffic versus a short term "attack" is to check the resource graphs in your CPanel. See if they show a steep spike or, rather, if there is there a steady rise in resource use over hours or days. A steep spike usually signifies an attack. It may also be a "Digg" effect where a mention on a popular forum or a marketing campaign suddenly sends a lot of traffic.
To stop "spikes" you must identify the culprit and deal with it. In the case of a Digg effect, or if you get regular legitimate spiky traffic, upgrade your plan if you want to ensure the website is always available. If you would rather save some money, you can just accept occasional, brief interruptions in your website. (Maybe not a great idea if you're marketing your site. But if the traffic is organic, you can decide how valuable it is to capture those excess visitors.)
If legitimate traffic is rising over time, the only recourse is to increase your resource allocation by upgrading your plan. You may also be able to increase the amount of traffic your website can bear by optimizing and caching.
Lets look at a somewhat typical website.
If you log into your CP you will find a "Resource Monitor" icon in the Logs section. Click on this icon to open the resource use page. It is beneficial to review these graphs periodically and watch for trends. More likely you will check the Resource Monitor because you received an email alert.
In our typical website you might see something this:
That notice is telling you that you reached the CPU limit and the EP limit sometime in the past 24 hours. So lets click on [Details] and see what it reveals.
At the top of the page you will see your current usage; a snapshot of this very minute.
Everything is self-explanatory except the CPU limit. CPU always shows a limit of 100% which represents the percentage of the limit configured for your plan. 100% can be one, two or more CPU cores.
The I/O limit for this website is 3MB/second which is a lot for our very fast SSD drives.
EPs are difficult to quantify. It is the number of actual network connections during a given second. Since connections open and close in a fraction of a second, the 40 EP limit above can translate to 100s of concurrent visitors. (We have a highly optimized WordPress website that receives 10 million visitors a month and averages about 12 EPs. Another website that is very unoptimized sees nearly 10 EPs with only 400,000 visitors a month.)
Memory is generous at 512MB. It is very rare to reach that limit. This website uses much more memory than average but it is safely below the 512MB limit. (The 512MB figure roughly equates to a 768MB VPS because the OS also consumes memory on a VPS.)
On to the "faults." Here is the visual graph on which you can see a spike in both CPU and EPs.
These spikes should be investigated. It is not "normal" traffic and the log files will usually show who (or what) caused the excess CPU and visits.
Another good tool is to look at your usage in spreadsheet format over time. Take a look at this is a 24 hour look back where you can clearly see the spike from the graph.
You can also review the spreadsheet with a shorter time period. For example, if you get an email alert, look at the 1-hour view to see specifically during which 5-minute period of time the problem occurred. Knowing this time-frame will help when reviewing logs.
The "m" column is the max value that occurred during that time period. In the graph above you can see that the max EPs during the 11:00 to 11:05 period is 3. And the max memory consumed during 11:05 to 11:10 is 175MB, while the average memory during that time was 158MB.
The right hand columns pMf and EPf show "faults" for Memory and EPs, respectively. These are the two parameters that will temporarily block the website so any number over zero in that column means traffic was blocked at some time during the period in question. (But not for the entire period.)
You can vastly reduce the number of times you receive 'faults' by reviewing your logs for the time period, identify the problem and take steps to address problems.