Modern processors now operate more efficiently and only use maximum electrical power when they are dealing with a demand for maximum computational power. When the computational demand reduces, so does the demand for electrical energy. This is a truly dynamic condition as computational workloads vary constantly.
Multiprocessor servers and those with very few disk drives (e.g. blade servers), will have the highest percentage dynamic power variation. A situation where the bulk or all of the processors in a rack experience a demand for maximum computational power is fairly infrequent and it may be weeks or months between such events. However, when this event does occur, it is patently a very critical period for the business concerned. Unfortunately this phenomenon has caught many Data Centre managers out and they have lost service to the business at very key times.
Why and how does this cause a potential for IT downtime?
At a rack level it is normal to connect the IT load to rack level PDU’s (power distribution units). In a dual power supply application there are normally two electrical feeds to the IT load. One comes from PDU A and one from PDU B. This delivers a 2N level of redundancy at the IT load input and this also has its own potential pitfalls (these will be highlighted later).
Management of the electrical capacity of these PDU’s is either non-existent or via local ammeters. These ammeters often show the current drawn at any particular time in amperes or as a percentage of the maximum capacity. Where rack space is available and the meter is normally showing electrical load at a fraction of the maximum capacity it would seem prudent that for example, further servers can be fitted within the free space.
Most servers spend much of their time operating at light computational loads. As we know, this means that the server will be drawing less than its potential power draw. Most people installing or maintaining data centres and network rooms, however, are unaware that the typical observed server power consumption may be much lower than the potential power consumption when under a high computational load. This situation can lead a data centre or network room operator or IT staff to accidentally put too many servers onto a rack level PDU.
This situation becomes dangerous when the majority of the IT equipment within the rack experience the demand for high levels of computational power. Under this condition, a group of servers will operate until enough of the servers are simultaneously subject to heavy loading. At this point an electrical overload condition will occur and the circuit protection (fuse or circuit breaker) will operate and immediately disconnect the IT load. This is obviously an extremely undesirable event. Furthermore, since it is happens at a time of high computational load, it is likely that the computing equipment is handling a large number of transactions - so the failure is very likely to be occurring at a particularly bad time.
The fall-out after such an event is often pretty serious. What the hell happened? Everything else seems OK, the UPS is working, the generator never kicked in, other racks were not affected.
Was it a fault in the rack PDU, a fault in one of the servers, a circuit breaker fault, a spike in the supply or a loose wire? When everything is checked and the rack powered back up – all can look good. The electrician checks the power consumption and to his mind confirms that the circuit is not overloaded. With everything idling he is truly observing this condition. So he changes the circuit breaker, fits s new PDU and everything returns to normal. That is until the next time that the event occurs!
The same consequences can occur if a server power supply fails. No-one realised that the server receives shared power from both A and B power supplies. When one fails all power is shifted to the other. This could in turn overload that particular circuit.
So what can be done to protect against this?
Ascertain and detail asset lists for each piece of equipment connected to a rack PDU. Against this list determine the following:-
|Asset||Minimum power demand||Maximum Power Demand||% variation|
From this list we can see that there is a potential for a maximum current draw of 31.52A at 230 volts single phase (7250/230 – power in watts divided by the nominal voltage). It is highly likely that we will observe a power demand of half this (<16A) for the bulk of the time.
If the asset is fitted with A and B power supplies and these operate on a shared load basis, we may observe a power demand of less than a quarter of this (<8A). Assuming we have 32A rated single phase PDU’s fitted it is not surprising that mistakes are made.
Nevertheless, we are at the maximum limits of a 32A PDU. There is no safety margin and no other equipment should be connected.
Where do we get this information from?
The vendors will have published figures but often these are overstated. This could be seen as a safety margin or a waste of capacity. UPS vendors publish more realistic figures and these are readily available from their websites.
Whichever approach for information is used, it is important to constantly monitor the power demand on each rack level PDU. With many PDU’s thresholds can be set and if equipment is added to the PDU an alert will be raised automatically. Investigations can then be immediately carried out. Trend analysis is also available with many PDU’s and a history of power demand, over time graph can be produced. Valued judgements can then be made with regard to dynamic power variations and the level of safety margin the facility has. Some PDU’s also offer the facility to remotely turn-off and on the outlets in a rack level PDU. In our example, it would be wise to have all un-used outlets disabled.
This level of management is paramount where users are likely to install or move equipment or plug it into a different outlet without the knowledge of the data centre manager. A situation that is very common in network rooms, collocation facilities, and medium security data centres. This approach can also alert the manager of a loss of power redundancy.
Where availability is set at very high levels this level of management and monitoring should be continued all the way back to the facilities incoming power supplies. This is often called a PMS (Power Management System) as against a BMS (Building Management System). All switchgear, transformers, generators, UPS and static switches are continuously monitored and any deviant from normal operation is immediately alerted to the management system. This article is based purely on the power chain and adaptive cooling is also critical to these dynamic changes. There are also other items that require close management. These include access systems and fire/flood detection and protection.
The only problem with the PMS is that it crosses the boundaries of IT and facilities. IT’s management systems are normally IP based and Facilities are normally BMS based.
The control of the power chain and indeed some of the other technologies is widening. The large engineering corporations are keen to have a strong presence in every step from the power station and all the way through to the socket outlet. Mergers, buyouts and collaborations have been commonplace and a handful of majors players are emerging.
Group Schneider, Eaton-Williams and Emerson - to name a few. In the European IT market probably Group Schneider is the most interesting. Having completed the purchase of American Power Conversion (APC) they are very strong in the IT sector and their presence within the electrical supply market is both well established and broad. Both Schneider’s sectors have robust facilities for monitoring and management via their products and it seems sensible that this will be soon available on a common platform. Once in place, the PMS will much simpler to manage.
A new breed of engineers will be needed. One that can converse in both IT and facility speak. One that fully understands the importance of both and can view the whole as a complete business solution.
on365, the data centre specialists state that they have such people. They call them ITility Engineers. As the demand on IT intensifies - more needs to be delivered with less and the pressure of the Green debate swells - such people will be needed to see the bigger picture but also understand specialist needs.