Friday, December 7, 2012

Maintaining Power in an Outage Event


The experience of a power loss at the local school and seeing how the battery backup systems work is a good one. The experience is good from the point of view that as an IT administrator and front line worker, you can see how the process functions in real time. It also allows for reflection and adjusting the plan to better serve the users of the technology systems at the school.

The school system has two legs of electrical power coming into the school system. One of those legs of electrical power shutdown at 8:30 am on this Friday morning. Lighting and outlet power were affected to portions of the building. Interestingly enough, the electrical wiring in any given room might have a portion of the circuits functional, while other outlets were dead. I am not exactly sure why the electrical circuits of the school were designed and engineered this way, but it is what it is.

When the electricity in one leg went down this morning, the main elementary computer telecommunications closet lost partial power. The phone system was unaffected along with some amplifier equipment. However, the three racks of computer switching, telecommunications and server equipment were immediately pushed to battery backup. The beeping of battery power was regular as the power continued to flow from their lead acid cells.

Within the first minute, the building's occupants were notified and by minute 5, non essential servers were shutdown including video surveillance and some backup devices. The storage server units were shutdown after users were notified; they were given 2 - 5 minutes of lead time. Our 1:1 students were able to function for about 15 minutes until the main telecommunication switching rack lost its battery inside the UPS.

The center rack containing iBoss, router, firewall and Ruckus wireless controller stayed up for an hour on its new CyberPower UPS. Other UPS units which were now lightly loaded stayed functional 30+ minutes into the event.

The take away at 45 minutes into the power outage event was that in order to preserve the wireless system in the building for a longer period of time, the primary HP ProCurve switch and the Elementary primary PoE HP ProCurve switch need their own UPS that can provide power for their functions as they are far more critical. They connect the high school to the elementary over fiber and also connect the Ruckus controller to the wireless APs. Maintaining that service for a minimum of 1 hour seems like a future goal worth attaining.

Once the power came back online, most of the battery UPS units rebooted themselves. Those UPS units shutdown manually, were restarted manually and the servers rebooted and tested. The process of bringing the main systems back online was probably 3 minutes. Some UPS units in closets at the high school needed to be manually restarted to power PoE and standard switches.

Some older small desktop UPSes showed their age and now are being seriously considered for replacement. We also know now that PoE switches draw power based upon the number of devices drawing power. At the high school, 8 PoE access points draw far less juice than the elementary school PoE which has 14 access points drawing power. Those 24 port PoE switches can draw up to 490 watts which is sure to hammer a usually good UPS in short order.

Our elementary switch structure comprised of 6 switches running at near 850 watts of power on a single SmartUPS 1500 lasted about 20 minutes. As mentioned above, 2 of those 6 switches will get their own CyberPower 2200 UPS. We will also move our X1 (Mac Mini server) that maintains the DNS and DHCP services to that new UPS to keep Internet access up as long as possible for our 1:1 folks and teachers. We believe we can go from 20 minutes of access to about 1 hour of access with a power outage.