Monday, February 13, 2012

LIME Barbados - Some Corporate Customers Lose Online Services for 2 Days due to DNS Server Issues



Date: 2012-Feb-13



Overview

For a couple of days, Feb 6th to mid-morning Feb 8th, 2012, certain corporate customers of LIME (Cable & Wireless) in Barbados may have been affected by a DNS (Domain Naming System) server issue at LIME.


It likely caused the loss of two and a half (2 and 1/2) days of Internet productivity (or entertainment) for the affected corporations and constituent users.

Those experiencing the problem would have endured a complete outage of any Internet service that required DNS to function! This includes any web pages accessed using a web browser.




Outage Details

Apparently, the LIME DNS servers 205.214.192.201 and 205.214.192.202 were not answering DNS queries from the LIME Broadband network. Thus, LIME's customers could not directly query them.

This was not an Internet outage. It was a DNS outage. The problem was correctable, once known, both at end-user machines or by the Systems Administrator reconfiguring the server and appropriate network equipment. It only affected companies with a specific collection of configuration settings. It is assumed the number of affected companies was large due to the common nature of the configuration.

It would take about three (3) minutes for an end-user to be guided into working around the issue once permitted to modify the network adapter settings.


Typical Static DNS Servers Configured Systems
Traditionally, the above DNS servers were statically configured in:


  • The Internet Protocol (IP) properties on on the network adapter of Windows Server machines functioning as DNS servers, Internet Security & Acceleration (ISA) servers, routers.

  • Hardware routers.

  • Hardware firewalls.

  • Microsoft DNS server settings as the Forwarders.

  • Distributed to Windows client computers via the DHCP's DNS options.


Concerns Emanating from the Outage


This outage should concern all who were affected in the following ways:



  1. It lasted in excess of two (2) days! Estimated outage time is a minimum of 53 hours.


  2. LIME's DNS servers col1.caribsurf.com [205.214.192.201] and col2.caribsurf.com [205.214.192.202] are authoritative name servers for serveral local domains (the correct term to use here is DNS zones). Whilst those servers were resolving DNS queries of international origin (according to LIME and confirmed by testing), they failed to resolve DNS queries from LIME's customers. This meant that if the customers' local caching DNS server was either, 1.) using root hints to resolve queries or 2.) using those afflicted LIME name servers as Forwarders, it would fail to communicate with any of the LIME hosted DNS zones. However, DNS zones not hosted by LIME would have been accessible with the root hints configuration.


  3. Whilst LIME has provisioned another caching DNS server dns.caribsurf.com [205.214.222.201], it is not an authoritative name server. That is to say it is not secondary to the primary name server for the hosted DNS zones.


  4. Both of the LIME Barbados' authoritative name servers (primary and secondary), as known to the public, appear to be on the same IP subnet. This makes the network path (or some part of it) a single point of failure, and thus minimizes the engineered redundancy. A failure on the upstream router to these name servers, and as was likely Feb 6th - 8th, 2012 a misconfiguration of the network policy on the upstream router can cause an outage. An outage could affect all DNS operations (authoritative and caching DNS service to the Internet) or authoritative and caching DNS operations only on the LIME network.

  5. On the issue of public relations by LIME during this critical network outage event, this was as absymal as usual. LIME failed to list this issue as a Service Alert [1]. However, the single day closure of their retail outlets on Feb 10th, 2012 due to a Union meeting was important to list (see image at top of page).





REFERENCES


[1] Service Alerts- LIME Barbados. http://www.time4lime.com/bb/news/service_alerts.jsp

Friday, November 27, 2009

Dell's Bad Advice and Internet Explorer Requested Lookup Key Not Found in Any Active Activation Context

System: Dell Precision T3400 workstation
Operating System (OS): Windows XP Professional 64-bit
Type of Use: Home / Home Office
Location: Barbados
Date of Dell's On-site Service: November 27th, 2009

Issue

  • Power LED on front bezel flashes amber on any attempt to power on from an absence of power condition, e.g. the power cord of the system unplugged from a surge suppressor.

Preliminary Diagnosis

  • A Power Supply Unit (PSU) failure was suspected.
  • The PSU failed when attached to two (2) separate PSU testers - only the +5 VSB LED was active.
  • The User's Guide for the system stated that the diagnostic LED condition implied an internal power issue.
  • From an online chat with Dell's Technical Support a dispatch was created for a replacement PSU.

Dell On-site Technician Diagnosis - 2 Days Later

  • Power supply unit fine but a motherboard replacement is required.


Subsequent Issue

  • The motherboard was replaced but a fault condition was experienced when attempting to load operating system (OS. A STOP error (blue screen, white text) appeared on each OS load attempt.


Real Reason for the Subsequent Issue

  • The Operating System (OS) expects the BIOS to be set in RAID On (or always on - whatever is the exact text) mode. This was not the default BIOS setting of newly replaced motherboard. To Dell's credit, this was also not the setting when the system was shipped from the factory about a year ago.
  • Simply changing this BIOS setting would have allowed the OS to load properly.


Fix Applied by Dell Technician and Dell Tech Support

  • The on-site technician performed an in-place/repair installation of Windows XP Professional 64-bit.
  • The above action would have reversed the OS from being capable be used in a RAID 0 or RAID 1 configuration. I had performed an in-place/repair install less than a week prior in order to setup data redundancy on the same Dell Precision workstation using RAID 1 (disk mirroring). In effect, Dell was restoring the factory configuration of the OS and Hardware Abstraction Layer (HAL).


Side Effects of Repair Install
In place/Repair installation of Windows XP Professional 64-bit damaged something in Internet Explorer. Internet Explorer 8 was installed on the system prior to the repair installation. The repair installation probably placed IE 6 as the executing IE version, but remanants of IE 8 where possibly somehow interfering with browsing from the running version of IE.


There were apparently also some networking problems at the same time. The computer was reportedly not always connecting to the Internet. Since I was not on-site, I cannot described the exact nature of the issue. However, the user reported that rebooting the networking devices seemed to have resolved the networking issues - apart from those with IE. The computer is and was connected to the networking equipment via an Ethernet cable.


I am unclearly on if the directly attached Linksys Wireless-N router (with LAN ports) was restarted only or also the Internet Service Provider's (ISP) Customer Premises Equipment (CPE).


However, the interpretation of an "absence of Internet condition" and a "networking problem" is generally one and the same for many end-users. Thus, a web browser not bringing up a web page is usually the condition that triggers fancy fault proclaimations and rebooting of devices. As described later, Internet Explorer was not successfully finding (resolving) and loading address bar entered web pages other that Google's search/home page (which was luckily set as the home page).


Given the previous paragraph, it is possible that assumed networking issues confused the technical analysis by Dell's remote support personnel as well as the on-site technician while he/she was present.


Side Effect Symptoms - IE Specific

  • Home page http://www.google.com/ loaded on opening IE.
  • The user is able to search and access web sites from Google's search results.
  • Typing full URL (e.g. http://www.facebook.com/), or anything else, in IE's address bar resulted in an untitled dialog box with the message:
The requested lookup key was not found in any active
activation context.


Suggestion of Dell Technical Support - Phone/Remote Session Support

  • Backup data and contact him/her again for a clean installation of Windows XP Professional 64-bit in order to resolve the issue.



A Clean Install is Not Necessary for a Web Browser Issue!

  • The issue existed only with Internet Explorer! Mozilla Firefox was also installed on the user's system and did not experience the same issue.
  • The suggestion of the Dell Support Technician was an inappropriate resolution step and would have been needlessly disruptive to the user in terms of data backup, restoration and application and driver reinstallation.


Solution - Fix IE Installation
Uninstall IE8 from Start >> Run... by typing:

%windir%\ie8\spuninst\spuninst.exe

Thanks to http://www.askmehelpdesk.com/internet-web/request-lookup-key-not-found-any-active-activation-context-61188.html.

The system was restarted the system after the IE8 uninstall. IE6 was present both as a 32-bit and 64-bit application and operating normally with respect to typing URLs in the address bar.

The system was upgraded to IE8 for whatever security and feature benefits it may give the user. The user reported that all was well after the upgrade.


Commentary on Dell Service

  1. The Dell on-site Technician repaired the hardware condition and overstepped his/her scope of work in fiddling with the operating system. Of course, attempting to help the end-user is admirable but doing a bad job is not.
  2. Isolation of the fault with respect to the OS loading after the motherboard was changed was beyond the on-site technician and whatever remote support assistance he/she had.
  3. At the point of experiencing the STOP error, the on-site support technician failed to engage the client in conversation. She would likely have referred him/her to speak to me, and I would of course have decided to take it from there myself or asked about the RAID on BIOS setting.
  4. Isolation of the web browser and assumed networking faults were again beyond the on-site Dell technician and whatever remote support he/she and the end-user had from other Dell representatives. A real IT support technician travels with software and browser setup images. Additionally, the proper diagnosis of networking issues MUST always move outside of web browser interaction observations.
  5. It is disgusting that the remote Dell Support was advocating a clean OS installation to solve a web browsing issue, moreover, a vendor specific browser issue. This points to the usual inadequate networking training of typical PC Support technicians and how dangerous it can be to end-users. From personal observation, strangely enough, years on the job do not appear to broaden our improve their knowledge of networking.


On the Good Side

  1. The user is lucky to have not incurred part replacement fees due to having a sensible warranty (3-year, Next Business Day) on the Dell Precision workstation. This purchase time warranty clearly has significantly lowered the Total Cost of Ownership (TCO) of the system, which has experiences a hard drive failure, cage-fan failure and motherboard failure all within the first year of the warranty. Many may get frustrated with so many issues in such a short time and claim brand inferiority, however on the economic side they are free to try another brand without the ease of warranty support and on-site service and judge the hassle and cost for theirselves.
  2. The user is also lucky to have expert level assistance in the form of me, though not so luck as above with respect to the fees. :-)

Items for Dell to Address
Observed during Online Chat to Report Power-On Issue

  1. I used the name of the client, a female in the chat sign up so she would remain the primary contact for Dell in the follow-up interaction. The support representative persisted in calling me Mr. Placeholder-for-English-Female-Christian-Name. This is despite me giving the full name being entered in the Chat sign-up information capture. The protocol for addressing customers need to be reviewed.
  2. The support representative asked if I tried the PSU in another computer system, to which I sarcastically commented and rambled on a bit about the likelyhood of having 2 workstations or desktops in the home so I could diagnostically swap parts. The support representative took this opportunity to point out that I was rather impolite - this was obviously an inappropriate and ill-advised action on his/her part. Of course, I so informed him/her.
  3. A summary of the fault was entered prior to the startup of the chat session, however on entering the chat the support representative still asked about the fault. Don't they get to read the initial information?
  4. Even though the system has a Next Business Day (NBD) warranty and the replacement part was in the island the Dell Technical support was saying the replacement service would be scheduled within 1 to 3 days. This is not quite what NBD means.

References

[1] Rant: requested lookup key was not found in any active activation context. http://groups.google.com/group/microsoft.public.internetexplorer.general/browse_thread/thread/63240ed1c5d56961/4eae2f494bd75cfb?q=. Usenet Group: microsoft.public.internetexplorer.general. Found via Google Groups. Accessed: 2009-11-27.

[2] http://www.askmehelpdesk.com/internet-web/request-lookup-key-not-found-any-active-activation-context-61188.html. Accessed: 2009-11-27.

[3] How to reinstall or repair Internet Explorer in Windows Vista and Windows XP. http://support.microsoft.com/default.aspx/kb/318378 Accessed: 2009-11-28.

Tuesday, September 15, 2009

LIME St. Lucia Communicates to Customers on SMTP Blocking Issue - with Tight Deadline?

This is an update to the precursor articles:
  • LIME St. Lucia SMTP Blocking - End User Edition
  • 2009-09-12 - LIME St. Lucia - Blocks SMTP Communication - Outbound Traffic on Port 25 - Disrupts a Business from Sending E-mail for at Least 1 Week!


On September 15th, 2009, at 10:35 hours AST (Atlantic Standard Time) LIME St. Lucia sent an e-mail to customers titled "FW: Email Experience/Spamming":
  1. Requesting information on the mail servers used by their business.
  2. Encouraging those hosting mail servers on-site and using dynamic IP addresses to move to using a static IP address.
  3. Requesting the information be sent by close of business today (September 15th, 2009).

Clearly they have not accounted for the situation where the clients do not host an on-site mail server (and therefore have no need for a static IP address, a static IP has a monthly recurrent cost) and wish to maintain communication with their 3rd party e-mail service provider!


Consider the scenario where an employee expects to access his e-mail from Microsoft Outlook on his residential LIME-provisioned ADSL connection at his home. This e-mail could be hosted either on his business place's on-site mail server or on that of a 3rd party provider. Let's hope that employee and his technical support / e-mail service providers are aware of the alternative means of regaining productivity!


Where are the Caribbean's Telecommunication Regulatory Authorities and Consumer Commissions on this matter? I know I e-mailed NTRC (http://www.ntrc.org.lc) at ntrc_slu@candw.lc - e-mail from their web page!


As of September 15th, 2009, 13:20 hrs AST no update was done in the Service Alerts (http://www.time4lime.com/whats_new.jsp?whats_menu=Service_Alerts) or Press Releases (http://www.time4lime.com/whats_new.jsp?whats_menu=Press_Releases) section of the LIME St. Lucia web-site.


By the way, those direct hyper-links above are likely to fail because the web server would NOT know the country context unless chosen from on the LIME Home Page (http://www.time4lime.com). Any further discussion on this is for another blog though. :-)