INCLUDE_DATA

Archive for the 'neutron star' Category

Update

Posted in Family, neutron star on August 5th, 2006

Things have been extremely busy these last several weeks. I have a final on Thursday and then I'm on summer vacation until mid September. I can't wait. My mother is coming into town in a couple of weeks and we are going to go see the new Cirque show.  (Corteo ).

I did a credit evaluation for my Northwestern program and found that I need to start taking at least two classes per semester to finish in four years. If I don't it will be more like 9 years!!!! So next Semester I'm going to take a Literature course and a Differential Calculus course. My life is over.

Work has been very crazy and fun. My status reports are like 6 pages long! Two people have left (For various reasons I cannot talk about) since I have started and have not been replaced. I am now basically doing anything and everything. We just purchased a new EMC SAN that I have been playing with. I'm still not sold on the idea of SANs. Expensive, and over-rated. especially EMC!

Eno is doing very well. He has a follow up nero appt. about his arm on Tuesday. (Keep your fingers crossed). Group administrators have finaly cut their first check to pay for his therapy. Yes, thats right, first! Since last October! I will not miss them!  The new company has Blue Cross Blue Shield. I feel he is doing extreemly wel though. 90% - 95% better.

Bye.

Looking for good systems admins

Posted in neutron star on August 5th, 2006

Hey,

We are looking for two really good sys admins. If anyone is interested drop me a line.

Eric.

I officialy not a contractor anymore.

Posted in neutron star on August 3rd, 2006

Effective 08/01 I'm an employee and not on contract anymore!

Help Me!

Posted in WTF, neutron star on July 5th, 2006

Today has been a total buzz kill!!  I'm out in NY to install a new WAN connection between the NY and Evanston offices and reconfigure the local network. The connection is a point to point Ethernet connection at 6MB and it was supposed to be tested and ready to go last week. I woke up at 4AM this morning to catch a 6am flight to LGA. Take off was delayed because some backup "electrical system" on the plane had failed. We had to wait for replacement parts to arrive and get installed. That was about a one hour delay. Once I arrived in LGA I got delayed another hour because United crew couldn't unload the luggage due to lightning strikes on the tarmac.

Once I finally arrived at the office I went up to the telco closet to locate the new connection and associated equipment. It was nowhere to be found. I called the provider (Yipes) and after several hours we discovered the equipment had not been installed. Remember this was all tested!  Yipes finally dispatched a tech with the router and switch. I figured we were home free now! NOT! We could find the four T1s that were supposedly installed. Nobody knew who the carrier was or were the DMARC was! After another several hours of searching I found four unmarked/unlabled T1s that happened to be the right ones. We wired up the router and switch and boom it works! So I proceeded to connect our Cisco switch and establish a dot1q trunk with the Evanston office. Once again it doesn't work. After two hours on the phone with Yipes tech support we discovered the Yipes equipment was miss-configured. Once again, remember, this has been tested and confirmed working last week.

Ok, now that switches are working time to reconfigure all the desktops. I quickly reconfigure my laptop and soon discover I cannot access the Internet. UBS (Our office space provider and ISP) has not configured their firewall for the new subnet. So it's now after 7pm and the chances of getting someone on the phone are not good. What! I did get someone.. Oh. but they don't know the password to the firewall.
It's now been over 14 hours.. I'm going to get dinner and back to the hotel.
My wonderful wife researched some vegetarian restaurants for me. Mmm nice dinner. NOPE! Sorry! The restaurants are now closed.

Argh!!!!!!!!

The network is down?

Posted in Tech, neutron star on June 19th, 2006

It was a crazy weekend. My CallManager/BGP network upgrades got canceled because we were having connectivity problems to remote sites and we didn't want to introduce any more changes until all the problems got resolved. We started getting a bunch of complaints that users could use email and that messages would just sit in users outboxes. Outlook would eventually just pop up one of its "Trying to connect" messages balloons down by the task bar. We proceeded to troubleshoot the issue to our San Francisco office and New York offices and discovered that any packet with a bytes size larger the 128 bytes was getting dropped. Both of our office are in UBS office buildings and UBS provides all of the network equipment and connectivity. So trouble shooting from their end was very difficult. Also both sites are connected back to headquarters via a IPSEC connection. In the end we discovered that if we installed the Cisco VPN client on the end users machines and used that to connect instead of the L2L tunnel everything worked fine. So today we are chasing after UBS to try and trouble shoot their VPN concentrator which terminates the remote end of the L2L tunnel.

 Also on Saturday we were installing new A/C units in our data center and relocating network equipment to make room for two new 6500 core switches. During that process many of the machines in the data center lost connectivity for short periods of time. After the move was completed Nagios started screaming that half the network was down. I immediately thought this was a problem with Nagios or the Nagios box itself because everything else seemed fine. I spent about four hours playing with duplex settings, speed settings, kernel drivers, etc on the Nagios box to try and figure out what the problem was. Then I started to notice that I was seeing issues outside of Nagios. If we attempted to ping any host within our production subnet we experienced about 40% - 80% packet loss. Nagios wasn't't broke! It was doing exactly what it was supposed to do! It was telling me the network was f&8^ked up! Long story short — we narrowed the problem down to a single bad SFP GBIC in one of the IDF closets.

Work fun

Posted in neutron star on June 13th, 2006

I have an interesting weekend coming. Starting Friday around 6pm we are going to be migrating from 3 bonded T1s a fully redundant BGP Internet connection. I finally get to play with BGP!

 

The new setup will consist of 3 T1 lines from MCI, 2 T1 lines from SBC, and a 10MB DS3 connection from Yipes. So starting at 6pm we will be modifying all our NAT rules on our external facing PIX firewalls, modifying DNS entries, modifying our internal routes and crossing our fingers. Should be fun?

 

At the same time I’m going to be upgrading our Call Managers to version 4.1.3 sr3a. The OS to version 4.2 and the IOS on our voice gateway. We have been troubleshooting an intermittent problem where inbound calls transferred get dropped. Sr3a seems to have dozens of fixes related to call transfers.

 

Another interesting item:

 

Our developers are working on a program that takes in a stream of pricing data, modifies it and then sends the data out to users that have subscribed to it. Everything is written in C++ and compiled with gcc. All of this happens in real-time and each unicast session takes about 2.1Mbs. That’s a lot of data! We have 100 traders running clients that talk to this thing. That’s over 200Mbs of data. I’m keeping my fingers crossed on this one. This is all running on a RHAS 4 Dell box with 4×2.8Ghz dual core xeon CPUs. In testing mode we are processing about 4000 prices per second and the load on the box is only .01! The network usage has spiked as high as 40Mbs. I’m wondering if this thing is going to be able to sustain 200Mbs.

 

We are using this really, really cool tool called NexVu to monitor our network traffic. It’s a dell box running Linux and their proprietary application. Check it out. http://www.nexvu.com

Using Nagios with Quickpage. A SMS TAP Gateway

Posted in Tech, neutron star on May 19th, 2006

Yesterday I spent the day setting up SMS paging from Nagios. In the past I had just used email to SMS gateways for sending notifications to my cell. SMS Gateways rely on the Nagios host having access to them and the network infrastructure in between them to be functional. Unfortunately many of the times Nagios needs to send out notifications the network is not in a reliable state. I have experienced several instances where a major router/switch goes down and Nagios has no way to let me know. Solution: SMS TAP dial-up gateway.

Most cellular service providers provide dial up SMS TAP gateways (Some even offer toll free numbers). These gateways allow you to send SMS messages to cellular devices by using your modem. With this setup Nagios could be completely disconnected from an IP network and still be able to get notifications sent out via a modem and some paging software.

Here is a great source of TAP dial-up numbers for most providers.

http://www.notepage.net/tap-phone-numbers-c.htm

Before you can configure Nagios to use a SMS TAP gateway you need to install some software that actually makes the call and speaks TAP. I decided to use quickpage (http://www.qpage.org/) because it was small, easy to build, and easy to configure. Just download quickpage, untar, run configure followed by a make and make install. (You may also want to take a look at sendmpage. http://www.sendpage.org)
Quickpage operates in a client server manner. A daemon sits and listens for a quickpage client to connect and tell it to send a message. (The qpage binary is both the daemon and cleint depending on which switches are specified.) Before you can start the qpage daemon you need to create an initial configuration file for quickpage. The configuration file sets some of the following options: Which serial port you modem is on, cellular service provider definitions, and recipient pager definitions. You can place the config file in any directory as long as you use the -C switch to tell qpage where it is. I think the default place it will look is /etc/qpage.cf

Here is an example qpage.cf file:

——————————————————————–
#Administrators email
administrator=protect.the@innocent.com

#Make sure qpage can write to this directory. If you start qpage as root
#it will become the daemon account.
queuedir=/var/spool/qpage

identtimeout=5
snpptimeout=60

#Serial port your modem is on
modem=ttya device=/dev/ttyS0

#A service definition called default
service=default
device=ttya
baudrate=1200
parity=even
allowpid=yes
maxtries=6
phone=18886561727

#A service definition called cingular - This seems to work for cingular cell phones
service=cingular
device=ttya
baudrate=1200
parity=even
allowpid=yes
maxtries=6
phone=18668837243

#A service definition called CingularBB - Blackberries
service=cingularBB
device=ttya
baudrate=9600
parity=even
allowpid=yes
maxtries=6
phone=18009094602

#These are pager definitions, Obviously you should replace 5555551212
#with your own cellular number.
#The service tag associates the pager with the services defined above.
pager=eric
pagerid=5555551212
service=default

pager=EdCingular
pagerid=5555551212
service=cingular

pager=EricCingular
pagerid=5555551212
service=cingularBB
——————————————————————–

Save this file somewhere and then execute the following command to start quickpage:

qpage -C /usr/local/etc/qpage.cf -q 5

The will start quickpage and tell it check the queue every five seconds. You may also want to consider adding the -d switch. This will force qpage into debug mode and is very helpful when testing new configurations.

Also note, that some providers like to have a 1 in front of the area code on the pager ID. This was the case with the Cingular dial-up. i.e. 15555551212 instead of 5555551212. They key here is to play around.

Once qpage is running try to send yourself a test SMS message. Quickpage by default will attempt to connect to the qpage daemon running on localhost. This is fine because we are testing from the same machine.
Type: qpage -p eric

Where eric is the name of the pager definition in your qpage.cf. qpage will connect to the daemon and submit a message for delivery. Now watch your qpage debug output. You should see it attempting to dial-out using the modem and connect to the provider. Thats it!

Because quckpage is a client server applications you can actualy run qpage from any host that has IP access to the machine running the qpage daemon. Just use the -s switch and specify the hostname. When thinking about this options it’s important to remember why we are doing this in the firstplace…
Now configure nagios to use qpage:

First you need to define a notification command. I have the following in my misccommands.cfg file:

# notify via sms using qpage
define command{
command_name notify-by-sms
command_line /usr/local/bin/qpage -s localhost -P $CONTACTNAME$ -f $HOSTNAME$ $SERVICEDESC$ ‘$SERVICEOUTPUT$’ $HOSTNAME$
}

Now define a contact that uses the notify-by-sms notification command:

define contact{
contact_name EricCingular
alias Eric’s Blackberry
service_notification_period 24×7
host_notification_period 24×7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-sms
host_notification_commands notify-by-sms
email top.secret@neutronstar.com
}

Now any service or host that is setup to send notifications to EricCingular will use the notify-by-sms command which calls qpage.

Thats it!

Ooops!

Posted in Tech, neutron star on April 28th, 2006

I was hired to be a Linux administrator. My hiring manager has seen my resume and knows that my skills go beyond just Linux/UNIX. My hiring manager decides to take advantage of those skills today!

We had an IT coordination meeting today to discuss all current projects and ongoing work. We had to determine how much of our time was spent on which tasks, etc, etc.

As of today my responsibilities have expanded, scope has increased, and diversified.

I now support:

1. RedHat Linux Servers
2. NextVu network monitoring system
3. Nagios system and services monitoring system
4. Veritas/Symantec Netbackup for Linux systems
5. Cisco AVVID (CallManager and Unity)
6. Backup LAN/WAN Cisco Admin
7. APC InfrastruXure implementation
8. About 8 Linux developers.

WOW! That was fun. I guess they don’t intend on getting rid of me anytime soon.

The NextVu product is pretty cool. It’s a custom sniffer that runs on a modified version of Fedora. The box has six NIC cards that can be connected to various parts of the network on Cisco span enabled ports. It provides a Java based GUI that you can use to monitor traffic flows on a per-application, protocol, session, host, etc level. Some uses would include: Holding WAN vendors to their SLAs, troubleshooting application network traffic, monitoring utilization, determining which applications are the bandwidth hogs, etc. The closest open source thing I can thing of is a cross between NTOP and ethereal.

64Bit Linux! Watch me run……

Posted in Tech, neutron star on April 24th, 2006

I (sadly) wiped the 64bit AMD machine today and re-installed the 32bit version of RedHat AS 4. It’s amazing how many problems I have run into trying to port various RPMs and libraries to the 64bit AMD platform.

  • Many applications still look for the 32bit version of libc and blow up/don’t compile if they can’t find it.
  • Some libraries want to install into /usr/lib64 and others into /usr/lib.
  • When compiling some apps are platform aware and look in /usr/lib64 while others are not and they look in /usr/lib
  • Some commercial libraries are not available in 64bit versions. (I don’t understand why it’s such a big deal for some to type make!)

I was able to hack my way around most of these issue by creating softlinks, or screwing around with the SPEC files to get things in thier proper locations but in the end we simply had to move back to a 32bit OS because of all the various issues raised as a result of the items listed. There were just too many customizations made to keep track of.

This is rather disappointing. I think the developers would really love to be able to take advantage of the 64bit platform, but unless the open source community makes a concerted effort to try and make the shift to 64bits on Linux transparent nobody will attempt to move over, or end up moving back to 32bits after getting tangled in the 64Bit x86 web.

Strange thing is that I don’t recall having any of these problems on other not x86 64bit platforms. I have been running Debian/Redhat on various SPARC, MIPS, and ALPHA platforms of years. So what’s the major difference? Keeping backwards compatibility between 32bit and 64bit x86? Well it isn’t working!

Nagios

Posted in Tech, neutron star on April 24th, 2006

I spent a fair amount of time on Friday configuring Nagios (http://www.nagios.org). I have it monitoring CPU, Memory Usage, System Load, Number of Processes, Services, Disk Usage, etc, etc, etc on both UNIX and Windows hosts. I used the NSClient program on the Windows hosts and ran into a fun little problem when trying to start the service.

First I copied all the appropriate files to c:\program files\NSClient, then I ran pNSclient.exe /install to install it as service. Next I typed net start nsclient and got the following error:

A system error has occurred.

System error 1067 has occurred.

The process terminated unexpectedly.

The event log contained the following:

Event ID: 2 Source: NSClient
NSClient CollectData: Call to rereve counter value for failed, returning stats code 4294967295.

Event ID: 1000 Source: Application Error
Faulting application pNSClient.exe, version 2.0.1.0, faulting module unkown, version 0.0.0.0, fault address 0×008c2459

I am using Windows 2003 with SP1 installed. SP1 tigtned down the security screws a bit and needed a bit more tweaking to allow Nsclient to run a service. The problem is related to the new Data Execution Protection stuff added to SP1. To fix the problem do the following:

Right click My Computer -> Choose Properties
Click on the Advanced Tab
Click the Settings button under the Performance section.
Click on the Data Execution Prevention tab
Click the Turn on DEP for all programs and services excpet those I select radio button
Click the Add button and browse to your copy of pNSclient.exe

Thats it.

I also am using nagios to monitor various Cisco statistics and states. I will post links to them shortly with example configs.