Sysadmin and Documenation

Documentation is very important. I started a new SysAdmin gig a couple of months ago and the people here did a good job of documentation. A lot is documented about the systems themselves and what sort of maintenance contracts we have and that sort of thing. All this is good stuff.

But: What is not documented is the relationships and dependencies between the various sites at this company (at least on the Unix side of the house). They are spread out all over the place: Canada, India, Texas, Louisiana, D.C.

The problem comes in because the administration for DNS and Sendmail was done without documentation.

Then, the time came to upgrade DNS. Management got wind of this problem and decided that this was a problem of some urgency. Nevermind that their main DNS and mailserver was running an un-patched copy of Solaris with the RPC portmapper open to the world — this problem needed to be fixed now.

The first time through, I discovered that they were depending on internal MX records in DNS to do mail routing. Uh… wrong! So, I prepared to take out the internal MX records. However, this meant that I had to change the sendmail configuration. Since they were running an old, unpatched copy of that, I decided to upgrade sendmail as well. I set up a mailertable and tried to get all the internal MX records into it. In the process, I discovered some relatively unknown machines running SMTP. You’d think they’d want to get rid of them if no one knew about them, eh? But no, the political climate (and some special people) guaranteed that they would stay.

I was able to clean up DNS a bit as a result of this upgrade. I had to; the new bind was far more sensitive about configuration problems than the older bind.

After extensive testing, I put the changes in place. It took longer than expected — things always do — but it got done.

Oops! There was no checklist of things to make sure that everything was done right (and this was a rush project, so there was no time to create one), so 6000 users lost their mail for about 12 hours.

Of course, a bigger deal was made of it than was necessary. It was a big deal, but really, no one believed the specter of lost sales of a nuclear power plant because email was down.

Finally, though, all the problems were fixed. What were the lessons I learned?

  • Document everything. For your sake and the sake of the person who comes after you. Especially document dependencies. People shouldn’t be able to claim grief if you had no way of knowing about it. If it isn’t documented, it doesn’t exist.
  • Make sure you have management’s support. You’ll need these guys saying I gave him the go ahead if something goes wrong.
  • Try to get as much information about the changes as you can. Test the information you have. Test it again.
  • Get someone else to review what you are doing if you can. You might miss something.

Thoughts about Package Formats

“Never do anything twice” That should be the mantra of all system administrators. Whatever you do, automate it. If it is installing a piece of software or the steps that you have to go through to add a user, automate it. On Unix, various tools exist to help. When it comes to managing the configuration of multiple machines, CFengine stands out. Its meta-language allows you to actions to take on various classes of machines. You can create classes (e.g. web-servers) and have packages installed on those machines and various configuration tweaks made. It really saves time because it helps you document all that you’ve done to set up a package or service. For compiling software and installing binaries, there are different methods available depending on your operating system. Solaris has a package system, but it installs software in funny places (usually something like /opt/packagename) and there is no obvious, easy way to make your own packages. RedHat Linux uses RPMs which at least learned the fallacy of putting every package in its own special hierarchy, but they didn’t make producing RPMs very easy. They use some sort of meta scripting language that is yet another thing to learn. At least the packaging system is free, though. The various free BSDs (FreeBSD, NetBSD, and OpenBSD) have what they call a ports system. This is based on Makefile, so you don’t have to learn anything if you already know about Makefiles, but it can be difficult to get everything right. For example, after you’ve gotten the whole thing to compile and install correctly, you may have to go through a few iterations of the install to produce a binary package so that you can ensure that you have all the files included and all the extra steps taken care of. The best system I’ve seen yet is Debian’s. They use Makefiles plus some scripts to simplify things. The scripts can take care of 90% of the work in most cases, and in those few that they don’t, they greatly simplify things. When it comes to installation, they have fakeroot which ensures that they’ve captured all the files that are installed. Another good thing about Debian is their Apt protocol. Apt will take a debian package, grab all dependencies from the Internet (optionally compiling them) and install everything. It can be done totally non-interactively — a huge benefit. The driving force behind good system administration, as with good programming, is laziness. As the SysAdmin, you have great power available to you to automate a lot of what you do. Use it.

Training seems to be a big thing. Everybody knows this, so we’ve got some problems with scammers. So, if you are going to try to start your career in IT, how do you know that you aren’t being taken advantage of? The first way, of course, is reputation. Does the school or training center have a good reputation? Most acredited colleges and universities now offer courses and certification training and, while these may be more expensive than other options, they would be my first choice.

It is still important to keep in mind what the return is for any training or education that you pursue. Don’t think that a four year degree guarentee’s you a job. It doesn’t. Just the other day, I shared the bus with a recent Electrical Engineering graduate who was working in the mail room of a local company because he couldn’t find a job. If you plan to go to college, take advantage of any internships you can. If you go the certification and training route, don’t expect that your training or cert alone is going to get you more than a helpdesk job.

Still, it is hard to tell. We still have a labor shortage in IT, so some companies may be willing to do OTJ for someone who only has a piece of paper.

Some people have asked me what it takes to break into system administration. When it comes to education, I try to make the point that it only gets you so far — it gets your foot in the door. To gain the skills that are needed to get the job done, you really have to sit down and aquire them yourself. Sure, you can go off to training and get the official scoop, but if that is your primary way of learning skills, you’ve drastically limited yourself.

So, in talking with a co-worker (who is an NT administrator) about this, we agreed that probably the easiest entry-level position to obtain is a help-desk job. Of course, you’ll want to get with a comany that is going to help you out and provide training, but good luck actually getting training — especially since the economy seems to be slowing down a bit.

How did I break into system administration? During college, I worked part time for Dow Jones doing software testing. I didn’t realize it at the time, but that part-time job was an invaluble aid to my career development. Everyone looks at prior experience — even if you are fresh out of college. My school was particularly poor in that regard; they didn’t really push internships or other work opportunities. I don’t blame them too much since they were a commuter school, but I’ve met other people who’ve gone their for a degree who were really hurt by this.

Anyway, because of that part-time job, I was able to get a job at Motorola in Austin as the software tester for an internal tools development group. A year later, fed up with the tedium of software testing and with a real desire to move back to New Orleans, I started working at Tulane University as the Unix SysAdmin for the EECS department. Again, there were questions of experience. I was asked what my experience with Solaris was. I was asked some technical questions about compiling software. Because they used Solaris at the school and at Motorola, I had a modicum of experience on the OS. Because I was desperate to move back to New Orleans and get out of Software Testing, I was willing to take the relatively low salary. My boss later told me that I seemed genuinely enthusiastic and this was another point in my favor.

You’ll note that training really didn’t enter the picture. No one asked about Certifications. And, although they did want a college degree, that was just a marker, something to check off and not really a qualifying attribute.

Through the mentorship there (my boss was a professor who studied operating systems and security), I gained a raft of valuble skills that led to much more lucrative jobs later. Without the on the job training I garnered at Tulane, though, none of that would have been possible.

Today, I’ve been working on getting ProFTPd to play nice with OpenLDAP for authentication. There are a couple of options here. I can use the mod_ldap contribution that comes with ProFTPd, or I can install the PAM module for Solaris.

Right now, I’m leaning towards the PAM module. This would enable me to do authentication on all UNIX accounts using LDAP — and I could admin the accounts for all Unix boxes and FTP from one place. If a user changed their password on one box, it would change everywhere.

The Novell Admin here showed me that Novell has exposed an LDAP interface for their NDS, and that would be the ideal thing: let everyone use their Novell passwords to log in.

It isn’t that easy, though. The FTP servers have to allow some people to log in who are not local.

So, right now, I’m thinking of putting an LDAP server on each box and have them all replicate from a single one.