Nathaniel R. Stickley

Software Engineer - Astrophysicist - Data Scientist

Debian / Ubuntu System Administration Roadmap

This is intended to be a brief summary of the basic knowledge and skills needed in order to maintain the NebulOS cluster at UC Riverside, starting at a low level. More generally, this is a concise roadmap (or outline) for Ubuntu Linux system administration. In this roadmap, I will point you to resources that can be used to learn more. The purpose of this is not to explain everything in detail; it will merely get you started. Note that some of the things mentioned here are specific to Ubuntu or Debian-based operating systems, so this isn't exactly a general Linux system administration roadmap.

The Basics

You will need to have a good understanding of...

You will need to learn to solve your own problems with and without the assistance of Google. This will involve reading manual pages (man pages), info pages, help files, and system logs. For example, this command shows you the manual for Bash:

The following command shows you all of the built-in functions in Bash:

To find out more about any of the commands that are listed by this command, you can read more documentation using the help command. For instance, to find out more about the wait command:

You should also learn to write non-trivial scripts in Bash, with conditional blocks, loops, and functions.

Resources for learning BASH: Book 1, Book 2

You should also understand the concept of an inode. In particular, you should know the difference between copying a file and moving a file. You should understand why moving a 5 GB file from one directory to another directory on the same disk happens almost immediately, while copying the file takes a few seconds. You should also understand the difference between soft and hard links (also known as symbolic links and ordinary links, produced by ln -s and ln).

Input and Output Redirection and Piping

  • Learn about Unix pipes (particularly the usage of the | operator in Bash)
  • Learn to use >, >>, >&, >&1, 1>&2, >>&, >!, and >&!, for output redirection.
  • Learn to use < for input redirection
  • Learn about named pipes (FIFOs, with the fifo command)

Viewing and Changing Permissions and Attributes

Learn about these:

  • chown
  • chmod
  • chatter
  • lsatter
  • ls -l
  • stat

Installing, Updating, and Removing Software

Since we are using Ubuntu, which is a Debian derivative, become familiar with...

  • apt-get
  • apt-cache (In particular, apt-cache depends and apt-cache rdepends)
  • dpkg
  • dpkg-reconfigure
  • synaptic (Synaptic Package Manager)
  • aptitude
  • apt

Investigating dependencies:

Listing, Creating, Deleting, and Modifying Users and Groups

  • users
  • w
  • useradd, adduser
  • deluser, userdel
  • usermod
  • groups
  • groupadd, addgroup
  • groupdel, delgroup
  • groupmod

It's also nice to know how user information (including passwords) are stored.

Changing Password: passwd

Perform actions as another user (switch user): su

Obtaining Root Priviledges / Becoming Root

  • su
  • sudo
  • sudo -i

Regular Expressions

Learn regular expressions (Regex) so that you can use grep, gawk, sed, and other utilities effectively:

Searching The System

  • locate (and the associated updatedb command)
  • find (a very versatile tool;there are many nice tutorials online)
  • apropos (search for phrases in all of the man pages installed on the system)
  • whereis (search for files associated with a program)

Searching a stream, a file, or a group of files:

  • command | grep [options] 'search term'
  • grep [options] 'search term' file
  • grep [options] 'search term' dir/*/filename-pattern*

Find and Examine Hard Drives / Block Devices and Their Contents

  • lsblk
  • blkid (as root)
  • parted -l (as root)
  • df
  • du

Format a Hard Drive or Partition, Edit Partitions, Fix Filesystem Problems

Since we primarily use EXT4 and Btrfs filesystems:

  • mkfs.btrfs
  • mkfs.ext4
  • parted
  • btrfs-convert
  • btrfs [sub-command]
  • fsck
  • btrfs restore -iv /dev/(disk) /recovered/data/
  • lvm (and the many tools associated with lvm)

Mounting and Unmounting Drives & Partitions

  • mount
  • umount

You may need to run lsof before unmounting to see if any programs are using the mounted device. You should definitely learn about the many mount options available.

Also learn about the format of /etc/fstab.

A Few Important Configuration & Info Files

  • /etc/grub.d/*
  • /etc/fstab
  • /etc/mtab
  • /etc/hosts
  • /etc/hostname
  • /etc/network/interfaces
  • /etc/sysctl.conf
  • /etc/sudoers (only edit with visudo)
  • /etc/sudoers.d/*
  • /etc/passwd
  • /etc/shadow
  • /etc/group
  • /etc/gshadow
  • /etc/crontab
  • /etc/os-release
  • /etc/bash.bashrc
  • /etc/init/*
  • /etc/init.d/*
  • /home/username/.bashrc
  • /etc/modules

Some Special Files (Devices) and Directories

  • /dev/null
  • /dev/random
  • /dev/urandom
  • /dev/zero
  • /run/shm (a RAM disk)

System Logs

Most logs are stored in /var/log/. The following are particularly useful when diagnosing problems:

  • /var/log/dmesg
  • /var/log/syslog
  • /var/log/boot.log
  • /var/log/auth.log
  • /var/log/dpkg.log

Managing Processes

  • top
  • htop
  • pgrep
  • ps
  • pkill
  • kill
  • killall

Child Process Management / Multiprocessing in Bash

To spawn a new process in the background, place an ampersand after the command:

Also refer to documentation for (i.e., learn about) the following:

  • fg
  • bg
  • wait
  • nohup
  • disown

(You should have learned about these when you learned about advanced Bash scripting)

Configuring the Bootloader

Since Ubuntu uses GRUB (The Grand Unified Bootloader), you should know a little bit about GRUB. You can learn from some Google searches and from the info page for grub-mkconfig, which is a command that you may need to use to regenerate your GRUB configuration. Also look at the files in /etc/grub.d/

Note: in practice, it’s easier to use update-grub, which calls grub-mkconfig.

Scheduling Tasks

Be aware of cron, at, and associated tools. You may need to add, remove, or modify cron jobs, for instance

  • crontab
  • cron
  • at
  • atq
  • atrm
  • batch

System Init

Some versions of Ubuntu use Upstart for system initialization, while the latest versions use Systemd. In either case, you can begin learning about the init system using:

You can learn more by reading tutorials online.

In any case, the restart and shut-down commands are:

and

and you can find initialization settings here:

  • /etc/init/*
  • /etc/init.d/*

SSH

Become very familiar with secure shell (SSH) and be aware that HPN-patched version of OpenSSH is installed (https://www.psc.edu/index.php/hpn-ssh). SSH has many features. Try to become familiar with the existence of most of the features. The tools ssh-keygen and ssh-copy-id are also quite important.

When using HPN-SSH, it is recommended that arcfour encryption be used and that TCP Receive Buffer Polling is set to enabled. To do this, use the following:

When the connection is slow and/or the data being transferred is highly compressible, it is beneficial to enable compression with the -C flag:

Byobu / Screen

When working remotely, you should use Byobu or GNU Screen to simplify things and also to prevent your session from dying if your network connection is interrupted. Byobu is a layer on top of GNU Screen, which makes Screen easier to use (http://byobu.co/).

Rsync

Become familiar with the power of rsync. Note that, if you ever need to use rsync to copy the contents of an HDFS DataNode’s block data, you need to copy everything including hard links, using rsync’s -H or --hard-links flags. Also be aware of the -e flag, which allows you to use rsync over SSH:

Copying data from one drive to another, preserving hard links and extended attributes

Network Configuration

Become familiar with the file /etc/network/interfaces and the utilities for managing network interfaces:

  • ifconfig
  • ifup
  • ifdown
  • ifquery

Become familiar with the configuration tools for network traffic filtering / routing:

  • iptables
  • iptables-save
  • iptables-restore
  • ip
  • route

An iptables tutorial, with specific info about Network Address Translation (NAT)

How to set up a gateway: Connection Sharing

DHCP

Setting up a DHCP server, using dnsmasq: http://blogging.dragon.org.uk/

Domain Name System (DNS) Nameserver configuration

Be familiar with DNS concepts and BIND9. Here’s a little tutorial.

NFS


Learn how to set up an NFS server and how to mount an NFS directory.

Network monitoring / network exploration

  • nethogs
  • netstat (Useful: sudo netstat -anlp | grep -w "192.168.0.1")
  • lsof -i
  • iftop
  • iptraf
  • nmap

A more comprehensive list: http://www.binarytides.com/linux-commands-monitor-network/

Also interesting: http://cacti.net/index.php and http://www.ntop.org/

Other Monitoring / info-gathering tools

  • iotop
  • iostat
  • lsof
  • memstat
  • free

Additionally, search the /usr/bin/ directory for all programs ending in 'stat'

and search the repositories for programs ending in 'stat'

Exploring the hardware

  • lshw
  • dmidecode
  • lscpus
  • lsusb
  • lspci
  • sensors (from the lm-sensors package)

To install and use the sensors tool:

Benchmarking

  • phoronix-test-suite (comprehensive, but bulky and time-consuming)
  • sysbench (multiple benchmarks)
  • mbw (for testing memory bandwidth)
  • fio (Flexible I/O tester)
  • ping (network latency)
  • iperf (for testing network throughput)

For iperf, on the server you run:

and on the client, you run:

Stress-testing / Burn-in

  • stress
  • badblocks

for example, to do a burn-in on the hard drive, /dev/sdg,

Then check:

where smartctl comes from the S.M.A.R.T. tools (smartmontools)

Automation / Orchestration with Ansible

Ansible allows you to efficiently manage large groups of machines. You can easily modify system settings, copy data among machines, and install, remove, update, and configure software easily.

I use Ansible’s ad hoc commands very frequently: http://docs.ansible.com/ansible/intro_adhoc.html

The real power comes in using playbooks: http://docs.ansible.com/ansible/playbooks.html

Compiling and Inspecting Libraries

To list functions / symbols in static libraries:

To list symbols in shared libraries:

To list symbols in objects:

To create a shared library (.so shared object file), which has position-independent code:

To find what libraries a program uses:

Example:

shows the libraries used by rsync because $(which rsync) expands to "/usr/bin/rsync"

The ltrace utility is also quite useful. It is similar to strace, but it reports library calls instead of system calls.

HDFS

You can learn to manage HDFS by reading the huge amount of Hadoop documentation online as well as by reading through the HDFS source. The following commands were not very well-documented when I started using HDFS.

To check the entire filesystem:

To check the files in a specific sub-directory or a specific file and show info about the blocks:

Move corrupt files to lost+found:

To re-set the number of replications to 3 manually:

Mesos

For Mesos configuration information, refer to the http://mesos.apache.org/ (web searches will also lead to nice documentation from the company, Mesosphere)

Deb Packages

Debian Packages (.deb files) are somewhat less interesting than they used to be, now that Snappy is available, with its associated snap packages, but here are a few basic things that may come in handy:

  • auto-apt
  • checkinstall
  • debc
  • dpkg-deb
  • dpkg-depcheck
  • debdiff
  • lintian (check that the deb conforms to standards)
  • debchange (modify changelog)

To extract the contents of a .deb file:

After editing the contents, you can re-build the archive by doing:

To identify dependencies, use

Kernel Modules

To list, load, unload kernel modules, and investigate modules:

  • lsmod
  • insmod
  • modprobe
  • rmmod
  • depmod
  • modinfo

The kmod program handles all of these tasks:

There are also tutorials online for building a custom kernel from source and for writing your own kernel modules.

Useful Books

Advanced Linux Programming

Introduction to the Command Line Second Edition

CompTIA Linux+ Complete Study Guide

Beej's Guide to Network Programming Using Internet Sockets

The Linux Kernel Module Programming Guide

Finally, a book about an extra-super-cool tool that might come in handy some day:

The ZeroMQ Guide

Nathaniel R. Stickley
nrs@nrstickley.com
626-269-9830