//{{{ Copyright (c) 2008, SUSE LINUX Products GmbH
//
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are met:
//
// Redistributions of source code must retain the above copyright notice, this
// list of conditions and the following disclaimer.
//
// Redistributions in binary form must reproduce the above copyright notice,
// this list of conditions and the following disclaimer in the documentation
// and/or other materials provided with the distribution.
//
// Neither the name of the Novell nor the names of its contributors may be used
// to endorse or promote products derived from this software without specific
// prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ONANY THEORY OF LIABILITY, WHETHER IN
// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
// POSSIBILITY OF SUCH DAMAGE.
//}}}
kdump on SLES 11 -- Implementation Guide
========================================
Bernhard Walle <bwalle@suse.de>
2008-07-01

//{{{ Overview -----------------------------------------------------------------

Overview
--------

//{{{ Scope of this Document ---------------------------------------------------

Scope of this Document
~~~~~~~~~~~~~~~~~~~~~~

At first: This document is about kdump configuration. It does not even mention
how kdump is implemented in the kernel and how +kexec-tools+ work -- because
we didn't modify the behaviour, we just use what we have mainline (and improve
mainline).

To understand that document, the user should be familiar how kdump works and
have read the “kdump on SLES 11 -- User's Guide” document. For a general kdump
overview, see http://en.opensuse.org/Kdump[http://en.opensuse.org/Kdump], for
example.

//}}}
//{{{ Why an own Solution? -----------------------------------------------------

Why an own Solution?
~~~~~~~~~~~~~~~~~~~~

So, why not just taking mainline for configuration? Because there is no
mainline! The only other distribution (as far as the author knows) that provides
an automatic kdump configuration is Fedora/RHEL. However, the SUSE
implementation must integrate into the distribution, meaning:

* To generate the kdump initrd, we use our own +mkinitrd+ mechanism which is
  extensible.
* For configuration, we have a YaST module.
* The configuration is stored in +/etc/sysconfig/kdump+ and uses the normal
  mechanism to edit.
* The init script must be “compatible” with the other init scripts in the
  distribution (although LSB specifies most stuff, so that should be not the
  problem when sharing init scripts).

Having that said, I think that this package contains some stuff that would be
useful for other distributions, too. I'm not opposing sharing that stuff with
other distribution, and I would like to put the stuff that is different across
the distributions inside my tree so that we finally have one “kdump” mainline
configuration package.

//}}}
//{{{ The „kdump” Package ------------------------------------------------------

The “kdump” Package
~~~~~~~~~~~~~~~~~~~

This is the kdump solution we use for SLES 11 (and later). It's not a complete
rewrite, it's only a major redesign from the experience we made with older
kdump solutions (basically the SLES 10 SP2 kdump solution).

The major change is: The kdump is copied now always by a special initrd and
the system never boots to the normal system in the kdump case. That has the
advantage that the memory requirements are smaller, and it's more robust.

The package “kdump” does following tasks:

* loading the kdump kernel via an init script,
* providing the software that are included in the initrd that copies the dump
  from +/proc/vmcore+ to the configured destination.
* deleting old dumps on the disk (when configured) and
* letting the keyboard LEDs blink while the dump takes place (that's where VGA
  does not work so that the user gets at least a bit feedback).

//}}}
//}}}
//{{{ Requirements -------------------------------------------------------------

Requirements
------------

//{{{ Loading the kdump Kernel -------------------------------------------------

Loading the kdump Kernel
~~~~~~~~~~~~~~~~~~~~~~~~

While the scripts should choose the “right” kernel on the system which can be
used for kdump, the user should still be able to overwrite this and use an *own
kernel*. We use a fixed naming scheme for the initrd that is used for dumping:
+/boot/initrd-_kernelver_-kdump+, so it's not necessary to implement a different
configuration option to choose the initrd.

To check if a kernel is usable as dump kernel, it's necessary to check if it's
_relocatable_. This can be done via reading the x86 boot header (for bzImage) or
via reading the ELF type.

The user must also be able to configure the *command line* which is used to boot
the kdump kernel. Because it's hard for the user to specify a complete command
line himself (he forgets for example to add +irqpoll+ which is necessary for
kdump to work in some cases or +maxcpus=1+), the user has the possibility to
*replace* the command line (for advanced users) or to only add *additional*
options.

Loading is done via the init script +/etc/init.d/kdump+. That script reads
+/etc/sysconfig/kdump+ to see if the user has chosen a special kernel and to
read the command line, and then loads the kernel with +kexec -p+.


//}}}
//{{{ Copying the Dump ---------------------------------------------------------

Copying the Dump
~~~~~~~~~~~~~~~~

The most important task after the kernel has been crashed is copying the dump
from +/proc/vmcore+ to the specified target.

Targets
^^^^^^^

We support following targets to copy the dump:

- Local disk (as file),
- FTP,
- SFTP (SSH2),
- NFS (v3),
- CIFS (Windows share).

While NFS and CIFS can be handled via mounting, FTP and SFTP must be handled in
the copy process (via external binary or a library). We need both network access
and disk access in the initrd. For this, we need the disk drivers and network
drivers and network configuration. This is only a configuration problem since
everything is already implemented in +dracut+ to provide network boot (NFS
root) facility.

Filtering
^^^^^^^^^

The +makedumpfile+ tool provides a facility to filter the dump. It can read the
dump from +/proc/vmcore+ and has two modes to write the dump:

- a flattened mode which can be used if the target cannot +lseek()+ (SFTP/FTP)
  which writes a special format to +stdout+. One must run a special Perl script
  (or makedumpfile itself) on the target to get back a normal dump,
- the direct disk mode.


Sparse files
^^^^^^^^^^^^

When +makedumpfile+ is not used for dump filtering, it makes sense to write
so-called _sparse files_ because in almost any case, the dump contains blocks
only of zeroes. If +cp+ is used, there's the +cp --sparse=always+ option.


Scripts
^^^^^^^

The user should be able to add custom scripts:

- before taking the dump (“pre script”),
- after taking the dump (“post script”).

Notification
^^^^^^^^^^^^

Since it's quite important that a machine crashes, there should be a possibility
to send an email notification without writing a custom script. To send an email,
a simple SMTP implementation can be used. A host name must be provided. SMTP
authentication will not be implemented in the first place.

Copying the kernel
^^^^^^^^^^^^^^^^^^

It's quite handy to have not only the dump in the dump directory but also the
kernel and the System.map file (and the related debugging information). Then
that directory can be sent as tarball to the support organization (Novell or any
partner). So, after copying the dump, the kernel and the rest should be added
optionally to that directory.

It's necessary to mount the root file system to get the kernel image. It's also
necessary to read the +.gnu_debuglink+ section to find out where the debug
information is located. That must be copied, too.

//}}}
//{{{ Deleting old Kernels -----------------------------------------------------
Deleting old Kernels
~~~~~~~~~~~~~~~~~~~~

If automatic dump is configured, there's the danger to clutter the whole file
system with old dumps. Therefore, we need an option where the user can set the
*number of old kernel dumps* to keep on disk.

Also, there should be a configuration option to configure the disk space which
must stay be free.

Both options make only sense for the local file system. For remote file systems,
one can run a cron job to make sure that old dumps are deleted. It is not good
if every dumping hosts delete the dumps on a “random” basis.

//}}}
//{{{ Blinking Keyboard LEDs ---------------------------------------------------
Blinking Keyboard LEDs
~~~~~~~~~~~~~~~~~~~~~~

While kdump is mostly used on servers, it makes also sense to use it on
workstations when debugging a specific issue. While you have mostly serial
console on servers and therefore see the kdump progress (serial console always
works), on workstation VGA is mostly broken when kdump is used because the X
driver occupies the graphics device and the kernel does not any attempt to reset
the VGA card. Normally, the firmware provides the reset.

However, since mainline is working on _kernel-based mode setting_ which could
also address that issue, it does not make sense to try to include some “hacks”
here. We lived now years with the problem, and I think we can live another year
with the problem.

However, to provide some feedback to the user beside of a blinking HDD LED
(which not any machine might have), the initrd includes a program which blinks
the keyboard LEDs while taking the dump. The blink frequency should be different
from a kernel panic so that the user does not just press the power button for 5
seconds. :-)

//}}}
//}}}
//{{{ Implementation -----------------------------------------------------------

Implementation
--------------

//{{{ Programming Language -----------------------------------------------------
Programming Language
~~~~~~~~~~~~~~~~~~~~

The full dump process will take place in initrd. This means for the scripts and
programs that are executed:

. No huge dependencies!
. No sophisticated scripting languages. Only Bash is acceptable, no Perl
  interpreter, no Python.

Because in the author's opinion, reliable and maintainable software in Bash only
is a nightmare, and more sophisticated scripting languages like Python are a
no-go for the initrd, one multi-call binary +kdumptool+ is implemented in C++
(object-oriented).

That binary copies that dump and also provides some helper facility.

Build System
^^^^^^^^^^^^

For the build system, +cmake+ is used.

//}}}
//{{{ Configuration ------------------------------------------------------------

Configuration
~~~~~~~~~~~~~

Reading Configuration
^^^^^^^^^^^^^^^^^^^^^

The whole configuration is provided in +/etc/sysconfig/kdump+. Because the user
must be able to use shell constructs in that file, following code can be used to
get the contents of the file in the C++ program:

-----------------------------------------------------
bash -c '
    source /etc/sysconfig/kdump
    echo "KDUMP_SAVEDIR=$KDUMP_SAVEDIR"
'
-----------------------------------------------------

Configuration Variables
^^^^^^^^^^^^^^^^^^^^^^^

This should give the reader an overview. The full documentation of the
configuration options is in the “User's Guide”.

Current
+++++++

Following configuration variables are used by kdump:

+KDUMP_KERNELVER+::
    Kernel version used for the kdump kernel. An empty value (the default) is
    used to auto-detect the kernel.

+KDUMP_COMMANDLINE+::
    Full kernel command line for the kdump kernel. Overwrites the default.
    Should be used with care.

+KDUMP_COMMANDLINE_APPEND+::
    Additional kernel command line options for the kdump kernel. That does not
    overwrite default kernel options (like +root+ or +irqpoll+).

+KEXEC_OPTIONS+::
    Additional options for kexec. Normally not needed to change that.

+KDUMP_IMMEDIATE_REBOOT+::
    Should we reboot after dumping?

+KDUMP_TRANSFER+::
    Use a custom script to copy the dump. Please make sure to include all needed
    binaries (and scripts) *including* this script itself to
    +KDUMP_REQUIRED_PROGRAMS+.

+KDUMP_SAVEDIR+::
    URL where the dump should be saved.

+KDUMP_KEEP_OLD_DUMPS+::
    Number of old dumps that should be kept. That's used if +KDUMP_SAVEDIR+
    points to local storage, i.e. a +file://+ URL.

+KDUMP_FREE_DISK_SIZE+::
    Disk size in MiB that should remain free. If (after saving the dump) less
    than +KDUMP_FREE_DISK_SIZE+ bytes would be free, the dump is *not* copied.
    No old dumps are deleted with that variable. The same limitations that are
    valid for +KDUMP_FREE_DISK_SIZE+ are also valid for that configuration
    option.

+KDUMP_VERBOSE+::
    Bitmask about the verbosity of kdump.

+KDUMP_DUMPLEVEL+::
    Dump level for +makedumpfile+. See _makedumpfile(8)_ for documentation.

+KDUMP_DUMPFORMAT+::
    Dump format: +ELF+ or +compressed+.

+KDUMP_CONTINUE_ON_ERROR+::
    If set to +true+ (default), then the return value of any action is checked
    and logged, but the next step is executed even if the previous failed.
    If set to +false+, then the process stops and the user can try to fixup
    things. That's mostly for debugging, for production one should always use
    +KDUMP_CONTINUE_ON_ERROR=true+.

Outdated
++++++++

Configuration variables that don't have any meaning any more (i.e., that were
removed from SLES 10 to SLES 11):

+KDUMP_RUNLEVEL+::
  Since the initrd is used to dump and we don't boot in a full environment, this
  variable does not make sense any more.

+KDUMP_DUMPDEV+::
  We don't support that immediate dump saving to a raw partition any more.

New
+++

Configuration variables that are new with SLES 11.

+KDUMP_REQUIRED_PROGRAMS+::
    A space-separated list of programs (with their full pathname) that should be
    included in the kdump initrd. You don't have to include shared libraries
    that are resolvable with +ldd+. However, if some library is loaded with
    +dlopen()+, then include them here. The files don't have to be executable.

+KDUMP_PRESCRIPT+::
    Command line that is executed *before* saving the dump. Please add the
    script itself to +KDUMP_REQUIRED_PROGRAMS+.

+KDUMP_POSTSCRIPT+::
    Command line that is executed *after* saving the dump. Please add the script
    itself to +KDUMP_REQUIRED_PROGRAMS+.

+KDUMP_COPY_KERNEL+::
    Boolean variable where you can configure if the kernel itself and the
    debugging information (if installed) gets copied into the dump directory.
    The default is “off”. This is handy if you want to have everything in place
    for debugging.

//}}}
//{{{ Loading the kdump kernel -------------------------------------------------

Loading the kdump kernel
~~~~~~~~~~~~~~~~~~~~~~~~

This is implemented in +/etc/init.d/kdump+. The script is run on each boot.

Configuration Options
^^^^^^^^^^^^^^^^^^^^^

- +KDUMP_KERNELVER+,
- +KDUMP_COMMANDLINE+ and +KDUMP_COMMANDLINE_APPEND+.


Choosing the kernel
^^^^^^^^^^^^^^^^^^^

In the default case, +KDUMP_KERNELVER+ empty, following scheme is used to decide
which kernel to take:

. Use +$(uname -r)-kdump+.
. If that doesn't exist, use +-kdump+.
. If that doesn't exist, use +$(uname -r)+.
. If that doesn't exist, use +""+.

If the kernel has no +kdump+ in the name, then it's checked for relocatability
before the script tries to load it via +kexec+.

Per item above, following files are checked, in the given order:

. +/boot/*vmlinuz+,
. +/boot/*vmlinux+,
. +/boot/*vmlinux.gz+.

There's following reason: On x86-64, the kernel is _not_ marked as
_relocatable_ in the ELF format (because some GDB bug ...), so we try the
bzImage first which is easy to detect for relocatability because of the x86 boot
protocol.

Helper Commands
^^^^^^^^^^^^^^^

The +kdumptool+ script provides following commands which are needed to load the
kernel:

    kdumptool identify_kernel [-r] [-t] kernel_image

The options are:

+-r | --relocatable+::
    Check if the kernel is relocatable.

+-t | --type+::
    Prints the type of the kernel image.

This replaces the functionality of the _kdump-identify\_kernel(8)_ standalone
binary of SLES 10 SP2.

//}}}
//{{{ Copying the Dump ---------------------------------------------------------

Copying the Dump
~~~~~~~~~~~~~~~~

Configuration Options
^^^^^^^^^^^^^^^^^^^^^

This part of the implementation uses following configuration options:

- +KDUMP_SAVEDIR+,
- +KDUMP_DUMPFORMAT+ and +KDUMP_DUMPLEVEL+,
- +KDUMP_CONTINUE_ON_ERROR+,
- +KDUMP_PRESCRIPT+ and +KDUMP_POSTSCRIPT+,
- +KDUMP_COPY_KERNEL+,
- +KDUMP_FREE_DISK_SIZE+.

The command specified in +KDUMP_PRESCRIPT+ is run before dump saving takes
place, and +KDUMP_POSTSCRIPT+ after saving takes place.

Saving the dump
^^^^^^^^^^^^^^^

The dump saving itself is done by:

    kdumptool save_dump <dump>

The only option +<dump>+ specifies the source of the dump. This is implemented
only for testing, in the initrd this is always +/proc/vmcore+.

For copying, the target is specified as +KDUMP_SAVEDIR+ and +KDUMP_DUMPFORMAT+
and +KDUMP_DUMPLEVEL+ is honored for calling +makedumpfile+ (or not, if +ELF+
and +0+ is specified, respectively).

Makedumpfile
^^^^^^^^^^^^

For +makedumpfile+, it's necessary to have additional information about the
kernel available. There are three options:

. having a kernel compiled with debugging information,
. having a so-called “configuration file” (_VMCOREINFO_) that is generated from
  the kernel with debugging information and specified for +makedumpfile+ by the
  +-i+ switch,
. using the _VMCOREINFO_ mechanism from the kernel (see below).

Using a kernel with debugging information is not an option in the initrd. It
would be necessary to copy the kernel into the initrd, and our split-out
debuginfo packages are not suited for that (support for split-out debuginfo
has been rejected upstream).

We did ship the _VMCOREINFO_ in the past, but now the third solution is more
elegant and will be used for SLES 11. The +kexec+ command builds an ELF note
while creating the ELF core headers (which are used as ELF headers in
+/proc/vmcore+). It gets that information from the running kernel and the
+/sys/kernel/vmcoreinfo+ file (which contains only an address and the size of
the data). The +makedumpfile+ command reads the information from the
+/proc/vmcore+ ELF header, and therefore doesn't need anything additional to
filter the dump.


Copying the kernel
^^^^^^^^^^^^^^^^^^

After the dump is saved, if +KDUMP_COPY_KERNEL+ is true, the +save_dump+ call
itself saves the kernel to the dump directory. It's necessary to implement that
as one command to be able to use one connection to the remote host.


Deleting the dump
^^^^^^^^^^^^^^^^^

This only applies if local dump target is used. For remote dumps the server
administrator must make sure that the dump quota is not exceeded, either by hard
disk quotas or cron jobs.

When no dump filtering is used and the +kdumptool+ recognises that the dump
would exceed the disk size minus +KDUMP_FREE_DISK_SIZE+, the +kdumptool
save_dump+ tool exits.

However, when dump filtering is used, the program has to check afterwards if the
disk size is exceeded and if yes, then it has to delete the dump.


Summary
^^^^^^^

So, to sum it up, the initrd script does following operations in pseudo code:

.Algorithm for the initrd kdump script
---------------------------------------------------------
function reboot():
    if KDUMP_IMMEDIATE_REBOOT:
        reboot machine
endfunction

function need_reboot():
    if error and not KDUMP_CONTINUE_ON_ERROR:
        reboot()
    endif
endfunction

function main():
    if KDUMP_PRESCRIPT:
        error = execute KDUMP_PRESCRIPT
    endif

    need_reboot(error)

    kdumptool delete_dumps
    need_reboot(error)

    error = kdumptool save_dump
    need_reboot(error)

    if KDUMP_POSTSCRIPT:
        error = execute KDUMP_POSTSCRIPT
    endif

    reboot()
endfunction

main()
---------------------------------------------------------
//}}}
//{{{ Deleting old Dumps -------------------------------------------------------

Deleting old Dumps
~~~~~~~~~~~~~~~~~~

Configuration Options
^^^^^^^^^^^^^^^^^^^^^

- +KDUMP_KEEP_OLD_DUMPS+

Executing
^^^^^^^^^

Old dumps are only deleted on local file systems (not NFS or FTP). The variable
specifies a number of old dumps, and that dumps are deleted before the dump is
saved.

Deletion takes place by

    kdumptool delete_dumps

//}}}
//{{{ Blinking the LEDs --------------------------------------------------------

Blinking Keyboard LEDs
~~~~~~~~~~~~~~~~~~~~~~

Blinking is done in userspace. The blinking is done by the +kdumptool+:

    kdumptool ledblink

The command has no options.

//}}}
//}}}

// vim: set sw=4 ts=4 fdm=marker tw=80: :collapseFolds=1:
