Troubleshooting OSAD and jabberd
Open File Count Exceeded
SYMPTOMS
: OSAD clients cannot contact the SUSE Manager Server, and jabberd requires long periods of time to respond on port 5222.
CAUSE
: The number of maximum files that a jabber user can open is lower than the number of connected clients.
Each client requires one permanently open TCP connection and each connection requires one file handler.
The result is jabberd begins to queue and refuse connections.
CURE
: Edit the /etc/security/limits.conf
to something similar to the following: jabbersoftnofile<#clients + 100>
jabberhardnofile<#clients + 1000>
This will vary according to your setup.
For example in the case of 5000 clients: jabbersoftnofile5100 jabberhardnofile6000
Ensure you update the /etc/jabberd/c2s.xml
max_fds parameter as well.
For example: <max_fds>6000</max_fds>
EXPLANATION
: The soft file limit is the limit of the maximum number of open files for a single process.
In SUSE Manager the highest consuming process is c2s, which opens a connection per client.
100 additional files are added, here, to accommodate for any non-connection file that c2s requires to work correctly.
The hard limit applies to all processes belonging to the jabber user, and accounts for open files from the router, s2s and sm processes additionally.
jabberd Database Corruption
SYMPTOMS
: After a disk is full error or a disk crash event, the jabberd database may have become corrupted.
jabberd may then fail to start during spacewalk-service start:
Starting spacewalk services... Initializing jabberd processes... Starting router done Starting sm startproc: exit status of parent of /usr/bin/sm: 2 failed Terminating jabberd processes...
/var/log/messages shows more details:
jabberd/sm[31445]: starting up jabberd/sm[31445]: process id is 31445, written to /var/lib/jabberd/pid/sm.pid jabberd/sm[31445]: loading 'db' storage module jabberd/sm[31445]: db: corruption detected! close all jabberd processes and run db_recover jabberd/router[31437]: shutting down
CURE
: Remove the jabberd database and restart.
Jabberd will automatically re-create the database:
spacewalk-service stop rm -Rf /var/lib/jabberd/db/* spacewalk-service start
An alternative approach would be to test another database, but SUSE Manager does not deliver drivers for this:
rcosa-dispatcher stop rcjabberd stop cd /var/lib/jabberd/db rm * cp /usr/share/doc/packages/jabberd/db-setup.sqlite . sqlite3 sqlite.db < db-setup.sqlite chown jabber:jabber * rcjabberd start rcosa-dispatcher start
Capturing XMPP Network Data for Debugging Purposes
If you are experiencing bugs regarding OSAD, it can be useful to dump network messages in order to help with debugging. The following procedures provide information on capturing data from both the client and server side.
-
Install the tcpdump package on the SUSE Manager Server as root:
zypper in tcpdump
-
Stop the OSA dispatcher and Jabber processes with
rcosa-dispatcher stop
andrcjabberd stop.
-
Start data capture on port 5222:
tcpdump -s 0 port 5222 -w server_dump.pcap
-
Start the OSA dispatcher and Jabber processes:
rcosa-dispatcher start
andrcjabberd start
. -
Open a second terminal and execute the following commands:
rcosa-dispatcher start
andrcjabberd start
. -
Operate the SUSE Manager server and clients so the bug you formerly experienced is reproduced.
-
Once you have finished your capture re-open terminal 1 and stop the capture of data with: CTRL+c
-
Install the tcpdump package on your client as root:
zypper in tcpdump
-
Stop the OSA process:
rcosad stop
. -
Begin data capture on port 5222:
tcpdump -s 0 port 5222 -w client_client_dump.pcap
-
Open a second terminal and start the OSA process:
rcosad start
-
Operate the SUSE Manager server and clients so the bug you formerly experienced is reproduced.
-
Once you have finished your capture re-open terminal 1 and stop the capture of data with: CTRL+c
Engineering Notes: Analyzing Captured Data
This section provides information on analyzing the previously captured data from client and server.
-
Obtain the certificate file from your SUSE Manager server: /etc/pki/spacewalk/jabberd/server.pem
-
Edit the certificate file removing all lines before
----BEGIN RSA PRIVATE KEY-----
, save it as key.pem -
Install Wireshark as root with:
zypper in wireshark
-
Open the captured file in wireshark.
-
From
select SSL from the left pane. -
Select RSA keys list:
-
IP Address any
-
Port: 5222
-
Protocol: xmpp
-
Key File: open the key.pem file previously edited.
-
Password: leave blank
For more information see also:
-