This section contains general instructions for troubleshooting various issues.
Each section in this manual under “Roles & Services” includes notes on troubleshooting procedures specific to that role, and on how to find log files that can assist with troubleshooting.
- Broker Statuses
- Fault Detection Troubleshooting
- JLOG Error Troubleshooting
- Troubleshooting Alerts
- Broker-Stratcon Connectivity Troubleshooting
PKI Connectivity Troubleshooting
The following roles make use of SSL to communicate:
In each role’s section the Operations Manual, you can find details on where the keys and certificates are located. Once you have those locations, troubleshooting an SSL connection can proceed.
If for any reason you are not receiving certificates, either when installing Circonus or when adding new services or brokers, try restarting the
circonus-ca_processorservice. This should cause the service to sign any pending CSRs and then begin listening again for new entries.
Verify that all the necessary keys and certificates exist. These will be
<application>.key. If any are missing, refer to the install manual and run
run-hooperagain on this node.
Verify that the
ca.crtmatches what is provided by your CA. To do this, log into the CA machine and look at
Verify that the certificate was signed by the CA by using the following command:
openssl verify -CAfile /path/to/ca.crt /path/to/application.crt
Verify that the key matches the certificate. If the following two commands don’t output the same value, there is a mismatch:
openssl x509 -noout -modulus -in /path/to/application.crt | openssl md5
openssl rsa -noout -modulus -in /path/to/application.key | openssl md5
Verify connectivity with the s_client using the following command:
openssl s_client -connect host:port -CAfile /path/to/ca.crt -cert /path/to/application.crt -key /path/to/application.key
If any of the above commands fail for non-obvious reasons, contact Circonus Support (email@example.com) about how to resolve the issue.
In the event that a check is not returning data when you believe it should, the following steps should be taken:
- Verify the running status of the check on the broker by following these steps:
- Navigate to the “Check Details” page on the UI and click the “Extended Details” link in the upper left section of the page. Record the UUID shown there.
- Log onto the broker machine and telnet to port 32322 using this command:
telnet localhost 32322
- Show the status of the check by typing this command, using the UUID from Step 1:
show check <UUID>
- If the check is getting an error, such as a refused connection or a timeout, verify the connectivity of the broker to the machine in question using system tools like telnet, curl, etc.
- If all these steps are showing the check should be working, collect the network traffic to and from the broker for inspection. If possible, you can use a tool like tcpdump or snoop to collect this network traffic.
Repairing Corrupt LevelDB Data Stores
On occasion, a LevelDB database may become corrupted.
You should be able to determine which log is corrupted by looking at the errorlog (usually in /snowth/logs/errorlog). It will tell you what has been corrupted. To fix it, follow the instructions below.
1. Disable snowthd.
Before you start, you will need to disable snowthd with the following command:
sudo systemctl stop circonus-snowth
2a. Correct corrupted text data.
There are two DBs that can become corrupted in the text db - the metrics store (a list of metrics) and the changelog (all of the different text values for a metric).
To correct the metrics store, run the following:
sudo /opt/circonus/sbin/snowthd -u nobody -g nobody \ -r text/metrics \ -i <id of snowth node in topology> \ -c /opt/circonus/etc/snowth.conf
To correct the changelog, run the the following:
sudo /opt/circonus/sbin/snowthd -u nobody -g nobody \ -r text/changelog \ -i <id of snowth node in topology> \ -c /opt/circonus/etc/snowth.conf
2b. Correct corrupted histogram data.
For histogram data, the metrics db (a list of all available histogram metrics) or the actual data (which is stored based on the period) can become corrupted.
To fix the metrics database, run the following:
sudo /opt/circonus/sbin/snowthd -u nobody -g nobody \ -r hist/metrics \ -i <id of snowth node in topology> \ -c /opt/circonus/etc/snowth.conf
To fix the actual data, run the following:
sudo /opt/circonus/sbin/snowthd -u nobody -g nobody \ -r hist/<period> \ -i <id of snowth node in topology> \ -c /opt/circonus/etc/snowth.conf
3. Renable snowthd.
Once finished, you will need to renable snowthd with the following commands:
sudo systemctl start circonus-snowth