Skip to main content

Operations

By default, IRONdb listens externally on TCP ports 2003 and 4242, TCP and UDP port 8112, and locally on TCP port 32322. These ports can be changed via configuration files. There are normally two processes, a parent and child. The parent process monitors the child, restarting it if it crashes. The child process provides the actual services, and is responsible for periodically "heartbeating" to the parent to show that it is making progress.

IRONdb is sensitive to CPU and IO limits. If either resource is limited, you may see a process being killed off when it does not heartbeat on time. These are known as "watchdog" events.

Service Management

The IRONdb service is called circonus-irondb.

To view service status: /bin/systemctl status circonus-irondb

To start the service: /bin/systemctl start circonus-irondb

To stop the service: /bin/systemctl stop circonus-irondb

To restart the service: /bin/systemctl restart circonus-irondb

To disable the service from running at system boot: /bin/systemctl disable circonus-irondb

To enable the service to run at system boot: /bin/systemctl enable circonus-irondb

Logs

Log files are located under /irondb/logs and include the following files:

  • accesslog
  • errorlog
  • startuplog

The access logs are useful to verify activity going to the server in question. Error logs record, among other things, crashes and other errant behavior, and may contain debugging information important for support personnel. The startup log records various information about database initialization and other data that are typically of interest to developers and operators. Logs are automatically rotated and retained based on configuration attributes in /opt/circonus/etc/irondb.conf.

If the child process becomes unstable, verify that the host is not starved for resources (CPU, IO, memory). Hardware disk errors can also impact IRONdb's performance. Install the smartmontools package and run /usr/sbin/smartctl -a /dev/sdX, looking for errors and/or reallocated-sector counts.

Crash Handling

Application crashes are, by default, automatically reported to Circonus, using Backtrace.io technology. When the crash occurs, a tracer program quickly gathers a wealth of detailed information about the crashed process and sends a report to Circonus, in lieu of obtaining a full core dump.

If you have disabled crash reporting in your environment, you can still enable traditional core dumping.

On EL7:

  • Add the following to /opt/circonus/etc/irondb-node-config: ulimit -c unlimited
  • Restart the circonus-irondb service
  • Set the kernel core pattern to place core dumps in a suitable location. See the core(5) man page for details. This location must be writable by the user that IRONdb runs as (nobody).
  • Allow setuid dumps: sysctl -w fs.suid_dumpable=1

When a process crashes, a core dump will be created in /irondb/appcrash with the filename core.<executable-name>.<pid>, and the event will be recorded in the system log.

(TODO: Ubuntu)

Debugging Mode

If instability continues, you may run IRONdb as a single process in the foreground, with additional debugging enabled.

First, ensure the service is disabled: /usr/bin/systemctl stop circonus-irondb

Then, run the following as root:

/opt/circonus/bin/irondb-start -D -d

Running IRONdb in the foreground with debugging should make the error apparent, and Circonus Support can help diagnose your problem. Core dumps are also useful in these situations (see above).

Replication

In a multi-node cluster, IRONdb nodes communicate with one another using port 8112. Metric data are replicated over TCP, while intra-cluster state (a.k.a. gossip) is exchanged over UDP. The replication factor is determine by the number of write copies defined in the cluster's toplogy. When a node receives a new metric data point, it calculates which nodes should "own" this particular stream, and, if necessary, writes out the data to a local, per-node journal. This journal is then read behind and replayed to the destination node.

When a remote node is unavailable, its corresponding journal on the remaining active nodes continues to collect new metric data that is being ingested by the cluster. When that node comes back online, its peers begin feeding it their backlog of journal data, in addition to any new ingestion which is coming directly to the returned node.

Proxying

Clients requesting metric data from IRONdb need not know the specific location of a particular stream's data in order to fetch it. Instead, they may request it from any node, and if the data are not present on that node, the request is transparently proxied to a node that does have the data. Because nodes can fail and need to catch up with their peers, proxying favors remote nodes that are the most up to date. This is determined from the gossip data, which includes a latency metric, indicating the most recent replication message that this node has seen from each of its peers. The node performing the proxying decides which of the other nodes that own the given metric has the most recent data.

If gossip state is unavailable, such as due to a network partition, the node handling the request may return less recent data, if it proxies to a node that happens to be behind, or none at all, if the requested data is not available locally and all other owning nodes are unavailable.

Operations Dashboard

IRONdb comes with a built-in operational dashboard accessible via port 8112 in your browser, e.g., http://irondb-host:8112. This interface provides real-time information about the IRONdb cluster. There are a number of tabs in the UI, which display different aspects about the node's current status.

Overview Tab

The "Overview" tab displays a number of tiles representing the current ingestion throughput, available rollup dimensions, license information, and storage statistics.

Image: &#39;ops_overview_tab.png&#39;

Ingestion

Read (Get) and Write (Put) throughput, per second.

  • "Batch" is an operation that reads or writes one or more metric streams.
  • "Tuple" is an individual measurement.

Therefore, a write operation that PUTs data for 10 different streams in a single operation counts as 1 Batch and 10 Tuples.

License Info

Displays details of the node's license.

Numeric Rollups

Displays throughput for both reads and writes per second for numeric rollup data.

  • "Cache Size" is the number of open file handles for numeric rollup data. A given stream's data may be stored in multiple files, one for each configured rollup period in which that stream's data has been recorded.
  • "Rollups" is the list of available rollup periods.

Histogram Rollups

Displays throughput for both reads and writes per second for histogram rollup data.

  • "Rollups" is the list of available rollup periods.

Text Changesets

Displays throughput for both reads and writes per second for text data.

Storage

Disk space used and performance data per data type and rollup dimension.

Each icon under "Performance" displays a histogram of the associated operation (Get/Put/Proxy) latency since the server last started. "Get" operations are reads, "Put" are writes, and "Proxy" are operations that require fetching data from a different node than the one which received the request.

Image: &#39;ops_perf.png&#39;

Latencies are plotted on the x-axis as seconds, with suffixes "m" for milliseconds, "μ" for microseconds, and "n" for nanoseconds. Counts of operations in each latency bucket are on the y-axis. The mean latency for the set is displayed as a vertical green line.

Image: &#39;ops_perf_quantile.png&#39;

Hovering over the x-axis will display a shaded region representing quantile bands and the latency values that fall within them. The quantiles are divided into four bands: p(0)-p(25), p(25)-p(50), p(50)-p(75), and p(75)-p(100). To avoid losing detail, the maximum x-axis values are not displayed, but the highest latency value may be seen by hovering over the p(75)-p(100) quantile band.

Hovering over an individual latency bar will display three lines at the top right corner of the histogram. These represent the number of operations that had less than, equal to, or greater than the current latency, and what percentage of the total each count represents.

Image: &#39;ops_perf_latency.png&#39;

The Used, Total, and Compress Ratio figures represent how much disk space is occupied by each data type or rollup, the total filesystem space available on the node, and the ratio of the original size to the compressed size stored on disk. The compression ratio is determined from the underlying ZFS filesystem.

Replication Latency Tab

Two types of latency are displayed here: "replication latency" and "gossip age". Replication latency is the difference between the current time on each node and the timestamp of the most recently received metric in the journal data from a remote node. Replication status information is exchanged between nodes using "gossip" messages, and the difference between the current time and the timestamp of the last gossip message received is the "gossip age". Gossip messages contain all replication state for a given node relative to all other nodes, so the state of the entire cluster can be seen from any node's UI.

Image: &#39;replication_tab.png&#39;

Each node in the cluster is listed in a heading derived from the topology configuration, and a gossip age in parentheses (see below). The node's latency summary is displayed at the right end of the heading line, and is an average of the replication latency between this node and all remote nodes. This is intended as a quick "health check" as to whether this node is significantly behind or not.

Clicking on the heading exposes a list of peer nodes, also from the topology configuration, and a replication latency indicator for each. Each peer's latency may be understood as "how far behind" the selected node is from that peer's current ingestion. In the example above, we can say that node "171" is 0 seconds behind from its peers "172" and "173".

All nodes should be running NTP or similar time synchronization. For example, if a remote node is shown as "(0.55 seconds old)", that means that a gossip message was received from that node 0.55 seconds ago, relative to the current node. Nodes that have persisently high gossip age, or peer latencies that do not drop to zero, may have clock skew.

Packet loss is another possible cause of replication latency. If a remote node's gossip latency varies widely, it could mean that gossip packets are being lost between hosts.

If the current node has never received a gossip message from a remote node since starting, that node will be displayed with a black bar, and the latency values will be reported as "unknown". This indicates that the remote node is either down or there is a network problem preventing communication with that node. Check that port 8112/udp is permitted between all cluster nodes.

Display Colors

Both gossip age and replication latency are also indicated using color.

Image: &#39;replication_latency.png&#39;

The heading of the node being viewed will always be displayed in blue.

Gossip ages for remote nodes are colored in the heading as follows:

  • Green means a difference of less than 2 seconds
  • Yellow means a difference of more than 2 seconds and less than 8 seconds
  • Red means a difference of more than 8 seconds
  • Black means no gossip packets have been received from the remote host since this host last booted.

Latency summaries in the heading are colored as follows:

  • If the node is behind W or more nodes by more than 4.5 minutes, then the summary is "latencies danger", and colored red.
  • If the node is behind W-1 or more nodes by more than 30 seconds, then the summary is "latencies warning", and colored yellow.
  • Otherwise, the average of all peer latencies is displayed, and colored green.

Replication latency indicators for individual remote nodes are colored as follows:

  • Green for less than 30 seconds behind
  • Yellow for more than 30 seconds but less than 270 seconds (4.5 minutes) behind
  • Red for more than 270 seconds (4.5 minutes) behind

Topology Tab

Displays the layout of the topology ring, and the percentage of the key space for which each node is primarily responsible (coverage.) The ideal distribution is 1/N, but since the system uses consistent hashing to map metric names to nodes, the layout will be slightly imperfect.

Image: &#39;topology_tab.png&#39;

An individual stream may be located by entering its UUID and Metric Name in the Locate Metrics tile, and then clicking the Locate button. Numbers indicating the primary and secondary owners of the metric (or more if more write copies are configured) will appear next to the corresponding node.

Image: &#39;topology_locate_metrics.png&#39;

Extensions Tab

Displays a list of the loaded Lua extensions that provide many of the features of IRONdb.

Internals Tab

Shows internal application information, which is useful for troubleshooting performance problems. This information is divided into panels by the type of information contained within. These panels are described below.

Logs

The Logs panel of the Internals tab shows recent entries from the errorlog. When the Internals tab is first displayed, the Logs panel is expanded by default.

Job Queues

The Job Queues panel lists libmtev eventer job queues (aka "jobqs"), which are groups of one or more threads dedicated to a particular task, such as writing to the database, or performing data replication. These tasks may potentially block for "long" periods of time and so must be handled asynchronously to avoid stalling the application's event loop.

Job queues have names that indicate what they are used for, and concurrency attributes that control the number of threads to use in different scenarios.

At the top right of the Joq Queues panel is a toggle that controls whether to display jobqs currently in use ("Used") or all existing jobqs ("All"). The default is to show only in-use jobqs.

The toggle first appeared in version 0.15.1

Image: &#39;jobq_panel.png&#39;

Each row in the panel represents a job queue, with the following columns:

  • Queue: the jobq name, preceded by a gauge of jobs that are either in-flight or backlogged (waiting to be enqueued.)
  • Concurrency: the number of threads devoted to this jobq. This may be expressed as a pair of numbers separated by an arrow, indicating the current thread count (left) out of a potential maximum thread count (right). It may also be shown as a single number, meaning either that the queue is of a fixed size, or that a dynamic queue is at its maximum concurrency.
  • Processed: a counter of jobs processed through this jobq since the application last booted.
  • Waiting: information on jobs waiting in the queue. From left to right, three pieces of information are visible:
    • A button for displaying a histogram of wait latencies for the queue, since application boot. This is the same type of histogram as used for Storage latencies in the Overview tab.
    • The average time that jobs spent waiting to be processed in the queue, in milliseconds, since the last refresh (5 seconds).
    • The instantaneous count of jobs currently waiting in the queue.
  • Running: information on jobs actively running in the queue. From left to right, three pieces of information are visible:
    • A button for displaying a histogram of run latencies for the queue, since application boot. This is the same type of histogram as used for Storage latencies in the Overview tab.
    • The average time that jobs spent running in the queue, in milliseconds, since the last refresh (5 seconds).
    • The instantaneous count of jobs currently running in the queue.

Sockets

The Sockets panel displays information on active sockets. These include both internal file descriptors for the libmtev eventer system, as well as network connections for REST API listeners and clients.

Image: &#39;sockets_panel.png&#39;

Each row in the panel corresponds to one socket, with the following columns:

  • FD: the file descriptor number that corresponds to the socket, and the value of the eventer mask. The mask determines what type of activity will trigger the callback associated with the socket. Typical values are (R)ead, (W)rite, and (E)xception. If multiple values are set, they are separated by a vertical bar.
  • Opset: the "style" of socket determines the set of operations that may be performed on the socket. Typical values are "POSIX", which means the standard set of POSIX-compliant calls like accept() and close() are available, and "SSL", which adds SSL/TLS operations. The vast majority of sockets in IRONdb will be of the POSIX type.
  • Callback: the libmtev function that will be called when the socket is triggered by activity matching the socket's mask. For example, if a socket has the Read mask, and there is data on the socket to read, the associated callback function will be invoked to handle reading that data.
  • Local: if the socket is part of a network listener or established connection, this will be the IP address and port of the local side of the connection.
  • Remote: if the socket is part of a network listener or established connection, this will be the IP address and port of the remote side of the connection.

Network sockets:

Image: &#39;sockets_net.png&#39;

Timers

The Timers panel displays information on timed events. IRONdb does not make extensive use of timed events so this panel is often empty.

Each row in the panel lists a timed event, with the following columns:

  • Callback: the libmtev function that will be called when the appointed time arrives.
  • When: the time that the callback should fire.

Stats

The Stats panel displays all statistics application statistics that have been registered into the system. These are collected and maintained by the libcircmetrics library. Statistics accumulate over the lifetime of the process, and are reset when the process restarts.

At the top of the panel is a Filter field where you can enter a substring or regex pattern to match statistics. Only those statistics matching the pattern will be displayed. This is a useful way to narrow down the list of statistics, which can be quite long.

The filter field first appeared in version 0.15.4.

Image: &#39;stats_panel.png&#39;

Stats are namespaced to indicate what they represent:

  • mtev: internal libmtev statistics
    • eventer: stats related to the operation of the event system
      • callbacks: each named callback registered in the system gets a "latency" statistic that is a cumulative histogram of all latency values for this callback since boot.
      • jobq: each jobq registered in the system gets a set of stats that convey various information about that jobq. The same information appears in the Job Queues panel, without the mtev.eventer prefix.
      • pool: per-loop statistics for named event loops. Cycletime is a histogram of elapsed time (in seconds) between iterations of the loop. Callbacks is a histogram of all callback latencies witnessed by the loop, also in seconds.
      • threads: per-thread cycle times, in seconds.
    • memory: memory allocation statistics.
    • modules: statistics exposed by libmtev modules.
    • pool_N: resource statistics for mtev_intern, a facility that reduces application memory usage by allowing multiple consumers to utilize a single copy of a given string or binary blob. IRONdb uses mtev_intern in the surrogate_db implementation.
    • rest: latencies for calls to REST endpoints.
  • snowth: IRONdb application information. Some stats are used to drive other parts of the UI, such as GET/PUT counters and histograms in the Overview. All of these stats are also available at /stats.json, without the snowth. prefix.