Skip to main content

Release Notes 2018

2018-12-31

caql_broker

Version: 0.1.1545429321 (fa98828)

  • Tag search support.

data_storage

Version: 0.1.1545615771 (be925c1)

  • Skip deleting from the raw database when doing full deletes. Raw shards will be automatically deleted during rollup, so this saves considerable time in the full delete case.
  • Remove timeout on single-metric raw deletes that caused spurious failures on larger delete operations.
  • Extend NNT delete debug logging to cover NNTBS.
  • Changes to support caql-broker tag search.
  • CAQL: Create NULL literal that can be injected as a constant, as with VIEW_PERIOD and VIEW_RANGE.
  • Fix several surrogate-db processing issues.

fault_detection

Version: 0.1.1544639627 (db11a23)

  • If not processing rules, do not load absence timers or perform broker checks.

Reconnoiter

Related roles: broker, caql_broker, data_storage, stratcon, web_stream

Version: 0.1.1545407666 (e21a326)

  • Truncate long tag categories and/or values uniquely. Truncated tags will show the longest possible amount of the original tag category or value, suffixed with _tldr_<sha1_hex> where <sha1_hex> is the hash of the original category or value.
  • Changes to support caql-broker tag search.

GoAPI

Related roles: api, web_frontend

Version: 0.1.1546009951 (21a3612)

  • Fix search bug with base64-encoded regex queries.
  • SAML authentication support.

Web UI/API

Version: 0.1.1545359271 (65330ee)

  • UI: Fix "Add Login" control for Okta on user profile page.
  • UI: Fix encoding of annotation titles and descriptions on annotation details page.
  • Allow searching rules by check UUID.
  • Prevent repeat alerts from rules on inactive checks.

2018-12-17

caql_broker

Version: 0.1.1544737073 (b5c4ce9)

  • Fix issue with search results not being updated regularly. Expect more load on the search API endpoint after updating.
  • Fix issue with caql-broker not being able to start when a snowth node was reconstituting.

data_storage

Version: 0.1.1544734951 (83e0481)

  • Two related bug fixes in the surrogate DB that manifest with metrics whose total stream tag length is more than 127 characters. Metrics with such tag sets could appear to be missing from search results. Metrics that do not have any stream tags, or whose total tag set is less than 127 characters, are not affected.
  • Fix bug that causes hanging when trying to delete certain metrics.
  • Fix occasional crash related to reading NNTBS data.
  • Add optional metric-delete debug logging.

GoAPI

Related roles: api, web_frontend

Version: 0.1.1544719966 (ce6484c)

  • Add HTTP basic auth support to metric search endpoint

Hooper

Version: 0.1.1544473897 (53351fe)

  • Allow multiple caql_broker hosts, configured as a cluster.
  • Restart GoAPI search service on changes to circonus.conf template.

Web UI/API

Version: 0.1.1544731582 (c8659a4)

  • UI: Check details bugfix: when decoding a histogram 'last value', don't error on 0 values.

2018-12-10

data_storage

Version: 0.1.1544199703 (aabb973)

  • Fix crash in metric serialization.
  • Reclassify an error message as a debug message - message occurs in a situation that is not a malfunction and can fill the logs.
  • Fix bug where text and histogram data transfer could get hung during reconstitute.
  • Fix memory ordering related crash in string intern implementation (libmtev).
  • Fix a bug where reconstitute process could get deadlocked and not make progress.
  • Fix a potential crash that could occur when reconstituting surrogate data.
  • Fix a bug where deleting a metric on a system would not remove the surrogate entry if the metric was not local to the node.
  • Add thread to run dictionary compaction in mtev_intern for the stratcon raw ingestor module.

fault_detection

Version: 0.1.1543873734 (22e92a8)

  • Ignore all ruleset patterns

libmtev

Related roles: broker, caql_broker, data_storage, stratcon, web_stream

Version: 1.5.28

  • Fix mtev_intern memory volatility/ordering issues.

Web UI/API

Version: 0.1.1544132942 (9b517cb)

  • UI: Fix Y-axis auto-scaling on graphs with empty histograms.
  • UI: Properly identify histogram metrics on check creation for JSON payloads.
  • Restore mq "firehose" metric in web_stream selfcheck.
  • Pre-release prep for CAQL broker clustering.

2018-12-03

caql_broker

Version: 0.1.1543406236 (0ba5459)

  • Add a counter for dropped messages.
  • Add /v2 to metric_cluster URL.

data_storage

Version: 0.1.1543521136 (dace2b5)

  • Increased speed of surrogate cache loading at startup.
  • Add snowthsurrogatecontrol tool, which allows offline review and modification of the surrogate database.
  • Fix reconstitute bug edge case where certain metric names would cause the reconstitute to spin/cease progress.
  • Fix bug where certain HTTP requests could hang.
  • Change default raw db conflict resolver to allow overriding old data with flatbuffer data from a higher generation.
  • Fix crash in metric serialization.
  • Memory utilization improvements.
  • Memory-leak fixes.

GoAPI

Related roles: api, web_frontend

Version: 0.1.1543502147 (c9c5e08)

  • Preserve escape sequences in alert_formats JSON from database.
  • Add X-Circonus-More-Items response headers to metrics search responses, when needed.
  • base64-encode all name/tag queries when sending search requests to data_storage/IRONdb.

Hooper

Version: 0.1.1543866611 (0a308b7)

  • Add crash-reporting support (Backtrace) to web_stream.
  • Restart stratcon service when its package updates.
  • Add custom-generated Diffie-Hellman (DH) parameters for stratcon TLS connections.
  • Fix first-time setup issue with missing MQ directory.
  • Change a comment line in circonus.conf so as not to interfere with GoAPI config parsing.

libmtev

Related roles: broker, caql_broker, data_storage, stratcon, web_stream

Version: 1.5.26

  • Fix DNS fast failures in lua that could cause null pointer dereference.
  • Fix support for aco-style REST handlers. This bug manifested as failed upload support.
  • Fix naming of aco events. They now report the underlying event.
  • Rearchitect the watchdog timeouts to allow children to cooperate and signal into the correct thread so we get a SIGTRAP-induced stack trace from the offending thread. (only systems with pthread_sigqueue, like Linux).
  • Articulate in logs and in glider invocation which thread watchdogged.
  • Fix hangs in HTTP content upload when clients paused in the middle of a block (bug introduced in 1.5.24)
  • Fix ACO registry mismanagement causing crashes.
  • Fix leak of ck_epoch_record on thread termination.

Reconnoiter

Related roles: broker, caql_broker, data_storage, stratcon, web_stream

Version: 0.1.1543411336 (2f9fa32)

  • Fix null-pointer dereference in DNS check.
  • Include thread names for jlog threads (assists with debugging).

Web UI/API

Version: 0.1.1543608912 (eb918bc)

  • Update selfchecks with metrics relevant to enzo-c.
  • Fix for invalid search query errors.
  • Add graphite_tls to check modules database table for new installations.
  • Let create_super_admin.pl take a password on the command-line. This allows for more automation for initial setups.
  • UI: Fix viewing of metric names containing a "/" character.
  • API: Fix for database deadlock when updating large numbers of checks in a single transaction.
  • UI: Do not show broken link in popup dialog on graphs page.
  • UI: Fix graph cluster datapoint expansion to properly handle histogram metrics from new GoAPI search service.
  • API: Document that GoAPI search supports the X-Circonus-More-Items response header for paged results.
  • UI: Clean up the layout of the Integrations module grid.

2018-11-19

caql_broker

Version: 0.1.1542379141 (de348ca)

  • Remove warning log messages for missing checks.
  • Add instrumentation for perceived message delay.
  • Use /v2/metric_old endpoint for search resolution, allowing v3 upgrade of the web-service.
  • Run caql-broker as non-root user.

data_storage

Version: 0.1.1542336753 (c7ddb12)

  • CAQL: Fix histogram validation
  • New module, graphite_egress_alter for applying Graphite-specific transforms on metric results before they are sent back in a Graphite request.
  • Fix storage_file open race.
  • Remove NNTBS info_db metadata database.
    • The info_db was LMDB with a continuous update cycle. All rows were replaced every rollup period causing horrific churn and bad performance pathologies on ZFS.
    • This entirely eliminates the database and replaces it with on-demand determination of epoch/apocalypse.
    • We introduce a new surrogate function to iterate over all surrogate keys which we now use for inventory processing during reconstitute and rollup recreation.
  • Crash fix in Graphite response when expanding names that are leaves.
  • CAQL: Allow "query" as alias for "q" parameter.
  • Improved surrogate DB performance and reduced memory usage.
  • Use the jemalloc allocator by default on Linux.
  • Fix watchdog in full-delete path, when finding the list of metrics to delete.
  • Provide offline surrogate DB maintenance tool.
  • Fix issue where Graphite metrics tags were mixed with Circonus stream tags.
  • Fix crash caused by rare race condition when inserting new metrics into the surrogate DB.

GoAPI

Initial release: This is a new internal component that runs on API and web_frontend nodes. It supports the new Stream Tags features for the Web UI and API, and will gradually take over additional REST endpoints from the existing Perl-based middleware.

Version: 0.1.1542652206 (68c290a)

Hooper

Version: 0.1.1542400689 (1dfdae0)

  • Activate new GoAPI service on API and web_frontend roles.
  • Add new Search::V3 configurations to circonus.conf.

libcircmetrics

Related roles: caql_broker, data_storage, stratcon, web_stream

Version: 0.0.1.1542307708

  • Use new h histogram type for histogram stats.
  • Improve performance by using different locking strategies.

libmtev

Related roles: broker, caql_broker, data_storage, stratcon, web_stream

Version: 1.5.23

  • Make SSL "connection closed" accept failures a debug message.
  • Remove port from SSL connection failures so they log dedup.
  • Make ncct (telnet console) output thread safe (crash fix).
  • Fix leak of thread name in SMR context.
  • Add eventer_jobq_memory_safety_name() function.
  • Add reporting on SMR activity.
  • Avoid unnecessary epoch synchronization (SMR), when there is no work to do.
  • Fix SMR regression in jobs thread wind-down.
  • Fix REST-driven jemalloc heap profiler.
  • Do not block thread exit for SMR, instead disown the return queue and allow gc thread to cleanup (this also fixes leaks at thread exit)

Reconnoiter

Related roles: broker, caql_broker, data_storage, stratcon, web_stream

Version: 0.1.1542385741 (9f4ac57)

  • Fix replacing deleted checks. When a check is deleted, it is marked as deleted and recycled. This allows for it to persist long enough to be unused and replicated if needed. This change solves two issues:
    • When a deleted check still in config was set via API, the deleted attribute was not cleared, so it appeared in config (and in running system) as updated-but-still-deleted.
    • When a deleted check wasn't yet flushed from config and the instance restarts, the check would be loaded as deleted, but not scheduled. This resulted in not recycling it. Now it is scheduled and immediately descheduled so that normal recycling happens.
  • Fix crash where statsd tries to use polls before initialization.
  • Support heap profiling.

Web UI/API

Version: 0.1.1542657093 (66f65d8)

  • New feature: Stream Tags

    • In the UI, there is a new interface for metric searching, called "Metrics Explorer". It utilizes a new V3 search syntax.
    • In the API, metrics search uses the same syntax (V3) as the UI. All other object type searches continue to use the existing V2 syntax.
  • UI: Remove filter_rules feature flag.

  • UI: Disable selecting metrics from "change brokers and metrics" if there are filters on the checkbundle.

  • UI/API: reconstitute_noit filter_rules vs filter_id: move validation into scope that needs it.

  • API: Handle user and contact_group REST endpoints via GoAPI.

  • Address unescaping of form args for RawForm.

  • API: Set account_id and check_name header fields in CAQL Backfill requests.

2018-11-05

caql_broker

Version: 0.1.1541163185 (8d216aa)

  • Fix histogram handling.
  • Surface more warnings in the error log.
  • Don't log the first HTTP retry.

data_storage

Version: 0.1.1541107572 (386a237)

  • Performance improvements to parsing surrogate database at startup.
  • Fix some potential crashes.
  • Disable saving ptrace stdout output files in the default circonus-watchdog.conf file.
  • Improve error checking when opening NNTBS timeshards.
  • Improve surrogate DB startup informational logging.
  • Various memory usage optimizations to reduce the amount of memory needed for snowthd to operate.
  • Remove global variables from Backtrace.io traces.
  • Add ability to delete surrogates from the system that are no longer used.
  • Remove temporary files used during reconstitute - there were a handful of files staying on disk and taking up space unnecessarily.
  • Increase timeout for pulling raw data during reconstitutes.
  • Adopt a more time- and space-efficient strategy for graphite searches.
  • Fix logging bug where long lines could end up running together.
  • Fix crash bug in histogram fetching API.

FQ

Related roles: mq

Version: 0.10.14

  • No user-facing changes since 0.10.12.

libmtev

Related roles: broker, data_storage, stratcon, web_stream

Version: 1.5.19

  • Have luamtev use a default pool concurrency of 1, add -n option.
  • Disable log dedup in luamtev by default.
  • Fix improper calculation of required space in base64 encode/decode that could allow two bytes of overrun in decoding into a "too small" buffer.
  • Make mtev_memory_{begin,end} recursively safe.
  • Use asynch barrier SMR in jobqs.
  • Avoid clipping last letter off long log lines.
  • Apply lua GC on next tick and not inline.
  • Make "cs" the default jobq memory safety level.
  • Fix off-by-on error in lua_web lua stack management (crash fix).
  • Move SMR maintenance into the eventer (out of a callback).
  • Fix livelock in mtev_intern when racing for a removed object.
  • Make the SMR cleanup in thread termination asynch (fix CPU burn).

Reconnoiter

Related roles: broker, data_storage, stratcon, web_stream

Version: 0.1.1541084944 (ec587d4)

  • Better validation within the ping_icmp module.
  • Send full metric name with tags to web_stream service. Fix for tagged metrics not showing up in graph play and dashboards.
  • Transient checks failed to update target_ip and name.
  • Expose selfcheck metrics under the "broker" namespace as well.
  • Explicitly disable dedup on noit feed elements. Fix for "gappy" collection of some metrics.

Web UI/API

Version: 0.1.1541117507 (96212ab)

  • UI: Remove "beta" decal from CAQL integrations menu.
  • UI: Stop mass updating last_modified_* on checks by bundle. Fixes unnecessary DB query.
  • API: Set check target to existing value on update if not passed by client.
  • UI: Re-introduce 'last values' to check details page at metric load time.

2018-10-22

data_storage

Version: 0.1.1539701422 (9838f50)

  • Add activity range parameters to /tag_cats and /tag_vals REST endpoints, add category parameter to /tag_vals.
  • Speed up loading of surrogate database by parallelizing the work.
  • Modest locking performance increase in surrogate database load.
  • Stop saving crash trace stdout to *.trc files, since the tracer produces its own output file.

libmtev

Related roles: broker, data_storage, stratcon, web_stream

Version: 1.5.12

  • Be extra cautious when shutting down the last thread in a pool to make sure there is no backlog.
  • Fix header to expose eventer_jobq_set_floor correctly.
  • Expose more controls for jobq mutation via console.

Reconnoiter

Related roles: broker, data_storage, stratcon, web_stream

Version: 0.1.1539352534 (23a9353)

  • Fixes for showing checks in a cluster.

Web UI/API

Version: 0.1.1540240496 (682d9c9)

  • UI/API: Add Graphite TLS check type.
  • UI: Layout adjustments on metric details, alert details pages.
  • UI: Minor layout bugfixes for change-brokers-and-metrics panel.
  • UI: CSS fix for radio button spacing in DNS check config panel.

2018-10-12

data_storage

Version: 0.1.1539280608

  • When loading a topology that has already been loaded, return HTTP 200 instead of 500.
  • Move Zipkin setup messages out of the error log and into the debug log.
  • Skip unparseable metric_locators during replication.
  • Turn off sync writes in tagged surrogate writer.
  • Fix potential crashes when check_name is NULL.
  • Documentation: fix missing rebalance state.
  • Add log deduplication to avoid spamming errorlog with identical messages.
  • Fix potential deadlock that could be triggered when forking off a process to be monitored by the watchdog.
  • Fix some potential crashes/memory leaks.
  • When loading a new topology, return 200 status instead of 500 if the topology is already loaded.
  • Support tag removal.
  • Performance/stability improvements for activity list operations.
  • Fix wildcard/regex queries inside tag categories.

libmtev

Version: 1.5.11

  • Implement log deduplication via dedup_seconds configuration option.
  • Watchdog config option to disable saving of glider stdout, useful in cases where the glider produces its own output files.
  • Document mtev.xml* functionality.
  • Fix unsafe fork (fork while resize_lock held) in logging subsystem.
  • Fix tagged release version extraction.
  • Fix infinite loop when logging oversized log entries, introduced in 1.5.8.

Reconnoiter

Version: 0.1.1539263519

  • Protect against empty rulesets.
  • Fix AWS check module to properly handle spaces in metric names.
  • Clearer error messages for REST calls.
  • Allow colons in stream-tag values.
  • Find filtersets below toplevel.
  • Fix for broker config corruption.

Web UI/API

Version: 0.1.1539284260

  • UI: Updated keyboard help overlay to remove old invalid keyboard shortcuts.
  • UI: Expose Prometheus check type when broker capability allows.
  • UI: Avoid displaying encoded histogram values in check preview.
  • UI: When initializing overlays on shared graphs, don't try to pull the share config before it's loaded.

2018-09-25

data_storage

Version: 0.1.1537899481

  • CAQL: add comparators to the each package, which operates on all input slots at once: gt, lt, geq, leq.
  • Fix activity-tracking replication
  • Allow 4096 chars for metric name ingestion
  • Locking changes for better performance on high-contention locks
  • Move raw ingestion startup off of the main eventer thread to prevent watchdogs
  • CAQL: Remove wrap_false from histogram:* functions. Histograms can't be missing, they can only be empty.
  • CAQL: Map histogram:* functions. So that:
    • The case of zero slot arguments is handled correctly
    • We apply the functions to all input slots
  • Don't loop forever when journal writes are in the future
  • CAQL: Check time during bundle loops
  • Disable mtev async core dumps, preventing double crashes (where a "real" crash is followed by a second crash due to a database lock still existing)
  • Various crash fixes

Hooper

Version: 0.1.1537455639 (EL7, OmniOS) Version: 0.1.1536174707 (EL6)

  • Remove obsolete grover_queue* services. These have not been used in a long time.
  • (OmniOS) Use a larger ZFS recordsize for lt-final dataset in the long_tail_storage role. This yields better compression ratios.

libmtev

Version: 1.5.7

  • Add the libluajit default path/cpath to luamtev by default
  • Fix compressed non-chunked encoding
  • Better error on improper rest registration
  • Introduce mtev_watchdog_disable_asynch_core_dump

Reconnoiter

Version: 0.1.1537458244

  • Fix memory leak: incomplete search tag parse-tree freeing
  • Automatic histograms (PR 482)
  • Support account_id, check_{uuid,id} suppressions
  • Support multi-document streaming JSON to httptrap
  • Put prometheus module into a dedicated eventer pool
  • Support NOIT_MODULES environment variable (PR 493)

Web UI/API

Version: 0.1.1537199732

  • UI: Fix bug in Quick Graph adding that led to metrics being displayed as inactive
  • UI: Block "Enter" key on metric filtering field and prevent some regular expression errors on change-metrics dialog
  • API: Account for prometheus check module

2018-09-03

Includes changes since release 2018-08-16

data_storage

Version: 0.1.1536172853

  • Expose stream tags in search results from the Lua API
  • New, optional journal and background job for managing activity tracking outside the normal ingest path
  • CAQL: new group_by function
  • Stop statically linking libzfs, always dlopen() if available
  • Parse "seconds.milliseconds" from incoming histogram records
  • Fix log enable/disable options (-L/-l)
  • Replace Lua histogram implementation with one that makes use of the C functions from libcircllhist for efficiency.
  • Prevent race in the REST delete endpoints
  • Add check-wide delete methods for raw and numeric rollup data

fault_detection

Version: 0.1.1534363526

  • Add flag to disable rule processing

libmtev

Version: 1.5.4

  • Fix mtev.shared_seq() producing duplicate keys during startup.
  • Add mtev_cluster_node_get_idx to get a node's deterministic offset in a cluster topology.
  • Make mtev_hash_merge_as_dict safe for NULL values.
  • Fix reported memory leak in DWARF reading.
  • Fix race conditions in freeing mtev_websocket_client_t.
  • Fix race in lua state (mtev lua coroutine) GC.
  • Remove local callback latency tracking.
  • Add per-pool callback latency tracking.
  • Skip epoch reclamation in threads that have never freed anything.
  • Always do asynchronous barrier epoch collection from the eventloop.
  • Batch asynchronous epoch reclamation to reduce epoch synching.
  • Fix lua/ssl_upgrade eventer actuation.
  • Add granular lua garbage collection configuration. default: step 1000 time before a full collect.
  • Monitor process now passes TERM, QUIT, and INT signals to child.
  • Fix a bug where we were not always closing the socket/connection in lua_web_resume - could cause connections to hang.
  • Fix a lock contention issue that occurred at startup.
  • Fix a memory leak in the lua path.
  • Fix some clean targets in the Makefile that were inadequate.
  • Move some logging from error log to debug log.
  • Fix gc_full=1 to fire on every invocation as documented.
  • Fix asynchronous memory reclamation.
  • Protect against attempting to close invalid file descriptors.
  • Do proper cleanup of eventer objects, even if not registered.
  • Fix internal accounting stats for eventer allocations.
  • Don't gate startup of event loops.
  • Fix a leak of per-thread Lua closure structs.

Reconnoiter

Version: 0.1.1536172454

  • Change the default behavior of check stats to not be perpetually cumulative (and thus potentially memory exhausting).
  • Fix null-termination of tags in a Prometheus check
  • Avoid null dereference in Lua checks
  • Move to new Lua garbage-collection capabilities in libmtev 1.5
  • Initiate Lua GC whenever returning from a coroutine
  • Fix incorrect initialization function for check stats
  • noit_check_resolver should protectively initialize

Web UI/API

Version: 0.1.1536180382

  • UI: Fix JS error when sharing a graph.
  • UI: Performance fix for check creation stalled when a large number of metrics are present.
  • UI: Fix bug related to honoring grid preference on worksheet and metrics pages.
  • UI: Fix bug where a plus (+) character wasn't searchable within a metric name.
  • UI: Fix problem with metric names not showing correctly in metric list dialogs.