Release Notes 2018
2018-12-31
caql_broker
Version: 0.1.1545429321 (fa98828)
- Tag search support.
data_storage
Version: 0.1.1545615771 (be925c1)
- Skip deleting from the raw database when doing full deletes. Raw shards will be automatically deleted during rollup, so this saves considerable time in the full delete case.
- Remove timeout on single-metric raw deletes that caused spurious failures on larger delete operations.
- Extend NNT delete debug logging to cover NNTBS.
- Changes to support caql-broker tag search.
- CAQL: Create NULL literal that can be injected as a constant, as with VIEW_PERIOD and VIEW_RANGE.
- Fix several surrogate-db processing issues.
fault_detection
Version: 0.1.1544639627 (db11a23)
- If not processing rules, do not load absence timers or perform broker checks.
Reconnoiter
Related roles: broker, caql_broker, data_storage, stratcon, web_stream
Version: 0.1.1545407666 (e21a326)
- Truncate long tag categories and/or values uniquely. Truncated tags will show
the longest possible amount of the original tag category or value, suffixed
with
_tldr_<sha1_hex>
where<sha1_hex>
is the hash of the original category or value. - Changes to support caql-broker tag search.
GoAPI
Related roles: api, web_frontend
Version: 0.1.1546009951 (21a3612)
- Fix search bug with base64-encoded regex queries.
- SAML authentication support.
Web UI/API
Version: 0.1.1545359271 (65330ee)
- UI: Fix "Add Login" control for Okta on user profile page.
- UI: Fix encoding of annotation titles and descriptions on annotation details page.
- Allow searching rules by check UUID.
- Prevent repeat alerts from rules on inactive checks.
2018-12-17
caql_broker
Version: 0.1.1544737073 (b5c4ce9)
- Fix issue with search results not being updated regularly. Expect more load on the search API endpoint after updating.
- Fix issue with caql-broker not being able to start when a snowth node was reconstituting.
data_storage
Version: 0.1.1544734951 (83e0481)
- Two related bug fixes in the surrogate DB that manifest with metrics whose total stream tag length is more than 127 characters. Metrics with such tag sets could appear to be missing from search results. Metrics that do not have any stream tags, or whose total tag set is less than 127 characters, are not affected.
- Fix bug that causes hanging when trying to delete certain metrics.
- Fix occasional crash related to reading NNTBS data.
- Add optional metric-delete debug logging.
GoAPI
Related roles: api, web_frontend
Version: 0.1.1544719966 (ce6484c)
- Add HTTP basic auth support to metric search endpoint
Hooper
Version: 0.1.1544473897 (53351fe)
- Allow multiple
caql_broker
hosts, configured as a cluster. - Restart GoAPI search service on changes to
circonus.conf
template.
Web UI/API
Version: 0.1.1544731582 (c8659a4)
- UI: Check details bugfix: when decoding a histogram 'last value', don't error on 0 values.
2018-12-10
data_storage
Version: 0.1.1544199703 (aabb973)
- Fix crash in metric serialization.
- Reclassify an error message as a debug message - message occurs in a situation that is not a malfunction and can fill the logs.
- Fix bug where text and histogram data transfer could get hung during reconstitute.
- Fix memory ordering related crash in string intern implementation (libmtev).
- Fix a bug where reconstitute process could get deadlocked and not make progress.
- Fix a potential crash that could occur when reconstituting surrogate data.
- Fix a bug where deleting a metric on a system would not remove the surrogate entry if the metric was not local to the node.
- Add thread to run dictionary compaction in mtev_intern for the stratcon raw ingestor module.
fault_detection
Version: 0.1.1543873734 (22e92a8)
- Ignore all ruleset patterns
libmtev
Related roles: broker, caql_broker, data_storage, stratcon, web_stream
Version: 1.5.28
- Fix mtev_intern memory volatility/ordering issues.
Web UI/API
Version: 0.1.1544132942 (9b517cb)
- UI: Fix Y-axis auto-scaling on graphs with empty histograms.
- UI: Properly identify histogram metrics on check creation for JSON payloads.
- Restore mq "firehose" metric in web_stream selfcheck.
- Pre-release prep for CAQL broker clustering.
2018-12-03
caql_broker
Version: 0.1.1543406236 (0ba5459)
- Add a counter for dropped messages.
- Add
/v2
to metric_cluster URL.
data_storage
Version: 0.1.1543521136 (dace2b5)
- Increased speed of surrogate cache loading at startup.
- Add
snowthsurrogatecontrol
tool, which allows offline review and modification of the surrogate database. - Fix reconstitute bug edge case where certain metric names would cause the reconstitute to spin/cease progress.
- Fix bug where certain HTTP requests could hang.
- Change default raw db conflict resolver to allow overriding old data with flatbuffer data from a higher generation.
- Fix crash in metric serialization.
- Memory utilization improvements.
- Memory-leak fixes.
GoAPI
Related roles: api, web_frontend
Version: 0.1.1543502147 (c9c5e08)
- Preserve escape sequences in alert_formats JSON from database.
- Add
X-Circonus-More-Items
response headers to metrics search responses, when needed. - base64-encode all name/tag queries when sending search requests to data_storage/IRONdb.
Hooper
Version: 0.1.1543866611 (0a308b7)
- Add crash-reporting support (Backtrace) to web_stream.
- Restart stratcon service when its package updates.
- Add custom-generated Diffie-Hellman (DH) parameters for stratcon TLS connections.
- Fix first-time setup issue with missing MQ directory.
- Change a comment line in circonus.conf so as not to interfere with GoAPI config parsing.
libmtev
Related roles: broker, caql_broker, data_storage, stratcon, web_stream
Version: 1.5.26
- Fix DNS fast failures in lua that could cause null pointer dereference.
- Fix support for aco-style REST handlers. This bug manifested as failed upload support.
- Fix naming of aco events. They now report the underlying event.
- Rearchitect the watchdog timeouts to allow children to cooperate and signal into the correct thread so we get a SIGTRAP-induced stack trace from the offending thread. (only systems with pthread_sigqueue, like Linux).
- Articulate in logs and in glider invocation which thread watchdogged.
- Fix hangs in HTTP content upload when clients paused in the middle of a block (bug introduced in 1.5.24)
- Fix ACO registry mismanagement causing crashes.
- Fix leak of
ck_epoch_record
on thread termination.
Reconnoiter
Related roles: broker, caql_broker, data_storage, stratcon, web_stream
Version: 0.1.1543411336 (2f9fa32)
- Fix null-pointer dereference in DNS check.
- Include thread names for jlog threads (assists with debugging).
Web UI/API
Version: 0.1.1543608912 (eb918bc)
- Update selfchecks with metrics relevant to enzo-c.
- Fix for invalid search query errors.
- Add
graphite_tls
to check modules database table for new installations. - Let
create_super_admin.pl
take a password on the command-line. This allows for more automation for initial setups. - UI: Fix viewing of metric names containing a "/" character.
- API: Fix for database deadlock when updating large numbers of checks in a single transaction.
- UI: Do not show broken link in popup dialog on graphs page.
- UI: Fix graph cluster datapoint expansion to properly handle histogram metrics from new GoAPI search service.
- API: Document that GoAPI search supports the
X-Circonus-More-Items
response header for paged results. - UI: Clean up the layout of the Integrations module grid.
2018-11-19
caql_broker
Version: 0.1.1542379141 (de348ca)
- Remove warning log messages for missing checks.
- Add instrumentation for perceived message delay.
- Use /v2/metric_old endpoint for search resolution, allowing v3 upgrade of the web-service.
- Run caql-broker as non-root user.
data_storage
Version: 0.1.1542336753 (c7ddb12)
- CAQL: Fix histogram validation
- New module,
graphite_egress_alter
for applying Graphite-specific transforms on metric results before they are sent back in a Graphite request. - Fix
storage_file
open race. - Remove NNTBS
info_db
metadata database.- The info_db was LMDB with a continuous update cycle. All rows were replaced every rollup period causing horrific churn and bad performance pathologies on ZFS.
- This entirely eliminates the database and replaces it with on-demand determination of epoch/apocalypse.
- We introduce a new surrogate function to iterate over all surrogate keys which we now use for inventory processing during reconstitute and rollup recreation.
- Crash fix in Graphite response when expanding names that are leaves.
- CAQL: Allow "query" as alias for "q" parameter.
- Improved surrogate DB performance and reduced memory usage.
- Use the
jemalloc
allocator by default on Linux. - Fix watchdog in full-delete path, when finding the list of metrics to delete.
- Provide offline surrogate DB maintenance tool.
- Fix issue where Graphite metrics tags were mixed with Circonus stream tags.
- Fix crash caused by rare race condition when inserting new metrics into the surrogate DB.
GoAPI
Initial release: This is a new internal component that runs on API and web_frontend nodes. It supports the new Stream Tags features for the Web UI and API, and will gradually take over additional REST endpoints from the existing Perl-based middleware.
Version: 0.1.1542652206 (68c290a)
Hooper
Version: 0.1.1542400689 (1dfdae0)
- Activate new GoAPI service on API and web_frontend roles.
- Add new
Search::V3
configurations tocirconus.conf
.
libcircmetrics
Related roles: caql_broker, data_storage, stratcon, web_stream
Version: 0.0.1.1542307708
- Use new
h
histogram type for histogram stats. - Improve performance by using different locking strategies.
libmtev
Related roles: broker, caql_broker, data_storage, stratcon, web_stream
Version: 1.5.23
- Make SSL "connection closed" accept failures a debug message.
- Remove port from SSL connection failures so they log dedup.
- Make ncct (telnet console) output thread safe (crash fix).
- Fix leak of thread name in SMR context.
- Add
eventer_jobq_memory_safety_name()
function. - Add reporting on SMR activity.
- Avoid unnecessary epoch synchronization (SMR), when there is no work to do.
- Fix SMR regression in jobs thread wind-down.
- Fix REST-driven jemalloc heap profiler.
- Do not block thread exit for SMR, instead disown the return queue and allow gc thread to cleanup (this also fixes leaks at thread exit)
Reconnoiter
Related roles: broker, caql_broker, data_storage, stratcon, web_stream
Version: 0.1.1542385741 (9f4ac57)
- Fix replacing deleted checks. When a check is deleted, it is marked as
deleted and recycled. This allows for it to persist long enough to be unused
and replicated if needed. This change solves two issues:
- When a deleted check still in config was set via API, the deleted attribute was not cleared, so it appeared in config (and in running system) as updated-but-still-deleted.
- When a deleted check wasn't yet flushed from config and the instance restarts, the check would be loaded as deleted, but not scheduled. This resulted in not recycling it. Now it is scheduled and immediately descheduled so that normal recycling happens.
- Fix crash where statsd tries to use polls before initialization.
- Support heap profiling.
Web UI/API
Version: 0.1.1542657093 (66f65d8)
New feature: Stream Tags
- In the UI, there is a new interface for metric searching, called "Metrics Explorer". It utilizes a new V3 search syntax.
- In the API, metrics search uses the same syntax (V3) as the UI. All other object type searches continue to use the existing V2 syntax.
UI: Remove
filter_rules
feature flag.UI: Disable selecting metrics from "change brokers and metrics" if there are filters on the checkbundle.
UI/API: reconstitute_noit filter_rules vs filter_id: move validation into scope that needs it.
API: Handle user and contact_group REST endpoints via GoAPI.
Address unescaping of form args for RawForm.
API: Set account_id and check_name header fields in CAQL Backfill requests.
2018-11-05
caql_broker
Version: 0.1.1541163185 (8d216aa)
- Fix histogram handling.
- Surface more warnings in the error log.
- Don't log the first HTTP retry.
data_storage
Version: 0.1.1541107572 (386a237)
- Performance improvements to parsing surrogate database at startup.
- Fix some potential crashes.
- Disable saving ptrace stdout output files in the default circonus-watchdog.conf file.
- Improve error checking when opening NNTBS timeshards.
- Improve surrogate DB startup informational logging.
- Various memory usage optimizations to reduce the amount of memory needed for snowthd to operate.
- Remove global variables from Backtrace.io traces.
- Add ability to delete surrogates from the system that are no longer used.
- Remove temporary files used during reconstitute - there were a handful of files staying on disk and taking up space unnecessarily.
- Increase timeout for pulling raw data during reconstitutes.
- Adopt a more time- and space-efficient strategy for graphite searches.
- Fix logging bug where long lines could end up running together.
- Fix crash bug in histogram fetching API.
FQ
Related roles: mq
Version: 0.10.14
- No user-facing changes since 0.10.12.
libmtev
Related roles: broker, data_storage, stratcon, web_stream
Version: 1.5.19
- Have luamtev use a default pool concurrency of 1, add -n option.
- Disable log dedup in luamtev by default.
- Fix improper calculation of required space in base64 encode/decode that could allow two bytes of overrun in decoding into a "too small" buffer.
- Make
mtev_memory_{begin,end}
recursively safe. - Use asynch barrier SMR in jobqs.
- Avoid clipping last letter off long log lines.
- Apply lua GC on next tick and not inline.
- Make "cs" the default jobq memory safety level.
- Fix off-by-on error in
lua_web
lua stack management (crash fix). - Move SMR maintenance into the eventer (out of a callback).
- Fix livelock in
mtev_intern
when racing for a removed object. - Make the SMR cleanup in thread termination asynch (fix CPU burn).
Reconnoiter
Related roles: broker, data_storage, stratcon, web_stream
Version: 0.1.1541084944 (ec587d4)
- Better validation within the ping_icmp module.
- Send full metric name with tags to web_stream service. Fix for tagged metrics not showing up in graph play and dashboards.
- Transient checks failed to update
target_ip
andname
. - Expose selfcheck metrics under the "broker" namespace as well.
- Explicitly disable dedup on noit feed elements. Fix for "gappy" collection of some metrics.
Web UI/API
Version: 0.1.1541117507 (96212ab)
- UI: Remove "beta" decal from CAQL integrations menu.
- UI: Stop mass updating
last_modified_*
on checks by bundle. Fixes unnecessary DB query. - API: Set check target to existing value on update if not passed by client.
- UI: Re-introduce 'last values' to check details page at metric load time.
2018-10-22
data_storage
Version: 0.1.1539701422 (9838f50)
- Add activity range parameters to
/tag_cats
and/tag_vals
REST endpoints, add category parameter to/tag_vals
. - Speed up loading of surrogate database by parallelizing the work.
- Modest locking performance increase in surrogate database load.
- Stop saving crash trace stdout to
*.trc
files, since the tracer produces its own output file.
libmtev
Related roles: broker, data_storage, stratcon, web_stream
Version: 1.5.12
- Be extra cautious when shutting down the last thread in a pool to make sure there is no backlog.
- Fix header to expose
eventer_jobq_set_floor
correctly. - Expose more controls for jobq mutation via console.
Reconnoiter
Related roles: broker, data_storage, stratcon, web_stream
Version: 0.1.1539352534 (23a9353)
- Fixes for showing checks in a cluster.
Web UI/API
Version: 0.1.1540240496 (682d9c9)
- UI/API: Add Graphite TLS check type.
- UI: Layout adjustments on metric details, alert details pages.
- UI: Minor layout bugfixes for change-brokers-and-metrics panel.
- UI: CSS fix for radio button spacing in DNS check config panel.
2018-10-12
data_storage
Version: 0.1.1539280608
- When loading a topology that has already been loaded, return HTTP 200 instead of 500.
- Move Zipkin setup messages out of the error log and into the debug log.
- Skip unparseable metric_locators during replication.
- Turn off sync writes in tagged surrogate writer.
- Fix potential crashes when check_name is NULL.
- Documentation: fix missing rebalance state.
- Add log deduplication to avoid spamming errorlog with identical messages.
- Fix potential deadlock that could be triggered when forking off a process to be monitored by the watchdog.
- Fix some potential crashes/memory leaks.
- When loading a new topology, return 200 status instead of 500 if the topology is already loaded.
- Support tag removal.
- Performance/stability improvements for activity list operations.
- Fix wildcard/regex queries inside tag categories.
libmtev
Version: 1.5.11
- Implement log deduplication via
dedup_seconds
configuration option. - Watchdog config option to disable saving of glider stdout, useful in cases where the glider produces its own output files.
- Document
mtev.xml*
functionality. - Fix unsafe fork (fork while
resize_lock
held) in logging subsystem. - Fix tagged release version extraction.
- Fix infinite loop when logging oversized log entries, introduced in 1.5.8.
Reconnoiter
Version: 0.1.1539263519
- Protect against empty rulesets.
- Fix AWS check module to properly handle spaces in metric names.
- Clearer error messages for REST calls.
- Allow colons in stream-tag values.
- Find filtersets below toplevel.
- Fix for broker config corruption.
Web UI/API
Version: 0.1.1539284260
- UI: Updated keyboard help overlay to remove old invalid keyboard shortcuts.
- UI: Expose Prometheus check type when broker capability allows.
- UI: Avoid displaying encoded histogram values in check preview.
- UI: When initializing overlays on shared graphs, don't try to pull the share config before it's loaded.
2018-09-25
data_storage
Version: 0.1.1537899481
- CAQL: add comparators to the
each
package, which operates on all input slots at once:gt
,lt
,geq
,leq
. - Fix activity-tracking replication
- Allow 4096 chars for metric name ingestion
- Locking changes for better performance on high-contention locks
- Move raw ingestion startup off of the main eventer thread to prevent watchdogs
- CAQL: Remove
wrap_false
fromhistogram:*
functions. Histograms can't be missing, they can only be empty. - CAQL: Map
histogram:*
functions. So that:- The case of zero slot arguments is handled correctly
- We apply the functions to all input slots
- Don't loop forever when journal writes are in the future
- CAQL: Check time during bundle loops
- Disable mtev async core dumps, preventing double crashes (where a "real" crash is followed by a second crash due to a database lock still existing)
- Various crash fixes
Hooper
Version: 0.1.1537455639 (EL7, OmniOS) Version: 0.1.1536174707 (EL6)
- Remove obsolete
grover_queue*
services. These have not been used in a long time. - (OmniOS) Use a larger ZFS recordsize for
lt-final
dataset in thelong_tail_storage
role. This yields better compression ratios.
libmtev
Version: 1.5.7
- Add the libluajit default path/cpath to luamtev by default
- Fix compressed non-chunked encoding
- Better error on improper rest registration
- Introduce
mtev_watchdog_disable_asynch_core_dump
Reconnoiter
Version: 0.1.1537458244
- Fix memory leak: incomplete search tag parse-tree freeing
- Automatic histograms (PR 482)
- Support
account_id
,check_{uuid,id}
suppressions - Support multi-document streaming JSON to httptrap
- Put prometheus module into a dedicated eventer pool
- Support
NOIT_MODULES
environment variable (PR 493)
Web UI/API
Version: 0.1.1537199732
- UI: Fix bug in Quick Graph adding that led to metrics being displayed as inactive
- UI: Block "Enter" key on metric filtering field and prevent some regular expression errors on change-metrics dialog
- API: Account for prometheus check module
2018-09-03
Includes changes since release 2018-08-16
data_storage
Version: 0.1.1536172853
- Expose stream tags in search results from the Lua API
- New, optional journal and background job for managing activity tracking outside the normal ingest path
- CAQL: new
group_by
function - Stop statically linking libzfs, always dlopen() if available
- Parse "seconds.milliseconds" from incoming histogram records
- Fix log enable/disable options (-L/-l)
- Replace Lua histogram implementation with one that makes use of the C functions from libcircllhist for efficiency.
- Prevent race in the REST delete endpoints
- Add check-wide delete methods for raw and numeric rollup data
fault_detection
Version: 0.1.1534363526
- Add flag to disable rule processing
libmtev
Version: 1.5.4
- Fix
mtev.shared_seq()
producing duplicate keys during startup. - Add
mtev_cluster_node_get_idx
to get a node's deterministic offset in a cluster topology. - Make
mtev_hash_merge_as_dict
safe for NULL values. - Fix reported memory leak in DWARF reading.
- Fix race conditions in freeing
mtev_websocket_client_t
. - Fix race in lua state (mtev lua coroutine) GC.
- Remove local callback latency tracking.
- Add per-pool callback latency tracking.
- Skip epoch reclamation in threads that have never freed anything.
- Always do asynchronous barrier epoch collection from the eventloop.
- Batch asynchronous epoch reclamation to reduce epoch synching.
- Fix lua/ssl_upgrade eventer actuation.
- Add granular lua garbage collection configuration. default: step 1000 time before a full collect.
- Monitor process now passes TERM, QUIT, and INT signals to child.
- Fix a bug where we were not always closing the socket/connection in
lua_web_resume
- could cause connections to hang. - Fix a lock contention issue that occurred at startup.
- Fix a memory leak in the lua path.
- Fix some clean targets in the Makefile that were inadequate.
- Move some logging from error log to debug log.
- Fix
gc_full=1
to fire on every invocation as documented. - Fix asynchronous memory reclamation.
- Protect against attempting to close invalid file descriptors.
- Do proper cleanup of eventer objects, even if not registered.
- Fix internal accounting stats for eventer allocations.
- Don't gate startup of event loops.
- Fix a leak of per-thread Lua closure structs.
Reconnoiter
Version: 0.1.1536172454
- Change the default behavior of check stats to not be perpetually cumulative (and thus potentially memory exhausting).
- Fix null-termination of tags in a Prometheus check
- Avoid null dereference in Lua checks
- Move to new Lua garbage-collection capabilities in libmtev 1.5
- Initiate Lua GC whenever returning from a coroutine
- Fix incorrect initialization function for check stats
noit_check_resolver
should protectively initialize
Web UI/API
Version: 0.1.1536180382
- UI: Fix JS error when sharing a graph.
- UI: Performance fix for check creation stalled when a large number of metrics are present.
- UI: Fix bug related to honoring grid preference on worksheet and metrics pages.
- UI: Fix bug where a plus (+) character wasn't searchable within a metric name.
- UI: Fix problem with metric names not showing correctly in metric list dialogs.