(909) Schema update improvements
From now on when topic schema is updated via hermes-console then all Hermes instances are notified to load latest schema from schema-registry as soon as possible (by default they should be notified in 2 minutes).
(906) Docs for adding subscription's filters
(872) Fix for reading Graphite stats in Management
Small fix in config.properties.
Property messages.local.buffered.storage.size.bytes
from 0.13.0 now becomes
frontend.messages.local.buffered.storage.size.bytes
.
All issues and pull requests: 0.13.0 milestone
(899) ChronicleMap v3
Starting from this version Hermes will use ChronicleMap v3 as a temporary buffer for messages (before that Hermes was using ChronicleMap v2).
Now there are 2 new config properties:
-
frontend.messages.local.buffered.storage.size.bytes
- describes default size for a delayed messages queue in bytes in internal Kafka Producer Queue and Hermes Frontend Buffer. -
frontend.messages.local.storage.average.message.size.in.bytes
- describes average message size for better performance for delayed messages in Hermes Frontend Buffer.
And also kafka.producer.buffer.memory
was removed from a config, now frontend.messages.local.buffered.storage.size.bytes
is responsible for that parameter.
(898) Sending Delay is not required in batch subscription
(896) Make BackupMessage serializable
(900) Hermes Mock documentation
(897) Fix label from seconds
to milliseconds
In Hermes console there were an inconsistency regarding requestTimeout
and sendingDelay
labels. Label stated that
those values are in seconds, but they are in milliseconds.
All issues and pull requests: 0.12.10 milestone
(894) Sending delay
Sending delay feature. We want to give users possibility to postpone sending an event for given time (max 5 seconds) so if there are multiple topics that sends messages at the same time, then can increase chance of receiving an event from one topic before an event from another topic.
(894) Improved processes signals management
Improve processes management to be more predictable and easy to understand.
All issues and pull requests: 0.12.9 milestone
(886) Listing topics by their owner
Endpoint in hermes-management to list topics by their owner with lower latency than using QueryEndpoint.
(888) Listing subscriptions by their owner
Endpoint in hermes-management to list subscriptions by their owner with lower latency than using QueryEndpoint.
(887) Waiting between unsuccessful polls to reduce cpu utilization
In a previous releases subscribing to topics with low rps is very cpu intensive because when polling KafkaConsumer implementation is constantly looping until timeout is reached. We introduced simple exponentially growing strategy to wait between unsuccessful polls which reduces cpu utilization by a significant margin in those cases.
All issues and pull requests: 0.12.8 milestone
All issues and pull requests: 0.12.7 milestone
All issues and pull requests: 0.12.5 milestone
All issues and pull requests: 0.12.4 milestone
All issues and pull requests: 0.12.3 milestone
All issues and pull requests: 0.12.2 milestone
(814) Offline storage metadata
Add new metadata to topic entity. From now on it is possible to specify if data from the topic should be persisted into any kind of offline store (like HDFS). Metadata is not used by Hermes, but is part of the API and can be consumed by tools like Gobblin to choose which data should be moved to HDFS and for how long should it be kept.
(821) Avro message preview in human-readable form
Topic message preview for Avro messages now shows data transformed to JSON instead of raw Avro bytes.
Contributed by @mictyd
(822) Detailed message when Avro validation fails
400 Bad Message
status now returns much more meaningful information when Avro validation fails.
Contributed by @janisz.
(824) Bum dependencies versions
Upgraded the following dependencies:
- Metrics to 3.2.5
- Guava to 23.0
- Apache Curator to 2.12.0 (forced by Guava upgrade)
(756) Display owner source in Console
Console now displays the source of topic and subscription owner next to the owner name.
(769) Deleted topics come back to life
Fixed by upgrading Kafka client to 0.10.1.0.
(812) Fixed Elasticsearch trace repo bug introduced in 0.12.0
(809) Fixed Hermes Console retransmit button
Contributed by @piorkowskiprzemyslaw.
(834) Max-rate Zookeeper structure cleanup script
Max-rate Zookeeper structure is not cleaned up when subscription is deleted. This, in time, leads to building up a huge structure with lots of watches. We observed that due to the amount of watches, Consumers startup time degrades significantly (up to 10 minutes). There is no fix for the lack of cleanup, but we created the script that can be run once in a while to keep the structure in desired size.
All issues and pull requests: 0.12.1 milestone
(799) Offline clients
Add new endpoint and interface in Management which can be implemented to show if data produced on given topic has been accessed recently in any offline storage (like Hadoop). Read more in docs.
(692) Improved schema management
Avro schemas are now created/deleted in the same transaction as topic creation/deletion, meaning that no topic is created if schema validation fails and vice versa.
Also new SchemaRepository interface method has been added which allows on validating schema before trying to send it to Schema Registry (fail fast).
(794) Query topics by metrics
(758) Use timestamp in seconds in ElasticSearch message tracking
Added new field in ElasticSearch message trace object which is used to order messages in time. This should significantly increase the speed of fetching data from ES.
(757) Rate limit schema registry calls in Consumers
There was no rate limit when trying to get schema from Schema Registry when no cached schema matched the message. This could cause DoS attack on Schema Registry for subscriptions with high traffic of malformed events.
(777) Subscription URI constraint fix
Fixed subscription URI constraints to match URI spec.
(787) Subscription validation fix
Subscriptions were validated at the wrong moment, which in some cases could lead to ugly NullPointerException instead of validation message.
(780) Metrics block topic removal
(760) Added http2 client to consumers
(770) Added feature to restrict subscribing for particular topic
(775) Console notification box UX improvements
Notification box will resize according to body text, also error messages require user interaction in order to disappear.
(778) Creating consumer signals chains
From now on, consumers logs contain information about signals like their id or type. This simplifies analysis of consumers history.
(781) Schema version aware deserialization is backward compatible
Which means that flag schemaVersionAwareSerializationEnabled
can be set to true
on the fly.
When the flag is enabled on a topic then consumed payload without schema version will be deserialized as well -
Hermes will try hard to adjust avro schema starting with the latest version.
(774) Ensuring that consumer process exists for processed signal
Fixes NullPointerException which occurred when some signals (e.x. COMMIT
) were processed for non existing consumer.
(764) Custom button on UI.
Allows to configure custom view near topic buttons area on hermes-console. Custom view can be set via configuration file, example:
{
"topic": {
"buttonsExtension": "<a class=\"btn btn-info {{topic.contentType === 'JSON' ? 'ng-show' : 'ng-hide'}}\" ng-href='http://migrator.example/topics/{{topic.name}}'>Migrate to AVRO</a>",
}
}
(753) Filtering by headers
Added new filter type: header
that allows on filtering messages by HTTP headers. Example of filter definition:
{"type": "header", "header": "My-Propagated-Header", "matcher": "^abc.*"}
Mind that by default no headers are propagated from Frontend to Consumers. To enable headers propagation, define and register
own HeadersPropagator
via HermesFrontend.Builder#withHeadersPropagator
.
(749) Handling avro/json
content type
When converting messages from JSON to Avro Hermes uses json-avro-converter to provide smooth experience, that does not require changing already produced JSONs (for instance to to support optional fields).
However in some rare cases it might be desired to send JSON messages that are compatible with
standard Avro JSON encoding. To use vanilla JSON -> Avro converter and bypass json-avro-converter
, send requests with avro/json
content type.
(748) Topic authorization controls and status in Console
Console now has support for toggling auth on topics. This is an opt-in feature, enable by specifying:
{
"topic": {
"authEnabled": true,
}
}
In Console config.json
.
(710) Limit size of messages in preview
(751) Move Zookeeper cache update logs to DEBUG level
(750) Move Schema Registry cache refresh logs to DEBUG level
(737) Updating subscription in hermes-console resets OAuth password
(734) Prevent manual setting of subscription state to PENDING
(743) Better defaults for max-rate algorithm
(738) Payload content-type check and error handling
From now on, clients who send HTTP request without specified Content-Type
on an Avro topic will receive proper error message.
(739) Added latency metrics for schema registry
Metrics are available in the following path:
schema.<schema-repo-type>.latency.read-schema
(740) Invalid metrics names in zookeeper for topics with underscore in name
(733) Topic authorization
Added feature to control which system has permission to publish on particular topic.
(722) Frontend security context initialization
Added feature to configure SSL context.
(735) Configurable http keep_alive
(707) Throughput limit
Added feature to limit throughput in bytes/sec when publishing to particular topic. Can be configured to work as simple threshold or dynamically calculated value.
(687) Throughput metric
Metric can be found under {producer|consumer}.{hostname}.throughput.{group}.{topic}
(693) Owners instead of support teams
Replaces group and subscription support team, contact and technical owner with a single notion – owner. Topics and subscriptions have assigned owners, groups no longer do, so everyone can create a topic in any group.
After deployment to hermes-management you need to run a migration task. It will initialise topic and subscription owners by assigning what used to be related group and subscription support teams. Perform with admin credentials:
POST /migrations/support-team-to-owner?source=Plaintext
(or source=Crowd
if you used Crowd support teams)
(721) Creator must be an owner of created topic or subscription
(714) Don't match when queried nested field doesn't exist instead of failing
(726) Pass Avro validation errors to users
(717) NPE in Hermes frontend related to BlacklistZookeeperNotifyingCache
(611) Consumers rate negotiation
Max rate negotiation algorithm for balancing maximum delivery rate across subscription consumers.
(703) Update Curator dependency
(701) Updated migration guide for 0.10.5
(713) Admin scripts catalogue with initial migration script for 0.10.5
(709) Fix docker-compose and docker setup
(688) Selective algorithm healing
Improved durability of assignments during restarts and zookeeper flaps. Reporting of assignments and running consumers has been improved and made consistent. More reliable handling of consumer processes.
To utilize these improvements it is required to stop all instances in hermes cluster, remove all nodes from {zookeeper.root}/consumers-workload/{kafka.cluster.name}/runtime
and restart instances.
This adds a marker in selective algorithm's consumer assignments, which allows rebalancing with removing automatically created assignments.
Alternatively, to avoid switching off your cluster, a script updating assignments' zookeeper nodes' data to AUTO_ASSIGNED
can be used. It should be also applied after all nodes run the new version, as previous run could shuffle assignments during deployment.
(698) Fix Dockerfile build
(690) Update json-avro-converter to 0.2.5
(687) Added throughput metric
(684) Limit number of retries for inflight on Frontend graceful shutdown
(694) Leaking file descriptors
Handling corner case in a race between ack and timeout task.
Because of it number of messages in backup storage was growing with a time. Eventually, it lead to full backup-storage and further writes ended with an exception which was not caught. This exception was the reason of file descriptor leak.
Mentioned corner case was fixed in this issue. Beside that, additional exception handling was added and backup-storage size is from now on monitored.
(695) Decouple filtering rate limiting from backpressure based rate limiting
Filtered messages do not influence on a sending rate.
(686) Do not merge topic.contentType with default value
Fix in hermes-console.
(675) Audit of subscription status changes
(674) Validate topic before saving
(679) hermes-client handles sender errors
(676) Fix saving changes in topic maxMessageSize attribute
This release introduces a crucial warming-up phase when starting Hermes Frontend.
(591) Frontend graceful startup
Frontend tries to load and cache all Avro schemas and Kafka topic metadata before accepting any traffic. Before this change large clusters were throwing 5xx and had very big latencies during warmup phase. Currently startup moment is barely noticable for clients (and in metrics).
(667) Declare max message size on topic
With this change users are asked to specify the maximum size of message on a topic during topic creation. This size is then used to calculate the size of Kafka buffers in Hermes Consumers. Prior to this change Consumer Kafka buffers were set to the same size for every topic (default: 10Mb per partition), which could cause crashes when starting Consumers with large number of subscriptions with lags.
By default message size is a soft limit, warn
log is emitted when message larger than declared size is received.
frontend.force.topic.max.message.size
flag can be switched to make it a hard limit (Frontend will return
http 413 Payload Too Large
status).
Also calculation based on message size os disabled by default (will be enabled by default in next versions). To use this
feature set consumer.use.topic.message.size
flag.
(666) Options to configure Consumer HTTP client SSL Context
New options to configure Consumers HTTP client:
consumer.http.client.validate.certs
consumer.http.client.validate.peer.certs
consumer.http.client.enable.crldp
(665) Allow to specify allowed topic content types in Hermes Console
(663) Fetch -2min of data from Graphite and take first non-empty value
(664) Use proper type of metrics in Consumer workload metrics
(652) Proper configuration for Zookeeper retries
This is a bugfix release improving schema-registry
integration and retransmission on large clusters.
This release introduces a lot of performance optimizations related to publishing messages to Hermes.
(#518) Frontend performance
- implemented hermes-benchmarks module with frontend benchmark tests written in jmh
- servlet layer was removed, publishing is done on raw undertow handlers
- timeouts mechanism (202, 408) was redesigned, locks were elminated
- sped up metrics invocation during message publishing, from now on they are kept in topics cache
(#559) Topic ban button
Thanks to topic ban button events published on a topic can be cheaply discarded. This feature can be used when some misbehaving publisher is detected, i.e. starts to push enormous events or all his events have invalid schema.
(#636) ConsumersProcessSupervisor is not killing any consumer process
- (#626) Custom KafkaNamesMapper can be used too late
- (#628) Hermes should operates on "Schema-Version" header instead of "Hermes-Schema-Version"
- (#630) Retransmission is unstable
(#612) Added explicit CORS allowed domain configuration option
(#619) Updated kafka-producer configuration
In the current version of kafka-producer (0.10.1) request.timeout.ms
parameter is also used as a timeout for dropping batches from internal accumulator.
Therefore, it is better to increase this timeout to very high value, because when kafka is unreachable we don't want to drop messages but buffer them in accumulator until is full.
This behavior will change in future version of kafka-producer.
More information on this issue can be found in kafka-users group archives
- (#614) JSON-to-Avro dry run fix for Hermes-incompatible schemas
- (#621) Schema-related frontend HTTP responses fix
- (#622) Fixing occasional null pointer when reading consumer assignments
- (#624) Catching unchecked exceptions in schema-versions cache that previously weren't logged
- (#616) Fixing bug with sync commit after each filtered message
This patch version was released mostly because of Schema version cache fix #608
Beside that:
- documentation about schema repository was updated
- integration tests should be more reliable
This release introduces Kafka 0.10 producer/consumer API and is no longer compatible with Kafka 0.8.x and 0.9.x deployments.
(#558) Use Kafka 0.10 producer/consumer API
This change breaks backwards compatibility - Hermes will not run on 0.8.x, 0.9.x Kafka clusters
Hermes uses Kafka 0.10 APIs. The change is not big for producers in Frontend module, but it rearranged whole Consumers module.
The benefits of moving to Kafka 0.10 (except from leaving the deprecated APIs behind) are:
- decreased number of active threads: in cluster with ~600 subscriptions number of threads decreased from ~4400 to ~700
- decreased memory consumption: same cluster, memory usage dropped by 10-20%
- decreased CPU consumption: same cluster, day-to-day CPU consumption dropped by ~10%
- greatly decreased shutdown time
The change is transparent for the end users.
Upgrading note
Before upgrading, make sure that offsets are committed and stored in Kafka (option: kafka.consumer.dual.commit.enabled
is set to true
or
kafka.consumer.offsets.storage
is set to kafka
(default) in Consumers module).
When upgrading, all Consumers should be stopped at once and started with new version.
(593) Confluent Schema Registry integration
Breaking change: Support for storing and validating JSON schemas has been removed
Hermes be integrated with Confluent Schema Registry to store and read Avro schemas. We kept existing integration with schemarepo.org repository. To switch between implementations, use schema.repository.type
option:
schema_repo
for "old" schemarepo.orgschema_registry
for Confluent Schema Registry
(#592) Management: Update Spring Boot (1.4.1) and Jersey (2.23)
(#595) Update tech.allegro.schema.json2avro to 0.2.4
(#566) Auditing management operations
All operations in Management can be auditable. By default this option is disabled, but can be enabled using:
audit.enabled = true
By default changes are sent to logs, but own implementation can be provided. Reed more in auditing documentation.
(#481) Delay between retries in Hermes Client
It is now possible to specify delay between consecutive retries of sending message.
HermesClient client = HermesClientBuilder.hermesClient(...)
.withRetries(3)
.withRetrySleep(100, 10_000)
The delay can rise exponentially in specified range (100ms to 10 seconds in example above).
(577) Consumer won't stop if there are messages in send queue
(579) Wrong path to lag stats in Hermes Console
(#359) OAuth2 support [incubating]
Hermes supports Resource Owner Password Credential Grant scenario. It is possible to declare multiple OAuth providers in Hermes, along with their credentials. Each subscription can choose a provider and defines own user & password.
(#556) Added source and target hostname information to tracking
Tracking information now contains additional fields: hostname
and remote_hostname
, which are:
- on Frontend side:
hostname
: hostname of Frontend host that received the messageremote_hostname
: IP address of events producer (who published)
- on Consumers side:
hostname
: hostname of Consumer host that was handling the messageremote_hostname
: IP address/hostname of host that acknowledged/rejected message (who received)
(#561) Consumers process model improvements
Improving the stability of new internal Consumers process model by adding consumer process graceful shutdown and filtering unwatned signals (i.e. sequential START & STOP) which might cause instability.
For monitoring purposes two new metrics (counters) were created in Consumers that compare the assignments state vs the actual consumers running:
consumers-workload.monitor.missing.count
- how many processes are missing compared to assigned amountconsumers-workload.monitor.oversubscribed.count
- how many processes exist although they should not, as this instance of Consumers is not assigned to run them
In addition to metrics, warning logs are emitted with details about subscription names missing/oversubscribed.