Releases: aeron-io/aeron
1.48.1
- [C++] Add warning that C++ API will be removed in 1.50.0.
- [Java] Publish artifacts to Central Portal using OSSRH Staging API.
- [Java] Bump
Agrona
to 2.2.2. - [Java] Bump
SBE
to 1.35.3. - [Java] Bump
JGit
to 7.2.1.202505142326-r. - [Java] Bump
Checkstyle
to 1.25.0. - [Java] Bump
Gradle
8.14.2.
1.48.0
Noteworthy Changes
-
ExclusivePublication#revoke
.Release publisher and subscriber resources immediately with exclusive publication revoke. Publication will not linger and not allow any trailing loss to be resolved. Subscription will not wait for any data to be received.
NB: Media driver and client code (publisher and subscriber) must run Aeron 1.48.0 or higher.
For more information see Publication#revoke wiki page.
-
Image#reject
.Reject incoming sessions from a publisher. This allows you to quickly stop data flow in scenarios where the data is no longer needed or is invalid.
For more information see Image#reject wiki page.
-
Track connection status in
AeronCluster
.AeronCluster
now contains a state machine to track connection status. The state machine is updated during poll operations (AeronCluster#pollEgress
andAeronCluster#controlledPollEgress
) and while sending data to the Cluster (i.e.AeronCluster#offer
,AeronCluster#tryClaim
,AeronCluster#sendKeepAlive
). If a break in communication is detected and it lasts for more thanAeronCluster.Context#newLeaderTimeoutNs()
thenAeronCluster
will close itself.NB: When
AeronCluster.Context#newLeaderTimeoutNs()
is not set theAeronCluster
will wait for double the leadership timeout from an actual Cluster. If that is not available (i.e. Cluster is running on an older Aeron version) then it will fallback to a 10 seconds default value, i.e. will wait for 20 seconds.If
AeronCluster#ingressPublication
orAeronCluster#egressSubscription
are used directly then it is a user responsibility to call new APIs in order to update the connection tracking state machine, i.e.:- After each invocation of the
offer
/tryClaim
on theAeronCluster#ingressPublication
a call toAeronCluster#trackIngressPublicationResult
must be made. - Every time
AeronCluster#egressSubscription
is polled a call toAeronCluster#pollStateChanges
must be made.
- After each invocation of the
-
Response channels GA.
Response channels have been promoted from experimental to General Availability. Users no longer need to enable experimental features to use this feature.
-
C & C++ wrapper Archive client APIs GA.
The APIs have been promoted from experimental to General Availability, achieving feature-completeness and parity with Java. Old C++ APIs will be decommissioned in 1.50.0.
-
Per-stream NAK counters.
Two new stream-specific NAK counters where added:snd-naks-received
(typeId=19
) - tracks the number of NAKs received by the sender.rcv-naks-sent
(typeId=20
) - tracks the number of NAKs sent by the receiver.
-
Affinity setting
AERON_DRIVER_ASYNC_EXECUTOR_CPU_AFFINITY
for async thread (aeron_executor
) was removed. -
Retransmit Receiver Window Multiple
To avoid overwhelming receivers in the event of retransmissions, Aeron limits the amount of data sent in a single retransmission to a multiple of the receiver window. Previously, this multiple was 16 for unicast, 16 for min and tagged multicast, and 4 for max multicast. It now defaults to 16 for unicast, 4 for all multicast strategies, and can be configured with the properties
aeron.unicast.flow.control.rrwm
andaeron.multicast.flow.control.rrwm
. -
Linger timeout
There is a new option to control how long untethered subscriptions will linger before being removed from flow control. If the new untethered linger timeout is not set, the default timeout is equal to the untethered window limit timeout. Previously, the untethered linger timeout was always equal to the window limit timeout. N 8000 ow they can be changed independently. The new property name is
aeron.untethered.linger.timeout
. It can also be set viauntethered-linger-timeout
URI parameter.
Changelog
- [Java] Initialize
archiveId
early using CnC file if Aeron instance is not specified. - [Java] Close extension's Archive client.
- [Java] Close snapshot replication before replay and recording are closed.
- [Java] Adjust archive client name based on the configuration.
- [C] Add client name for the implicit Aeron client created by the Archive client.
- [Java] Name implicit Aeron clients based on their usage.
- [C] Use
untethered-linger-timeout
on the receiver side. - [Java] Store
untethered-linger-timeout
in the log buffer metadata. - [Java] Fix a bug where
untethered-linger-timeout
was not added to the resulting URI. - [Java] Use
untethered-linger-timeout
on the receiver side. - [Java] Use
Publication#revoke
andImage#reject
to closeControlSession
resources. - [Java] Use
Publication#revoke
to abort replay session. - [Java] Don't through an exception when failing to copy a file within the data collector. This breaks other parts of the data collection on test failure (e.g. event log capture).
- [C] Flow control retransmit receiver window multiple for C driver. (#1807)
- [C] C version of untethered linger timeout. (#1808)
- [Java] Require
ArchiveThreadingMode.INVOKER
if MediaDriver is running in the invoker mode. - [Java/C] Per stream NAKs. (#1806)
- [Java] Add separate linger timeout for untethered subscriptions. (#1801)
- [Java/C]
Publication#revoke
. (#1781) - [Java] Make cluster publish leader heartbeat timeout to clients. (#1805)
- [Java] Require Aeron client to run in the invoker mode if MediaDriver is running with
ThreadingMode.INVOKER
, i.e.Aeron.Context.useConductorAgentInvoker(true)
must be set whenAeron.Context.driverAgentInvoker()
is set. - [Java] Add event code type for sequencer.
- [Java/C]
Image#reject
. (#1785) - [Java] Fsync
archive.catalog
file to disc when shutting down Archive. - [C] Align flow control receiver timeout with Java, i.e. use
AERON_FLOW_CONTROL_RECEIVER_TIMEOUT
env variable instead ofAERON_MIN_MULTICAST_FLOW_CONTROL_RECEIVER_TIMEOUT
. - [Java] Remove legacy
aeron.MinMulticastFlowControl.receiverTimeout
config option, i.e. useaeron.flow.control.receiver.timeout
directly. - [C] Remove experimental feature flag for response channels.
- [Java] Remove experimental option for response channels for the response channels.
- [C] MDC short send fix. (#1770)
- [Java] Flow control retransmit receiver window multiple (#1800)
- [Java] Prevent potential silent message loss on cluster ingress/egress.
- [Java] Make AeronCluster track connection status.
- [Java] Create new Ping message for archive client keepalive. (#1799)
- [Java] File page aligned mark files. (#1789)
- [Java] Increment snapshot counter after standby snapshots were successfully replicated.
- [Config] Update code style to reduce use of '.*' imports.
- [Java] Improve storage space exception detection.
- [Java] Properly check for EOS flag. (#1795)
- [C] Fix issue for untethered slow consumers impacting whole server. (#1792)
- [C++ Wrapper] Remove 'experimental' indicator for C/C++ wrapper archive APIs. (#1793)
- [Java] Refactor session liveness check.
- [C] Use
async-executor
name for the async thread, i.e. align with the Java impl. - [Java] Use
async-executor
prefix for async threads. - [Bash] Simplify thread affinity listing.
- [Java] Surface method to describe extension snapshot content in ClusterTool. Support printing snapshot entries as hex dumps.
- [C++] Add
#include <cstdint>
. - [C++ Wrapper] Add missing header. (#1786)
- [C] Remove affinity settings for the async thread (
aeron_executor
). - [Java] Add TestIdleStrategy.
- [Java] Synchronize session ids across cluster nodes. (#1774)
- [CI] Add Clang 20 to the build matrix.
- [C] Call close_session in
archive_close()
. (#1778) - [Java] Added close reason to consensus module extension call back on session close.
- [C] Create log buffers sparse by default.
- [Java] Create log buffers sparse by default.
- [Java] Add context to the disconnected control session warning message, i.e. show the response streamId/channel pair to help identify client that was disconnected.
- [Java] Use separate fragment assemblers for IPC and UDP inputs.
- [C++ Wrapper] Sync
addAliasIfAbsent
method to ChannelUri. (#1755) - [C++ Wrapper] Allow setting the recording events channel. (#1768)
- [Java] Use
MarkFile#timestampRelease
. - [C++ Wrapper] fix uri_buffer length in
Subscription.tryResolveChannelEndpointPort()
. (#1767) - [Java] Don't report error if the publication is closed or not connected during replay.
- [Doc] Document the reserved range for Aeron counter typeIds. (#1771)
- [Java] Update
sub-pos
iff the image was not closed. Otherwise, the JVM might crash withSIGSEGV
while accessing closedPosition
. - [CMake] Only link to client for signal test.
- [C] Add TERM signal handling to C media driver and supporting test.
- [C] Add missing header. (#1765)
- ...
1.47.5
[Driver] Check if EOS
flag bit is set instead of the entire mask. (#1795)
[Driver] Record bytes lost in the loss report only once when a loss is detected, i.e. do not count the same loss when resending NAKs. (#1796)
[Driver] Prevent NetworkPublication's pub-lmt
from wrapping around into the dirty term. (#1794)
[Cluster] Prevent ConsensusModule's state (nextSessionId
) diverging between leader and follower nodes when a session is rejected during the authentication phase. (#1774)
[Cluster] Only send TerminationAck to the leader that requested it. (#1797)
[Cluster] Use separate fragment assemblers for IPC and UDP inputs.
[Client: C] Do not update image list change number when retaining/releasing images as those can be called from a client conductor thread.
[Client: C++ Wrapper] Use const on Context.h
copy constructor.
[Archive Client: C] Call close_session()
in archive_close()
. (#1778)
1.47.4
- [Driver] Increment retransmit count only if data was actually sent.
- [Cluster] Fix buffer reference for ClusterMarkFile. (#1753)
- [Cluster/Archive] Protect against access to the closed mark file.
- [Cluster/Archive] Prevent JVM crash when opening an old version of the mark file (i.e. without a message header).
- [C++ Wrapper] Add an AsyncDestination type definition to Subscription.h. (#1749)
- [C++ Wrapper] Change wrapper version of the Context so that it does not hold a pointer to the underlying C context and track the values directly on the object and pass them through during init. Keep the pointer to the C context on the Aeron object to be properly cleaned up. (#1730)
- [C] Check that an image exists in the Subscription when retaining/releasing. (#1752)
1.47.3
- [Java] Reset
ClusterBackup
state if the Cluster node from whichClusterBackup
is replaying the log is "not available", i.e. either no longer eligible (i.e. after an election) or the backup query cannot be sent to it (e.g.ConsensusModule
is down). - [Java] Fix typo in ReplicationSession state change reason.
- [C] Adding setter/getter methods for CPU affinity to media driver. (#1737)
- [C] Support use of
sendmmsg()
without an address (i.e. when connect address is used). (#1742) - [C] Close image when it is being removed from a subscription.
- [C++ Wrapper] Decrement ref count of an
Image
after it was created, because it was counted twice: once in the C code when looking theaeron_image_t
and the second time by invokingaeron_subscription_image_retain
inside theImage
constructor. - [C++ Wrapper] Remove definitions that shadow
aeron_logbuffer_descriptor.h
definitions. (#1740) - [Java] Upgrade to Gradle 8.12.1.
- [Java] Upgrade to Shadow 8.3.6.
- [Java] Upgrade to Checkstyle 10.21.2.
- [Java] Upgrade to ByteBuddy 1.17.1.
1.47.2
Known issues
- [Java]
ClusterBackup
might connect to two different Cluster nodes simultaneously whereby one is used to provide the live Raft log replay and to download the snapshots, whereas the other one is used to fetch the latest list of snapshot entries and the recording log metadata. As long as all of the Cluster nodes are "in sync" (i.e. have the same set of snapshots) then everything is ok. However, if the second node from whichClusterBackup
fetches the snapshots was down for some time (i.e. does not have all of the snapshots) then theClusterBackup
might end up with a broken recording log whereby recording log entries will have a different log position to the underlying snapshot recordings.
Fixed in 1.47.3
Changelog
- [Java] Fix a regression in
AeronArchive#listRecording
which could return arbitrary recording information when the specifiedrecordingId
is not found (does not exist or state is notVALID
) instead of sending backControlResponseCode.RECORDING_UNKNOWN
. - [C] Apply
aeron.conductor.cpu.affinity
to the thead inSHARED
threading mode andaeron.sender.cpu.affinity
to sender/receiver thread inSHARED_NETWORK
threading mode. - [C] Add support for setting CPU affinity for the async executor thread (
aeron.driver.async.executor.cpu.affinity
property andAERON_DRIVER_ASYNC_EXECUTOR_CPU_AFFINITY
env variable).
1.47.1
Known issues
- [Java]
ClusterBackup
might connect to two different Cluster nodes simultaneously whereby one is used to provide the live Raft log replay and to download the snapshots, whereas the other one is used to fetch the latest list of snapshot entries and the recording log metadata. As long as all of the Cluster nodes are "in sync" (i.e. have the same set of snapshots) then everything is ok. However, if the second node from whichClusterBackup
fetches the snapshots was down for some time (i.e. does not have all of the snapshots) then theClusterBackup
might end up with a broken recording log whereby recording log entries will have a different log position to the underlying snapshot recordings.
Fixed in 1.47.3
Changelog
- [Java] Fix Archive regression where sending descriptors (i.e. AeronArchive#listRecording, AeronArchive#listRecordings, AeronArchive#listRecordingsForUri, AeronArchive#listRecordingSubscriptions) could result in corrupted or duplicate data to be returned.
- [C] Fix incorrect aeron_array_fast_unordered_remove usages in the client conductor. (#1728)
- [Java/C] Send a new NAK if the gap length changes. (#1729)
- [Java] Update the archiveId in the mark file if it is set when concluding the context. (#1726)
- [Java] Fix java system test on Alpine (musl libc). (#1734)
- [C] Fix var-args bug in error reporting.
- [C++] add aeronDir() method for context. (#1725)
- [C++] Fix the issue with .h file imports in aeron-client cpp_wrapper. (#1727)
- [Java] Upgrade to ByteBuddy 1.16.1.
1.47.0
Important Update: Aeron is moving to a new GitHub organisation
Aeron is moving to a new GitHub organisation following its adoption by Adaptive in 2022. This transition marks a significant milestone in Aeron's journey, ensuring continued innovation and support for the world's leading low-latency message transport system.
You can find the new Aeron, SBE and Agrona repositories and all related resources at aeron-io.
All links to the previous repository location are automatically redirected to the new location.
However, to avoid confusion, we recommend updating any existing local clones to point to the new repository URL. You can do this by using git remote
on the command line:
git remote set-url origin NEW_URL
Thank you for your continued support and contributions to the Aeron Open Source project.
Breaking changes
-
[Java] Agrona upgrade contains breaking changes. See Agrona 2.0.0 release notes.
Note:
--add-opens java.base/jdk.internal.misc=ALL-UNNAMED
JVM option must be specified in order to run Aeron. In addition to--add-opens java.base/java.util.zip=ALL-UNNAMED
that is required to running the Aeron Archive.
Noteworthy Changes
-
Detect and terminate dormant Archive clients.
Archive will now send periodic heartbeat messages to each connected Archive client. By default it is done once per second and can be configured via
aeron.archive.session.liveness.check.interval
property or programmatically viaio.aeron.archive.Archive.Context#sessionLivenessCheckIntervalNs(long)
method. If it detects that it cannot send such a message for more than a connection timeout (i.e.aeron.archive.connect.timeout
, defaults to 5 seconds) then it will close the corresponding control session which will cause such Archive client to disconnect. -
Eliminate interference between Archive clients.
-
C/C++ Wrapper implementation of the Archive client APIs.
In terms of feature completeness and stability, they are still marked experimental, as there's a small chance some of the functions might change as the feature is hardened. Furthermore, a number of the async APIs have yet to be implemented.
-
Fix duplicate service messages during failover/restart when using multiple services in Cluster.
When service messages are being sent from multiple services, these can be enqueued in different orders. This means during failover/restart pending messages can be skipped or duplicated when a new leader is elected.
Upgrade procedure: Those affected will need to do a clean shutdown (with a snapshot) and restart the whole cluster with the fix.
-
Invalidate Standby snapshots.
When invalidating latest snapshot both normal and Standby snapshots are taken into account. In order to prevent invalidated snapshots from being re-downloaded from the Standby node upon recovery.
-
New log events for NAK messages sent and received.
NAK_RECEIVED
logging event was added when a NAK request is received by the sender. An existingSEND_NAK_MESSAGE
event was renamed toNAK_SENT
and logs a NAK message being sent by the receiver. -
Prevent client process crashing by a pathologically slow consumer.
If a call to
Controlled/FragmentHandler#onFragment
blocks for disproportionate amount of time, i.e. long enough for anImage
to become unavailable. Then the corresponding log buffer will freed by the client conductor thread. Any further access to the log buffer will cause the client process to segfault. TheImage
was updated to prevent any further access once it was closed.
Known issues
- [Java]
AeronArchive#listRecording/listRecordings/listRecordingsForUri/listRecordingSubscriptions
might return a wrong recording information uponBACK_PRESSURED
/ADMIN_ACTION
result when sending a recording descriptor.
Fixed in 5ad7601. - [Java]
ClusterBackup
might connect to two different Cluster nodes simultaneously whereby one is used to provide the live Raft log replay and to download the snapshots, whereas the other one is used to fetch the latest list of snapshot entries and the recording log metadata. As long as all of the Cluster nodes are "in sync" (i.e. have the same set of snapshots) then everything is ok. However, if the second node from whichClusterBackup
fetches the snapshots was down for some time (i.e. does not have all of the snapshots) then theClusterBackup
might end up with a broken recording log whereby recording log entries will have a different log position to the underlying snapshot recordings.
Fixed in 1.47.3
Changelog
- [Java] Speedup
purgeSegments/deleteDetachedSegments
operations by only deleting files in a range between the current startPosition and the previous startPosition (purge) or the oldest existing segment file position (detached files). - [C] Fix dangling pointer in replay merge. (#1723)
- [Java] Prevent segfaults through mark file API after close.
- [Java] Trigger slow build on push to master.
- [Java] Do not close Cluster archive when doing next rounds of backup queries since the replay might still be active. Also do not switch to
RESET_BACKUP
state unless the current Cluster node has switched its role and therefore is no longer eligible for replay. - [Java] Use ClusterEvent instead of ClusterException with Category.WARN.
- [Java] Use ClusterEvent to report issues when stopping recording/replay + prevent an NPE when stopping a replay as
clusterArchive
could have been closed while in the BACKUP_QUERY stage. - [C/C++] Change interval of driver keepalive error reporting.
- [C] Update C driver to use the same matching logic as the Java driver for checking the validity of tagged publications and subscriptions.
- [CI] Core dump dir creation.
- [CI] Enable core dumps on Linux and MacOS.
- [CI] Collect Windows core dump files.
- [CI] Trigger slow build on PR.
- [C] compare publication stream id with link stream id when checking for matching spy subscriptions (#1722)
- [CI] Add
ubuntu-24.04-arm
to the build matrix for Java. - [CI] Use env to store base Java version.
- [Java] Handle multiple PendingServiceMessageTrackers while producing consensus module patch.
- [CI] Simplify log upload.
- [CI] Fix crash log upload on Windows.
- Bug/fix error with tagged channels reresolution (#1720)
- [C] add a call to init the new fields in the logbuffer metadata (#1717)
- [Java] Add the few missing fields for logbuffer descriptor (#1721)
- [Java] Close temporary MarkFile when migrating from old version.
- [Java] Write message header before mark file header in the
cluster-mark.dat
file to be able to use SBE features based on theactingBlockLength
andactingVersion
. - tidy up the namespace for exception_handler_t (#1715)
- [C] Handle connecting to Archive without credentials. (#1716)
- [Java] Write message header before mark file header in the
archive-mark.dat
file to be able to use SBE features based on theactingBlockLength
andactingVersion
. - [Java] Fix config not found issue.
- [Java] Extract capturing lambda allocation to outside loop and yield when not making progress.
- Replacement of ThreadHints.onSpinWait by Thread. (#1713)
- [Java] Add
archiveId
to the ArchiveMarkFile. - [Java] Fix an off-by-one error while searching for counters.
- [Java] Tidy up after #1711.
- [Java] Add a test for address re-resolution back to the initial IP address.
- fix resolution bug, when the new ip is back to udpChannel.remoteData, this can not be triggered
resolution changes
(#1711) - [Java] Use different URI for early access JDK build.
- [Java] Fix Javadoc URI.
- [Java] Run Mockito as Java agent for JDK 23+ compatibility.
- [CI] Add JDK 23 to the build matrix.
- [Java] Change comment prevent JDK23 javadoc warning, FIX #1710.
- [C] Prevent double free of the aeron_exclusive_publication_t which is closed by the proxy.
- [Java] Poll for remote Archive errors while awaiting log recording session to be created.
- [Java] Add OS max/default values for SO_SNDBUF and SO_RCVBUF parameters to the log buffer metadata section.
- [C] Add OS default/max fields for the
OS_SNDBUF/OS_RCVBUF
to the log buffer metadata section. - Fixes problem with socket snd/rcv buffer in logbuffer metadata. (#1707)
- [Java] Add method to convert error code to String.
- [Java] Remove remaining dynamic join APIs.
- [Java] Move config printing option to the CommonContext.
- [Java] Cleanup after #1705.
- rename the null value to compatible with C++ (#1705)
- [Java] Touch ups.
- Logbuffer metadata extra fields (#1700)
- [C] Rename
SEND_NAK_MESSAGE
toNAK_SENT
so that it is symmetric withNAK_SENT
- [C] Add
AERON_DRIVER_EVENT_NAK_RECEIVED
event logging. - [Java] Rename
SEND_NAK_MESSAGE
toNAK_SENT
so that it is symmetric withNAK_SENT
. - [Java] Add log event for when a NAK message is received.
- Fix duplicate service messages during failover/restart when using multiple services (#1703)
- [Java] Change Tests.sleep so that it uses LockSupport.parkNanos to prevent catching of InterruptedException and clearing the interrupt flag.
- [C] Remove duplicate definition of aeron_semantic_version_compose. (#1701)
- [Java] Close AeronArchive client if control response Subscription is disconnected.
- [Build] correct release gradle cache path
- [Java] Rename IngressAdapter onFragment to onMessage and remove interface to provide more appropriate naming.
- [Build] Remove OSS c/c++ binary step in release workflow
- [Java] Simplify synchronous connect.
- [Java] Use Agrona Checksum classes.
- [Java] Emit WARN event when ControlSession is closed abruptly + add reason to the ControlSession state transition log + increase default stale session check interval to 1s.
*...
1.46.8
1.44.6
- [Java] Update RecoverPlan after standby snapshot replication completes with the replicated snapshot entries.