Skip to content

improve: filter only own updates for read-after-write-consistency#3399

Open
csviri wants to merge 21 commits into
operator-framework:mainfrom
csviri:skip-only-update-event
Open

improve: filter only own updates for read-after-write-consistency#3399
csviri wants to merge 21 commits into
operator-framework:mainfrom
csviri:skip-only-update-event

Conversation

@csviri

@csviri csviri commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Signed-off-by: Attila Mészáros a_meszaros@apple.com

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2026
@csviri csviri requested a review from shawkins June 7, 2026 14:55
@csviri csviri linked an issue Jun 7, 2026 that may be closed by this pull request
@csviri csviri requested review from metacosm and xstefank June 7, 2026 15:37
@csviri csviri force-pushed the skip-only-update-event branch from 170c235 to 51d7087 Compare June 7, 2026 15:55
@csviri csviri force-pushed the skip-only-update-event branch from 413171e to 9cd3b73 Compare June 8, 2026 09:02
@csviri

csviri commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

@metacosm @shawkins @xstefank
This should be the new implementation that is now aware of intermediate events. So only filters out events for controller own changes. The algorithm is I think even simpler than before. The idea is simple, we record the events during the update, and we check at the end if there was event that is not related to our changes; if no we don't propagate an event.
If yes, we propagate an Update events, that contains as resource from the first event as previous version, latest as the actual resource + some edge case with delete event. ( This is important because filters might relay on the changed resource).

This approach however would be bulletproof if we handle the re-list in the informer from the fabric8 pr above. Since in some (rather extreme) edge cases, if there is a re-list, event might be lost. So the idea would be (will create a separate PR) that, if we got notified that goind to be a re-list we flip a switch, and until the sync is finished we simply allow to pass (don't filter) any events. I think without that this is not even possible to do 100% correctly. (but maybe @shawkins has some ideas)

https://github.com/fabric8io/kubernetes-client/pull/7899/changes

So this is the correct way of doing, with "maximal precision", the good think is that it is actually relatively simple.
However something so be aware about.

  1. We record the events related to that one resource. This in average case (like when there is an event from our modification or one from 3rd party) is 0-2 events, in some extreme edge cases might be much more, when there are very frequent update on the resource both from the controller and some 3rd party, this might be more - even that this is for a limited time - while our updates are finished.

  2. Regarding time, we currently support multiple concurrent updates, and we finish the processing after all concurrent updates are finished. Theoretically if we have a many-to-one scenario to many primary resources reconciliations write one single resource, this could be long even could lead to "starvation" like situation so the update is not finished. First of all, in controller many-to-one is quite rate, we could already tell don't use this feature for such scenarion. But also I will iterate on this a bit and maybe have it time bound / or request number bound as failsafe, that will solve this issue.

@csviri

csviri commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

added #3406 that covers the informer re-list, will continue now adding unit tests etc.

@csviri csviri force-pushed the skip-only-update-event branch from 3e986ab to 88c8e52 Compare June 10, 2026 06:47
@csviri csviri marked this pull request as ready for review June 10, 2026 06:48
Copilot AI review requested due to automatic review settings June 10, 2026 06:48
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 10, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the event-filtering/temporary-cache logic to better support read-after-write consistency by distinguishing “own” updates from external/intermediate updates, and adds a regression integration test covering deletion racing with a status update.

Changes:

  • Reworks TemporaryResourceCache event filtering to record “own” resourceVersions and return a GenericResourceEvent (via Optional) instead of an EventHandling enum.
  • Updates ManagedInformerEventSource, InformerEventSource, and ControllerEventSource to consume the new Optional<GenericResourceEvent> flow and propagate summary events when appropriate.
  • Adds a new integration test scenario for deletion during status update, plus updates/extends unit tests around informer/controller event handling.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/deletionduringstatusupdate/DeletionDuringStatusUpdateStatus.java Adds a status POJO used by the new deletion-vs-status-update regression test.
operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/deletionduringstatusupdate/DeletionDuringStatusUpdateReconciler.java Test reconciler that forces a controlled race between status patching and deletion.
operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/deletionduringstatusupdate/DeletionDuringStatusUpdateIT.java New regression IT ensuring cleanup is triggered when delete races with status update.
operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/deletionduringstatusupdate/DeletionDuringStatusUpdateCustomResource.java New CR type used by the regression IT.
operator-framework-core/src/test/java/io/javaoperatorsdk/operator/processing/event/source/informer/TemporaryResourceCacheTest.java Updates/extends cache tests for the new Optional<GenericResourceEvent> behavior and intermediate-event handling.
operator-framework-core/src/test/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerEventSourceTest.java Updates informer event source tests; adds repeated tests and new intermediate-event scenarios.
operator-framework-core/src/test/java/io/javaoperatorsdk/operator/processing/event/source/controller/ControllerEventSourceTest.java Adds controller-level tests for intermediate-event propagation/deferral during filtering.
operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/TemporaryResourceCache.java Core behavior change: event filtering, own-version tracking, and intermediate-event propagation logic.
operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/ManagedInformerEventSource.java Adapts update+cache flow to new doneEventFilterModify contract and event propagation.
operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerEventSource.java Adapts informer callbacks to new Optional<GenericResourceEvent> and propagates the event payload.
operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/GenericResourceEvent.java Introduces GenericResourceEvent payload used by the new filtering logic.
operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/EventFilterDetails.java Refactors tracking to store related events and “own” resourceVersions to decide propagation/summary behavior.
operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/controller/ControllerEventSource.java Updates controller event handling to use Optional<GenericResourceEvent> and the new delete handling path.
Comments suppressed due to low confidence (2)

operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/GenericResourceEvent.java:31

  • The identifier "lastStateUnknow" is misspelled (should be "lastStateUnknown"). Since it is exposed via a public getter, the typo leaks into call sites and makes the API harder to understand.
    operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/GenericResourceEvent.java:72
  • equals() ignores lastStateUnknow. If lastStateUnknow is semantically relevant (it changes delete semantics), equals/hashCode should be updated together to include it; otherwise consider documenting that it's intentionally excluded.

Comment on lines 159 to 161
log.debug(
"Propagating event for {}, resource with same version not result of a reconciliation.",
"Propagating event for {}, resource with same version not result of a our update.",
action);
Comment on lines +132 to +134
if (comp == 0) {
result = Optional.empty();
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true IMO, it can happen that we received a delete, them the resource was re-created, but will think about this.

@csviri

csviri commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

@shawkins was thinking about this more, and having optimistic locking would be a too strong restriction in my opinion, compared to that even not doing event filtering is a better option.

So I prepared this PR and the one that we can merge subsequently:
#3406

With these two we would cover this functionality by just filtering events directly related to our updates, or not do filtering when there is a relist going on.
The algorithms are comparable to the current one in terms of complexity, not significantly more complex.

If the complexity would be proven to an issue and we would still hit some problematic cases, as an alternative we could just do a heuristic that would be very simple:

  • cache the latest resource (that we do regardless of event filtering, no issues with that at all)
  • and filter only event where the new resource has the resource version of the cached resource.
  • maybe with some minimal information that we received already other event.

But for both alternatives, we at the end need to be aware that re-list happened and events might be lost, so regardless we would need this:
fabric8io/kubernetes-client#7899

please let me know what do you think. Will finalize the PRs today.

cc @manusa

@csviri

csviri commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Just one thought about the complexity, yes these algorithms are not trivial, but also would make it much easier for the users to implement reconciliation logic. So there is such complexity in the framework or at the end in some cases in the reconcilers, so having these problems solved by is where the frameworks are really useful.

@csviri csviri force-pushed the skip-only-update-event branch from cc08d4d to 51ecd96 Compare June 10, 2026 09:24
csviri added 9 commits June 10, 2026 11:25
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
csviri and others added 7 commits June 10, 2026 11:25
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
@csviri csviri force-pushed the skip-only-update-event branch from 51ecd96 to 2deff06 Compare June 10, 2026 09:25
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
@metacosm metacosm changed the title improve: filter only own updates for read-after-write-conistency improve: filter only own updates for read-after-write-consistency Jun 10, 2026
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
private final Boolean lastStateUnknow;

public ExtendedResourceEvent(
public GenericResourceEvent(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't do this kind of renames of public classes in minors

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strictly internal class. Maybe we should explicitly annotate public APIs

return;
}
if (resultEvent.orElseThrow().getAction() != ResourceAction.DELETED) {
log.warn("Non delete event received on onDelete handling. This should not happen.");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.warn("Non delete event received on onDelete handling. This should not happen.");
log.warn("An event which is not a delete event recieved in onDelete handling. This should not happen.");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should also log the real event received

@csviri csviri Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to log the whole event, that might even contain sensitive enformation (think of k8s Secrets). We already logging metadata with MDC.

Signed-off-by: Attila Mészáros <a_meszaros@apple.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/GenericResourceEvent.java:31

  • lastStateUnknow / getLastStateUnknow() are misspelled in a public class, which makes the API awkward and easy to misuse. Renaming to lastStateUnknown / getLastStateUnknown() (and updating call sites) would avoid baking the typo into the public surface.

Comment on lines +141 to +143
if (comp == 0) {
result = Optional.empty();
}
Comment on lines +125 to 133
var resultEvent =
temporaryResourceCache.onDeleteEvent(resource, deletedFinalStateUnknown);
if (resultEvent.isEmpty()) {
return;
}
if (resultEvent.orElseThrow().getAction() != ResourceAction.DELETED) {
log.warn("Non delete event received on onDelete handling. This should not happen.");
}
primaryToSecondaryIndex.onDelete(resource);
Comment on lines 165 to 168
} else if (eventAcceptedByFilter(action, newObject, oldObject)) {
log.debug(
"Propagating event for {}, resource with same version not result of a reconciliation.",
"Propagating event for {}, resource with same version not result of a our update.",
action);
Comment on lines 85 to 88
if (log.isDebugEnabled()) {
log.debug("Event received with action: {}", action);
log.trace("Event Old resource: {},\n new resource: {}", oldResource, resource);
log.debug("Event Old resource: {},\n new resource: {}", oldResource, resource);
}
csviri and others added 2 commits June 10, 2026 15:37
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filtering ONLY own events in read-after-write-consitency

3 participants