refactor(crawler): tighten CrawlerThread hot path#168
Open
marevol wants to merge 1 commit into
Open
Conversation
Replace stream/lambda boilerplate with explicit iteration in places called per crawl, reducing allocations and improving readability. - CrawlerThread.storeChildUrls: convert filter/map/collect pipeline to a single for-loop; pre-size HashSet and ArrayList. - CrawlerClientCreator.register: replace stream().forEach and Collection.forEach lambdas with enhanced for-loops. - UrlFilterImpl.init: drop stream().collect(Collectors.toList()) in favor of new ArrayList<>(set). No behavior changes; existing UrlFilterImplTest and CrawlerThreadTest suites pass unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reduce stream/lambda allocations on per-crawl hot paths and small init code, mirroring the same direction as fess#3134. No behavior changes.
Changes
CrawlerThread.storeChildUrls: Convert thestream().filter(...).map(...).collect(Collectors.toList())pipeline into a singleforloop. Pre-size both the dedupHashSetand the resultingArrayListfromchildUrlList.size(). Reuses the URL value via a local variable instead of three repeatedd.getUrl()calls. This runs once per crawled page that yields children.CrawlerClientCreator.register(...): ReplaceclientMap.entrySet().stream().forEach(...)andclientFactoryList.forEach(...)with enhancedforloops. Both methods aresynchronizedand run during client registration; the change just removes lambda/stream object allocations.UrlFilterImpl.init: ReplacecachedXxxSet.stream().collect(Collectors.toList())withnew ArrayList<>(set). Drops thejava.util.stream.Collectorsimport.What was intentionally NOT changed
LinkedHashSet,HashSet,ArrayList) are kept the same; only the way they are populated changes.CrawlerThreadis untouched.Test plan
mvn compile— succeedsmvn test -Dtest=UrlFilterImplTest,CrawlerThreadTest— 34/34 pass (noCrawlerClientCreatorTestexists in this repo)mvn formatter:format && mvn license:format— applied, clean🤖 Generated with Claude Code