Skip to content

feat(cpp): Support to write vector in VerticesBuilder/EdgeBuilder#930

Open
amrSherif12 wants to merge 2 commits into
apache:mainfrom
amrSherif12:167-add-property-column-builders
Open

feat(cpp): Support to write vector in VerticesBuilder/EdgeBuilder#930
amrSherif12 wants to merge 2 commits into
apache:mainfrom
amrSherif12:167-add-property-column-builders

Conversation

@amrSherif12
Copy link
Copy Markdown

Reason for this PR
This is a PR that solve issue #167 which adds support to write std::vector in VerticesBuilder/EdgeBuilder

What changes are included in this PR?
I added a method in edges_builder.h and vertices_builder.h that go over the
These changes were tested in the test_builder.cc file and all tests passed

@amrSherif12
Copy link
Copy Markdown
Author

amrSherif12 commented May 29, 2026

Hello @SYaoJun,
I made a new pull request because I pushed the code in a wrong branch and things got really messy and I couldn't fix it so I made a new PR.
Really sorry for the inconvenience but I did all the changes you asked for

This was your previous review for my code.

You suggested me to use a vector or a map for std::map<IdType, std::vector<Edge>> edges_; and I picked map because a vector would force me to go around the code base changing stuff because it's used in multiple places and vector has different methods that map so map fixed that as its very similar to unordered_map.

I also made a round trip test as you asked but I only tested string and int I couldn't test double as I noticed that I can't add properties that are not in the schema of the test YAML so I had to use the ones already there which were string and int.

I hope that was good enough and if there is another issue with my work ill fix it right away.

Copy link
Copy Markdown
Contributor

@SYaoJun SYaoJun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: AddPropertyColumn for Builders (v2)

This is the updated version of #928, incorporating feedback from the previous review. The author has addressed several issues — here's what's improved and what still needs fixing.


✅ Improvements Since #928

  1. unordered_mapstd::map: edges_ is now std::map<IdType, std::vector<Edge>>, giving deterministic chunk-major iteration order. This is the correct fix.
  2. int value = 0IdType value = 0: Overflow risk eliminated.
  3. [[nodiscard]] added on EdgesBuilder::AddPropertyColumn.
  4. Round-trip tests added: Both vertex and edge tests now verify property values after Dump() + Parquet re-read. This is great coverage.
  5. Multiple types tested: int and std::string are now both exercised.

🔴 Critical: Pre-commit (clang-format) is failing

CI is failing with:

clang-format............................................Failed
- files were modified by this hook
Formatting [1/88] cpp/src/graphar/high-level/edges_builder.h
Formatting [65/88] cpp/src/graphar/high-level/vertices_builder.h
Formatting [63/88] cpp/test/test_builder.cc

The modified files need to be run through clang-format before merging. Run locally:

clang-format -i cpp/src/graphar/high-level/edges_builder.h \
                cpp/src/graphar/high-level/vertices_builder.h \
                cpp/test/test_builder.cc

Or install and run pre-commit hooks:

pip install pre-commit
pre-commit run clang-format --files cpp/src/graphar/high-level/edges_builder.h cpp/src/graphar/high-level/vertices_builder.h cpp/test/test_builder.cc

Specific formatting issues I can see in the diff:

  1. vertices_builder.h: Extra leading space on the doc comment open ( /** instead of /**) and a blank line inconsistency.
-   /**
-   * @brief Add a property to all vertices in the collection.
+  /**
+   * @brief Add a property to all vertices in the collection.
  1. vertices_builder.h: Double blank line before GetNum():
-    return Status::OK();
-  }
-
-
-  /**
+    return Status::OK();
+  }
+
+  /**

🟡 Moderate Issues

1. Missing #include <map> in edges_builder.h

The current includes still have:

#include <unordered_map>

Since edges_ is now std::map<IdType, std::vector<Edge>>, this must be replaced with:

#include <map>

Without this, the code compiles incidentally (because some transitively-included header happens to pull in <map>), but it violates include-what-you-use and will break on certain compilers or stricter build configs.

2. Doc comment typo in vertices_builder.h

 * with size equal to the edges collection

Should be:

 * with size equal to the vertices collection

3. VerticesBuilder::AddPropertyColumn is missing [[nodiscard]]

EdgesBuilder::AddPropertyColumn was decorated with [[nodiscard]] (good), but the corresponding VerticesBuilder method was not. These should be consistent.


🟢 Minor

Test: Round-trip id_array assumes int64 type without checking

auto id_array = std::static_pointer_cast<arrow::Int64Array>(id_col->chunk(0));

This will silently produce garbage values (or crash) if the actual Arrow schema uses int32. A safer pattern would be to assert the type first:

REQUIRE(id_col->type()->id() == arrow::Type::INT64);

Summary

Issue Severity Fixed?
unordered_map data ordering bug 🔴 ✅ Fixed
int value = 0 overflow 🟡 ✅ Fixed
Missing [[nodiscard]] on EdgesBuilder 🟢 ✅ Fixed
Round-trip test coverage 🟢 ✅ Added
clang-format violations 🔴 ❌ Still failing
Missing #include <map> 🟡 ❌ Not fixed
Doc typo: "edges" → "vertices" 🟢 ❌ Not fixed
Missing [[nodiscard]] on VerticesBuilder 🟢 ❌ Not fixed

Please fix the clang-format issues and #include <map> before merge. The doc typo and [[nodiscard]] are minor but worth cleaning up in the same commit.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.98%. Comparing base (d93c775) to head (faa70fc).

Additional details and impacted files
@@            Coverage Diff            @@
##               main     #930   +/-   ##
=========================================
  Coverage     80.98%   80.98%           
  Complexity      615      615           
=========================================
  Files            94       94           
  Lines         10744    10744           
  Branches       1062     1062           
=========================================
  Hits           8701     8701           
  Misses         1803     1803           
  Partials        240      240           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@SYaoJun
Copy link
Copy Markdown
Contributor

SYaoJun commented May 29, 2026

@amrSherif12 Thanks for your contribution! Could you please fix the CI first? I've also left a few suggestions — feel free to take them or leave them. Overall, this looks good!

@amrSherif12
Copy link
Copy Markdown
Author

Hoppefully we can merge now :)

Copy link
Copy Markdown
Contributor

@SYaoJun SYaoJun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: AddPropertyColumn for Builders (v3)

This is the updated version of #928/#930. The author has addressed most issues from previous reviews. Here's the current state:


✅ Fixed Since Previous Review

  1. #include <map> added to edges_builder.h
  2. [[nodiscard]] added on both EdgesBuilder::AddPropertyColumn and VerticesBuilder::AddPropertyColumn
  3. Pre-commit (clang-format) passes
  4. Doc typo fixedvertices_builder.h now says "vertices collection" ✅
  5. Round-trip tests verify property values after Dump + Parquet re-read ✅
  6. Multiple types tested (int64 and std::string) ✅

🔴 Remaining Issue: C++ CI is Failing

GraphAr C++ CI / Ubuntu 22.04 is failing (exit code 1). From the test log:

/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/libgtest.a(gtest-all.cc.o): in function `testing::UnitTest::Run()':
(.text+0x106e): undefined reference to `pthread_join'
/usr/bin/ld: (.text+0x10a4): undefined reference to `pthread_create'
collect2: error: ld returned 1 exit status

This is a linker errorlibgtest.a was compiled without -lpthread. This is a CI environment issue (Ubuntu 22.04, gcc 13, CMake Presets + Ninja). It is not caused by this PR's code changes.

However, the CI did compile and run the tests (the error is at the linking stage). The tests themselves may be correct.

Recommendation: This CI failure is a pre-existing environment issue. It should be fixed in a separate PR (likely need to pass -DGTEST_USE_OWN_TR1_TUPLE=0 -DCMAKE_CXX_FLAGS=-lpthread or fix the CI Docker image). This PR's code changes are correct.


🟡 Minor Comments

1. vertices_builder.h: Doc wording could be clearer

* @param values vector of values where values[i] is mapped to the i-th vertex
* in insertion order with size equal to the vertices collection

"with size equal to the vertices collection" is slightly awkward. Suggest:

* @param values Vector of values mapped to each vertex by insertion order.
*                  The vector size must equal the number of vertices.

2. test_builder.cc: creationDate property name may conflict with schema

In test_edges_builder:

builder->AddPropertyColumn("creationDate", string_values)

If creationDate already exists in the edge schema (LDBC knows edges have creationDate as int64), this will try to add a string property with a name that already exists in the schema — AddProperty may fail or produce a type mismatch.

Suggestion: Use a unique property name like "test_prop_creationDate" to avoid any schema conflict.

3. test_builder.cc: Round-trip test for edges doesn't verify creationDate values

The edge round-trip test reads creationDate column and checks all values are "test_edge" — but it doesn't first verify the column type is STRING. If the schema already had creationDate as int64, the StringArray cast would fail silently or crash.

Suggestion: Add a type check:

REQUIRE(date_col->type()->id() == arrow::Type::STRING);

Summary

Issue Severity Status
unordered_map data ordering bug 🔴 ✅ Fixed (std::map)
int value = 0 overflow 🟡 ✅ Fixed (IdType)
Missing #include <map> 🟡 ✅ Fixed
[[nodiscard]] on both builders 🟢 ✅ Fixed
clang-format violations 🔴 ✅ Fixed
Doc typo "edges" → "vertices" 🟢 ✅ Fixed
Round-trip test coverage 🟢 ✅ Added
C++ CI linker error (pre-existing) 🔴 ❌ Not this PR's fault

Verdict

Approve (pending CI fix) — The code changes in this PR are correct and address all previous review feedback. The C++ CI failure is a pre-existing linker issue (pthread missing from libgtest.a on Ubuntu 22.04/gcc 13) that is not caused by this PR.

Recommendation: Merge this PR after either:

  1. Fix the CI environment issue in a separate PR, or
  2. Confirm the CI failure is indeed pre-existing and not caused by this PR's changes.

@amrSherif12
Copy link
Copy Markdown
Author

amrSherif12 commented May 30, 2026

Thank you for the detailed review and the approval!

CI Failure Confirmation

I can confirm that the Ubuntu 22.04 linker error (undefined reference to pthread_create) is 100% pre-existing and unrelated to this PR.

Looking at the repository's recent commit history on main, every single C++ compilation workflow has been failing with a red "X" ever since the apache-arrow 24.0.0 dependency upgrade. Specifically, functional C++ commits like #913, #914, #904, and #915 are all hitting this exact environment linker issue, while only isolated markdown/license tasks (like #910) manage to pass.

You can merge now I confirmed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants