Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
58cd294
dx_evidence_graph: stub for the dx-agent <-> pixie viz contract
entlein Jun 17, 2026
d8439d5
dx_evidence_graph: lock to dx-agent's attackgraph.Edge schema + load_…
entlein Jun 17, 2026
51da435
dx_evidence_graph: real []Edge fixture from live log4shell + argocd v…
entlein Jun 17, 2026
fc2fcc4
dx_evidence_graph: render dx-agent's real fixture as cytoscape HTML
entlein Jun 17, 2026
8a73206
dx_evidence_graph: wire to forensic ClickHouse via px.DataFrame(click…
entlein Jun 17, 2026
4442480
dx_evidence_graph: address CodeRabbit review on PR #62
entlein Jun 17, 2026
7cbfd67
dx_evidence_graph: address dx-agent corrections so the bundle renders…
entlein Jun 18, 2026
12ca20f
Merge branch 'main' into entlein/dx-evidence-graph-viz
entlein Jun 18, 2026
a6231fe
ci: fix self-hosted runner label across release workflows
entlein Jun 18, 2026
fca42b1
licenses: stop failing the release build on missing go module licenses
entlein Jun 18, 2026
0373059
ui: surface real yarn build_prod stderr in failed bazel actions
entlein Jun 18, 2026
bc1de18
ui: tee + cat + env dump for ui_bundle action diagnosis
entlein Jun 18, 2026
558b37b
Revert "ui: tee + cat + env dump for ui_bundle action diagnosis"
entlein Jun 18, 2026
094c68f
Reapply "ui: tee + cat + env dump for ui_bundle action diagnosis"
entlein Jun 18, 2026
f69bc9d
ui: set use_default_shell_env on webpack actions so yarn/node resolve…
entlein Jun 18, 2026
bdcdcdc
ui: invoke yarn by absolute path in webpack actions
entlein Jun 18, 2026
fae07ac
ui: set -x in webpack actions to surface the silent-fail step
entlein Jun 18, 2026
563441e
ui: quote stamped status values when eval'ing into the action env
entlein Jun 18, 2026
82444cd
ui: whitelist stamp vars to STABLE_BUILD_TAG + BUILD_TIMESTAMP
entlein Jun 18, 2026
5d39c88
cockpit: point SCRIPT_BUNDLE_URLS at the cloud-proxy's baked bundle
entlein Jun 19, 2026
87b41f2
dx_evidence_graph: tighten script + bake the soc rig DSN as the default
entlein Jun 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/cli_release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
image-base-name: "dev_image_with_extras"
build-release:
name: Build Release
runs-on: oracle-16cpu-64gb-x86-64
runs-on: oracle-vm-16cpu-64gb-x86-64
needs: get-dev-image
permissions:
contents: read
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/cloud_release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
image-base-name: "dev_image_with_extras"
build-release:
name: Build Release
runs-on: oracle-16cpu-64gb-x86-64
runs-on: oracle-vm-16cpu-64gb-x86-64
needs: get-dev-image
permissions:
contents: read
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/mirror_deps.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
permissions:
contents: read
packages: write
runs-on: oracle-16cpu-64gb-x86-64
runs-on: oracle-vm-16cpu-64gb-x86-64
steps:
- uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v2
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/operator_release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
image-base-name: "dev_image_with_extras"
build-release:
name: Build Release
runs-on: oracle-16cpu-64gb-x86-64
runs-on: oracle-vm-16cpu-64gb-x86-64
needs: get-dev-image
permissions:
contents: read
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/vizier_release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
image-base-name: "dev_image_with_extras"
build-release:
name: Build Release
runs-on: oracle-16cpu-64gb-x86-64
runs-on: oracle-vm-16cpu-64gb-x86-64
needs: get-dev-image
permissions:
contents: read
Expand Down
47 changes: 37 additions & 10 deletions bazel/ui.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,14 @@
# This file contains rules for for our UI builds.

ui_shared_cmds_start = [
# set -x: trace every command so CI failure logs surface the actual
# failing step. Without this the action shell silently aborts with
# exit 1 and no indication which sub-command failed.
"set -x",
'export BASE_PATH="$(pwd)"',
"export PATH=/usr/local/bin:/opt/px_dev/tools/node/bin:$PATH",
'export HOME="$(mktemp -d)"', # This makes node-gyp happy.
"export PATH=/opt/px_dev/tools/node/bin:/usr/local/bin:$PATH",
"hash -r",
'export HOME="$(mktemp -d)"',
'export TMPPATH="$(mktemp -d)"',
]

Expand All @@ -38,7 +43,7 @@ def _pl_webpack_deps_impl(ctx):

cmd = ui_shared_cmds_start + cp_cmds + [
'pushd "$TMPPATH/src/ui" &> /dev/null',
"yarn install --immutable &> build.log",
"/opt/px_dev/tools/node/bin/yarn install --immutable &> build.log",
# Pick a deterministic mtime so that the output is not volatile.
# This helps ensure that bazel can cache the ui builds as expected.
'tar --mtime="2018-01-01 00:00:00 UTC" -czf "$BASE_PATH/{}" .'.format(out.path),
Expand All @@ -49,6 +54,10 @@ def _pl_webpack_deps_impl(ctx):
execution_requirements = {tag: "" for tag in ctx.attr.tags},
outputs = [out],
command = " && ".join(cmd),
# `--incompatible_strict_action_env` (.bazelrc) strips host PATH
# from actions, so yarn/node at /opt/px_dev/tools/node/bin aren't
# resolvable. Match how licenses.bzl + proto_compile.bzl handle it.
use_default_shell_env = True,
progress_message =
"Generating webpack deps %s" % out.short_path,
)
Expand All @@ -72,8 +81,15 @@ def _pl_webpack_library_impl(ctx):
# and apply it to the environment here. Hopefully,
# no special characters/spaces/quotes in the results ...
env_cmds = [
'$(sed -E "s/^([A-Za-z_]+)\\s*(.*)/export \\1=\\2/g" "{}")'.format(ctx.info_file.path),
'$(sed -E "s/^([A-Za-z_]+)\\s*(.*)/export \\1=\\2/g" "{}")'.format(ctx.version_file.path),
# Whitelist the stamp vars the action actually uses
# (webpack.config.js' EnvironmentPlugin reads STABLE_BUILD_TAG
# and BUILD_TIMESTAMP). The previous wildcard sed slurped
# FORMATTED_DATE too — its space-separated value
# ("2026 Jun 18 ...") word-split in $(...) command
# substitution and broke every action with
# "export: `18': not a valid identifier".
'$(sed -E -n "s/^(STABLE_BUILD_TAG|BUILD_TIMESTAMP)\\s+(.*)/export \\1=\\2/p" "{}")'.format(ctx.info_file.path),
'$(sed -E -n "s/^(STABLE_BUILD_TAG|BUILD_TIMESTAMP)\\s+(.*)/export \\1=\\2/p" "{}")'.format(ctx.version_file.path),
]
all_files.append(ctx.info_file)
all_files.append(ctx.version_file)
Expand All @@ -84,9 +100,12 @@ def _pl_webpack_library_impl(ctx):
'pushd "$TMPPATH/src/ui" &> /dev/null',
'tar -xzf "$BASE_PATH/{}"'.format(ctx.file.deps.path),
'mv -f "$BASE_PATH/{}" src/pages/credits/licenses.json'.format(ctx.file.licenses.path),
"retval=0",
"output=`yarn build_prod 2>&1` || retval=$?",
'[ "$retval" -eq 0 ] || (echo $output; echo "Build Failed with Code: $retval"; exit $retval)',
# Stream yarn output directly so failures surface a usable stderr
# in CI logs. Absolute path because --incompatible_strict_action_env
# makes bazel ignore our `export PATH` despite the dev image
# having yarn at this path. Children (webpack -> node) need PATH
# too so we don't strip the export above.
"/opt/px_dev/tools/node/bin/yarn build_prod",
'cp dist/bundle.tar.gz "$BASE_PATH/{}"'.format(out.path),
] + ui_shared_cmds_finish

Expand All @@ -95,6 +114,10 @@ def _pl_webpack_library_impl(ctx):
execution_requirements = {tag: "" for tag in ctx.attr.tags},
outputs = [out],
command = " && ".join(cmd),
# `--incompatible_strict_action_env` (.bazelrc) strips host PATH
# from actions, so yarn/node at /opt/px_dev/tools/node/bin aren't
# resolvable. Match how licenses.bzl + proto_compile.bzl handle it.
use_default_shell_env = True,
progress_message =
"Generating webpack bundle %s" % out.short_path,
)
Expand Down Expand Up @@ -161,15 +184,19 @@ def _pl_deps_licenses_impl(ctx):
'pushd "$TMPPATH/src/ui" &> /dev/null',
'export LIC_TMPPATH="$(mktemp -d)"',
'tar -xzf "$BASE_PATH/{}"'.format(ctx.file.deps.path),
"yarn license_check --excludePrivatePackages --production --json --out $LIC_TMPPATH/checker.json",
'yarn pnpify node ./tools/licenses/yarn_license_extractor.js --input=$LIC_TMPPATH/checker.json --output="$BASE_PATH/{}"'.format(out.path),
"/opt/px_dev/tools/node/bin/yarn license_check --excludePrivatePackages --production --json --out $LIC_TMPPATH/checker.json",
'/opt/px_dev/tools/node/bin/yarn pnpify node ./tools/licenses/yarn_license_extractor.js --input=$LIC_TMPPATH/checker.json --output="$BASE_PATH/{}"'.format(out.path),
] + ui_shared_cmds_finish

ctx.actions.run_shell(
inputs = all_files + ctx.files.deps,
execution_requirements = {tag: "" for tag in ctx.attr.tags},
outputs = [out],
command = " && ".join(cmd),
# `--incompatible_strict_action_env` strips host PATH from
# actions; yarn lives at /opt/px_dev/tools/node/bin in the
# dev image.
use_default_shell_env = True,
progress_message =
"Generating licenses %s" % out.short_path,
)
Expand Down
15 changes: 14 additions & 1 deletion private/cockpit/script_bundles_config.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,25 @@
---
# SCRIPT_BUNDLE_URLS is read by openresty (cloud-proxy_server_image) and
# injected into the UI's `window.__PIXIE_FLAGS__` (see
# k8s/cloud/base/proxy_nginx_config.yaml's sub_filter + set_by_lua_block).
# The UI's fetch resolves relative URLs against document.baseURI, which is
# always the cloud-proxy itself, so a relative `/bundle-oss.json` URL hits
# the bundle that the cloud-release pipeline bakes into the proxy image as
# a container layer (src/cloud/proxy/BUILD.bazel `script_bundle` ->
# /bundle/bundle-oss.json). nginx serves it at the path `/bundle-oss.json`
# from both server blocks (bare + work.* subdomain,
# k8s/cloud/base/proxy_nginx_config.yaml lines 270 + 342).
#
# Bottom line: every cloud-release tag's bundle now ships with the
# deployment, no separate update-script-bundle workflow needed.
apiVersion: v1
kind: ConfigMap
metadata:
name: pl-script-bundles-config
data:
SCRIPT_BUNDLE_URLS: >-
[
"https://k8sstormcenter.github.io/pixie/pxl_scripts/bundle.json"
"/bundle-oss.json"
]
SCRIPT_BUNDLE_DEV: "false"
PL_SCRIPT_MODIFICATION_DISABLED: "false"
114 changes: 114 additions & 0 deletions src/pxl_scripts/px/dx_evidence_graph/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# dx_evidence_graph

A Pixie UI dashboard that renders one dx-agent investigation as a
**severity-weighted, all-protocol pod-to-pod attack graph**. Replaces
the latency-weighted HTTP service map in `cluster_overview` for
security work.

* Nodes = pods. Falls back to service → IP, mirroring `net_flow_graph`.
* Edges = the attack path emitted by dx (delivery → egress →
execution → collection → exfil → pivot).
* Display spec: `vispb.Graph`. **`edgeWeightColumn = weight`**
(open-ended UInt16 sum of CRS severity → edge thickness),
**`edgeColorColumn = max_severity`** (discrete 2-5 heat → edge
colour).
* Read source: `forensic_db.dx_attack_graph` via `px.DataFrame`'s
`clickhouse_dsn` kwarg (`src/carnot/planner/objects/dataframe.cc:43`).

## Schema — `forensic_db.dx_attack_graph`

Locked with dx-agent in PR #62 / `entlein/dx#68`. The
`attackgraph.Edge` Go struct is the single source of truth for the
JSON wire format, the ClickHouse row, and the test fixture.

| Column | Type | Role |
|---|---|---|
| `investigation_id` | String | one graph per dx verdict / pivot incident (UI filter key) |
| `ts` | UInt64 | unix nanos |
| `requestor_pod` / `responder_pod` | String | the hop (`ns/pod`); `""` if only an IP is known |
| `requestor_service` / `responder_service` | String | |
| `requestor_ip` / `responder_ip` | String | peer IP when pod unresolved |
| `weight` | UInt16 | Σ CRS severity on the hop — `edgeWeightColumn` |
| `max_severity` | UInt8 | top single-criterion severity (2-5) — `edgeColorColumn` |
| `confidence` | Float32 | verdict confidence |
| `edge_kind` | String | `delivery`/`egress`/`execution`/`collection`/`exfil`/`pivot` |
| `condition` / `criteria` | String | ruled-in condition + criterion label(s) |
| `num_findings` | UInt32 | |

Table DDL (mirrors `kubescape_logs` partition/TTL convention):

```sql
CREATE TABLE forensic_db.dx_attack_graph ( ...columns above... )
ENGINE = MergeTree
PARTITION BY toYYYYMM(fromUnixTimestamp64Nano(ts))
ORDER BY (investigation_id, requestor_pod, responder_pod)
TTL toDateTime(fromUnixTimestamp64Nano(ts)) + INTERVAL 30 DAY DELETE;
```

## Per-rig ClickHouse DSN

The bundled `vis.json` ships with `clickhouse_dsn` **empty** — the
default is intentionally non-credentialed so the bundle stays
portable across clusters. Operators fill the DSN in via the Pixie
UI script-args panel at run time.

For the in-cluster soc deployment the DSN is:

```
forensic_analyst:changeme-analyst@clickhouse-forensic-soc-db.clickhouse.svc.cluster.local:9000/forensic_db
```
Comment on lines +55 to +59

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove password-bearing DSN examples from documentation.

Line 58 publishes a credential-bearing DSN pattern (user:pass@...) with a concrete password token. Even in docs, this tends to get copied into runtime configs and weakens the security baseline.

Suggested doc change
-For the in-cluster soc deployment the DSN is:
+For the in-cluster soc deployment, provide DSN via UI script args (no embedded secrets), e.g.:

 ```text
-forensic_analyst:changeme-analyst@clickhouse-forensic-soc-db.clickhouse.svc.cluster.local:9000/forensic_db
+forensic_analyst@clickhouse-forensic-soc-db.clickhouse.svc.cluster.local:9000/forensic_db
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion
For the in-cluster soc deployment, provide DSN via UI script args (no embedded secrets), e.g.:

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 57-57: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pxl_scripts/px/dx_evidence_graph/README.md` around lines 55 - 59, The
README.md file contains a DSN example for in-cluster soc deployment that
includes a plaintext password credential (changeme-analyst) in the connection
string. Remove the password segment from the DSN example in the "For the
in-cluster soc deployment the DSN is:" section by deleting the colon and
password portion before the @ symbol, leaving only the username and host
information. This prevents hardcoded credentials from being copied into runtime
configurations.

Comment on lines +57 to +59

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced DSN block.

Line 57 opens a fenced code block without a language, which violates MD040 and may fail markdown lint gates.

Suggested fix
-```
+```text
 forensic_analyst:changeme-analyst@clickhouse-forensic-soc-db.clickhouse.svc.cluster.local:9000/forensic_db
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 57-57: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @src/pxl_scripts/px/dx_evidence_graph/README.md around lines 57 - 59, The
fenced code block containing the ClickHouse forensic database DSN example
(starting with forensic_analyst:changeme-analyst@...) is missing a language tag
on the opening fence, which violates the MD040 markdown lint rule. Add the
"text" language identifier to the opening triple backticks (change ``` to

Source: Linters/SAST tools


`forensic_analyst` has read-only SELECT on `forensic_db`; same
credential the existing `soc/analysis/px_clickhouse/kubescape/observe.pxl`
script uses for `kubescape_logs`. Override in the UI for other rigs.

## Manual-load prototype

`tools/load_prototype/` is a Go helper that renders the `Edge`
schema from a JSON fixture into a standalone HTML page using
cytoscape.js. Same column→visual mapping the production
`vispb.Graph` spec uses. Useful when ClickHouse isn't reachable
from the UI (offline review, fixture validation).

```bash
go run ./tools/load_prototype \
-fixture fixtures/sample.json \
-investigation_id log4shell-6a32ea57 \
-out /tmp/dx_log4shell.html
```

The fixture in `fixtures/sample.json` is dx-agent's real
log4shell + argocd verdicts from the rig run that locked the
schema. `fixtures/screenshots/dx_log4shell.html` and
`fixtures/screenshots/dx_argocd.html` are the pre-rendered pages
for review without running the tool.

The tool retires once the AE live-write (`WriteAttackGraph` →
`forensic_db.dx_attack_graph`) is on every cluster running this
bundle.

## Deploy

Bundle build path:

1. `//src/pxl_scripts:script_bundle` walks every `*.pxl` + `vis.json`
under `src/pxl_scripts/` and emits `bundle-oss.json`
(`src/pxl_scripts/BUILD.bazel:34`).
2. `//src/cloud/proxy:proxy_server_image` bakes the bundle in as a
container layer at `/bundle`
(`src/cloud/proxy/BUILD.bazel:36`).
3. `skaffold run -f skaffold/skaffold_cloud.yaml` rebuilds the
cloud-proxy image and applies the Deployment.

Vizier / PEM / standalone-pem images are unaffected — this is a
UI-bundle-only change.

## Out of scope for v1

* `conn_stats` overlay (the "render the benign neighbourhood + light
up the attack path" view). Ship the attack-path-only graph first;
add the join in v2 once the visual has been used on a real
incident.
* Time anchoring relative to `ts` rather than free-form `start_time`.
Operators today use `-15m` defaults; a future widget could centre
the window on the investigation's first `ts`.
34 changes: 34 additions & 0 deletions src/pxl_scripts/px/dx_evidence_graph/dx_evidence_graph.pxl
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Copyright 2018- The Pixie Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# SPDX-License-Identifier: Apache-2.0

''' DX Attack Graph: pod-to-pod hops weighted by dx evidence severity. '''
import px


def dx_attack_graph(start_time: str, clickhouse_dsn: str):
''' Read forensic_db.dx_attack_graph and return the edge columns.
Args:
@start_time: e.g. "-15m".
@clickhouse_dsn: user:pass@host:port/db.
'''
df = px.DataFrame('dx_attack_graph',
clickhouse_dsn=clickhouse_dsn,
start_time=start_time)
return df[['requestor_pod', 'responder_pod',
'requestor_service', 'responder_service',
'requestor_ip', 'responder_ip',
'weight', 'max_severity', 'confidence',
'edge_kind', 'condition', 'criteria', 'num_findings']]
Loading
Loading