feat(profiling): Complete OTLP profiles implementation with JFR conversion pipeline#10098
Draft
feat(profiling): Complete OTLP profiles implementation with JFR conversion pipeline#10098
Conversation
c98c2a0 to
52c579e
Compare
Contributor
|
This pull request has been marked as stale because it has not had activity over the past quarter. It will be closed in 7 days if no further activity occurs. Feel free to reopen the PR if you are still working on it. |
Add profiling-otel module with core infrastructure for JFR to OTLP profiles conversion: - Dictionary tables for OTLP compression (StringTable, FunctionTable, LocationTable, StackTable, LinkTable, AttributeTable) - ProtobufEncoder for hand-coded protobuf wire format encoding - OtlpProtoFields constants for OTLP profiles proto field numbers - Unit tests for all dictionary tables and encoder - Architecture documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add JMH benchmark filtering via -PjmhIncludes property in build.gradle.kts
- Update JfrToOtlpConverterBenchmark parameters to {50, 500, 5000} events
- Run comprehensive benchmarks and document actual performance results
- Update BENCHMARKS.md with measured throughput data (Apple M3 Max)
- Update ARCHITECTURE.md with performance characteristics
- Key findings: Stack depth is primary bottleneck (~60% reduction per 10x increase)
- Linear scaling with event count, minimal impact from context count
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
…pport Reverted Phase 1 optimization attempts that showed no improvement: - Removed tryGetExisting() optimization from JfrToOtlpConverter - Deleted tryGetExisting() method from FunctionTable - The optimization added overhead (2 FunctionKey allocations vs 1) Added JMH profiling support: - Added profiling configuration to build.gradle.kts - Enable with -PjmhProfile=true flag - Configures stack profiler (CPU sampling) and GC profiler (allocations) Profiling results reveal actual bottlenecks: - JFR File I/O: ~20% (jafar-parser, external dependency) - Protobuf encoding: ~5% (fundamental serialization cost) - Conversion logic: ~3% (our code) - Dictionary operations: ~1-2% (NOT the bottleneck) Key findings: - Dictionary operations already well-optimized at ~1-2% of runtime - Modern JVM escape analysis optimizes temporary allocations - Stack depth is dominant factor (O(n) frame processing) - HashMap lookups (~10-20ns) dominated by I/O overhead Updated documentation: - BENCHMARKS.md: Added profiling section with findings - ARCHITECTURE.md: Added profiling support and results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…ant pool IDs Leverage JFR's internal stack trace deduplication by caching conversions based on constant pool IDs. This avoids redundant processing of identical stack traces that appear multiple times in profiling data. Implementation: - Add @JfrField(raw=true) stackTraceId() methods to all event interfaces (ExecutionSample, MethodSample, ObjectSample, JavaMonitorEnter, JavaMonitorWait) - Implement HashMap cache in JfrToOtlpConverter with lazy stack trace resolution - Cache key combines stackTraceId XOR (identityHashCode(chunkInfo) << 32) for chunk-unique identification - Modify convertStackTrace() to accept Supplier<JfrStackTrace> and check cache before resolution - Update all event handlers to pass method references (event::stackTrace) instead of resolved stacks - Add stackDuplicationPercent parameter to JfrToOtlpConverterBenchmark (0%, 70%, 90%) - Document Phase 5.6: Stack Trace Deduplication Optimization in ARCHITECTURE.md Performance Results: - 0% stack duplication: 8.1 ops/s (baseline, no cache benefit) - 70% stack duplication: 14.4 ops/s (+78% improvement, typical production workload) - 90% stack duplication: 20.5 ops/s (+153% improvement, 2.5x faster for hot-path heavy workloads) All 82 tests pass. Zero overhead for unique stacks, significant gains for realistic duplication patterns. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…n Docker unavailable Use @testcontainers(disabledWithoutDocker = true) to automatically skip OtlpCollectorValidationTest when Docker is not available instead of failing with IllegalStateException. This allows the test suite to pass cleanly in environments without Docker while still running all other tests. When Docker is available, these tests will run normally. Result: 82 tests pass, Docker tests gracefully skipped when unavailable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Implement support for OTLP profiles original_payload and original_payload_format fields (fields 9 and 10) to include source JFR recording(s) in OTLP output for debugging and compliance verification. Key features: - Zero-copy streaming architecture using SequenceInputStream - Automatic uber-JFR concatenation for multiple recordings - Disabled by default per OTLP spec recommendation (size considerations) - Fluent API: setIncludeOriginalPayload(boolean) Implementation details: - Enhanced ProtobufEncoder with streaming writeBytesField(InputStream, long) method - Single file optimization: direct FileInputStream - Multiple files: SequenceInputStream chains files with zero memory overhead - Streams data in 8KB chunks directly into protobuf output Test coverage: - Default behavior verification (payload disabled) - Single file with payload enabled - Multiple files creating uber-JFR concatenation - Setting persistence across converter reuse Documentation: - Added Phase 6 to ARCHITECTURE.md with usage examples, design decisions, and performance characteristics - Centralized jafar-parser dependency version in gradle/libs.versions.toml 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…uration constants Implement foundation for parallel OTLP profile uploads alongside JFR format. **Step 1: RecordingData Reference Counting** Add thread-safe reference counting to support multiple listeners accessing the same RecordingData: - Add AtomicInteger refCount and volatile boolean released flag - Add retain() method to increment reference count before passing to additional listeners - Make release() final with automatic reference counting (decrements and calls doRelease at 0) - Add protected doRelease() for actual cleanup (called when refcount reaches 0) - Update all implementations: OpenJdkRecordingData, DatadogProfilerRecordingData, OracleJdkRecordingData, CompositeRecordingData Reference counting pattern enables multiple uploaders (JFR + OTLP) to safely share RecordingData without double-release or resource leaks. Each listener calls retain() before use and release() when done. Actual cleanup happens only when refcount reaches zero. **Step 2: OTLP Configuration Constants** Add configuration property keys to ProfilingConfig for OTLP profile format support: - profiling.otlp.enabled (default: false) - Enable parallel OTLP upload - profiling.otlp.include.original.payload (default: false) - Embed source JFR in OTLP - profiling.otlp.url (default: "") - OTLP endpoint URL (empty = derive from agent URL) - profiling.otlp.compression (default: "gzip") - Compression type for OTLP upload Configuration will be read directly from ConfigProvider in OtlpProfileUploader for testability. Next steps: - Step 3: Implement OtlpProfileUploader class (reads config from ConfigProvider) - Step 4: Integrate with ProfilingAgent - Step 5: Add tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add OtlpProfileUploader class implementing RecordingDataListener - Read configuration from ConfigProvider for testability - Support GZIP compression (configurable via boolean flag) - Use JfrToOtlpConverter to transform JFR recordings to OTLP format - Derive OTLP endpoint from agent URL (port 4318, /v1/profiles) - Handle both synchronous and asynchronous uploads - Use TempLocationManager for temp file creation - Add profiling-otel dependency to profiling-uploader module - Add basic unit tests for OtlpProfileUploader Configuration options: - profiling.otlp.enabled (default: false) - profiling.otlp.url (default: derived from agent URL) - profiling.otlp.compression.enabled (default: true) - profiling.otlp.include.original.payload (default: false) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…e counting Integrate OtlpProfileUploader into ProfilingAgent to enable parallel JFR and OTLP profile uploads when configured. Implements explicit reference counting pattern for RecordingData to safely support multiple concurrent handlers. Key changes: 1. ProfilingAgent integration: - Add OtlpProfileUploader alongside ProfileUploader - Extract handler methods (handleRecordingData, handleRecordingDataWithDump) - Use method references instead of capturing lambdas for better performance - Call retain() once for each handler (dumper, OTLP, JFR) - Update shutdown hooks to properly cleanup OTLP uploader 2. Explicit reference counting in RecordingData: - Change initial refcount from 1 to 0 for clarity - Each handler must call retain() before processing - Each handler calls release() when done - doRelease() called only when refcount reaches 0 - Updated javadocs to reflect explicit counting pattern 3. Comprehensive test coverage: - RecordingDataRefCountingTest validates all handler combinations - Tests single, dual, and triple handler scenarios - Verifies thread-safety with concurrent handlers - Tests error conditions (premature release, retain after release) - Confirms idempotent release behavior Benefits: - Symmetric treatment of all handlers (no special first handler) - Clear, explicit reference counting (easier to understand and verify) - No resource leaks or premature cleanup - Efficient method references (no lambda capture overhead) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Include OTLP profiles converter and its dependencies in the agent-profiling uber JAR for integration into dd-java-agent.jar. The profiling-otel module and its jafar-parser dependency are now bundled, while shared dependencies (internal-api, components:json) are correctly excluded via the existing excludeShared configuration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add command-line interface for testing and validating JFR to OTLP
conversions with real profiling data.
Features:
- Convert single or multiple JFR files to OTLP protobuf or JSON
- Include original JFR payload for validation (optional)
- Merge multiple recordings into single output
- Detailed conversion statistics
Usage:
./gradlew :dd-java-agent:agent-profiling:profiling-otel:convertJfr \
-Pargs="recording.jfr output.pb"
./gradlew :dd-java-agent:agent-profiling:profiling-otel:convertJfr \
-Pargs="--json recording.jfr output.json"
See doc/CLI.md for complete documentation and examples.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add --pretty flag to control JSON pretty-printing in the CLI converter. By default, JSON output is compact for efficient processing. Use --pretty for human-readable output with indentation. Usage: # Compact JSON (default) ./gradlew convertJfr --args="--json input.jfr output.json" # Pretty-printed JSON ./gradlew convertJfr --args="--json --pretty input.jfr output.json" The pretty-printer is a simple, dependency-free implementation that adds newlines and 2-space indentation without external libraries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Integrates OpenTelemetry's profcheck tool to validate OTLP profiles conform to the specification. This provides automated conformance testing and helps catch encoding bugs early. Key additions: - Docker-based profcheck integration (docker/Dockerfile.profcheck) - Gradle tasks for building profcheck image and validation - ProfcheckValidationTest with Testcontainers integration - Comprehensive documentation in PROFCHECK_INTEGRATION.md Gradle tasks: - buildProfcheck: Builds profcheck Docker image from upstream PR - validateOtlp: Validates OTLP files using profcheck - Auto-build profcheck image before tests tagged with @tag("docker") Test results: - ✅ testEmptyProfile: Passes validation - ✅ testAllocationProfile: Passes validation - ❌ testCpuProfile: Revealed stack_index out of range bugs - ❌ testMixedProfile: Revealed protobuf wire-format encoding bugs The test failures are expected and valuable - they uncovered real bugs in the OTLP encoder that need to be fixed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Dictionary tables (location, function, link, stack, attribute) were omitting their required index 0 sentinel entries from the wire format, causing profcheck validation failures. Root cause: 1. Dictionary loops started at i=1 instead of i=0, skipping sentinels 2. ProtobufEncoder.writeNestedMessage() had an if (length > 0) check that completely skipped writing empty messages 3. Sentinel entries encode as empty messages (all fields are 0/empty) 4. Result: Index 0 was not present in wire format, causing off-by-one array indexing errors in profcheck validation Fix: - Changed ProtobufEncoder.writeNestedMessage() to always write tag+length even for empty messages (required for sentinels) - Changed all dictionary table loops to start from i=0 to include sentinels - Added attribute_table encoding (was completely missing) - Updated JSON encoding to match protobuf encoding - Fixed test to use correct event type (datadog.ObjectSample) All profcheck validation tests now pass with "conformance checks passed". 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…rter
This commit adds support for mapping JFR event attributes to OTLP profile
sample attributes, enabling richer profiling data with contextual metadata.
Key changes:
1. Sample Attributes Implementation:
- Added attributeIndices field to SampleData class
- Implemented getSampleTypeAttributeIndex() helper for creating sample type attributes
- Updated all event handlers (CPU, allocation, lock) to include sample.type attribute
- Uses packed repeated int32 format for attribute_indices per proto3 spec
2. ObjectSample Enhancements:
- Added objectClass, size, and weight fields to ObjectSample interface
- Implemented upscaling: sample value = size * weight
- Added alloc.class attribute for allocation profiling
- Maintains backwards compatibility with allocationSize field
3. OTLP Proto Field Number Corrections:
- Fixed Sample field numbers to match official Go module proto:
* stack_index = 1
* values = 2 (was 4)
* attribute_indices = 3 (was 2)
* link_index = 4 (was 3)
* timestamps_unix_nano = 5 (was 5)
- Corrects discrepancy between proto file and generated Go code
4. Dual Validation System:
- Updated Dockerfile.profcheck to include both protoc and profcheck
- Created validate-profile wrapper script
- Protoc validation is authoritative (official Protocol Buffers compiler)
- Profcheck warnings are captured but don't fail builds
- Documents known profcheck timestamp validation issues
5. Test Updates:
- Updated smoke tests to use new ObjectSample fields (size, weight)
- Modified validation tests to check for protoc validation success
- All validation tests passing with spec-compliant output
Design decisions:
- Measurements (duration, size*weight) are stored as sample VALUES
- Labels/metadata (sample.type, alloc.class) are stored as ATTRIBUTES
- AttributeTable provides automatic deduplication via internString()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed profcheck timestamp validation errors and made profcheck validation mandatory to pass alongside protoc validation. Timestamp Issues Fixed: - Removed manual startTime field assignments in all test JFR events - Manual timestamps were being interpreted as JFR ticks (not epoch nanos) - Let JFR recording system automatically assign correct timestamps - JFR auto-timestamps are properly converted via chunkInfo.asInstant() Validation Changes: - Made profcheck validation mandatory (previously only protoc was required) - Updated validation script to require both protoc AND profcheck to pass - Removed special handling for "known attribute_indices bug" (now fixed) - Updated test assertions to verify both validators pass - Both validators now cleanly pass for all test profiles Result: Complete OTLP profiles spec compliance with both: - protoc (official Protocol Buffers compiler) - structural validation - profcheck (OpenTelemetry conformance checker) - semantic validation All tests passing: empty, CPU, allocation, and mixed profiles. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added convert-jfr.sh script that provides a simplified interface for converting JFR files to OTLP format without needing to remember Gradle task paths. Features: - Automatic compilation if needed - Simplified command-line interface - Colored output for better visibility - File size reporting - Comprehensive help message - Error handling with clear messages Usage: ./convert-jfr.sh recording.jfr output.pb ./convert-jfr.sh --json recording.jfr output.json ./convert-jfr.sh --pretty recording.jfr output.json ./convert-jfr.sh file1.jfr file2.jfr merged.pb Updated CLI.md documentation with: - Quick start section featuring the convenience script - Complete usage examples - Feature list and when to use the script vs Gradle directly The script wraps the existing Gradle convertJfr task, providing a more user-friendly interface for development and testing workflows. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Enhanced the conversion script with detailed diagnostic output showing: - Input file sizes (individual and total) - Output file size - Wall-clock conversion time - Compression ratio (output vs input size) - Space savings (bytes and percentage) Usage: ./convert-jfr.sh --diagnostics recording.jfr output.pb Example output: [DIAG] Input: recording.jfr (89.3KB) [DIAG] Total input size: 89.3KB [DIAG] === Conversion Diagnostics === [DIAG] Wall time: 127.3ms [DIAG] Output size: 45.2KB [DIAG] Size ratio: 50.6% of input [DIAG] Savings: 44.1KB (49.4% reduction) Features: - Cross-platform file size detection (macOS and Linux) - Nanosecond-precision timing - Human-readable size formatting (B, KB, MB, GB) - Automatic compression ratio calculation - Color-coded diagnostic output (cyan) Updated CLI.md with: - --diagnostics option documentation - Example output showing diagnostic information - Updated feature list 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…iagnostics Added convert-jfr.sh convenience wrapper for JFR to OTLP conversion with comprehensive diagnostic output and cross-platform compatibility. Features: - Simple CLI interface wrapping Gradle convertJfr task - Support for all converter options (--json, --pretty, --include-payload) - --diagnostics flag showing detailed metrics: * Input/output file sizes with human-readable formatting * Actual conversion time (parsed from converter output) * Compression ratios and savings - Colored output for better readability - Cross-platform file size detection (Linux and macOS) - Automatic compilation via Gradle Implementation: - Parses converter's own timing output to show actual conversion time (e.g., 141ms) instead of total Gradle execution time (13+ seconds) - Uses try-fallback approach for stat command (GNU stat → BSD stat) - Works on Linux, macOS with GNU coreutils, and native macOS Documentation: - Added "Convenience Script" section to doc/CLI.md - Usage examples and feature list - Diagnostic output examples Example: ./convert-jfr.sh --diagnostics recording.jfr output.pb Shows: 141ms conversion time, 2.0MB → 2.2KB (99.9% reduction) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…speedup Replaced Gradle-based execution with a fat jar approach for dramatic performance improvement in the JFR to OTLP conversion script. Performance improvement: - Previous: ~13+ seconds (Gradle overhead) - New: ~0.4 seconds (< 0.5s total) - Speedup: ~31x faster - Actual conversion time: ~120ms (unchanged) Implementation: - Added shadowJar task to build.gradle.kts with minimization - Modified convert-jfr.sh to use fat jar directly via java -jar - Added automatic rebuild detection based on source file mtimes - Jar only rebuilds when source files are newer than jar - Cross-platform mtime detection (GNU stat → BSD stat fallback) - Suppressed harmless SLF4J warnings (defaults to NOP logger) Features: - Automatic jar rebuild only when source files change - Fast startup (no Gradle overhead) - Clean output with SLF4J warnings filtered - All existing diagnostics and features preserved Fat jar details: - Size: 1.9MB (minimized with shadow plugin) - Location: build/libs/profiling-otel-*-cli.jar - Main-Class manifest entry for direct execution - Excludes unnecessary SLF4J service providers Documentation: - Updated CLI.md to highlight performance improvements - Noted fat jar usage instead of Gradle task Example: ./convert-jfr.sh --diagnostics recording.jfr output.pb Total time: 0.4s (vs 13+ seconds with Gradle) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Simplified the conversion script output to avoid duplicate information:
Default mode (no flags):
- Single concise line: "[SUCCESS] Converted: output.pb (45.2KB, 127ms)"
- No verbose converter output shown
- Perfect for scripting and quick conversions
Diagnostics mode (--diagnostics):
- Shows converter's detailed output (files, format, time)
- Enhanced diagnostics section with compression metrics
- Clear input→output flow visualization
- Space savings calculations
Changes:
- Removed duplicate "Converting..." and "Conversion complete" messages
- Eliminated redundant output file info in default mode
- Consolidated size/time reporting
- Renamed section to "Enhanced Diagnostics" to distinguish from converter output
Example outputs:
Default:
[SUCCESS] Converted: output.pb (45.2KB, 127ms)
With --diagnostics:
[DIAG] Input: recording.jfr (89.3KB)
Converting 1 JFR file(s) to OTLP format...
Adding: recording.jfr
Conversion complete!
Output: output.pb
Format: PROTO
Size: 45.2 KB
Time: 127 ms
[DIAG] === Enhanced Diagnostics ===
[DIAG] Input → Output: 89.3KB → 45.2KB
[DIAG] Compression: 50.6% of original
[DIAG] Space saved: 44.1KB (49.4% reduction)
Documentation updated in CLI.md with both output examples.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- IOLogger: datadog.trace.relocate.api → datadog.logging - getFile() → getPath() in OtlpProfileUploader and JfrToOtlpConverter - ScrubbedRecordingData: override doRelease() instead of final release() - libs.versions.toml: restore jafar-tools entry for profiling-scrubber Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
52c579e to
340ae63
Compare
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 59 metrics, 12 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~2ac3d50b80, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.058 s) : 0, 1057846
Total [baseline] (11.007 s) : 0, 11007201
Agent [candidate] (1.066 s) : 0, 1065583
Total [candidate] (11.189 s) : 0, 11188886
section appsec
Agent [baseline] (1.26 s) : 0, 1259761
Total [baseline] (11.001 s) : 0, 11001103
Agent [candidate] (1.265 s) : 0, 1265377
Total [candidate] (11.008 s) : 0, 11008154
section iast
Agent [baseline] (1.235 s) : 0, 1234875
Total [baseline] (11.328 s) : 0, 11328182
Agent [candidate] (1.234 s) : 0, 1234219
Total [candidate] (11.341 s) : 0, 11340962
section profiling
Agent [baseline] (1.189 s) : 0, 1189232
Total [baseline] (11.082 s) : 0, 11082069
Agent [candidate] (1.188 s) : 0, 1188392
Total [candidate] (11.112 s) : 0, 11112255
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~2ac3d50b80, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.245 ms) : 0, 1245
crashtracking [candidate] (1.237 ms) : 0, 1237
BytebuddyAgent [baseline] (632.627 ms) : 0, 632627
BytebuddyAgent [candidate] (637.115 ms) : 0, 637115
AgentMeter [baseline] (29.523 ms) : 0, 29523
AgentMeter [candidate] (29.812 ms) : 0, 29812
GlobalTracer [baseline] (249.021 ms) : 0, 249021
GlobalTracer [candidate] (250.429 ms) : 0, 250429
AppSec [baseline] (32.297 ms) : 0, 32297
AppSec [candidate] (32.625 ms) : 0, 32625
Debugger [baseline] (59.875 ms) : 0, 59875
Debugger [candidate] (60.195 ms) : 0, 60195
Remote Config [baseline] (625.989 µs) : 0, 626
Remote Config [candidate] (595.217 µs) : 0, 595
Telemetry [baseline] (8.012 ms) : 0, 8012
Telemetry [candidate] (8.823 ms) : 0, 8823
Flare Poller [baseline] (8.387 ms) : 0, 8387
Flare Poller [candidate] (8.424 ms) : 0, 8424
section appsec
crashtracking [baseline] (1.256 ms) : 0, 1256
crashtracking [candidate] (1.233 ms) : 0, 1233
BytebuddyAgent [baseline] (672.82 ms) : 0, 672820
BytebuddyAgent [candidate] (676.515 ms) : 0, 676515
AgentMeter [baseline] (12.138 ms) : 0, 12138
AgentMeter [candidate] (12.17 ms) : 0, 12170
GlobalTracer [baseline] (248.548 ms) : 0, 248548
GlobalTracer [candidate] (249.384 ms) : 0, 249384
AppSec [baseline] (186.48 ms) : 0, 186480
AppSec [candidate] (186.61 ms) : 0, 186610
Debugger [baseline] (65.891 ms) : 0, 65891
Debugger [candidate] (66.598 ms) : 0, 66598
Remote Config [baseline] (573.663 µs) : 0, 574
Remote Config [candidate] (579.897 µs) : 0, 580
Telemetry [baseline] (7.86 ms) : 0, 7860
Telemetry [candidate] (7.962 ms) : 0, 7962
Flare Poller [baseline] (3.493 ms) : 0, 3493
Flare Poller [candidate] (3.482 ms) : 0, 3482
IAST [baseline] (24.267 ms) : 0, 24267
IAST [candidate] (24.329 ms) : 0, 24329
section iast
crashtracking [baseline] (1.23 ms) : 0, 1230
crashtracking [candidate] (1.215 ms) : 0, 1215
BytebuddyAgent [baseline] (810.22 ms) : 0, 810220
BytebuddyAgent [candidate] (810.38 ms) : 0, 810380
AgentMeter [baseline] (11.435 ms) : 0, 11435
AgentMeter [candidate] (11.437 ms) : 0, 11437
GlobalTracer [baseline] (239.944 ms) : 0, 239944
GlobalTracer [candidate] (240.13 ms) : 0, 240130
AppSec [baseline] (27.933 ms) : 0, 27933
AppSec [candidate] (28.481 ms) : 0, 28481
Debugger [baseline] (65.932 ms) : 0, 65932
Debugger [candidate] (64.829 ms) : 0, 64829
Remote Config [baseline] (546.729 µs) : 0, 547
Remote Config [candidate] (532.54 µs) : 0, 533
Telemetry [baseline] (7.888 ms) : 0, 7888
Telemetry [candidate] (7.803 ms) : 0, 7803
Flare Poller [baseline] (3.449 ms) : 0, 3449
Flare Poller [candidate] (3.399 ms) : 0, 3399
IAST [baseline] (30.107 ms) : 0, 30107
IAST [candidate] (29.874 ms) : 0, 29874
section profiling
ProfilingAgent [baseline] (94.328 ms) : 0, 94328
ProfilingAgent [candidate] (94.543 ms) : 0, 94543
crashtracking [baseline] (1.195 ms) : 0, 1195
crashtracking [candidate] (1.186 ms) : 0, 1186
BytebuddyAgent [baseline] (694.146 ms) : 0, 694146
BytebuddyAgent [candidate] (693.801 ms) : 0, 693801
AgentMeter [baseline] (9.219 ms) : 0, 9219
AgentMeter [candidate] (9.246 ms) : 0, 9246
GlobalTracer [baseline] (207.463 ms) : 0, 207463
GlobalTracer [candidate] (207.346 ms) : 0, 207346
AppSec [baseline] (33.09 ms) : 0, 33090
AppSec [candidate] (32.946 ms) : 0, 32946
Debugger [baseline] (66.288 ms) : 0, 66288
Debugger [candidate] (65.923 ms) : 0, 65923
Remote Config [baseline] (591.996 µs) : 0, 592
Remote Config [candidate] (577.193 µs) : 0, 577
Telemetry [baseline] (7.83 ms) : 0, 7830
Telemetry [candidate] (7.804 ms) : 0, 7804
Flare Poller [baseline] (3.517 ms) : 0, 3517
Flare Poller [candidate] (3.595 ms) : 0, 3595
Profiling [baseline] (94.889 ms) : 0, 94889
Profiling [candidate] (95.091 ms) : 0, 95091
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~2ac3d50b80, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.055 s) : 0, 1054596
Total [baseline] (8.882 s) : 0, 8882385
Agent [candidate] (1.061 s) : 0, 1061090
Total [candidate] (8.87 s) : 0, 8870455
section iast
Agent [baseline] (1.243 s) : 0, 1242502
Total [baseline] (9.634 s) : 0, 9634118
Agent [candidate] (1.23 s) : 0, 1229987
Total [candidate] (9.576 s) : 0, 9576132
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~2ac3d50b80, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.227 ms) : 0, 1227
crashtracking [candidate] (1.236 ms) : 0, 1236
BytebuddyAgent [baseline] (631.427 ms) : 0, 631427
BytebuddyAgent [candidate] (635.978 ms) : 0, 635978
AgentMeter [baseline] (29.497 ms) : 0, 29497
AgentMeter [candidate] (29.91 ms) : 0, 29910
GlobalTracer [baseline] (247.81 ms) : 0, 247810
GlobalTracer [candidate] (250.618 ms) : 0, 250618
AppSec [baseline] (32.257 ms) : 0, 32257
AppSec [candidate] (32.498 ms) : 0, 32498
Debugger [baseline] (58.659 ms) : 0, 58659
Debugger [candidate] (59.361 ms) : 0, 59361
Remote Config [baseline] (593.41 µs) : 0, 593
Remote Config [candidate] (607.426 µs) : 0, 607
Telemetry [baseline] (7.979 ms) : 0, 7979
Telemetry [candidate] (8.791 ms) : 0, 8791
Flare Poller [baseline] (8.963 ms) : 0, 8963
Flare Poller [candidate] (5.903 ms) : 0, 5903
section iast
crashtracking [baseline] (1.247 ms) : 0, 1247
crashtracking [candidate] (1.223 ms) : 0, 1223
BytebuddyAgent [baseline] (817.434 ms) : 0, 817434
BytebuddyAgent [candidate] (808.554 ms) : 0, 808554
AgentMeter [baseline] (11.551 ms) : 0, 11551
AgentMeter [candidate] (11.412 ms) : 0, 11412
GlobalTracer [baseline] (240.613 ms) : 0, 240613
GlobalTracer [candidate] (239.001 ms) : 0, 239001
IAST [baseline] (29.365 ms) : 0, 29365
IAST [candidate] (29.184 ms) : 0, 29184
AppSec [baseline] (29.748 ms) : 0, 29748
AppSec [candidate] (26.756 ms) : 0, 26756
Debugger [baseline] (64.452 ms) : 0, 64452
Debugger [candidate] (64.676 ms) : 0, 64676
Remote Config [baseline] (528.078 µs) : 0, 528
Remote Config [candidate] (539.326 µs) : 0, 539
Telemetry [baseline] (7.781 ms) : 0, 7781
Telemetry [candidate] (7.743 ms) : 0, 7743
Flare Poller [baseline] (3.434 ms) : 0, 3434
Flare Poller [candidate] (3.353 ms) : 0, 3353
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 3 performance regressions! Performance is the same for 15 metrics, 18 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~2ac3d50b80, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section baseline
no_agent (1.25 ms) : 1237, 1263
. : milestone, 1250,
iast (3.319 ms) : 3272, 3366
. : milestone, 3319,
iast_FULL (5.999 ms) : 5938, 6059
. : milestone, 5999,
iast_GLOBAL (3.665 ms) : 3602, 3727
. : milestone, 3665,
profiling (2.1 ms) : 2081, 2120
. : milestone, 2100,
tracing (1.961 ms) : 1944, 1978
. : milestone, 1961,
section candidate
no_agent (1.261 ms) : 1249, 1272
. : milestone, 1261,
iast (3.411 ms) : 3365, 3457
. : milestone, 3411,
iast_FULL (5.965 ms) : 5906, 6024
. : milestone, 5965,
iast_GLOBAL (3.669 ms) : 3615, 3722
. : milestone, 3669,
profiling (357.437 µs) : 350, 365
. : milestone, 357,
tracing (1.974 ms) : 1957, 1991
. : milestone, 1974,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~2ac3d50b80, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section baseline
no_agent (18.229 ms) : 18044, 18414
. : milestone, 18229,
appsec (18.781 ms) : 18593, 18969
. : milestone, 18781,
code_origins (17.752 ms) : 17575, 17929
. : milestone, 17752,
iast (18.05 ms) : 17871, 18229
. : milestone, 18050,
profiling (19.227 ms) : 19037, 19418
. : milestone, 19227,
tracing (17.669 ms) : 17494, 17843
. : milestone, 17669,
section candidate
no_agent (18.958 ms) : 18769, 19147
. : milestone, 18958,
appsec (18.49 ms) : 18310, 18671
. : milestone, 18490,
code_origins (18.707 ms) : 18523, 18890
. : milestone, 18707,
iast (17.856 ms) : 17684, 18028
. : milestone, 17856,
profiling (255.456 µs) : 245, 266
. : milestone, 255,
tracing (17.603 ms) : 17434, 17773
. : milestone, 17603,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~2ac3d50b80, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section baseline
no_agent (1.491 ms) : 1479, 1503
. : milestone, 1491,
appsec (3.844 ms) : 3622, 4066
. : milestone, 3844,
iast (2.286 ms) : 2216, 2357
. : milestone, 2286,
iast_GLOBAL (2.321 ms) : 2251, 2392
. : milestone, 2321,
profiling (2.101 ms) : 2046, 2156
. : milestone, 2101,
tracing (2.097 ms) : 2043, 2151
. : milestone, 2097,
section candidate
no_agent (1.489 ms) : 1478, 1501
. : milestone, 1489,
appsec (3.851 ms) : 3625, 4077
. : milestone, 3851,
iast (2.282 ms) : 2212, 2352
. : milestone, 2282,
iast_GLOBAL (2.331 ms) : 2260, 2401
. : milestone, 2331,
profiling (2.109 ms) : 2053, 2164
. : milestone, 2109,
tracing (2.1 ms) : 2046, 2154
. : milestone, 2100,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~2ac3d50b80, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section baseline
no_agent (15.618 s) : 15618000, 15618000
. : milestone, 15618000,
appsec (14.91 s) : 14910000, 14910000
. : milestone, 14910000,
iast (18.599 s) : 18599000, 18599000
. : milestone, 18599000,
iast_GLOBAL (18.106 s) : 18106000, 18106000
. : milestone, 18106000,
profiling (14.75 s) : 14750000, 14750000
. : milestone, 14750000,
tracing (14.973 s) : 14973000, 14973000
. : milestone, 14973000,
section candidate
no_agent (15.067 s) : 15067000, 15067000
. : milestone, 15067000,
appsec (14.752 s) : 14752000, 14752000
. : milestone, 14752000,
iast (18.773 s) : 18773000, 18773000
. : milestone, 18773000,
iast_GLOBAL (18.118 s) : 18118000, 18118000
. : milestone, 18118000,
profiling (14.772 s) : 14772000, 14772000
. : milestone, 14772000,
tracing (15.189 s) : 15189000, 15189000
. : milestone, 15189000,
|
- Swap log messages in DatadogProfiler set/clear span context - Fix timestamp precision loss (millis→nanos) in JfrToOtlpConverter - Fix TOCTOU: move convert() inside try block before temp file deletion - Remove dead handleRecordingData/handleRecordingDataWithDump methods - Fix retain leak in OTLP listener lambda on upload exception - Fix retain/release TOCTOU race in RecordingData with synchronized - Eliminate URL derivation duplication in OtlpProfileUploader - Fix pipe exit code masking in convert-jfr.sh - Clarify DatadogProfilingScope.close() no-op comment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
convertStackTrace signature changed; replace reflection with file-based API Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Share ParsingContext across parse calls to avoid ServiceLoader per open - Cache FunctionKey and LocationKey hashCode (was boxing+Object[] per lookup) - Reuse ProtobufEncoder across convert() calls via reset() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Does This Do
Introduces
profiling-otel, a newagent-profilingsubmodule implementing a complete JFR-to-OTLP profiles converter:datadog.ExecutionSample(CPU),datadog.MethodSample(wall-clock),datadog.ObjectSample(allocation),jdk.JavaMonitorEnter/Wait(lock contention){stack_index, attribute_indices, link_index}identityconvert-jfr.sh) with automatic rebuild and diagnostics modeMotivation
R&D groundwork for OTLP profile export. This establishes the JFR-to-OTLP conversion pipeline needed to emit Datadog profiles in the OpenTelemetry profiles format. The converter is not yet wired into the agent upload path.
Additional Notes
This is experimental R&D work — implementation may still change.
Performance snapshot (JDK 21, macOS Darwin 25.4.0):
ByteArrayOutputStream.writeeliminated (13.7% → 0%) by switching to a directbyte[]+cursor approach.Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueJira ticket: N/A (R&D)
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.