Our style is derived from the
Uber Go Style Guide.
This document covers CockroachDB-specific conventions only; readers are expected
to know standard Go and the Uber guide. Use crlfmt -w -tab 2 <file>.go (not
gofmt) to format code: 100-column code lines, 80-column comments, tab width 2.
We use crlfmt's rules for wrapping long function signatures.
While it is fairly perscriptive, it does not add or remove repeated identical
argument types, e.g. start int, end int vs start, end int; the latter form
with the omitted specifier should only be used on a single line no argument
should appear by itself on a line with no type (confusing and brittle to edits).
- Enums: start at
iota + 1unless the zero value is meaningful. - Error flow: reduce nesting — handle errors early and return, keeping the
happy path at the lowest indentation level. Reduce variable scope by using
if err := f(); err != nilwhere possible. - Mutexes: embed
sync.Mutexdirectly in private types; use a namedmufield for exported types. - Channels: size should be one or unbuffered (zero). Any other size requires high scrutiny and justification.
- Slice/map ownership: document whether a function captures or copies its slice/map arguments.
- Empty slices: prefer
vardeclaration (not[]int{}).nilis a valid empty slice. Check emptiness withlen(s) == 0, nots == nil. - Defer: always use
deferfor releasing locks, closing files, and similar cleanup. - Struct initialization: always specify field names. Use
&T{}instead ofnew(T). - String performance: prefer
strconvoverfmtfor primitive conversions. Avoid repeated[]byte("...")conversions in loops. - Type assertions: always use the "comma ok" idiom.
- Functional options: use the
Optioninterface pattern at API boundaries; use option structs for internal functions.
- Data structure comments belong at the declaration. Explain purpose, lifecycle, field interactions, and what code initializes/accesses each field.
- Function declaration comments focus on inputs, outputs, and contract -- not implementation.
- Algorithmic comments belong inside function bodies. Separate processing phases and explain why, not what.
- In struct, API, and protobuf definitions, there should be more comments than code.
- Do not narrate code ("this increments x", "now we iterate over strings").
- Block comments (standalone line) use full sentences with capitalization and punctuation. Inline comments (end of code line) are lowercase without terminal punctuation.
- Overview and design comments must be understandable with zero knowledge of the code. Minimize new terminology; define new terms immediately.
- Factually incorrect comments are bugs -- fix immediately or file an issue.
- In reviews, prefix grammar suggestions with "nit:".
For bool literals passed to functions, and in some cases other simple literals
like nil, 0, or 1, it can be helpful to add a C-style comment with the
parameter name, e.g. printInfo(ctx, txn, 1 /* depth */, false /* verbose */).
When doing so the comment always uses the parameter name verbatim -- do not
negate it and do not substitute a different word. Another good option in such
cases is to define a named const, e.g. const verbose, notVerbose = true, false
to pass, or even have the function take a typed arg e.g. type Verbosity bool.
From roachpb/metadata.proto -- the enum-level comment provides a birds-eye
view of joint configs; each value explains its role in replication changes.
// ReplicaType identifies which raft activities a replica participates in. In
// normal operation, VOTER_FULL and LEARNER are the only used states. However,
// atomic replication changes require a transition through a "joint config"; in
// this joint config, the VOTER_DEMOTING and VOTER_INCOMING types are used as
// well to denote voters which are being downgraded to learners and newly added
// by the change, respectively.
//
// All voter types indicate a replica that participates in all raft activities,
// including voting for leadership and committing entries. In a joint config, two
// separate majorities are required: one from the set of replicas that have
// either type VOTER or VOTER_OUTGOING or VOTER_DEMOTING, as well as that of the
// set of types VOTER and VOTER_INCOMING.
enum ReplicaType {
VOTER_FULL = 0;
VOTER_INCOMING = 2;
VOTER_OUTGOING = 3;
}From replica_command.go -- comments explain lease semantics and range-merge
edge cases, not just control flow.
func (r *Replica) executeAdminCommandWithDescriptor(
ctx context.Context, updateDesc func(*roachpb.RangeDescriptor) error,
) *roachpb.Error {
// Retry forever as long as we see errors we know will resolve.
retryOpts := base.DefaultRetryOptions()
// Randomize quite a lot just in case someone else also interferes with us
// in a retry loop. Note that this is speculative; there wasn't an incident
// that suggested this.
retryOpts.RandomizationFactor = 0.5
var lastErr error
for retryable := retry.StartWithCtx(ctx, retryOpts); retryable.Next(); {
// The replica may have been destroyed since the start of the retry loop.
// We need to explicitly check this condition. Having a valid lease, as we
// verify below, does not imply that the range still exists: even after a
// range has been merged into its left-hand neighbor, its final lease
// (i.e., the lease we have in r.mu.state.Lease) can remain valid
// indefinitely.
if _, err := r.IsDestroyed(); err != nil {
return roachpb.NewError(err)
}
// Admin commands always require the range lease to begin (see
// executeAdminBatch), but we may have lost it while in this retry loop.
// Without the lease, a replica's local descriptor can be arbitrarily
// stale, which will result in a ConditionFailedError. To avoid this, we
// make sure that we still have the lease before each attempt.
if _, pErr := r.redirectOnOrAcquireLease(ctx); pErr != nil {
return pErr
}Order by receiver, then rough call order; constructors after type definition.
- Names must be unique across the entire project, because otherwise
goimportsmay auto-import the wrong package. - Use the parent name plus a suffix to achieve uniqueness:
server/serverpb,kv/kvserver,util/contextutil,util/testutil. - Protobuf definitions go in a
xxxxpbsub-directory of the parent package (e.g.pkg/server/serverpb,pkg/sql/sessiondatapb).
Interface extraction: Extract an interface into a leaf sub-package that both sides can import.
// Package bapi defines the interface.
package bapi
type B interface { Bar() }
// Package a imports only the interface.
package a
import "b/bapi"
func Foo(b bapi.B) { b.Bar() }
// Package b imports a and implements the interface.
package b
import "a"
type bImpl struct{}
func (x bImpl) Bar() { a.Foo(x) }Ensure package a has a mock implementation for its own unit tests.
Dependency injection: Use when the interface approach is impractical.
- In package
a:var BarFn = func() { /* placeholder */ } - In package
b:func init() { a.BarFn = Bar }
The package exposing the injection slot must pass its unit tests independently, with valid default behavior when the slot is not populated.
X_test package trick: Test files in package X can declare
package X_test, which is allowed to import packages that transitively depend
on X. For example, testserver depends on pkg/server which depends on
pkg/sql, so tests needing a test server use package sql_test or
package server_test.
We use the datadriven package as an alternative to table-driven tests. One of the most used implementations is "logic tests" which execute SQL and compare results to golden files.
A well-written test should observe and assert only behavior directly relevant
to the functionality under test. Overly broad assertions (e.g. selecting all
system tables instead of only tables the test created) make tests brittle and
train engineers to blindly update them, defeating their purpose. A logic test
should choose its SELECT and WHERE clause so that the output is fully
determined by the test's own constructed conditions -- not by unrelated factors
like total system table count, OS version, table IDs, or range IDs.
This is particularly important to consider intentionally when working with tests
that have a --rewrite option as this can easily capture more output in what is
being tested than intended.
We use cockroachdb/errors, a superset
of Go's standard errors package that integrates with cockroachdb/redact.
errors.New("connection failed") // static string
errors.Newf("invalid value: %d", val) // formatted
errors.AssertionFailedf("expected non-nil pointer") // assertion failure
errors.Wrap(err, "opening file") // add context
errors.Wrapf(err, "connecting to %s", addr) // formatted contextKeep context succinct; avoid "failed to" which piles up:
// Bad: "failed to x: failed to y: failed to create store: the error"
return errors.Newf("failed to create new store: %s", err)
// Good: "x: y: new store: the error"
return errors.Wrap(err, "new store")errors.WithHint(err, "check your network connection")
errors.WithDetail(err, "request payload was 2.5MB")
errors.WithSafeDetails(err, "node_id=%d", nodeID) // safe for Sentry reportsif errors.Is(err, ErrNotFound) { /* ... */ } // sentinel errors
var nfErr *NotFoundError
if errors.As(err, &nfErr) { /* ... */ } // typed errors| Scenario | Approach |
|---|---|
| No additional context needed | Return original error |
| Adding context | errors.Wrap or errors.Wrapf |
| Passing through goroutine channel | errors.WithStack on both ends |
| Callers don't need to detect this error | errors.Newf |
| Hide original cause | errors.Handled or errors.Opaque |
Prefer %v over %s when formatting errors that might be nil. A nil error
formatted with %s renders as %!s(<nil>), while %v renders as <nil>.
How errors should affect process and session lifecycle:
-
User input / query errors (e.g.
SELECT invalid,SELECT 1/0): Return a regular error with an appropriate SQLSTATE code. Do not kill the session or the process. -
Unexpected condition scoped to one query (unreachable code, precondition failure): Return
errors.AssertionFailedf(...). Crash reports are sent automatically. Do not kill the session or the process. -
Candidate future feature (unsupported parameter combination, unexpected but valid input hitting a
default/elseclause): Usepgerror.UnimplementedWithIssue(...)orpgerror.Unimplemented(...). See "Unimplemented features" below. -
Invalid session-scoped state (unreachable code operating on session state): Propagate an assertion error to the client. Kill or make the session read-only. Do not kill the process.
-
Invalid state on a read-only or non-persisting path (unexpected disk data, bad subsystem return): Propagate an assertion error, also call
log.Errorf. Ensure no data is persisted. Do not kill the process. -
Invalid state on a data-persisting path (post-condition failure during write, corruption in KV storage): Call
log.Fatalf. This kills the process and sends a crash report automatically.
SQLSTATE is the stable, documented error API -- not message text. Use the same code PostgreSQL would use in an equivalent situation. Only invent new codes when PostgreSQL has no equivalent; respect the two-character category prefix. Verify codes with SQL logic tests.
CockroachDB-specific codes:
- XX000 -- internal error; auto-derived for assertion failures, triggers a crash report when the error flows back to the client.
- XXUUU -- auto-chosen when no SQLSTATE is set. Reduce these over time.
- XXA00 -- txn committed but a schema change op failed; manual intervention likely needed.
- 40001 -- serialization error. Txn did not commit; can be retried.
- 40003 -- statement completion unknown. Txn may or may not have committed; manual intervention likely needed.
See pkg/sql/pgwire/pgcode/codes.go for the full list.
- Messages, hints, and details can be changed freely without release notes.
- SQLSTATE values are the stable contract. New codes or splitting one code into multiple alternatives requires a release note.
- Message: lowercase start, no trailing period, no newlines. Keep it short and descriptive.
- Hint / Detail: full sentences (capital start, period at end). May be multi-line. Hints tell the user what to do; details elaborate on what happened.
Do not include arbitrarily large strings in error payloads -- they cause
excessive memory use and truncated crash reports. When in doubt, truncate
to a prefix and append a unicode ellipsis (…).
Mark unsupported-but-plausible functionality with
pgerror.UnimplementedWithIssue(issueNum, ...) or
pgerror.Unimplemented(...). Label the tracking issue with docs-todo,
known-limitation, and X-anchored-telemetry. Unimplemented errors get
their own non-crash telemetry automatically.
Merging incomplete implementations is acceptable as long as every
unimplemented code path uses unimplemented.New(...) (or the variants
above). Track each gap with a GitHub issue and a // TODO(#NNNNN)
comment at the call site.
Data in logs and errors is unsafe (PII-laden) by default and will be
redacted before being sent to Cockroach Labs. Mark information as safe to
preserve observability. Always use cockroachdb/errors (not fmt.Errorf) so
that error messages are automatically redactable.
- Format string literals in
errors.New/Newf/Wrap/log.Infof/etc. - Booleans, integers, floats,
time.Time,time.Duration. - The redactable contents of existing
errorvalues when wrapped. - Everything else is unsafe by default.
| Scenario | Approach |
|---|---|
Type contains only IDs, counts, or other non-PII (e.g. NodeID) |
Implement SafeValue and update the allowlist in pkg/testutils/lint/passes/redactcheck/redactcheck.go. |
| Type has a mix of safe and unsafe fields | Implement SafeFormatter (preferred). |
Prefer SafeFormatter over SafeValue -- it is more precise, does not
require linter allowlist changes, and is resistant to someone later adding
unsafe fields to the type.
// Before: entire output is considered unsafe.
func (m MetricSnap) String() string {
suffix := ""
if m.ConnsRefused > 0 {
suffix = fmt.Sprintf(", refused %d conns", m.ConnsRefused)
}
return fmt.Sprintf("infos %d/%d sent/received, bytes %dB/%dB sent/received%s",
m.InfosSent, m.InfosReceived,
m.BytesSent, m.BytesReceived,
suffix)
}
// After: output is safe (all fields are integers, format strings are literals).
func (m MetricSnap) SafeFormat(w redact.SafePrinter, _ rune) {
w.Printf("infos %d/%d sent/received, bytes %dB/%dB sent/received",
m.InfosSent, m.InfosReceived,
m.BytesSent, m.BytesReceived)
if m.ConnsRefused > 0 {
w.Printf(", refused %d conns", m.ConnsRefused)
}
}
func (m MetricSnap) String() string { return redact.StringWithoutMarkers(m) }redact.Sprint()/redact.Sprintf()-- create aRedactableStringfrom mixed safe/unsafe values.redact.StringBuilder-- build aRedactableStringprogrammatically.- Store
redact.RedactableStringin struct fields that will appear in logs, instead of plainstring.
- Don't use
redact.Safe()/errors.Safe()/log.Safe()-- these are fragile promises that break when someone changes the argument's type. - Don't cast a
stringtoRedactableStringmanually -- useredact.Sprintfinstead. - Don't use
SafeValueon complex types -- someone may later add an unsafe field without noticing.
Use echotest to lock down SafeFormat output:
func TestMyType_SafeFormat(t *testing.T) {
v := MyType{ID: 1, Name: "sensitive"}
redacted := string(redact.Sprint(v))
echotest.Require(t, redacted, datapathutils.TestDataPath(t, t.Name()))
}Run with -rewrite to generate the initial testdata file, then verify markers
appear around unsafe fields.
CockroachDB uses a gogoproto fork.
Proto definitions live in .proto files; generated .pb.go files are
regenerated by ./dev gen protobuf or ./dev build short.
- Don't use
required-- it breaks backward/forward compatibility between nodes running different binaries. - Use
gogoproto.nullableso the compiled field type is the value itself (not a pointer). Absence is represented as the zero value. Exceptions: when you need to distinguish zero from absent, or when the message type is very large and pointer copying is cheaper. - Don't use
optional-- usegogoproto.nullableinstead if you need presence semantics. - Avoid
types.Anyandjsontypes -- they are very hard to decode when inspecting debug zips.
Gotcha: Empty proto fields can have non-zero marshalling overhead on hot paths. Use the omitEmpty field to ensure truly empty encoding.
When defining result column labels for statements or virtual tables:
- Use consistent casing; use underscores to separate words (consistent with
information_schema), e.g.start_keynotstartkey. - The same concept across different statements should use the same label, e.g.
variableandvaluematch betweenSHOW ALL CLUSTER SETTINGSandSHOW SESSION ALL. - Labels must be usable without quoting. Avoid SQL keywords:
table_namenot"table",index_namenot"index". Never name a column"user"-- useuser_name. - Avoid abbreviations unless universally used in spoken English (
idis fine,locis not). - For "primary key" columns, disambiguate overloaded words:
zone_id,job_id,table_idinstead of justid. - For objects with multiple handle types (ID, name), disambiguate in the label:
table_namenottable, sotable_idcan be added later. - Reuse labels from
information_schemaor existingSHOWstatements when they match the principles above.