Skip to content

perf: replace Uint8Array lookup tables with regex in buildUrl#345

Merged
yusukebe merged 1 commit intov2from
perf-v2-simplify-url
Apr 17, 2026
Merged

perf: replace Uint8Array lookup tables with regex in buildUrl#345
yusukebe merged 1 commit intov2from
perf-v2-simplify-url

Conversation

@usualoma
Copy link
Copy Markdown
Member

I made things quite complicated in #310, but in this case, regular expressions seem to be (slightly) faster.

Since this also reduces the amount of code, I’d like to include this refactoring at the end of v2.

benchmark

  • Current implementation
  • Several regular expressions (this PR)
  • A version using Uint8Array in part

I compared them, and this PR’s approach was slightly faster. (And the code was simpler, too.)

benchmark                                    avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------------------------ -------------------------------
• host validation (fast path)
------------------------------------------------------------ -------------------------------
Old: Uint8Array + charCodeAt loop             150.01 ns/iter 152.59 ns       █
                                     (132.95 ns … 200.40 ns) 181.89 ns       █▄
                                     (  0.10  b … 197.16  b)   0.46  b ▁▂▆▃▅███▆▅▄▅▃▂▁▂▁▂▂▂▁

New: regex (reValidHost)                      236.52 ns/iter 244.36 ns  █
                                     (222.97 ns … 429.22 ns) 293.46 ns ▆█▆
                                     (  0.10  b … 197.61  b)   0.56  b ███▇▅▅▇▇▇▃▂▂▁▂▂▁▁▁▁▁▁

Hybrid: Uint8Array host + regex URL           268.48 ns/iter 273.12 ns   █
                                     (233.17 ns … 537.30 ns) 409.20 ns  ▂██▄
                                     (  0.10  b … 377.06  b)   1.43  b ▃████▆▅▃▄▂▂▁▂▂▁▂▁▁▁▁▁

• URL validation (fast path)
------------------------------------------------------------ -------------------------------
Old: Uint8Array + charCodeAt loop             416.14 ns/iter 424.17 ns  █
                                     (397.24 ns … 518.66 ns) 471.43 ns ███▄▅  ▃
                                     (248.43  b … 621.62  b) 257.87  b █████████▆▅▂▄▃▃▂▂▃▂▂▂

New: regex (reValidRequestUrl + reDotSegment) 392.50 ns/iter 421.70 ns   ▇█
                                     (362.21 ns … 485.88 ns) 439.09 ns   ██▄           ▅▄
                                     ( 70.85  b … 505.59  b) 255.75  b ▃████▄▅▄▃▄▃▄▂▃▁▆██▅▃▁

Hybrid: Uint8Array host + regex URL           379.04 ns/iter 386.05 ns   █
                                     (351.70 ns … 525.60 ns) 455.40 ns  ▆█▇
                                     (248.32  b … 545.37  b) 257.46  b ▄████▇▅▄▂▂▂▁▅▆▅▁▂▂▂▁▁

• buildUrl end-to-end (host × URL)
------------------------------------------------------------ -------------------------------
Old: Uint8Array + charCodeAt loops              4.19 µs/iter   4.29 µs      ▂    ▂█▂██
                                         (3.80 µs … 4.94 µs)   4.52 µs    ▅ █▅  ▅█████   ▅
                                     (  0.09  b …   0.11  b)   0.10  b ▇▇▇█▁██▇▁██████▇▇▁█▁▇

New: regex                                      3.52 µs/iter   3.62 µs  ██        ▄
                                         (3.30 µs … 3.90 µs)   3.88 µs  ████      █ ▅     █
                                     (  0.10  b …   0.11  b)   0.10  b ██████▅▅▅▅▅█▁█▁▁▅▁▅█▅

Hybrid: Uint8Array host + regex URL             3.73 µs/iter   3.88 µs                  ▄█
                                         (3.47 µs … 3.95 µs)   3.93 µs   ▅▅▅  ▅         ██ ▅
                                     (  0.10  b …   0.11  b)   0.10  b █▁████▅█▁▅▁██▅▅▅▅████
url-buildurl.mjs
// Benchmark: buildUrl — Uint8Array lookup tables vs regex
//
// Compares the two approaches used in commit 1441383:
//   - Old: Uint8Array + charCodeAt loops for host & URL validation
//   - New: Precompiled regex (reValidHost, reValidRequestUrl, reDotSegment)
//
// Usage: node benchmarks/url-buildurl.mjs

import { bench, group, run } from 'mitata'

// ============================================================
// Test data
// ============================================================
const hosts = [
  'localhost',
  'localhost:3000',
  'example.com',
  'example.com:8080',
  'my-app.example.com:4567',
  'sub.domain.example.co.jp:12345',
  'a',
  'my_host.local:1234',
]

const incomingUrls = [
  '/',
  '/path/to/resource',
  '/path?key=value&foo=bar',
  '/api/v2/users/123/posts/456/comments?page=1&limit=20&sort=created_at#section',
  '/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z',
  '/~user/path_name/file-name.html',
  '/assets/js/app.min.js?v=1234567890',
  '/search?q=hello+world&lang=en&page=1',
]

// ============================================================
// Old: Uint8Array + charCodeAt loops
// ============================================================
const allowedRequestUrlChar = new Uint8Array(128)
for (let c = 0x30; c <= 0x39; c++) allowedRequestUrlChar[c] = 1
for (let c = 0x41; c <= 0x5a; c++) allowedRequestUrlChar[c] = 1
for (let c = 0x61; c <= 0x7a; c++) allowedRequestUrlChar[c] = 1
{
  const chars = "-./:?#[]@!$&'()*+,;=~_"
  for (let i = 0; i < chars.length; i++) allowedRequestUrlChar[chars.charCodeAt(i)] = 1
}

const safeHostChar = new Uint8Array(128)
for (let c = 0x30; c <= 0x39; c++) safeHostChar[c] = 1
for (let c = 0x61; c <= 0x7a; c++) safeHostChar[c] = 1
{
  const chars = '.-_:'
  for (let i = 0; i < chars.length; i++) safeHostChar[chars.charCodeAt(i)] = 1
}

const isPathDelimiter = (c) => c === 0x2f || c === 0x3f || c === 0x23

function hasDotSegment(url, dotIndex) {
  const prev = dotIndex === 0 ? 0x2f : url.charCodeAt(dotIndex - 1)
  if (prev !== 0x2f) return false
  const nextIndex = dotIndex + 1
  if (nextIndex === url.length) return true
  const next = url.charCodeAt(nextIndex)
  if (isPathDelimiter(next)) return true
  if (next !== 0x2e) return false
  const nextNextIndex = dotIndex + 2
  if (nextNextIndex === url.length) return true
  return isPathDelimiter(url.charCodeAt(nextNextIndex))
}

function buildUrlOld(scheme, host, incomingUrl) {
  const url = `${scheme}://${host}${incomingUrl}`

  let needsHostValidationByURL = false
  for (let i = 0, len = host.length; i < len; i++) {
    const c = host.charCodeAt(i)
    if (c > 0x7f || safeHostChar[c] === 0) {
      needsHostValidationByURL = true
      break
    }
    if (c === 0x3a) {
      i++
      const firstDigit = host.charCodeAt(i)
      if (
        firstDigit < 0x31 ||
        firstDigit > 0x39 ||
        i + 4 > len ||
        i + (firstDigit < 0x36 ? 5 : 4) < len
      ) {
        needsHostValidationByURL = true
        break
      }
      for (; i < len; i++) {
        const c = host.charCodeAt(i)
        if (c < 0x30 || c > 0x39) {
          needsHostValidationByURL = true
          break
        }
      }
    }
  }

  if (needsHostValidationByURL) {
    return new URL(url).href
  } else if (incomingUrl.length === 0) {
    return url + '/'
  } else {
    if (incomingUrl.charCodeAt(0) !== 0x2f) {
      return 'invalid'
    }
    for (let i = 1, len = incomingUrl.length; i < len; i++) {
      const c = incomingUrl.charCodeAt(i)
      if (
        c > 0x7f ||
        allowedRequestUrlChar[c] === 0 ||
        (c === 0x2e && hasDotSegment(incomingUrl, i))
      ) {
        return new URL(url).href
      }
    }
    return url
  }
}

// ============================================================
// New: Precompiled regex
// ============================================================
const reValidRequestUrl = /^\/[!#$&-;=?-\[\]_a-z~]*$/
const reDotSegment = /\/\.\.?(?:[/?#]|$)/
const reValidHost = /^[a-z0-9._-]+(?::(?:[1-5]\d{3,4}|[6-9]\d{3}))?$/

function buildUrlNew(scheme, host, incomingUrl) {
  const url = `${scheme}://${host}${incomingUrl}`

  if (!reValidHost.test(host)) {
    return new URL(url).href
  } else if (incomingUrl.length === 0) {
    return url + '/'
  } else {
    if (incomingUrl.charCodeAt(0) !== 0x2f) {
      return 'invalid'
    }
    if (!reValidRequestUrl.test(incomingUrl) || reDotSegment.test(incomingUrl)) {
      return new URL(url).href
    }
    return url
  }
}

// ============================================================
// Hybrid: Uint8Array host + regex URL
// ============================================================
function buildUrlHybrid(scheme, host, incomingUrl) {
  const url = `${scheme}://${host}${incomingUrl}`

  let needsHostValidationByURL = false
  for (let i = 0, len = host.length; i < len; i++) {
    const c = host.charCodeAt(i)
    if (c > 0x7f || safeHostChar[c] === 0) {
      needsHostValidationByURL = true
      break
    }
    if (c === 0x3a) {
      i++
      const firstDigit = host.charCodeAt(i)
      if (
        firstDigit < 0x31 ||
        firstDigit > 0x39 ||
        i + 4 > len ||
        i + (firstDigit < 0x36 ? 5 : 4) < len
      ) {
        needsHostValidationByURL = true
        break
      }
      for (; i < len; i++) {
        const c = host.charCodeAt(i)
        if (c < 0x30 || c > 0x39) {
          needsHostValidationByURL = true
          break
        }
      }
    }
  }

  if (needsHostValidationByURL) {
    return new URL(url).href
  } else if (incomingUrl.length === 0) {
    return url + '/'
  } else {
    if (incomingUrl.charCodeAt(0) !== 0x2f) {
      return 'invalid'
    }
    if (!reValidRequestUrl.test(incomingUrl) || reDotSegment.test(incomingUrl)) {
      return new URL(url).href
    }
    return url
  }
}

// ============================================================
// Correctness check
// ============================================================
const scheme = 'https'
for (const host of hosts) {
  for (const incoming of incomingUrls) {
    const old = buildUrlOld(scheme, host, incoming)
    const nw = buildUrlNew(scheme, host, incoming)
    const hyb = buildUrlHybrid(scheme, host, incoming)
    if (old !== nw || old !== hyb) {
      console.error(`MISMATCH host="${host}" url="${incoming}": old=${old} new=${nw} hybrid=${hyb}`)
      process.exit(1)
    }
  }
}
console.log('Correctness check passed.\n')

// ============================================================
// Benchmark
// ============================================================
group('host validation (fast path)', () => {
  bench('Old: Uint8Array + charCodeAt loop', () => {
    for (const host of hosts) buildUrlOld(scheme, host, '/')
  })
  bench('New: regex (reValidHost)', () => {
    for (const host of hosts) buildUrlNew(scheme, host, '/')
  })
  bench('Hybrid: Uint8Array host + regex URL', () => {
    for (const host of hosts) buildUrlHybrid(scheme, host, '/')
  })
})

group('URL validation (fast path)', () => {
  bench('Old: Uint8Array + charCodeAt loop', () => {
    for (const url of incomingUrls) buildUrlOld(scheme, 'localhost:3000', url)
  })
  bench('New: regex (reValidRequestUrl + reDotSegment)', () => {
    for (const url of incomingUrls) buildUrlNew(scheme, 'localhost:3000', url)
  })
  bench('Hybrid: Uint8Array host + regex URL', () => {
    for (const url of incomingUrls) buildUrlHybrid(scheme, 'localhost:3000', url)
  })
})

group('buildUrl end-to-end (host × URL)', () => {
  bench('Old: Uint8Array + charCodeAt loops', () => {
    for (const host of hosts) {
      for (const url of incomingUrls) buildUrlOld(scheme, host, url)
    }
  })
  bench('New: regex', () => {
    for (const host of hosts) {
      for (const url of incomingUrls) buildUrlNew(scheme, host, url)
    }
  })
  bench('Hybrid: Uint8Array host + regex URL', () => {
    for (const host of hosts) {
      for (const url of incomingUrls) buildUrlHybrid(scheme, host, url)
    }
  })
})

await run()

@usualoma
Copy link
Copy Markdown
Member Author

Hi @yusukebe
Would you mind reviewing this?

@yusukebe
Copy link
Copy Markdown
Member

Hi @usualoma

Thank you! RegExp is fast and simple! By the way, which runtime did you benchmark it on? Node.js or Bun?

@usualoma
Copy link
Copy Markdown
Member Author

Hi @yusukebe,

Thanks for checking!

It's Node.js. I've confirmed that the same pattern occurs in both versions 20.20.2 and 24.14.1.

% node benchmarks/url-buildurl.mjs
Correctness check passed.

clk: ~4.57 GHz
cpu: Apple M5 Pro
runtime: node 24.14.1 (arm64-darwin)

benchmark                                    avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------------------------ -------------------------------
• host validation (fast path)
------------------------------------------------------------ -------------------------------
Old: Uint8Array + charCodeAt loop             142.14 ns/iter 146.64 ns ▆█▃ ▄
                                     (133.16 ns … 309.14 ns) 165.32 ns █████▄▂▆▇▇    ▅
                                     (  0.10  b … 153.16  b)   0.48  b ██████████▅▃▂▂█▆▁▂▁▁▁

New: regex (reValidHost)                      227.14 ns/iter 228.37 ns  █▅
                                     (222.90 ns … 347.19 ns) 244.97 ns  ██
                                     (  0.10  b … 569.87  b)   1.40  b ▇███▆▅▅▅▃▂▃▂▂▁▁▂▁▂▁▁▁

Hybrid: Uint8Array host + regex URL           247.04 ns/iter 252.74 ns  ██▇▃▅▃█  ▆
                                     (234.83 ns … 295.25 ns) 275.76 ns ▄████████▅█▅▂
                                     (  0.10  b … 277.19  b)   1.17  b █████████████▅▂▁▁▁▁▁▂

• URL validation (fast path)
------------------------------------------------------------ -------------------------------
Old: Uint8Array + charCodeAt loop             424.83 ns/iter 430.69 ns     █
                                     (404.58 ns … 518.88 ns) 510.31 ns ▄█▂▃█▄
                                     ( 35.86  b … 658.25  b) 256.12  b ██████▄▃▃▃▄▁▁▁▁▁▁▁▁▁▁

New: regex (reValidRequestUrl + reDotSegment) 393.21 ns/iter 426.39 ns  █▃
                                     (360.43 ns … 456.29 ns) 447.17 ns  ██▆           ▆█
                                     (248.64  b … 649.75  b) 257.03  b ▂███▆▃▃▃▃▁▂▂▂▃▅██▄▂▂▂

Hybrid: Uint8Array host + regex URL           367.65 ns/iter 370.08 ns   █▃
                                     (348.62 ns … 477.58 ns) 442.67 ns   ███
                                     (248.32  b … 493.91  b) 256.89  b ▂█████▅▅▃▂▂▂▁▁▂▂▁▁▁▁▁

• buildUrl end-to-end (host × URL)
------------------------------------------------------------ -------------------------------
Old: Uint8Array + charCodeAt loops              3.97 µs/iter   4.03 µs    ▃▃   █▃▃
                                         (3.77 µs … 4.33 µs)   4.32 µs ▂▂▇██▇▂ ███
                                     (  0.09  b …   0.11  b)   0.10  b ███████▆███▁▆▁▆▁▁▆▆▁▆

New: regex                                      3.46 µs/iter   3.54 µs  █▂▂
                                         (3.33 µs … 3.72 µs)   3.71 µs ▅███         ▅
                                     (  0.10  b …   0.12  b)   0.10  b ████▄▄▁▇▄▇▄▇▁█▄▄▁▇▁▇▄

Hybrid: Uint8Array host + regex URL             3.57 µs/iter   3.63 µs    ▃▃     ▃█▃
                                         (3.44 µs … 3.76 µs)   3.76 µs ▂▇▇██   ▇▂███▂   ▂
                                     (  0.10  b …   0.11  b)   0.10  b █████▆▆▆██████▆▁▁█▆▁▆
clk: ~4.40 GHz
cpu: Apple M5 Pro
runtime: node 20.20.2 (arm64-darwin)

benchmark                                    avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------------------------ -------------------------------
• host validation (fast path)
------------------------------------------------------------ -------------------------------
Old: Uint8Array + charCodeAt loop             223.41 ns/iter 233.39 ns       ▂██▅
                                     (167.38 ns … 332.09 ns) 300.75 ns       █████
                                     (355.71  b … 677.92  b) 512.26  b ▂▃▃▄▄███████▇▄▃▃▂▂▁▂▂

New: regex (reValidHost)                      274.31 ns/iter 280.30 ns     █▃
                                     (255.63 ns … 330.53 ns) 311.69 ns    ▃██▇▂
                                     (225.33  b … 694.80  b) 511.96  b ▃▇▇█████▅▇▆▆▅▃▄▃▂▃▂▂▁

Hybrid: Uint8Array host + regex URL           301.28 ns/iter 310.83 ns      █
                                     (267.12 ns … 381.83 ns) 343.37 ns     ▂█    ▄▄
                                     (512.06  b … 734.86  b) 512.61  b ▁▂▁▃██▅▃▃███▆▆▅▂▄▃▃▁▂

• URL validation (fast path)
------------------------------------------------------------ -------------------------------
Old: Uint8Array + charCodeAt loop             498.30 ns/iter 550.97 ns  ▆  █▂
                                     (440.70 ns … 639.72 ns) 594.55 ns ▂█ ▂██        ▃▆
                                     (256.13  b … 541.84  b) 257.02  b ██████▅▇▇▄▃▃▂▃██▄▃█▄▃

New: regex (reValidRequestUrl + reDotSegment) 402.28 ns/iter 409.45 ns   ▆█▄
                                     (372.67 ns … 496.89 ns) 483.02 ns  ▅███▅▃
                                     (256.13  b … 451.84  b) 256.99  b ▆██████▇▇▃▂▅▇▂▅▂▃▂▁▂▂

Hybrid: Uint8Array host + regex URL           444.89 ns/iter 470.80 ns      █▂
                                     (382.78 ns … 597.23 ns) 542.29 ns     ▂██▄▅▅▂ ▇▅
                                     ( 27.14  b … 475.84  b) 256.02  b ▇█▅▃███████▇██▂▃▂▂▂▁▂

• buildUrl end-to-end (host × URL)
------------------------------------------------------------ -------------------------------
Old: Uint8Array + charCodeAt loops              4.75 µs/iter   4.83 µs           █
                                         (4.39 µs … 5.00 µs)   4.97 µs           █ ██
                                     (  1.57 kb …   1.57 kb)   1.57 kb ▅▁▁▁▅▅▁▁█▅██████▅▁██▅

New: regex                                      4.11 µs/iter   4.17 µs         █ ▄
                                         (3.81 µs … 4.40 µs)   4.37 µs         █ █▅ █
                                     (  1.57 kb …   1.57 kb)   1.57 kb ▅▁▁█▁████▅██▅█▅█▅▅▅▅▅

Hybrid: Uint8Array host + regex URL             4.07 µs/iter   4.12 µs         █
                                         (3.84 µs … 4.32 µs)   4.32 µs       ▃▃█ █ ▃
                                     (  1.57 kb …   1.57 kb)   1.57 kb ▄▁█▄█▄███▄█▄█▄▄█▄▄▁▄▄

@yusukebe
Copy link
Copy Markdown
Member

@usualoma Thanks! This is for Node.js, so it's reasonable to benchmark only Node.js.

Copy link
Copy Markdown
Member

@yusukebe yusukebe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yusukebe
Copy link
Copy Markdown
Member

@usualoma Thanks! Is it all done for v2?

@yusukebe yusukebe merged commit ef43cdd into v2 Apr 17, 2026
5 checks passed
@yusukebe yusukebe deleted the perf-v2-simplify-url branch April 17, 2026 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants