Skip to content

Add prefixes to regex terminals #161

@jahav

Description

@jahav

I am working on a replacement of a handcrafted parser for library ClosedXml (a library to manipulate xlsx files) with the XLParser. Because xlsx can have hundreds of thousands of formulas, I would like to improve performance of XLParser.

I would like to add prefixes to the RegexBasedTerminals.

Irony uses a first character prefixes to build a table of a char->possible terminals (terminals without prefix are always considered). This table is then used to speedup a calculation of current terminal.

I would also like to change grammar to be case sensitive (small, but measurable improvement) and terminals already use both cases, where necessary (e.g. a-zA-Z).

I have tried to change regex options of the terminals (through reflection) - RegexOptions.ExplicitCapture (as recommended in best practices), RegexOptions.Compiled, RegexOptions.CultureInvariant . but there wasn't significant improvements.

I have run a benchmark on EnronFormulasParseTest (test was modified to be single threaded). Parser version with prefixes runs 44% faster.

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22000.856/21H2)
AMD Ryzen 5 5500U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.302
  [Host]     : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2
  Job-JFUIAS : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2

IterationCount=3  LaunchCount=1  WarmupCount=1

With prefixes

Method Mean Error StdDev
EnronDataSet 26.496 s 3.6721 s 0.2013 s
EusesFormulasParseTest 2.852 s 0.0582 s 0.0032 s

Without prefixes

Method Mean Error StdDev
EnronDataSet 47.295 s 2.5500 s 0.1398 s
EusesFormulasParseTest 4.738 s 0.3636 s 0.0199 s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions