Bug report
Bug description:
import re
from time import perf_counter as time
p1 = re.compile(r"[\s\S]*")
p2 = re.compile(".*", re.DOTALL)
s = "a"*10000
for p in (p1,p2):
t0 = time()
for i in range(10000): _=p.match(s)
print(time()-t0)
Runtimes are 0.44 s vs 0.0016 s on my system. Instead of simplification, the [\s\S] is stepped through one after another. \s does not match so then \S is checked (the order [\S\s] is twice as fast for the string here). This is not solely an issue for larger matches. A 40 char string is processed half as fast when using [\s\S]. Even 10 chars take about 25% longer to process. I'm not completely sure whether this qualifies as a bug or an issue with documentation. Other languages don't have the DOTALL option and always rely on the first option. Plenty of posts on SO and elsewhere will thus advocate using [\s\S] as an all-matching regex pattern. Unsuspecting Python programmers such as @barneygale may expect [\s\S] to be identical to using a dot with DOTALL as seen below.
@serhiy-storchaka
|
elif part == '**\n': |
|
# '**/' component: we use '[\s\S]' rather than '.' so that path |
|
# separators (i.e. newlines) are matched. The trailing '^' ensures |
|
# we terminate after a path separator (i.e. on a new line). |
|
part = r'[\s\S]*^' |
|
elif part == '**': |
|
# '**' component. |
|
part = r'[\s\S]*' |
CPython versions tested on:
3.11, 3.13
Operating systems tested on:
Linux, Windows
Linked PRs
Bug report
Bug description:
Runtimes are 0.44 s vs 0.0016 s on my system. Instead of simplification, the [\s\S] is stepped through one after another. \s does not match so then \S is checked (the order [\S\s] is twice as fast for the string here). This is not solely an issue for larger matches. A 40 char string is processed half as fast when using [\s\S]. Even 10 chars take about 25% longer to process. I'm not completely sure whether this qualifies as a bug or an issue with documentation. Other languages don't have the DOTALL option and always rely on the first option. Plenty of posts on SO and elsewhere will thus advocate using [\s\S] as an all-matching regex pattern. Unsuspecting Python programmers such as @barneygale may expect [\s\S] to be identical to using a dot with DOTALL as seen below.
@serhiy-storchaka
cpython/Lib/pathlib.py
Lines 126 to 133 in 9bb202a
CPython versions tested on:
3.11, 3.13
Operating systems tested on:
Linux, Windows
Linked PRs