You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PoC made in Python, but can be applied the same way to PHP and Java too.
Rule description
Using regex methods for basic string manipulations is not time efficient.
Prefer the usage of string methods such as startswith, endswith, or in operator, which are faster.
Noncompliant Code Example
string='abcdef'ifre.search(r'^abc', string):
print('string starts with abc')
Compliant Solution
string='abcdef'ifstring.startswith('abc'):
print('string starts with abc')
Rule short description
Avoid using REGEX for basic string manipulation.
Rule justification
We measured the execution time using the time module in Python. The resource used was a 1.1 million word list found online. To obtain representative results, tests were performed several times.
prefix='te'withopen('1.1million word list.txt', 'r', encoding='utf-8') asfile:
count=0forwordinfile:
ifword.startswith(prefix):
count+=1
We search the 1.1 million word list to find strings starting with 'te', using regex.search or string.startswith. The test was done 5 time for each method, giving the results below:
using regex search
N° of iteration
Time (ms)
1
560.47
2
670.64
3
675.09
4
831.12
5
517.98
Average
651.06
using string startswith
N° of iteration
Time (ms)
1
318.57
2
259.16
3
287.76
4
253.95
5
245.64
Average
273.02
Conclusion: for this test session, using string manipulation was on average 2.4x faster than using regex. More tests should be done to study the energy consuption, but it should be proportionate to the execution time.
Severity / Remediation Cost
Estimate the severity and remediation cost of your issue.
Severity: Minor - the impact of the bad practice is not that important unless the volume of searched strings is very important (very uncommon, but that may happen).
Remediation cost: Easy - alternative energy efficient functions exists in Python, Java, PHP and most other languages. The code requires a little refactor but not very complicated.
Implementation principle
Search for regex containing no special character
Search for regex containing ^ and/or $ anchors, and re.search, re.match and re.compile methods
The text was updated successfully, but these errors were encountered:
Rule title
Avoid basic REGEX usages.
Language and platform
PoC made in Python, but can be applied the same way to PHP and Java too.
Rule description
Using regex methods for basic string manipulations is not time efficient.
Prefer the usage of string methods such as
startswith
,endswith
, orin
operator, which are faster.Noncompliant Code Example
Compliant Solution
Rule short description
Avoid using REGEX for basic string manipulation.
Rule justification
We measured the execution time using the
time
module in Python. The resource used was a 1.1 million word list found online. To obtain representative results, tests were performed several times.Noncompliant Code
Compliant Code
We search the 1.1 million word list to find strings starting with 'te', using regex.search or string.startswith. The test was done 5 time for each method, giving the results below:
using regex search
using string startswith
Conclusion: for this test session, using string manipulation was on average 2.4x faster than using regex. More tests should be done to study the energy consuption, but it should be proportionate to the execution time.
Severity / Remediation Cost
Estimate the severity and remediation cost of your issue.
Implementation principle
^
and/or$
anchors, andre.search
,re.match
andre.compile
methodsThe text was updated successfully, but these errors were encountered: