Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update HTML tag parsing to work with Acunetix 360 #40

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
v4.16.0 (Month 2025)
- Update HTML tag parsing for Acunetix 360

v4.15.0 (December 2024)
- No changes

Expand Down
41 changes: 23 additions & 18 deletions lib/acunetix/concerns/cleanup.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,32 +8,37 @@ def cleanup_html(source)

format_table(result)

result.gsub!(/"/, '"')
result.gsub!(/&/, '&')
result.gsub!(/&lt;/, '<')
result.gsub!(/&gt;/, '>')

result.gsub!(/<h[0-9] >(.*?)<\/h[0-9]>/) { "\n\n*#{$1.strip}*\n\n" }
result.gsub!(/<b>(.*?)<\/b>/) { "*#{$1.strip}*" }
result.gsub!(/<br\/>/, "\n")
result.gsub!(/<br ?\/>/, "\n")
result.gsub!(/<div(.*?)>|<\/div>/, '')
result.gsub!(/<span.*?>(.*?)<\/span>/m){"#{$1.strip}"}
result.gsub!(/<span.*?>|<\/span>/, '') #repeating again to deal with nested/empty/incomplete span tags

result.gsub!(/<a(.*?)href='(.*?)'><i(.*?)><\/i>(.*?)<\/a>/m) { "\"#{$4}\":#{$2}" }
result.gsub!(/<a.*?>(.*?)<\/a>/m, '\1')
result.gsub!(/<font.*?>(.*?)<\/font>/m, '\1')
result.gsub!(/<h2>(.*?)<\/h2>/) { "*#{$1.strip}*" }
result.gsub!(/<i>(.*?)<\/i>/, '\1')
result.gsub!(/<p.*?>(.*?)<\/p>/) { "\np. #{$1.strip}\n" }
result.gsub!(/<code><pre.*?>(.*?)<\/pre><\/code>/m){|m| "\n\nbc.. #{$1.strip}\n\np. \n" }
result.gsub!(/<code>(.*?)<\/code>/) { "@#{$1.strip}@" }
result.gsub!(/<pre.*?>(.*?)<\/pre>/m){|m| "\n\nbc.. #{$1.strip}\n\np. \n" }

result.gsub!(/<li.*?>([\s\S]*?)<\/li>/m){"\n* #{$1.strip}"}
result.gsub!(/<ul>([\s\S]*?)<\/ul>/m){ "#{$1.strip}\n" }
result.gsub!(/<em>(.*?)<\/em>/) { "_#{$1.strip}_" }
result.gsub!(/<p.*?>(.*?)<\/p>/) { "p. #{$1.strip}\n\n" }
result.gsub!(/<code><pre.*?>(.*?)<\/pre><\/code>/m){|m| "\n\nbc.. #{$1}\n\np. \n" }
result.gsub!(/<code>(.*?)<\/code>/) { "\n\nbc. #{$1}\n\n" }
result.gsub!(/<pre.*?>(.*?)<\/pre>/) { "\n\nbc. #{$1}\n\n" }
result.gsub!(/<pre.*?>(.*?)<\/pre>/m){|m| "\n\nbc.. #{$1}\n\np. \n" }

result.gsub!(/<li.*?>([\s\S]*?)<\/li>/m){"\n* #{$1}"}
aapomm marked this conversation as resolved.
Show resolved Hide resolved
result.gsub!(/<ul>([\s\S]*?)<\/ul>/m){ "#{$1}\n" }
result.gsub!(/(<ul>)|(<\/ul>|(<ol>)|(<\/ol>))/, "\n")
result.gsub!(/<li>/, "\n* ")
result.gsub!(/<\/li>/, "\n")
result.gsub!(/<strong>(.*?)<\/strong>/m) { "*#{$1}*" }

result.gsub!(/&quot;/, '"')
result.gsub!(/&amp;/, '&')
result.gsub!(/&lt;/, '<')
result.gsub!(/&gt;/, '>')

result.gsub!(/<strong>(.*?)<\/strong>/) { "*#{$1.strip}*" }
result.gsub!(/<span.*?>(.*?)<\/span>/m){"#{$1.strip}\n"}
# Cleanup lingering <p></p>
Copy link
Member

@etdsoft etdsoft Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rachkor is this really an issue? We have this and L16 "cleanup" lines, is the code so bad that they include random <span> and <p> tags all over the place? It seems we're doing something wrong with our parsing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will only catch the nested <p> tags and it doesn't seem like there's any in the sample files. Removed ✅

result.gsub!(/<p.*?>(.*?)<\/p>/m) { $1 }

result
end
Expand Down Expand Up @@ -68,7 +73,7 @@ def format_table(str)

# Some of the values have embedded HTML conent that we need to strip
def tags_with_html_content
[:details, :description, :detailed_information, :impact, :recommendation]
[:details, :description, :detailed_information, :impact, :recommendation, :remedial_actions, :remedial_procedure, :external_references]
end

def tags_with_commas
Expand Down
5 changes: 4 additions & 1 deletion lib/dradis/plugins/acunetix/mapping.rb
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,10 @@ module Mapping
'CVSS3Vector' => '{{ acunetix[vulnerability_360.cvss31_vector] }}',
'CVSS3Base' => '{{ acunetix[vulnerability_360.cvss31_base] }}',
'CVSS3Temporal' => '{{ acunetix[vulnerability_360.cvss31_temporal] }}',
'CVSS3Environmental' => '{{ acunetix[vulnerability_360.cvss31_environmental] }}'
'CVSS3Environmental' => '{{ acunetix[vulnerability_360.cvss31_environmental] }}',
'Remedial Actions' => '{{ acunetix[vulnerability_360.remedial_actions] }}',
'Remedial Procedure' => '{{ acunetix[vulnerability_360.remedial_procedure] }}',
'References' => '{{ acunetix[vulnerability_360.external_references] }}',
}
}.freeze

Expand Down
12 changes: 11 additions & 1 deletion spec/acunetix/acunetix360/importer_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ module Dradis::Plugins
end

before(:each) do
stub_content_service
stub_content_service(Dradis::Plugins::Acunetix)

@importer = described_class.new(content_service: @content_service)
end
Expand Down Expand Up @@ -57,5 +57,15 @@ def run_import!

run_import!
end

it 'parses links in <external-references> tag' do
expect(@content_service).to receive(:create_issue) do |args|
expect(args[:text]).to include('"Blind SQL Injection":https://www.owasp.org/index.php/Blind_SQL_Injection')
expect(args[:text]).to include('"SQL Injection Cheat Sheet[#Blind]":https://www.acunetix.com/blog/web-security/sql-injection-cheat-sheet/#BlindSQLInjections')
OpenStruct.new(args)
end

run_import!
end
end
end