Protect Against Cyber Threats

This repository contains the solution for the hackathon, using this approach I was able to achieve 2nd rank in this hackathon.
Here is the link to my interview with Analytics India Magazine.

Problem Statement

Can you construct a next-gen model, capable of detecting code that is present in a body of text? Be part of a mission to enhance the security and resilience of web applications.

Protecting our software landscapes is not an easy task. Malicious actors are frequently trying to enter systems and get access to resources, whether operational or data. The ability for an actor to compromise systems, elevate their privileges, and move laterally within infrastructure typically hinges on executing hidden code. One common method they employ is embedding this code in seemingly harmless media—whether it's images, videos, or even simple text files.

Task

Given a body of text, find the source code hidden in the text. There might not be any source control or multiple sections of source code concealed within the text.

Dataset

ID: A unique identifier, either an integer or string, used to distinguish each row in the dataset.
Text: A string containing the textual content associated with each record.
ContainsCode: A boolean indicating whether the "Text" field contains any code snippets.
CodeList: A list or array storing code snippets extracted from the "Text" field, if applicable.

Solution

I have finetuned RoBERTa base distilled and DeBERTa v3 large, each for 30 epochs.
Both of these models were trained on SQUAD V2.
For Inference, I use the output of the model which has a higher logit score.
Optimizer Adamw with Learning rate = 2e-5 and Cosine scheduler with warmup.
Dropout of 0.2

Metrics

The evaluation metric used in this competition is accuracy.
First, MultiLabel Binarizer is applied on the predicted span of code.
Then, accuracy is computed. I got 0.90227 accuracy on the private leaderboard.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
notebooks		notebooks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protect Against Cyber Threats

Problem Statement

Task

Dataset

Solution

Metrics

About

Releases

Packages

Languages

mohan-gupta/shell-protect-against-cyber-threats

Folders and files

Latest commit

History

Repository files navigation

Protect Against Cyber Threats

Problem Statement

Task

Dataset

Solution

Metrics

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages