Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt misc.py to 3W Dataset 2.0 #128

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

castrokelly
Copy link

This pull request adapts the misc.py sub-module to ensure compatibility with the 3W Dataset 2.0. The main changes include removing redundant functions, adapting existing functions to receive the DataFrame as a parameter.

Changes made:

1. Removed redundant functions:

  • label_and_file_generator: This function iterated through the dataset files to extract information about the instances. With the new load_3w_dataset() function, this is no longer necessary.

  • get_all_labels_and_files: This function used label_and_file_generator to get information about the instances. Since label_and_file_generator was removed, this function is also unnecessary.

2. Adapted create_table_of_instances:

  • Modified to receive the DataFrame loaded by load_3w_dataset() as a parameter, instead of lists of instances.
  • The internal logic was adjusted to work with the DataFrame and generate the table of instances.

Example usage:

import toolkit as tk

# Load the dataset
df = tk.load_3w_dataset()

# Create the table of instances
toi = tk.create_table_of_instances(df)

# Display the table
print(toi.to_markdown())
  1. Adapted calc_stats_instances:
  • Modified to receive the DataFrame as a parameter.
  • The logic for calculating statistics was adjusted to use the DataFrame data.

Example usage:

import toolkit as tk

# Load the dataset
df = tk.load_3w_dataset()

# Calculate the instances' statistics
stats = tk.calc_stats_instances(df)

# Display the statistics
print(stats.to_markdown())

4. Adapted create_and_plot_scatter_map:

  • Modified to receive the DataFrame as a parameter.
  • The logic for generating the scatter plot was adjusted to use the DataFrame data.

Example usage:

import toolkit as tk

# Load the real data from the dataset
df_real = tk.load_3w_dataset(data_type='real')

# Create and plot the scatter map
tk.create_and_plot_scatter_map(df_real)

5. Adapted load_instances:

  • Modified to receive the DataFrame as a parameter.
  • The logic for loading instances was adjusted to use the DataFrame data.

Example usage:

import toolkit as tk

# Load the dataset
df = tk.load_3w_dataset()

# Load the instances
instances_df = tk.load_instances(df)

Functions that still need to be tested:

filter_rare_undesirable_events()
plot_instance()
resample()

It is important to test these functions with the 3W Dataset 2.0 to ensure that they are working correctly with the new data structure and event types.

With these modifications, the 3W Toolkit will be fully compatible with version 2.0 of the 3W Dataset, allowing users to take advantage of the new features and data.


By creating this pull request, I confirm that I have read and fully accept and agree with one of the Petrobras' Contributor License Agreements (CLAs):

Our CLAs are based on the Apache Software Foundation's CLAs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant