Adding threshold to Transformers pipeline #14

joshpopelka20 · 2023-11-21T22:01:46Z

I'm using this code to run inference:


// Use a pipeline as a high-level helper
from transformers import pipeline

// Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification


tokenizer = AutoTokenizer.from_pretrained("obi/deid_bert_i2b2")

pipe = pipeline("token-classification", tokenizer=tokenizer, model="obi/deid_bert_i2b2",
                aggregation_strategy="first")

I'm trying to increase the threshold, but can't find a config for it. Is it possible with my setup?

The text was updated successfully, but these errors were encountered:

prajwal967 · 2023-12-20T09:57:37Z

Hi, sorry for the late response.

Unfortunately, the threshold can't be added via the HuggingFace pipelines.
What you could do is see if you can get the raw logits from the pipeline - if you can, then you can process the raw logit values using the code given here: Threshold max or Threshold sum

Let us know if you have any other questions!

joshpopelka20 · 2023-12-20T16:24:37Z

Not sure if I'm doing this right, but this is the code I have so far:

inputs = tokenizer(text, return_tensors="pt")

   outputs = model(**inputs)
   predictions = outputs.logits
   print(PostProcessPicker.get_threshold_max(predictions, 1.8982457699258832e-06))

I'm getting this error message when I run the code:

/usr/local/lib/python3.10/dist-packages/robust_deid/sequence_tagging/post_process/model_outputs/post_process_picker.py in get_threshold_max(self, threshold)
56 (ThresholdProcessMax): Return Threshold Max post processor
57 """
---> 58 return ThresholdProcessMax(self._label_list, threshold=threshold)
59
60 def get_threshold_sum(self, threshold) -> ThresholdProcessSum:

AttributeError: 'Tensor' object has no attribute '_label_list'

prajwal967 · 2023-12-31T11:59:32Z

Hi,

Could you add the following lines of code:

# Import the respective classes from the respective locations

# Initialize labels
ner_labels = NERLabels(notation='BIO', ner_types=["PATIENT", "STAFF", "AGE", "DATE", "PHONE", "ID", "EMAIL", "PATORG", "LOC", "HOSP", "OTHERPHI"]) 
label_list = ner_labels.get_label_list()

# Get the post processing object
picker = PostProcessPicker(label_list=label_list)
# This creates an object of the threshold max class which you can use to process the predictions with the threshold
threshold_max = picker.get_threshold_max(predictions, 1.8982457699258832e-06))

# Get the model predictions 
outputs = model(**inputs)
predictions = outputs.logits

# There are two ways to process the predictions -
# Case 1: Get the predictions - no additional filtering
# The label list converts ids of labels back to string form
final_preds = [[label_list[self.process_prediction(p)] for p in prediction] for prediction in predictions]

        
# Case 2: Get the predictions - where we also pass a labels list(that can be used to ignore predictions at certain positions etc.)
Use the pre-defined function
final_preds, final_labels = threshold_max.decode(predictions, labels)

Let us know if this piece of code did not work!

joshpopelka20 · 2024-01-02T20:34:53Z

I'm not understanding this piece of code:

   # Case 2: Get the predictions - where we also pass a labels list(that can be used to ignore predictions at certain positions etc.)
   # Use the pre-defined function
   final_preds, final_labels = threshold_max.decode(predictions, labels)

what is the "labels" list supposed to be?

prajwal967 · 2024-01-04T08:21:10Z

We have an option to ignore the predictions for certain tokens, which can be specified via the labels argument. If we pass [O, NA, O, O, NA] as labels (assuming we have 5 tokens as input) the function ignores the predictions at positions 1 & 4 and return [P0, P2, P3] (predictions at the three positions).

You don't need to use it, this is optional

joshpopelka20 · 2024-01-04T21:24:52Z

The decode method seems to require the labels list. I've tried to create labels list with the same shape as the predictions tensor, but I'm getting a different error.

Code:

tensor_shape = torch.Size([1, 105, 45])
labels = [["O"] * tensor_shape[2] for _ in range(tensor_shape[1])]
final_preds, final_labels = threshold_max.decode(predictions, labels)

Error message:

/usr/local/lib/python3.10/dist-packages/numpy/ma/core.py in new(cls, data, mask, dtype, copy, subok, ndmin, fill_value, keep_mask, hard_mask, shrink, order)
2904 msg = "Mask and data not compatible: data size is %i, " +
2905 "mask size is %i."
-> 2906 raise MaskError(msg % (nd, nm))
2907 copy = True
2908 # Set the mask to the new value

MaskError: Mask and data not compatible: data size is 45, mask size is 23.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding threshold to Transformers pipeline #14

Adding threshold to Transformers pipeline #14

joshpopelka20 commented Nov 21, 2023

prajwal967 commented Dec 20, 2023

joshpopelka20 commented Dec 20, 2023 •

edited

Loading

prajwal967 commented Dec 31, 2023

joshpopelka20 commented Jan 2, 2024

prajwal967 commented Jan 4, 2024

joshpopelka20 commented Jan 4, 2024

Adding threshold to Transformers pipeline #14

Adding threshold to Transformers pipeline #14

Comments

joshpopelka20 commented Nov 21, 2023

prajwal967 commented Dec 20, 2023

joshpopelka20 commented Dec 20, 2023 • edited Loading

prajwal967 commented Dec 31, 2023

joshpopelka20 commented Jan 2, 2024

prajwal967 commented Jan 4, 2024

joshpopelka20 commented Jan 4, 2024

joshpopelka20 commented Dec 20, 2023 •

edited

Loading