-
Notifications
You must be signed in to change notification settings - Fork 130
models Models documentation
-
Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
-
AutoML-Image-Instance-Segmentation
Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
-
Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
-
AutoML-Named-Entity-Recognition
Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
-
Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
-
bytetrack_yolox_x_crowdhuman_mot17-private-half
bytetrack_yolox_x_crowdhuman_mot17-private-half
model is from OpenMMLab's MMTracking library. This model is <a href="https://github.com/open-mmlab/mmtracking/blob/master/configs/mot/bytetrack/metafile.yml#L24" t... -
CompVis/stable-diffusion-v1-4 is a latent text-to-image diffusion model known for generating highly realistic images from textual input. This model incorporates a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Im...
-
The Model Card for DeciCoder 1B provides details about a 1 billion parameter decoder-only code completion model developed by Deci. The model was trained on Python, Java, and JavaScript subsets of Starcoder Training Dataset and uses Grouped Query Attention with a context window of 2048 tokens. It ...
-
DeciLM-7B is a decoder-only text generation model with 7.04 billion parameters, released by Deci under the Apache 2.0 license. It is the top-performing 7B base language model on the Open LLM Leaderboard and uses variable Grouped-Query Attention (GQA) to achieve a superior balance between accuracy...
-
DeciLM-7B-instruct is a model for short-form instruction following, built by LoRA fine-tuning on the SlimOrca dataset. It is a derivative of the recently released DeciLM-7B language model, a pre-trained, high-efficiency generative text model with 7 billion parameters. DeciLM-7B-instruct is one of...
-
deformable_detr_twostage_refine_r50_16x2_50e_coco
deformable_detr_twostage_refine_r50_16x2_50e_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6... -
The DistilBERT model is a smaller, faster version of the BERT model for Transformer-based language modeling with 40% fewer parameters and 60% faster run time while retaining 95% of BERT's performance on the GLUE language understanding benchmark. This English language question answering model has ...
-
The BART model is a transformer encoder-encoder model trained on English language data, and fine-tuned on CNN Daily Mail. It is used for text summarization and has been trained to reconstruct text that has been corrupted using an arbitrary noising function. The model is effective for text generat...
-
facebook-deit-base-patch16-224
This model is a more efficiently trained Vision Transformer (ViT). The Vision Transformer (ViT) is a transformer encoder model that is pre-trained and fine-tuned on a large collection of images in a supervised fashion. It is presented with images as sequences of fixed-size patches, which are line...
-
The Segment Anything Model (SAM) is an innovative image segmentation tool capable of creating high-quality object masks from simple input prompts. Trained on a massive dataset comprising 11 million images and 1.1 billion masks, SAM demonstrates strong zero-shot capabilities, effectively adapting ...
-
The Segment Anything Model (SAM) is an innovative image segmentation tool capable of creating high-quality object masks from simple input prompts. Trained on a massive dataset comprising 11 million images and 1.1 billion masks, SAM demonstrates strong zero-shot capabilities, effectively adapting ...
-
The Segment Anything Model (SAM) is an innovative image segmentation tool capable of creating high-quality object masks from simple input prompts. Trained on a massive dataset comprising 11 million images and 1.1 billion masks, SAM demonstrates strong zero-shot capabilities, effectively adapting ...
-
The Vision Transformer (ViT) is a BERT-like transformer encoder model which is pretrained on a large collection of images in a supervised fashion, such as ImageNet-21k. The ImageNet dataset comprises 1 million images and 1000 classes at a resolution of 224x224, which the model was fine-tuned on. ...
-
Summary: camembert-ner is a NER model fine-tuned from camemBERT on the Wikiner-fr dataset and was validated on email/chat data. It shows better performance on entities that do not start with an uppercase. The model has four classes: O, MISC, PER, ORG and LOC. The model can be loaded using Hugging...
-
mask_rcnn_swin-t-p4-w7_fpn_1x_coco
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual ...
-
microsoft-beit-base-patch16-224-pt22k-ft22k
The BEiT is a vision transformer that is similar to the BERT model, but is also capable of image analysis. The model is pre-trained on a large collection of images, and uses patches to analyze images. It uses relative position embeddings and mean-pooling to classify images, and can be used to ext...
-
microsoft-swinv2-base-patch4-window12-192-22k
The Swin Transformer is a type of Vision Transformer used in both image classification and dense recognition tasks. It builds hierarchical feature maps by merging image patches in deeper layers and has linear computation complexity to input image size due to computation of self-attention only wit...
-
mmd-3x-deformable-detr_refine_twostage_r50_16xb2-50e_coco
deformable-detr_refine_twostage_r50_16xb2-50e_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/deformable_d... -
mmd-3x-mask-rcnn_swin-t-p4-w7_fpn_1x_coco
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual ...
-
mmd-3x-sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco
sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/spar... -
mmd-3x-sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco
sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/spars... -
mmd-3x-vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco
vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/vfnet/metafile.yml#L46" t... -
mmd-3x-vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco
vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/vfnet/metafile.yml... -
mmd-3x-yolof_r50_c5_8x8_1x_coco
yolof_r50_c5_8x8_1x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/yolof/metafile.yml#L21" target="_blan... -
Multimodal Early Fusion Transformer, MMEFT, is a transformer-based model tailored for processing both structured and unstructured data.
It can be used for multi-class and multi-label multimodal classification tasks, and is capable of handling datasets with features from diverse modes, including ...
-
ocsort_yolox_x_crowdhuman_mot17-private-half
ocsort_yolox_x_crowdhuman_mot17-private-half
model is from OpenMMLab's MMTracking library. This model is <a href="https://github.com/open-mmlab/mmtracking/blob/master/configs/mot/ocsort/metafile.yml#L24" target=... -
OpenAI-CLIP-Image-Text-Embeddings-ViT-Base-Patch32
The
CLIP
model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-B/32 Transformer architecture as an image... -
The
CLIP
model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-B/32 Transformer architecture as an image... -
The
CLIP
model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-L/14 Transformer architecture as an image... -
Whisper is an OpenAI pre-trained speech recognition model with potential applications for ASR solutions for developers. However, due to weak supervision and large-scale noisy data, it should be used with caution in high-risk domains. The model has been trained on 680k hours of audio data represen...
-
Whisper is a model that can recognize and translate speech using deep learning. It was trained on a large amount of data from different sources and languages. Whisper models can handle various tasks and domains without needing to adjust the model.
Whisper large-v3 is similar to the previous larg...
-
RoBERTa Base OpenAI Detector is a language model developed by OpenAI that is fine-tuned using outputs from the 1.5B GPT-2 model. It is designed to detect text generated by GPT-2 and is not meant to be used for malicious purposes or to evade detection. The main focus of the model is to aid in synt...
-
runwayml-stable-diffusion-inpainting
runwayml/stable-diffusion-inpainting is a versatile text-to-image model capable of producing realistic images from text input and performing inpainting using masks. It was initialized with Stable-Diffusion-v-1-2 weights and underwent two training phases: 595k steps of regular training and 4...
-
runwayml-stable-diffusion-v1-5
runwayml/stable-diffusion-v1-5 is a powerful text-to-image latent diffusion model capable of generating photo-realistic images given any text input. The model uses a fixed pretrained text encoder (CLIP ViT-L/14) as suggested in the ...
-
Salesforce-BLIP-2-opt-2-7b-image-to-text
BLIP-2
is a model consisting of three components: a CLIP-like image encoder, a Querying Transformer (Q-Former), and a large language model. The image encoder and language model are initialized from pre-trained checkpoints and kept frozen while training the Querying Transformer. The model's goal... -
Salesforce-BLIP-2-opt-2-7b-vqa
BLIP-2
is a model consisting of three components: a CLIP-like image encoder, a Querying Transformer (Q-Former), and a large language model. The image encoder and language model are initialized from pre-trained checkpoints and kept frozen while training the Querying Transformer. The model's goal... -
Salesforce-BLIP-image-captioning-base
The
BLIP
framework is a new Vision-Language Pre-training (VLP) framework that can be used for both vision-language understanding and generation tasks. BLIP effectively utilizes noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the ... -
BLIP
is a new Vision-Language Pre-training (VLP) framework that excels in both understanding-based and generation-based tasks. It effectively utilizes noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. BLIP achieves ... -
sparse_rcnn_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco
sparse_rcnn_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d078... -
sparse_rcnn_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco
sparse_rcnn_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787... -
The RoBERTa Large model is a large transformer-based language model that was developed by the Hugging Face team. It is pre-trained on masked language modeling and can be used for tasks such as sequence classification, token classification, or question answering. Its primary usage is as a fine-tun...
-
stabilityai-stable-diffusion-2-1
stabilityai/stable-diffusion-2-1 model is a fine-tuned version of the Stable Diffusion v2 model, with additional training steps on the same dataset. It's designed for generating and modifying images based on text prompts, utilizing a Latent Diffusion Model with a fixed, pretrained text encode...
-
stabilityai-stable-diffusion-2-inpainting
stabilityai/stable-diffusion-2-inpainting model is a continuation of the stable-diffusion-2-base model, with an additional 200,000 steps of training. It utilizes a mask-generation strategy introduced in LAMA and combines this with latent Variational Autoencoder (VAE) representations of the ...
-
stabilityai-stable-diffusion-xl-refiner-1-0
stabilityai/stable-diffusion-xl-refiner-1.0 employs an ensemble of expert modules in a pipeline for latent diffusion. The process involves using a base model to generate noisy latents, which are then refined using a specialized denoising model. The base model can function independently. Alt...
-
vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco
vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6061f92f... -
vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco
vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6... -
yolof_r50_c5_8x8_1x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6061f92fa09e48fb1/configs/...