Hugging Face

Hugging Face is an open-source platform for data science and machine learning. It is a central hub for AI experts and enthusiasts, similar to GitHub for AI projects. Clément Delangue, Julien Chaumond, and Thomas Wolf are the co-founders of Hugging Face. ^[1]

Overview

Hugging Face is a comprehensive machine learning (ML) and data science platform. It enables users to build, train, and deploy AI models, offering infrastructure for everything from initial code to live application deployment. Users can browse and utilize models and datasets others share and test demo projects. Known for its Transformers Python library, Hugging Face simplifies downloading and training ML models, facilitating efficient workflow integration. The platform's open-source nature and deployment tools promote resource sharing and reduce model training time, resource consumption, and the environmental impact of AI development. ^[2]

History

Hugging Face Inc., founded in New York City in 2016 by French entrepreneurs Clément Delangue, Julien Chaumond, and Thomas Wolf, is behind the Hugging Face platform. Initially, they developed a chatbot app for teenagers but later shifted focus to becoming a machine learning platform after open-sourcing the chatbot model. ^[3]

Hugging Face Hub

The Hugging Face Hub is an online platform featuring over 350,000 models, 75,000 datasets, and 150,000 demo apps (Spaces), all open source and publicly accessible. It serves as a central place for users to explore, experiment, collaborate, and build technology with machine learning. Models, Spaces, and Datasets on the Hugging Face Hub are hosted as Git repositories, making version control and collaboration key features. A repository, or repo, is a storage space for code and assets, allowing users to back up their work, share it with the community, and collaborate with teams. ^[4]^[5]

Models

Hugging Face allows users to create and host their own AI models on its platform, where they can manage versions, add information, and upload necessary files. Users can choose to make models public or private and collaborate through discussions and pull requests directly on the model page. Models can be run directly from Hugging Face, integrating outputs into any application. The platform offers over 200,000 models for various tasks, including natural language processing, audio tasks, computer vision, and multimodal models. The Transformers library enables easy connection to these models for task execution, model training with personal data, or quick creation of demo Spaces. ^[2]^[6]

The Model Hub on Hugging Face is a repository where community members can host their model checkpoints for storage, discovery, and sharing. Users can download pre-trained models using the huggingface_hub client library, the Hugging Face Transformers library for fine-tuning, or any of over 15 integrated libraries. The platform also offers the Serverless Inference API and Inference Endpoints for using models in production settings. ^[6]

Transformers Library

Hugging Face Transformers is a library of pre-trained state-of-the-art models for natural language processing (NLP), computer vision, and audio and speech processing tasks. It includes both Transformer and non-Transformer models, such as modern convolutional networks for computer vision tasks. Deep learning technology underpins these models, often in popular consumer products like smartphones and apps. Since its introduction in 2017, the original Transformer model has inspired many new models beyond NLP, including those for protein structure prediction, cheetah training, and time series forecasting. All these models are based on the original Transformer architecture, with some using only the encoder or decoder and others using both, providing a taxonomy for categorizing and examining the high-level differences within the Transformer family. ^[7]^[8]

Datasets

A dataset is a collection of data used to train AI models through machine learning. These datasets contain examples paired with labels, which guide the model in interpreting each example. During training, the model learns to understand patterns and relationships between the examples and labels. Once trained, the model can generate outputs based on new prompts. Creating high-quality datasets is challenging and requires accurate representation of real-world data to prevent model errors. Hugging Face hosts over 30,000 datasets for various tasks, including natural language processing, computer vision, and audio. Users can also contribute their datasets and access new ones as they become available. ^[2]^[9]

Spaces

Hugging Face allows users to host models and browse datasets for training, but Spaces creates showcases and self-contained demos for a wider audience. The platform provides basic computing resources (16 GB RAM, 2 CPU cores, and 50 GB disk space) for running these demos, with options to upgrade for better performance. Spaces are useful for promoting projects and attracting contributors. Many Spaces are user-friendly and require no technical skills, making them accessible for anyone to use. ^[2]^[10]

Organizations

The Hugging Face Hub provides Organizations, allowing users to group accounts and manage datasets, models, and Spaces. Administrators can set user roles to regulate access to repositories and manage their organization’s payment method and billing information. ^[11]

Enterprise Hub

The Enterprise Hub enhances organizations with advanced functionalities, fostering secure, compliant, and managed collaboration for teams and companies on Hugging Face. It includes audit logs, a private dataset viewer, and resource groups. ^[12]

Audit Logs enable organization admins to easily track members' actions, including changes in organization membership, repository settings, and billing.
The Dataset Viewer is activated for private datasets owned by an Enterprise Hub organization. This tool aids teams in understanding their data, facilitating improved data processing and filtering for AI. Users can explore dataset contents, analyze data distributions, filter by values, and search for keywords. Additionally, datasets can be converted to Parquet for programmatic data visualization. ^[13]
Resource Groups streamline repository organization and access management for organization administrators. This functionality allows disparate teams to collaborate on their respective repositories within a single organization. Each repository is assigned to a single Resource Group, and organization members must be included to access its repositories. Within each Resource Group, members are assigned roles that define their permissions for the group’s repositories. Organization admins retain control over all resource groups within the organization. Moreover, Resource Groups impact the visibility of private repositories within the organization. Private repositories linked to a Resource Group are visible solely to members of that group, whereas public repositories are visible to all individuals, both within and outside the organization. ^[14]

Some organizations using Enterprise mode include: ^[15]

Meta
BrainDAO
Snorkel AI
Uber
Liberty Mutual
Arcee AI

Partnerships

Amazon (AWS)

On March 23rd, 2021, Hugging Face announced a strategic partnership with Amazon aimed at simplifying the utilization of State of the Art Machine Learning models and expediting the deployment of cutting-edge NLP features for companies. As part of this collaboration, Hugging Face opted to leverage Amazon Web Services (AWS) as its preferred cloud provider to deliver customer services. To facilitate this partnership, Hugging Face and Amazon introduced new Hugging Face Deep Learning Containers (DLCs), which aimed to streamline the process of training Hugging Face Transformer models in Amazon SageMaker. ^[16]

On February 21, 2023, Hugging Face and Amazon Web Services (AWS) unveiled an expanded long-term strategic partnership to accelerate the availability of next-generation machine learning models. The partnership aims to make these models more accessible to the machine-learning community while helping developers achieve optimal performance at minimal costs. With this expanded collaboration, Hugging Face and AWS aim to expedite machine learning adoption by leveraging the latest models hosted on Hugging Face alongside the advanced capabilities of Amazon SageMaker. Customers can now easily fine-tune and deploy state-of-the-art Hugging Face models on Amazon SageMaker and Amazon Elastic Computing Cloud (EC2) with just a few clicks, utilizing purpose-built machine learning accelerators such as AWS Trainium and AWS Inferentia. ^[17]

Graphcore

On September 14th, 2021, at the 2021 AI Hardware Summit, Hugging Face unveiled its new Hardware Partner Program, featuring device-optimized models and software integrations. Among the founding members of this program was Graphcore, known for its Intelligence Processing Unit (IPU). Graphcore elaborated on their collaboration with Hugging Face, highlighting how it would enable developers to enhance the performance of state-of-the-art Transformer models seamlessly. ^[18]

Habana Labs

On April 12th, 2022, Habana Labs and Hugging Face announced a collaboration to enhance the efficiency of training high-quality transformer models. By integrating Habana’s SynapseAI software suite with the Hugging Face Optimum open-source library, data scientists and machine learning engineers can accelerate their Transformer training tasks on Habana processors with minimal code adjustments, increasing productivity and cost savings. ^[19]

Kakao Brain

On March 6th, 2023, Kakao Brain and Hugging Face announced the release of a new open-source image-text dataset called COYO, consisting of 700 million pairs and two new visual language models trained on it, ViT and ALIGN. This marks the first time the ALIGN model is publicly available for free and open-source use and the initial release of ViT and ALIGN models accompanied by the train dataset. ^[20]

Jupyter

On March 23rd, 2023, Hugging Face announced enhanced support for Jupyter notebooks hosted on the Hugging Face Hub. In addition to hosting models, datasets, and demos, the Hub now hosts over 7,000 notebooks, providing valuable documentation of the development process and offering tutorials for utilizing resources. This improvement in notebook hosting was a significant development for the Hugging Face community. ^[21]

Google

On January 25th, 2024, Hugging Face announced a strategic partnership with Google Cloud to democratize machine learning. The collaboration involved working together across open science, open source, cloud, and hardware to empower companies to develop their AI using the latest open models from Hugging Face and the advanced cloud and hardware features from Google Cloud. ^[22]

NVIDIA

On March 18th, 2024, Hugging Face announced the launch of Train on DGX Cloud, a new service available on the Hugging Face Hub for Enterprise Hub organizations. Train on DGX Cloud simplifies the utilization of open models by leveraging the accelerated compute infrastructure of NVIDIA DGX Cloud. This collaboration aimed to provide Enterprise Hub users with easy access to the latest NVIDIA H100 Tensor Core GPUs, allowing them to fine-tune popular Generative AI models like Llama, Mistral, and Stable Diffusion within the Hugging Face Hub with just a few clicks. ^[23]

Cloudflare

On April 2nd, 2024, Hugging Face introduced Deploy on Cloudflare Workers AI, a new integration available on the Hugging Face Hub. This integration simplifies the use of open models as a serverless API, leveraging state-of-the-art GPUs deployed in Cloudflare edge data centers. This allows developers to build robust Generative AI applications without managing GPU infrastructure and servers while minimizing operating costs by only paying for the compute they use. ^[24]

Overview

History

Hugging Face Hub

Models

Transformers Library

Datasets

Spaces

Organizations

Enterprise Hub

Audit Logs enable organization admins to easily track members' actions, including changes in organization membership, repository settings, and billing.
The Dataset Viewer is activated for private datasets owned by an Enterprise Hub organization. This tool aids teams in understanding their data, facilitating improved data processing and filtering for AI. Users can explore dataset contents, analyze data distributions, filter by values, and search for keywords. Additionally, datasets can be converted to Parquet for programmatic data visualization. ^[13]
Resource Groups streamline repository organization and access management for organization administrators. This functionality allows disparate teams to collaborate on their respective repositories within a single organization. Each repository is assigned to a single Resource Group, and organization members must be included to access its repositories. Within each Resource Group, members are assigned roles that define their permissions for the group’s repositories. Organization admins retain control over all resource groups within the organization. Moreover, Resource Groups impact the visibility of private repositories within the organization. Private repositories linked to a Resource Group are visible solely to members of that group, whereas public repositories are visible to all individuals, both within and outside the organization. ^[14]

Some organizations using Enterprise mode include: ^[15]

Meta
BrainDAO
Snorkel AI
Uber
Liberty Mutual
Arcee AI

Subscribe to wiki

Share wiki

Bookmark

Wiki Details

Hugging Face

Overview

History

Hugging Face Hub

Models

Transformers Library

Datasets

Spaces

Organizations

Enterprise Hub

Partnerships

Amazon (AWS)

Graphcore

Habana Labs

Kakao Brain

Jupyter

Google

NVIDIA

Cloudflare

Feedback

Commit Info

Twitter Timeline

Related Articles

Media

Wiki Details

Feedback

Commit Info

Twitter Timeline

Related Articles

Media

REFERENCES

Subscribe to wiki

Share wiki

Bookmark

Hugging Face

Wiki Details

Hugging Face

Overview

History

Hugging Face Hub

Models

Transformers Library

Datasets

Spaces

Organizations

Enterprise Hub

Partnerships

Amazon (AWS)

Graphcore

Habana Labs

Kakao Brain

Jupyter

Google

NVIDIA

Cloudflare

Feedback

Feedback

Commit Info

Twitter Timeline

Related Articles

Related Articles

Media

Wiki Details

Feedback

Feedback

Commit Info

Twitter Timeline

Related Articles

Related Articles

Media

REFERENCES

Feedback

Feedback

Profile Summary

Profile Summary