Have you ever needed to analyze a large amount of text but lacked the processing power of a supercomputer? Fastc, a new Python library, is here to democratize text classification for those working in resource-constrained environments.
BrainDAO is happy to introduce Fastc, a simple yet powerful Python library designed to perform parallel text classification efficiently and straightforwardly. Whether you are working on sentiment analysis, spam detection, language identification, or any other text classification task, Fastc is an excellent solution in memory-restricted environments.
BrainDAO developed fastc while working on improvements to IQGPT.com, the AI assistant for blockchain knowledge. IQ GPT was trained on thousands of in-depth articles from IQ.wiki, the largest blockchain encyclopedia, as well as live data from sources like InvestHK, CoinGecko, and DefiLlama. IQ GPT is also available as a Telegram and Discord bot that communities can integrate to help with moderation and answering questions.
Fastc integrates state-of-the-art language models for embeddings with simple classification strategies, eliminating the need for fine-tuning. This makes it ideal for resource-constrained environments, allowing multiple classifiers to share common embedding models on a single machine. With Fastc, you can scale up text classification, enabling hundreds of classifiers to run concurrently on the same machine without requiring extensive computational resources.
What is Fastc?
Fastc is a user-friendly Python library designed to build lightweight text classifiers with ease. It is perfect for anyone who wants to run multiple text classifiers in parallel on their personal computer or on servers with limited resources, even without any prior knowledge of text classification.
Why is Fastc Different?
Traditional text classification methods often require more complex training processes and substantial computing power. Fastc adopts a simpler approach, leveraging pre-trained Large Language Models (LLMs) to comprehend text meaning and employing straightforward yet effective strategies, such as Logistic Regression or Centroid-based methods, to differentiate between text classes. This eliminates the need for fine-tuning, resulting in much faster model training and efficient inference when running multiple classifiers in parallel.
Key Features of Fastc:
-
Focused on limited-memory CPU execution: Use efficient distilled models such as deepset/tinyroberta-6l-768d for embedding generation.
-
Cosine Similarity and Logistic Regression: Bypass the need for fine-tuning by utilizing LLM embeddings to efficiently categorize texts using either cosine similarity with class centroids or logistic regression.
-
Efficient Parallel Execution: Run hundreds of classifiers concurrently with minimal overhead by sharing the same model for embedding generation.
Who Can Benefit from Fastc?
Fastc is a valuable tool for a wide range of users, including:
-
Data Scientists: Ideal for those working with limited computational resources.
-
Researchers: Enables efficient analysis of textual data in various research fields.
-
Developers: Simplifies the creation of ad-hoc text classifiers and the integration of classification tasks into applications.
-
Anyone Working with Text Data: Provides a user-friendly way to build text classifiers and analyze text, even with limited technical knowledge.
Get Started with Fastc
Fastc is designed to make text classification accessible and efficient, regardless of your computing power. Whether you are a data scientist, researcher, developer or anyone working with text data, Fastc provides a straightforward, resource-efficient solution. Dive into text classification with Fastc and experience the benefits of state-of-the-art language models without the need for extensive computational resources.
Go to Github page: https://github.com/EveripediaNetwork/fastc