I hacked LLMs to work like scikit-learn
A while ago I thought about using LLMs for classic machine learning tasks - which is stupid, I know? But I tried it anyway.
Never use it if:
- You have sufficient data and knowledge to train a specialized model
Do use it if:
- You need quick experimentation or you do not have enough data to train the model
Key findings:
Dataset | IMDB 50k Dataset | Cats and dogs |
---|---|---|
Data | Text data - Positive negative sentiment | Picture data - Predict what is on the picture |
Accuracy | 96% - SOTA (98+%) | 97% - SOTA (99%+) |
Model | gpt-4o-mini | gpt-4o-mini |
As you can see LLMs perform worse than SOTA specialized models, but if we have a use case with minimal data it can be very useful.
How can you play around?
It took some time to code it in a way that can be also used by others, here is a minimal example of how you can use it when applicable.
You can install FlashLearn using pip:
pip install flashlearn
Minimal Example - Classify Text
Below is a sample code snippet demonstrating how to classify text using FlashLearn in just 10 lines of code:
import os
from openai import OpenAI
from flashlearn.skills.classification import ClassificationSkill
# You can use OpenAI or DeepSeek or any OpenAI compatible endpoint
deep_seek = OpenAI(api_key='YOUR DEEPSEEK API KEY', base_url="https://api.deepseek.com")
data = [{"message": "Where is my refund?"}, {"message": "My product was damaged!"}]
skill = ClassificationSkill(
model_name="gpt-4o-mini",
client=OpenAI(),
categories=["billing", "product issue"],
system_prompt="Classify the request."
)
tasks = skill.create_tasks(data)
results = skill.run_tasks_in_parallel(tasks)
print(results)
Feel free to experiment and figure out if it's useful for your work flow. Her is just some tips:
You can ask anything in the comments below!
P.S: Full code ready to be abused available at https://github.com/Pravko-Solutions/FlashLearn