Large Language Model Evaluation

Bryce Wilson

Data Engineer at Black.ai

Consistent support

If there's one thing that makes SUPA stand out, it's their commitment to providing consistent support throughout the data labeling process. The team actively and efficiently engaged with us to ensure any ambiguity in the dataset was cleared up.

Jonas Olausson

Data Engineer at Black AI

The best interface for self-service labeling.

Everything from uploading data to seeing it labeled in real time was really cool. This is just way simpler to use compared to Amazon Sagemaker and LabelBox. I was also very impressed with how the platform delivered exactly what we needed in terms of label quality.

Sravan Bhagavatula

Director of Computer Vision at Greyscale AI

Launch a revised batch within hours

I was also able to view the labels as they were being generated, which gave me quick feedback about the label quality, rather than waiting for the whole batch. This replaced my standard manual QA process using external tools like Voxel's Fiftyone, as the labels were clear and easy to parse through in real-time.

Sparsh Shankar

Associate ML Engineer at Sprinklr

Really quick

The annotators were really quick. I would upload and 5 minutes later - 10 images done. I checked 5 minutes later - 100 images done.

Puneet Garg

Head of Data Science at Carousell

Good quality judgments

The team at [SUPA] has been very professional & easy to work with since we started our collaboration in 2019. They've provided us with good quality judgments to train, tune, and validate our Search & Recommendations models.

Why are we doing this?

We're excited to introduce a new open-source tool that empowers users to evaluate and compare LLM performance firsthand. While LLMs have shown immense potential across various applications, the ability to meaningfully assess their capabilities—especially in specific domains and use cases—has remained a challenge.

Our tool aims to bridge this gap by providing a transparent, hands-on comparison environment where users can pit different models against each other and draw their own conclusions.

What sets our tool apart is its inclusive approach to model evaluation. We've noticed a gap in representation, particularly for Southeast Asian LLMs, in existing comparison platforms. Our tool will include models from the SEA region alongside global options, ensuring a more comprehensive evaluation landscape. We believe this diversity is crucial for users working across different linguistic and cultural contexts.

In line with our commitment to advancing the field, all data generated through our platform will be made publicly available for research purposes and model fine-tuning.

Made with love by the SUPA Team

reviews

Benji Meltzer

CTO at Aerobotics

An integral part of our AI journey

“Aerobotics has partnered with SUPA for data annotation since 2020. Their dependable and collaborative approach has made them an integral part of our AI journey."

‍Read more about how SUPA helped Aerobotics scale up to 170% in a week at 97% accuracy

Dominic Calina

Head of Data at Greyparrot AI

Slashing startup time from 2-3 weeks to a mere 24 hours

"SUPA's labeling support has been instrumental in our accelerated progress."

Read more about SUPA's ongoing collaboration with Greyparrot

Dmitri

ML Data Engineer at ZERO10

Outstanding accuracy of over 99%

“What do we like best about SUPA? The quality and thoughtful approach to requirements elicitation."
‍
Read more about SUPA’s fast delivery of segmentation data with over 99% accuracy

Jonas Olausson

Data Engineer at Black.ai

Way easier than Amazon Sagemaker and Labelbox

"The best interface for self-service labeling. Everything from uploading data to seeing it labeled in real time was really cool. Very impressed with how the platform delivered exactly the label quality we needed."

Sparsh Shankar

Associate ML Engineer at Sprinklr

Incredible turnaround

"The annotators were really quick. Checked 5 minutes after uploading data - 100 images done."

Sravan Bhagavatula

Director of CV at Greyscale AI

Better iterations

"Real-time view of the generated labels gave me instant feedback about the label quality. No more waiting for the whole batch. This replaced my standard manual QA process using external tools like Voxel's FiftyOne."

Puneet Garg

Head of Data Science at Carousell

High quality labels

"SUPA has provided us with good quality judgments to train, tune, and validate our Search & Recommendations models. Before this, it was difficult to scale our labeling workflows."

Bryce Wilson

Data Engineer at Black.ai

Consistent support

"What stands out is their commitment to providing consistent support throughout the data labeling process. The team actively engaged with us to clear up any ambiguity in the dataset including edge cases we didn't foresee."

Trusted by machine learning teams worldwide

Let the LLM battle begin

Consistent support

The best interface for self-service labeling.

Launch a revised batch within hours

Really quick

Good quality judgments

Why are we doing this?

Updates

PRESS RELEASE – TDCX and SUPA tie-up to help companies address a key barrier in generative AI adoption

SUPA Achieves SOC 2 Type II Certification: A New Level of Trust and Security for Our Customers

PRESS RELEASE – Global Leader in AI Waste Intelligence Greyparrot.ai Expands to 89 Waste Categories, Powered by SUPA

Get in touch

Open Source LLMComparison Tool

Trusted by machine learning teams worldwide

Let the LLM battle begin

Consistent support

The best interface for self-service labeling.

Launch a revised batch within hours

Really quick

Good quality judgments

Why are we doing this?

Updates

PRESS RELEASE – TDCX and SUPA tie-up to help companies address a key barrier in generative AI adoption

SUPA Achieves SOC 2 Type II Certification: A New Level of Trust and Security for Our Customers

PRESS RELEASE – Global Leader in AI Waste Intelligence Greyparrot.ai Expands to 89 Waste Categories, Powered by SUPA

Get in touch

Open Source LLM
Comparison Tool