Puristan - Urdu & Arabic AI related Models, datasets and Others

Alpaca-gpt4-arabic dataset

This post shows alpaca-gpt4-arabic dataset from HuggingFace.

Uncategorized

OpenAI MMMLU Dataset for Arabic

The openai/MMMLU dataset, also known as the Multilingual Multiple-Choice Test (MMMLU) dataset, is a collection of multiple-choice questions and answers designed to evaluate the performance of AI models across various languages and domains. Here’s a basic summary and some use cases: Summary of the openai/MMMLU Dataset for Arabic Overview Data Structure Use Cases 1. Model

OCR

Text Extraction Metrics to evaluate Text Extraction Accuracy in OCR

Evaluating accuracy of text extraction in OCR systems can be tricky however there are certain metrics available that can be used to evaluate this. These metrics are: Below we’ll discuss CER and WER Only. Character Error Rate (CER): CER measures the rate of erroneous characters produced by an OCR system compared to the ground truth.

Alpaca-gpt4-arabic dataset

Quantum Physics of LLM

OpenAI MMMLU Dataset for Arabic

Text Extraction Metrics to evaluate Text Extraction Accuracy in OCR