How to Use OpenAI’s GPT 3.5 to Annotate News Data

Bytesview Analytics
3 min readMay 30, 2024

--

Imagine a computer program that can help label information instead of us doing it all by hand! This is possible for simpler tasks with large language models, like the ones that power chatbots.

By giving them clear instructions, we can ask them to sort information like positive or negative news articles. This example uses OpenAI, but it works with other similar programs too!

What is Data Annotation?

Data annotation involves humans typically annotating data (adding labels like text or tags) to train machine learning models with the desired outcome.

Benefits of Annotation Using OpenAI GPT

  • Data annotation enhances model accuracy by supplying high-quality training data for AI models.
  • Uncovers field-specific details in the data
  • Annotation data helps AI models to generalize better.
  • Reduce the bias of the AI model.

Why LLM as Data Annotators?

LLMs excel at learning new tasks quickly, even with minimal examples (few-shot) or no examples (zero-shot), like GPT-3 and other advanced models.

For tasks requiring fewer features, like classification, LLMs are well-suited. Their extensive training data grants them a deep understanding of text structure, allowing them to perform exceptionally well on downstream tasks when guided by clear prompts.

Setting up NewsData.io and Open AI

Newsdata.io and OpenAI offer Python libraries. We’ll apply Newsdata.io’s news data and OpenAI’s API for sentiment analysis using Python.

Classify news articles as “Negative,” “Neutral,” or “Positive” using sentiment analysis. To achieve this with Python, follow these steps to extract news data from Newsdata.io.

For instructions to set up the NewsData.io and Open AI, please refer to this link (here)

Understanding Data

A single Newsdata.io API call can fetch up to 50 news items (unless limited).

All 50 news items are available in the “result” field of the response JSON object.

Check out this article to learn how a single news item looks in the response object.

News headlines act like mini summaries, cleverly capturing the core of the story in just a few words. This grabs readers’ attention and packs a punch of information. Amazingly, we can even analyze the sentiment of the entire article by just looking at the headline, making the process much faster.

Why is Only the News Headline Sufficient?

Headlines naturally reflect a news story’s positive or negative nature, effectively grabbing the reader’s attention.

Furthermore, this approach reduces the tokens required for OpenAI API requests, resulting in lower costs.

Annotation of News Data

GPT-3.5 Turbo’s Chat Completion API (JSON format) is well-suited for tasks like sentiment analysis due to its relative simplicity.

When you give GPT-3.5 Turbo Chat a prompt, it generates words in response.

Regular GPT-3 responses are lengthy, but sentiment analysis just needs a positive/negative/neutral label. JSON mode gives us this single label clearly, making it easier to analyze news sentiment. Just tell GPT-3 to use JSON format to avoid unnecessary text.

To learn more about the annotation news data, click here.

--

--

Bytesview Analytics

Bytesview data analysis tool is one of the most effective and easiest ways to extract insights for unstructured text data.