Extracting Reddit Data with AI

A few days ago, I stumbled upon a Reddit thread packed with API recommendations. The thread was long, detailed, and exactly what I needed—except I didn’t want to manually comb through dozens of comments. I figured this was the perfect job for AI. Spoiler alert: it wasn’t as straightforward as I expected.

The Problem

I had a simple request: extract a list of APIs from a Reddit thread and organize them into a table with names and descriptions. Sounds easy, right? This is exactly the kind of task modern AI tools should excel at.

What I Tried

I started with the obvious candidates:

  • Notion AI: My go-to assistant couldn’t access the Reddit thread directly
  • ChatGPT: Same story—no direct access to Reddit content
  • Gemini/Chrome AI Mode: Also came up empty

Next, I tried copying and pasting the thread content directly. Two problems emerged: the thread was too long for most free AI tools, and Reddit’s interface makes it surprisingly difficult to select and copy large amounts of text cleanly.

The Solution: Reddit’s JSON API + jq

curl and jq to get Reddit data into chatbots

After some experimentation, I discovered a simpler approach: Reddit’s JSON API combined with jq, my favorite command-line JSON processor.

The key insight? Any Reddit URL can be converted to JSON by simply appending .json to the end.

Here’s the basic approach:

curl -A "Mozilla/5.0" https://www.reddit.com/r/learnprogramming/comments/1klt8li/what_are_some_apis_you_guys_find_yourself_using/.json | jq ".[]"

This dumps the entire thread structure, which is comprehensive but overwhelming. The real magic happens when you filter for just the comment bodies:

curl -A "Mozilla/5.0" https://www.reddit.com/r/learnprogramming/comments/1klt8li/what_are_some_apis_you_guys_find_yourself_using/.json | jq ".[].data.children[].data.body"

This extracts just the comment text, which you can then feed to your AI tool of choice for analysis and organization.

If you’re not familiar with jq, I highly recommend checking out my earlier post on jq—it’s an incredibly powerful tool for ad hoc JSON processing.

Alternative Approaches

You could write a Python script, or create a Tampermonkey browser script for a more integrated experience. But for quick, one-off tasks, the jq approach is hard to beat for simplicity and flexibility.

A Word of Caution

Before you start scraping Reddit data, familiarize yourself with Reddit’s Data API Terms and User Agreement. Reddit has specific policies about data collection and usage. Make sure your use case complies with their terms, especially if you’re planning to do anything beyond personal research or one-off data extraction.

Wrapping Up

Sometimes the best solution isn’t the fanciest one. A simple curl command, a powerful JSON processor, and an AI assistant make a surprisingly effective combination for extracting insights from long-form web content. No complex setup, no API keys, just straightforward Unix tools doing what they do best.

programming  tools  jq  json  reddit  api  ai 

See also