Categories


Our Customer Success Stories

Repustate has helped organizations worldwide turn their data into actionable insights.

Learn how these insights helped them increase productivity, customer loyalty, and sales revenue.

See all Stories

Table of Contents

Ever wanted to extract images from web pages? Now you can with one simple API call.

Repustate’s clean-html API call has been one of our most popular API calls since Day 1. It hasn’t been touched much as its performance was quite good from the get-go, but that changed recently. Now you can extract images as well as the text from any web page.

We had a customer request to add the ability to extract the main image from a web page as well, similar to how Instapaper or Mobile Safari’s “Reader” feature works.

Now by default, when you call clean-html, an image attribute comes back with a URL for the main image, if it exists, for a given article.

Let’s take a look at an example. You’ll need a Repustate API key to try this on your own but it’s free and easy to get one. Let’s take this URL:

http://www.thestar.com/news/insight/2013/02/15/challenging_the_vatican_progressive_catholics_say_reform_must_begin_with_church_governance.html

and pass it to our API call.

curl -d "url=http://www.thestar.com/news/insight/2013/02/15/challenging_the_vatican_progressive_catholics_say_reform_must_begin_with_church_governance.html" http://api.repustate.com/v2/YOUR_API_KEY/clean-html.json

And here’s the response:

{"status": "OK", "text": "To progressive Canadian Catholic ... (shortened for this example)", "image": "http://www.thestar.com/content/dam/thestar/news/insight/2013/02/15/challenging_the_vatican_progressive_catholics_say_reform_must_begin_with_church_governance/vatican_lightning.jpg.size.xxlarge.promo.jpg", "url": "http://www.thestar.com/news/insight/2013/02/15/challenging_the_vatican_progressive_catholics_say_reform_must_begin_with_church_governance.html"}

As you can (kind of) see, there is an ‘image’ key in the JSON response with a URL for the main image of that article.

With this API call, you can create your own version of Instapaper or Readability for your own purposes.