(317) 456-C2IT (2248)
Toll Free: (866) 217-7478
info@c2itconsulting.net
skip to the main content area of this page


Web Page Scraping

One of the ways we've been gathering data for years has been called "screen scraping." You log onto a program, find certain text in certain portions of the page, and "scrape" it off for your own application's needs.

This has become a whole new animal with the explosion of the Internet. So much more information is available now, but it's constantly changing as well! We are able to use RSS, blogs, XML, Web Services and other types of feeds to obtain much of the information you may need on your site, but every once in a while we still have to get out there and scrape. The examples below demonstrate how this works.

Original Web Page

The frame below shows you a "stats" page for our website. Notice the table about halfway down that shows the percentage of hits from each country.

A bit hard to read, isn't it? Imagine all you really wanted to see was this chart. Normally you'd have to display the page in an IFRAME HTML element (like this), use a FRAMESET, or send the user off-site to view the page. Who wants that, when you could deliver THIS to them:




  Num Perc. Country Name    
 43887.60%United StatesUnited States
 204.00%IndiaIndia
 183.60%CanadaCanada
 51.00%South AfricaSouth Africa
 40.80%AustraliaAustralia
 30.60%Bosnia And HerzegovinaBosnia And Herzegovina
 30.60%Russian FederationRussian Federation
 20.40%DenmarkDenmark
 20.40%IrelandIreland
 10.20%MacedoniaMacedonia
 10.20%NetherlandsNetherlands
 10.20%ItalyItaly
 10.20%Saudi ArabiaSaudi Arabia
 10.20%United Arab EmiratesUnited Arab Emirates

Conclusion

So what would you rather deliver on your site? A link or an IFRAME displaying someone else's site, or would you rather scrape the information you want from their site and make it your own? We aren't advocating plagiarism or stealing copywritten information, and we encourage you to give credit where credit is due, but isn't this the way to go?

We can build a site for you that contains pages with this type of functionality, or we can provide you with development tools and controls to do it on your own!

Our scraper offers the following features

Another example of scraping functionality, this time completely removing images from the scraped website.



  Num Perc. Country Name    
43887.60%United States
204.00%India
183.60%Canada
51.00%South Africa
40.80%Australia
30.60%Bosnia And Herzegovina
30.60%Russian Federation
20.40%Denmark
20.40%Ireland
10.20%United Arab Emirates
10.20%Macedonia
10.20%Netherlands
10.20%Italy
10.20%Saudi Arabia

Bible Reading Scraper

One of our most recent projects involved this scraping technology. We used a database of Bible passages, along with BibleGateway.com's search engine, to create a smashup page of daily readings to read through the entire Bible in a year. You can check it out here.