Phone: (866) 217-7478
info@c2itconsulting.net
skip to the main content area of this page


Web Page Scraping

One of the ways we've been gathering data for years has been called "screen scraping." You log onto a program, find certain text in certain portions of the page, and "scrape" it off for your own application's needs.

This has become a whole new animal with the explosion of the Internet. So much more information is available now, but it's constantly changing as well! We are able to use RSS, blogs, XML, Web Services and other types of feeds to obtain much of the information you may need on your site, but every once in a while we still have to get out there and scrape. The examples below demonstrate how this works.

Original Web Page

The frame below shows you a "stats" page for our website. Notice the table about halfway down that shows the percentage of hits from each country.

A bit hard to read, isn't it? Imagine all you really wanted to see was this chart. Normally you'd have to display the page in an IFRAME HTML element (like this), use a FRAMESET, or send the user off-site to view the page. Who wants that, when you could deliver THIS to them:




  Num Perc. Country Name  
 34177.32%United StatesUnited States
 296.58%IndiaIndia
 173.85%Unknown-
 81.81%CanadaCanada
 61.36%South AfricaSouth Africa
 61.36%UkraineUkraine
 40.91%AustraliaAustralia
 30.68%United KingdomUnited Kingdom
 30.68%PhilippinesPhilippines
 30.68%NetherlandsNetherlands
 30.68%DenmarkDenmark
 30.68%Hong KongHong Kong
 20.45%SingaporeSingapore
 20.45%GermanyGermany
 20.45%IsraelIsrael
 10.23%LithuaniaLithuania
 10.23%ItalyItaly
 10.23%JamaicaJamaica
 10.23%MaltaMalta
 10.23%SwitzerlandSwitzerland
 10.23%MalaysiaMalaysia
 10.23%SenegalSenegal
 10.23%PakistanPakistan
 10.23%Russian FederationRussian Federation

Conclusion

So what would you rather deliver on your site? A link or an IFRAME displaying someone else's site, or would you rather scrape the information you want from their site and make it your own? We aren't advocating plagiarism or stealing copywritten information, and we encourage you to give credit where credit is due, but isn't this the way to go?

We can build a site for you that contains pages with this type of functionality, or we can provide you with development tools and controls to do it on your own!

Our scraper offers the following features

Another example of scraping functionality, this time completely removing images from the scraped website.



  Num Perc. Country Name  
34177.32%United States
296.58%India
173.85%Unknown-
81.81%Canada
61.36%South Africa
61.36%Ukraine
40.91%Australia
30.68%United Kingdom
30.68%Philippines
30.68%Netherlands
30.68%Denmark
30.68%Hong Kong
20.45%Singapore
20.45%Germany
20.45%Israel
10.23%Lithuania
10.23%Italy
10.23%Jamaica
10.23%Malta
10.23%Switzerland
10.23%Malaysia
10.23%Senegal
10.23%Pakistan
10.23%Russian Federation

Bible Reading Scraper

One of our most recent projects involved this scraping technology. We used a database of Bible passages, along with BibleGateway.com's search engine, to create a smashup page of daily readings to read through the entire Bible in a year. You can check it out here.