Phone: (866) 217-7478
info@c2itconsulting.net
skip to the main content area of this page


Web Page Scraping

One of the ways we've been gathering data for years has been called "screen scraping." You log onto a program, find certain text in certain portions of the page, and "scrape" it off for your own application's needs.

This has become a whole new animal with the explosion of the Internet. So much more information is available now, but it's constantly changing as well! We are able to use RSS, blogs, XML, Web Services and other types of feeds to obtain much of the information you may need on your site, but every once in a while we still have to get out there and scrape. The examples below demonstrate how this works.

Original Web Page

The frame below shows you a "stats" page for our website. Notice the table about halfway down that shows the percentage of hits from each country.

A bit hard to read, isn't it? Imagine all you really wanted to see was this chart. Normally you'd have to display the page in an IFRAME HTML element (like this), use a FRAMESET, or send the user off-site to view the page. Who wants that, when you could deliver THIS to them:




  Num Perc. Country Name  
 33675.68%United StatesUnited States
 276.08%IndiaIndia
 184.05%Unknown-
 132.93%CanadaCanada
 71.58%DenmarkDenmark
 61.35%UkraineUkraine
 51.13%South AfricaSouth Africa
 40.90%AustraliaAustralia
 30.68%GermanyGermany
 30.68%NetherlandsNetherlands
 30.68%Hong KongHong Kong
 20.45%ItalyItaly
 20.45%SingaporeSingapore
 20.45%IsraelIsrael
 20.45%United KingdomUnited Kingdom
 20.45%PhilippinesPhilippines
 10.23%ArgentinaArgentina
 10.23%LithuaniaLithuania
 10.23%JamaicaJamaica
 10.23%MaltaMalta
 10.23%SwitzerlandSwitzerland
 10.23%MalaysiaMalaysia
 10.23%SenegalSenegal
 10.23%PakistanPakistan
 10.23%Russian FederationRussian Federation

Conclusion

So what would you rather deliver on your site? A link or an IFRAME displaying someone else's site, or would you rather scrape the information you want from their site and make it your own? We aren't advocating plagiarism or stealing copywritten information, and we encourage you to give credit where credit is due, but isn't this the way to go?

We can build a site for you that contains pages with this type of functionality, or we can provide you with development tools and controls to do it on your own!

Our scraper offers the following features

Another example of scraping functionality, this time completely removing images from the scraped website.



  Num Perc. Country Name  
33675.68%United States
276.08%India
184.05%Unknown-
132.93%Canada
71.58%Denmark
61.35%Ukraine
51.13%South Africa
40.90%Australia
30.68%Germany
30.68%Netherlands
30.68%Hong Kong
20.45%Italy
20.45%Singapore
20.45%Israel
20.45%United Kingdom
20.45%Philippines
10.23%Argentina
10.23%Lithuania
10.23%Jamaica
10.23%Malta
10.23%Switzerland
10.23%Malaysia
10.23%Senegal
10.23%Pakistan
10.23%Russian Federation

Bible Reading Scraper

One of our most recent projects involved this scraping technology. We used a database of Bible passages, along with BibleGateway.com's search engine, to create a smashup page of daily readings to read through the entire Bible in a year. You can check it out here.