PHP Coding
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me



Go Back   Codewalkers ForumsPHP RelatedPHP Coding

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Codewalkers Forums Sponsor:
Old January 8th, 2013, 04:46 PM
humeroushubris humeroushubris is offline
Registered User
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 1 humeroushubris User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 48 m 3 sec
Reputation Power: 0
noob - PHP web crawler

I have a basic PHP web crawler script and I need to expand its functionality, the problem is I'm a total noob at PHP and my knowledge is very basic so I'm coming here for some help.

My goal is to have a basic user input (text box) and when the user types in a phrase; let's say "Red Apples" and hits the enter button the script should start crawling the web for the phrase "Red Apples" and store the plain text results along with the URL they originated from in a database.

Here is what I've got so far:

Code:
error_reporting( E_ERROR );
 
define( "CRAWL_LIMIT_PER_DOMAIN", 50 );
 

$domains = array();

$urls = array();
 
function crawl( $url )
{
  global $domains, $urls;
 
  echo "Crawling $url... ";
 
  $parse = parse_url( $url );

  $domains[ $parse['host'] ]++;
  $urls[] = $url;
 
  $content = file_get_contents( $url );
  if ( $content === FALSE )
  {
    echo "Error.\n";
    return;
  }
 
 
  $content = stristr( $content, "body" );
  preg_match_all( '/http:\/\/[^ "\']+/', $content, $matches );
 
  echo 'Found ' . count( $matches[0] ) . " urls.\n";
 
  foreach( $matches[0] as $crawled_url )
  {
    $parse = parse_url( $crawled_url );
 
    if ( count( $domains[ $parse['host'] ] ) < CRAWL_LIMIT_PER_DOMAIN
        && !in_array( $crawled_url, $urls ) )
    {
      sleep( 1 );
      crawl( $crawled_url );
    }
  }
}


If anybody could point me in the right direction that would be awesome.

Reply With Quote
Old January 9th, 2013, 06:33 AM
DavidMR's Avatar
DavidMR DavidMR is offline
Contributing User
Codewalkers Beginner (1000 - 1499 posts)
 
Join Date: Apr 2007
Location: Galway
Posts: 1,369 DavidMR User rank is Private First Class (20 - 50 Reputation Level)DavidMR User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 1 Month 4 h 23 sec
Reputation Power: 8
with that code sample I dont see how you can achieve a search facility. how about parsing a google search instead?
__________________
When I die, I want to go peacefully like my Grandfather did, in his sleep -- not screaming, like the passengers in his car.

Reply With Quote
Old January 9th, 2013, 01:07 PM
IAmALlama IAmALlama is offline
Me
Click here for more information
 
Join Date: Apr 2007
Location: San Diego, CA
Posts: 2,267 IAmALlama User rank is Lance Corporal (50 - 100 Reputation Level)IAmALlama User rank is Lance Corporal (50 - 100 Reputation Level)IAmALlama User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 2 Weeks 1 Day 6 h 30 m 38 sec
Reputation Power: 9
do you have a specific list of pages you want to crawl, or is the objective to crawl the entire web? sounds like what you are creating is essentially google.

Reply With Quote
Old January 10th, 2013, 01:38 AM
Trenton9claude Trenton9claude is offline
Registered User
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 5 Trenton9claude User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 7 m 20 sec
Reputation Power: 0
I have a basic PHP web crawler script and I need to expand its functionality





Reply With Quote
Old January 10th, 2013, 03:55 AM
DavidMR's Avatar
DavidMR DavidMR is offline
Contributing User
Codewalkers Beginner (1000 - 1499 posts)
 
Join Date: Apr 2007
Location: Galway
Posts: 1,369 DavidMR User rank is Private First Class (20 - 50 Reputation Level)DavidMR User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 1 Month 4 h 23 sec
Reputation Power: 8
unless you show us what you currently have, we cant help...

Reply With Quote
Reply

Viewing: Codewalkers ForumsPHP RelatedPHP Coding > noob - PHP web crawler


Developer Shed Advertisers and Affiliates


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap