Quote Originally Posted by litewarez View Post
:/ makes me wonder how fast the googlebot system is..

Great tut Hyperz, any chance a C# version for comparison of speeds ??
A C# version would have at least 10 times more code to do the same thing.

Quote Originally Posted by jayfella View Post
lool! we were talking on teamspeak about this. Personally, and i dont care what you say Hyp, my method pwns urs! Let the battle begin! i'll post some speed-tests using my little engine.

EDIT: chomps ur CPU a little - why so? What part needs so much calculation?
This doesn't show my method . But sure, bring it on. It's time to kick ass and chew bubblegum !

It has a queue of 500.000 urls. Each time it crawls a page it extracts 50 urls on average. Each of those 50 urls has to be checked against the entire queue to filter out dupes. And this gets done more than 5 times in a second at moments.

500.000 * 50 * 5 = 125.000.000 string matches per second. Yes, that will indeed keep a quad core busy. I'm surprised it only eats 50%. Just goes to show how good it scales . Keep in mind that filtering dupes is not the only thing that happens a few times a second here.