Results 21 to 30 of 39
-
5th Jul 2010, 04:06 PM #21Respected Developer
No, using regex is overkill. I suggest you do some research in to how regular expression engines work internally. As I said before, you can use regex only in very (very very) simple situations. As soon as you try to use it for extracting dynamic data from multiple locations in a document you're screwed with regex. It ain't rocket science.
-
5th Jul 2010, 05:24 PM #22Respected DeveloperWebsite's:
PlatinumW.org NexusDDL.com HD-United.org CheckLinks.org FLVD.orgI know how regex works. The way DOM parser works is simple but uses lot of memory and is slower. I find it trivial to waste resources for a simple task such as his. He does not need to extract dynamic data. Using a DOM parser is overkill and waste of resources in this case. It ain't rocket science either.
Current projects:
Megaupload Premium Multifetch Script | FF Plugin: Tinypic and Imagevenue Image Remoter
Projects in hiatus:
IPB Linkchecker Bot | VB Linkchecker Bot
-
5th Jul 2010, 06:09 PM #23Respected Developer
I don't think you do. If you did you'd agree with me and everyone esle who calls himself a coder.
Have fun: http://swtch.com/~rsc/regexp/regexp2.html
Sigh. To parse XML/HTML one only needs a simple state machine. Because of that it can be parsed with relatively little code and is much more efficient when compared to the complex finite state machines that make up modern regular expression engines. It requires less CPU cycles than running it trough one or more regular expressions and because of the simplicity of the state machine it uses less memory. Once a DOM is parsed it'll use roughly the same amount of memory as the original string that holds the markup (+ a few KB here and there for the objects).
Every developer that needs to work with HTML or XML will parse it with state machine and not with regular expressions for said reasons.
I'll give you another article: http://www.codinghorror.com/blog/200...hulhu-way.html
Erm, you save resources if you simply parse the HTML the way it should be parsed. You might want to re-read the first post. It's clear that he needs dynamic data.
Apparently for some it is. You know you're wrong here, why not just say I'm right? It's not gonna make you look stupid or anything. We all learn, including me.
PS: do read those articles.
-
6th Jul 2010, 09:47 AM #24Respected DeveloperWebsite's:
PlatinumW.org NexusDDL.com HD-United.org CheckLinks.org FLVD.orgWell from the looks of it, it doesn't look like he wants to parse data but extract it.
The regular expression reading was very interesting - thanks for the linkI am also not clear as to what do you mean by dynamic data?
Current projects:
Megaupload Premium Multifetch Script | FF Plugin: Tinypic and Imagevenue Image Remoter
Projects in hiatus:
IPB Linkchecker Bot | VB Linkchecker Bot
-
6th Jul 2010, 10:55 AM #25OPMemberWebsite's:
InstantRDP.com
-
6th Jul 2010, 12:11 PM #26Respected Developer
^That too is how it should not be done. It's virtually the same as regular expressions but even less solid. What is so hard about just parsing the HTML file? Why use some confusing dirty method? Anyway, it's your problem. Just know that your code is flawed.
Lol you just don't give up do you ^^. You need to parse data to extract it correctly. By dynamic data I mean data that changes (coming from a PHP file for example). If you look at the 1st post you see he extracts data from a forum which is about as dynamic as it gets.
-
6th Jul 2010, 12:22 PM #27OPMemberWebsite's:
InstantRDP.comI already completed coding the part for extracting the string. It was not simple <div> and </div> tag. I had to extract data between two definite pattern of strings.
So, I didn't want to waste another week learning your method.
I'll give it time after I complete my project and learn that too though.
PHP Code:for(int i=0;i<5;i++)
{
string result = "";
int iIndexOfBegin = strSource.IndexOf(strBegin);
if (iIndexOfBegin != -1)
{
String tempstring = strSource.Substring(iIndexOfBegin + strBegin.Length);
int iEnd = tempstring.IndexOf(strEnd);
if (iEnd != -1)
{
result = tempstring.Substring(0, iEnd);
string next = result;
-
6th Jul 2010, 12:41 PM #28Respected Developer
What is there to learn about:
Code:var html = new HtmlDocument(); // load the html html.LoadHtml(yourHtmlHere); // use XPath to select all "A" elements from the html var anchors = html.DocumentNode.SelectNodes("//a"); // filter out those that start with http var filter = from a in anchors where a.GetAttributeValue("href", "").StartsWith("http") select a;
It's just loading a dll and calling a few methods. I don't see what needs to be learned here. What you're doing right there is the wrong way to do it and I wouldn't be surprised if I see you making another topic because suddenly something stopped working or your program crashes.
-
6th Jul 2010, 12:47 PM #29ლ(ಠ益ಠლ)Website's:
extremecoderz.comDoing what pankaj wants is quickest when using IndexOf - iv been through them all, and thats what conclusion i came to, which is why i created my little "getstringInbetween" class.
-
6th Jul 2010, 12:51 PM #30Respected Developer
But it is not reliable with dynamic data. You need access to the DOM. This is not a situation where execution time is important but rather one where the validity of the data is.
Sponsored Links
Thread Information
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)
Similar Threads
-
extracting data from diffrent site
By zebono2 in forum Web Development AreaReplies: 1Last Post: 28th Jul 2012, 06:22 AM -
C++ string search help needed
By googleplus in forum Web Development AreaReplies: 0Last Post: 12th May 2012, 04:42 PM -
How to recover deleted or lost data, file, photo on Mac with Data Recovery software
By Jack20126 in forum General DiscussionReplies: 0Last Post: 20th Dec 2011, 03:37 AM -
php string - heredoc syntax
By desiboy in forum Web Development AreaReplies: 3Last Post: 16th Nov 2010, 05:15 PM -
[c#] Get String In between strings
By jayfella in forum Web Development AreaReplies: 3Last Post: 16th Jun 2010, 11:23 PM
themaLeecher - leech and manage...
Version 5.03 released. Open older version (or...