From jeff at somethingsimilar.com Tue Feb 3 01:12:47 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Mon, 2 Feb 2009 22:12:47 -0800
Subject: [Nokogiri-talk] flipping the bozo bit
Message-ID:
Hey,
I'm trying to add support for nokogiri's XML parser[1] to rfeedparser,
a ruby translation of feedparser, but I've hit a snag.
You see, the 3000+ xml tests for feedparser expect a "bozo" bit to be
flipped in the data structure returned if the parsed XML is not
well-formed (i.e. tags are missing a '>', etc.). This is to provide
the developer using it a handy way of detecting "bad" data. On top of
that, the architecture of feedparser (and rfeedparser) depends on
having a "strict" parser for well-formed XML and a "loose" parser (for
ill-formed XML). rfeedparser manages to get both expat and
libxml-ruby[2] to adhere to this just as they do in the python
version.[3]
The problem I'm having is that I can't get nokogiri to fail on
ill-formed XML! The 1500 or so ill-formed tests fail miserably when
using my perfectly fine nokogiri SAX parser because nokogiri will not
give up. Nothing turns up as a warning nor an error (i.e. nothing is
passed to SAX::Parser#warning nor SAX::Parser#error).
Is there some way to get nokogiri to either a) only work on
well-formed XML or b) have it include some information on the
well-formedness of the XML it is parsing? Perhaps, there is something
easy I've overlooked.
--
Jeff
[1] Currently, just the XML parsing. Once we jump the hurdle I explain
here, I'll probably start looking at for the "loose" things.
[2] Well, older versions of it. The 0.9.7 and 0.9.8 releases have
been... finicky.
[3] "Manages" meaning "they blow up as expected and rfp cleans up the
mess, so it has to have this architecture".
From aaron.patterson at gmail.com Tue Feb 3 11:49:00 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Tue, 3 Feb 2009 08:49:00 -0800
Subject: [Nokogiri-talk] flipping the bozo bit
In-Reply-To:
References:
Message-ID: <6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com>
Hi Jeff,
On Mon, Feb 2, 2009 at 10:12 PM, Jeff Hodges wrote:
> Hey,
> I'm trying to add support for nokogiri's XML parser[1] to rfeedparser,
> a ruby translation of feedparser, but I've hit a snag.
>
> You see, the 3000+ xml tests for feedparser expect a "bozo" bit to be
> flipped in the data structure returned if the parsed XML is not
> well-formed (i.e. tags are missing a '>', etc.). This is to provide
> the developer using it a handy way of detecting "bad" data. On top of
> that, the architecture of feedparser (and rfeedparser) depends on
> having a "strict" parser for well-formed XML and a "loose" parser (for
> ill-formed XML). rfeedparser manages to get both expat and
> libxml-ruby[2] to adhere to this just as they do in the python
> version.[3]
Do you want to be notified of bad XML, or actually blow up on bad XML, or both?
> The problem I'm having is that I can't get nokogiri to fail on
> ill-formed XML! The 1500 or so ill-formed tests fail miserably when
> using my perfectly fine nokogiri SAX parser because nokogiri will not
> give up. Nothing turns up as a warning nor an error (i.e. nothing is
> passed to SAX::Parser#warning nor SAX::Parser#error).
Yes. Nokogiri is the hardest working XML parser in show business. It
will parse anything! ;-)
> Is there some way to get nokogiri to either a) only work on
> well-formed XML or b) have it include some information on the
> well-formedness of the XML it is parsing? Perhaps, there is something
> easy I've overlooked.
I do have something for you.
You can pass options to the DOM parser as to how strict you'd like to
be. Passing in 0 is most strict. This will blow up:
doc = Nokogiri::XML('', nil, nil, 0)
For all the options, check out the constants here:
http://nokogiri.rubyforge.org/nokogiri/classes/Nokogiri/XML.html
Just bitwise and the constants to set options. (Sorry the RDoc is
broken, I've worked with Eric to fix up rdoc bugs and it should be
nicer next release).
If you'd just like to be *notified* of parse errors, and not blow up,
there is a callback you can set:
Nokogiri.error_handler = lambda { |syntax_error| puts syntax_error.level }
doc = Nokogiri::XML('')
You can use that same callback for SAX documents. IIRC, the warning
and error callbacks are just in libxml2 to tease us. I included them
for completeness, but I've never been able to get them to fire.
One thing to watch out for..... That lambda is not thread safe. It's
easy to fix for SAX documents, but I'm not sure what to do when DOM
parsing. I can tie that error callback to a context, I just don't
know what to tie it to? Thread.current? Any suggestions would be
greatly appreciated. :-)
I hope that helps!
--
Aaron Patterson
http://tenderlovemaking.com/
From aaron.patterson at gmail.com Tue Feb 3 11:55:50 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Tue, 3 Feb 2009 08:55:50 -0800
Subject: [Nokogiri-talk] flipping the bozo bit
In-Reply-To: <6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com>
References:
<6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com>
Message-ID: <6959e1680902030855x3821c744p8c6b38a98a6d4f@mail.gmail.com>
On Tue, Feb 3, 2009 at 8:49 AM, Aaron Patterson
wrote:
> Hi Jeff,
>
> On Mon, Feb 2, 2009 at 10:12 PM, Jeff Hodges wrote:
>> Hey,
>> I'm trying to add support for nokogiri's XML parser[1] to rfeedparser,
>> a ruby translation of feedparser, but I've hit a snag.
>>
>> You see, the 3000+ xml tests for feedparser expect a "bozo" bit to be
>> flipped in the data structure returned if the parsed XML is not
>> well-formed (i.e. tags are missing a '>', etc.). This is to provide
>> the developer using it a handy way of detecting "bad" data. On top of
>> that, the architecture of feedparser (and rfeedparser) depends on
>> having a "strict" parser for well-formed XML and a "loose" parser (for
>> ill-formed XML). rfeedparser manages to get both expat and
>> libxml-ruby[2] to adhere to this just as they do in the python
>> version.[3]
>
> Do you want to be notified of bad XML, or actually blow up on bad XML, or both?
>
>> The problem I'm having is that I can't get nokogiri to fail on
>> ill-formed XML! The 1500 or so ill-formed tests fail miserably when
>> using my perfectly fine nokogiri SAX parser because nokogiri will not
>> give up. Nothing turns up as a warning nor an error (i.e. nothing is
>> passed to SAX::Parser#warning nor SAX::Parser#error).
>
> Yes. Nokogiri is the hardest working XML parser in show business. It
> will parse anything! ;-)
>
>> Is there some way to get nokogiri to either a) only work on
>> well-formed XML or b) have it include some information on the
>> well-formedness of the XML it is parsing? Perhaps, there is something
>> easy I've overlooked.
>
> I do have something for you.
>
> You can pass options to the DOM parser as to how strict you'd like to
> be. Passing in 0 is most strict. This will blow up:
>
> doc = Nokogiri::XML('', nil, nil, 0)
>
> For all the options, check out the constants here:
>
> http://nokogiri.rubyforge.org/nokogiri/classes/Nokogiri/XML.html
>
> Just bitwise and the constants to set options. (Sorry the RDoc is
> broken, I've worked with Eric to fix up rdoc bugs and it should be
> nicer next release).
>
> If you'd just like to be *notified* of parse errors, and not blow up,
> there is a callback you can set:
>
> Nokogiri.error_handler = lambda { |syntax_error| puts syntax_error.level }
> doc = Nokogiri::XML('')
>
> You can use that same callback for SAX documents. IIRC, the warning
> and error callbacks are just in libxml2 to tease us. I included them
> for completeness, but I've never been able to get them to fire.
>
> One thing to watch out for..... That lambda is not thread safe. It's
> easy to fix for SAX documents, but I'm not sure what to do when DOM
> parsing. I can tie that error callback to a context, I just don't
> know what to tie it to? Thread.current? Any suggestions would be
> greatly appreciated. :-)
>
> I hope that helps!
Actually, any syntax suggestions on any of the error handling would be
great. In my use cases, I want nokogiri to recover the document no
matter what. Because of that, I haven't concerned myself too much
with handling parse errors (since I don't get them). I think that is
an area where I could make some nice improvements.
--
Aaron Patterson
http://tenderlovemaking.com/
From jeff at somethingsimilar.com Tue Feb 3 13:39:53 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Tue, 3 Feb 2009 10:39:53 -0800
Subject: [Nokogiri-talk] flipping the bozo bit
In-Reply-To: <6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com>
References:
<6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com>
Message-ID:
On Tue, Feb 3, 2009 at 8:49 AM, Aaron Patterson
wrote:
> You can pass options to the DOM parser as to how strict you'd like to
> be. Passing in 0 is most strict. This will blow up:
>
> doc = Nokogiri::XML('', nil, nil, 0)
This looks to be what I want. Well, more accurately, However, I can't
find the SAX::Parser or SAX::Document methods to allow me to do this.
Actually, this is related to another problem I found: There appears to
be no way to tell a SAX::Parser or SAX::Document what encoding to
parse as given that I know that I will always be passing a string to
be parsed (meaning, SAX::Parser#parse is the appropriate method to
call).
Maybe there were supposed to be some other options on
SAX::Parser#parse, #parse_io and #parse_file and they just were
accidentally left out?
--
Jef
From aaron.patterson at gmail.com Tue Feb 3 14:30:56 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Tue, 3 Feb 2009 11:30:56 -0800
Subject: [Nokogiri-talk] flipping the bozo bit
In-Reply-To:
References:
<6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com>
Message-ID: <6959e1680902031130td954c87hf6e48d93f9d29716@mail.gmail.com>
On Tue, Feb 3, 2009 at 10:39 AM, Jeff Hodges wrote:
> On Tue, Feb 3, 2009 at 8:49 AM, Aaron Patterson
> wrote:
>> You can pass options to the DOM parser as to how strict you'd like to
>> be. Passing in 0 is most strict. This will blow up:
>>
>> doc = Nokogiri::XML('', nil, nil, 0)
>
> This looks to be what I want. Well, more accurately, However, I can't
> find the SAX::Parser or SAX::Document methods to allow me to do this.
> Actually, this is related to another problem I found: There appears to
> be no way to tell a SAX::Parser or SAX::Document what encoding to
> parse as given that I know that I will always be passing a string to
> be parsed (meaning, SAX::Parser#parse is the appropriate method to
> call).
If you set the lambda, it still gets called on SAX parse errors. In
there, you can choose to raise or do whatever:
Nokogiri.error_handler = lambda { |syntax_error| raise "Damn!" }
Nokogiri::XML::SAX::Parser.new.parse('')
> Maybe there were supposed to be some other options on
> SAX::Parser#parse, #parse_io and #parse_file and they just were
> accidentally left out?
SAX::Parser.parse_io takes an encoding. The encoding is a number that
maps to these constants:
http://xmlsoft.org/html/libxml-encoding.html#xmlCharEncoding
Yuck. I should document that.... :-(
I'm going to have to look in to setting encoding for in memory
parsing.... The function I'm using doesn't take any encoding options,
but there must be a way to set them.
--
Aaron Patterson
http://tenderlovemaking.com/
From timcharper at gmail.com Tue Feb 3 16:05:18 2009
From: timcharper at gmail.com (Tim Harper)
Date: Tue, 3 Feb 2009 14:05:18 -0700
Subject: [Nokogiri-talk] How to select child elements only from a Nokogiri
node
Message-ID:
This gist paste says most of it:
http://gist.github.com/57751
Given a table, I'm trying to select all of the rows in that table without
selecting rows from nested tables. Historically in Hpricot, I would just
use (element / "> tr"). However, nokogiri doesn't like that syntax.
Additionally, I can't seem to use the xpath selector either, (element /
"//tr") is selecting all child rows.
Am I going the completely wrong route? Or is this a feature that's planned
to be implemented at some point?
Thanks :)
Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From aaron.patterson at gmail.com Tue Feb 3 16:25:41 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Tue, 3 Feb 2009 13:25:41 -0800
Subject: [Nokogiri-talk] How to select child elements only from a
Nokogiri node
In-Reply-To:
References:
Message-ID: <6959e1680902031325m51aa5d6bq7d35c2c44c979aa4@mail.gmail.com>
On Tue, Feb 3, 2009 at 1:05 PM, Tim Harper wrote:
> This gist paste says most of it:
> http://gist.github.com/57751
>
> Given a table, I'm trying to select all of the rows in that table without
> selecting rows from nested tables. Historically in Hpricot, I would just
> use (element / "> tr"). However, nokogiri doesn't like that syntax.
> Additionally, I can't seem to use the xpath selector either, (element /
> "//tr") is selecting all child rows.
> Am I going the completely wrong route? Or is this a feature that's planned
> to be implemented at some point?
Actually, I consider this to be broken behavior in hpricot. Using
CSS, you're saying "find all tr tags which are decedents of this
reference node". Since your reference node is the top level "table"
tag, it finds all four descendants.
Your XPath query says "find all nodes starting at the root whose name
is 'tr'". If you start your XPath with a slash, it *always* means
"from the root node". If you want relative queries in XPath, start
with a dot: ".//tr".
Just try to think of how you might write the CSS selector when dealing
with your web browser. How would you expect it to behave with the
browser? That is how it should work with nokogiri. We try to match
browser behavior as closely as possible.
I've forked your gist to illustrate:
http://gist.github.com/57768
Hope that helps!
--
Aaron Patterson
http://tenderlovemaking.com/
From timcharper at gmail.com Tue Feb 3 17:07:26 2009
From: timcharper at gmail.com (Tim Harper)
Date: Tue, 3 Feb 2009 15:07:26 -0700
Subject: [Nokogiri-talk] How to select child elements only from a
Nokogiri node
In-Reply-To: <6959e1680902031325m51aa5d6bq7d35c2c44c979aa4@mail.gmail.com>
References:
<6959e1680902031325m51aa5d6bq7d35c2c44c979aa4@mail.gmail.com>
Message-ID:
Thanks for your response :)
I don't think I explained my question well enough - sorry about that. And
there was a problem with my example which showed Hpricot behavior to be
opposite of what it does in the real world. Both of the examples you
provided as a solution returned the rows from the nested table, but what I'm
trying to get is the rows from the root table.
I forked and updated the gist, adding some comments and fixing an issue with
the original
http://gist.github.com/57787
I understand your point about what should be valid in css, and yes,
naturally, you wouldn't use a css selector starting with a >. But you also
wouldn't be evaluating a css selector against a node in the document
somewhere (you always start from root).
IE:
Nokogiri::HTML.parse("table#users > tr")
Nokogiri::HTML.parse("table#users") / " > tr"
In the latter example, I see the node found by Nokogiri as a direct
substitute for "table#users", and should have some way to select child
elements from the node without recursing deeper. I hope I'm explaining
myself better.
Thank you again for your quick response!
Tim
On Tue, Feb 3, 2009 at 2:25 PM, Aaron Patterson
wrote:
> On Tue, Feb 3, 2009 at 1:05 PM, Tim Harper wrote:
> > This gist paste says most of it:
> > http://gist.github.com/57751
> >
> > Given a table, I'm trying to select all of the rows in that table without
> > selecting rows from nested tables. Historically in Hpricot, I would just
> > use (element / "> tr"). However, nokogiri doesn't like that syntax.
> > Additionally, I can't seem to use the xpath selector either, (element /
> > "//tr") is selecting all child rows.
> > Am I going the completely wrong route? Or is this a feature that's
> planned
> > to be implemented at some point?
>
> Actually, I consider this to be broken behavior in hpricot. Using
> CSS, you're saying "find all tr tags which are decedents of this
> reference node". Since your reference node is the top level "table"
> tag, it finds all four descendants.
>
> Your XPath query says "find all nodes starting at the root whose name
> is 'tr'". If you start your XPath with a slash, it *always* means
> "from the root node". If you want relative queries in XPath, start
> with a dot: ".//tr".
>
> Just try to think of how you might write the CSS selector when dealing
> with your web browser. How would you expect it to behave with the
> browser? That is how it should work with nokogiri. We try to match
> browser behavior as closely as possible.
>
> I've forked your gist to illustrate:
>
> http://gist.github.com/57768
>
> Hope that helps!
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From jeff at somethingsimilar.com Tue Feb 3 17:37:50 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Tue, 3 Feb 2009 14:37:50 -0800
Subject: [Nokogiri-talk] flipping the bozo bit
In-Reply-To: <6959e1680902031130td954c87hf6e48d93f9d29716@mail.gmail.com>
References:
<6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com>
<6959e1680902031130td954c87hf6e48d93f9d29716@mail.gmail.com>
Message-ID:
That's fine but I had hoped to replace hpricot with nokogiri for the
HTML parsing as well in the near future. Is there some distinguishing
characteristic between HTML parse errors and XML parse errors that
would be passed to that lambda? And is there a distinguishing
characteristic between recoverable and non-recoverable parse errors?
Modifying a module-wide variable to when I'll be doing multiple kinds
of parses is kind of icky.
--
Jeff
On Tue, Feb 3, 2009 at 11:30 AM, Aaron Patterson
wrote:
> On Tue, Feb 3, 2009 at 10:39 AM, Jeff Hodges wrote:
>> On Tue, Feb 3, 2009 at 8:49 AM, Aaron Patterson
>> wrote:
>>> You can pass options to the DOM parser as to how strict you'd like to
>>> be. Passing in 0 is most strict. This will blow up:
>>>
>>> doc = Nokogiri::XML('', nil, nil, 0)
>>
>> This looks to be what I want. Well, more accurately, However, I can't
>> find the SAX::Parser or SAX::Document methods to allow me to do this.
>> Actually, this is related to another problem I found: There appears to
>> be no way to tell a SAX::Parser or SAX::Document what encoding to
>> parse as given that I know that I will always be passing a string to
>> be parsed (meaning, SAX::Parser#parse is the appropriate method to
>> call).
>
> If you set the lambda, it still gets called on SAX parse errors. In
> there, you can choose to raise or do whatever:
>
> Nokogiri.error_handler = lambda { |syntax_error| raise "Damn!" }
> Nokogiri::XML::SAX::Parser.new.parse('')
>
>> Maybe there were supposed to be some other options on
>> SAX::Parser#parse, #parse_io and #parse_file and they just were
>> accidentally left out?
>
> SAX::Parser.parse_io takes an encoding. The encoding is a number that
> maps to these constants:
>
> http://xmlsoft.org/html/libxml-encoding.html#xmlCharEncoding
>
> Yuck. I should document that.... :-(
>
> I'm going to have to look in to setting encoding for in memory
> parsing.... The function I'm using doesn't take any encoding options,
> but there must be a way to set them.
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
From aaron.patterson at gmail.com Tue Feb 3 17:38:51 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Tue, 3 Feb 2009 14:38:51 -0800
Subject: [Nokogiri-talk] How to select child elements only from a
Nokogiri node
In-Reply-To:
References:
<6959e1680902031325m51aa5d6bq7d35c2c44c979aa4@mail.gmail.com>
Message-ID: <6959e1680902031438y549cf86cn45797f5ccd2907a6@mail.gmail.com>
On Tue, Feb 3, 2009 at 2:07 PM, Tim Harper wrote:
> Thanks for your response :)
> I don't think I explained my question well enough - sorry about that. And
> there was a problem with my example which showed Hpricot behavior to be
> opposite of what it does in the real world. Both of the examples you
> provided as a solution returned the rows from the nested table, but what I'm
> trying to get is the rows from the root table.
> I forked and updated the gist, adding some comments and fixing an issue with
> the original
> http://gist.github.com/57787
> I understand your point about what should be valid in css, and yes,
> naturally, you wouldn't use a css selector starting with a >. But you also
> wouldn't be evaluating a css selector against a node in the document
> somewhere (you always start from root).
> IE:
> Nokogiri::HTML.parse("table#users > tr")
> Nokogiri::HTML.parse("table#users") / " > tr"
> In the latter example, I see the node found by Nokogiri as a direct
> substitute for "table#users", and should have some way to select child
> elements from the node without recursing deeper. I hope I'm explaining
> myself better.
Okay, I think I understand a little better. In order to support this syntax:
doc.css("table#users").css(" > tr")
We would have to keep track of the previously used CSS selector. That
doesn't sound like fun.....
For now, you could do this:
doc.css('table#users").xpath('./tr')
That will select immediate children whose name is "tr".
--
Aaron Patterson
http://tenderlovemaking.com/
From timcharper at gmail.com Tue Feb 3 18:03:04 2009
From: timcharper at gmail.com (Tim Harper)
Date: Tue, 3 Feb 2009 16:03:04 -0700
Subject: [Nokogiri-talk] How to select child elements only from a
Nokogiri node
In-Reply-To: <6959e1680902031438y549cf86cn45797f5ccd2907a6@mail.gmail.com>
References:
<6959e1680902031325m51aa5d6bq7d35c2c44c979aa4@mail.gmail.com>
<6959e1680902031438y549cf86cn45797f5ccd2907a6@mail.gmail.com>
Message-ID:
OK - thank you for your reply :)
We'll work around it for now.
Tim
On Tue, Feb 3, 2009 at 3:38 PM, Aaron Patterson
wrote:
> On Tue, Feb 3, 2009 at 2:07 PM, Tim Harper wrote:
> > Thanks for your response :)
> > I don't think I explained my question well enough - sorry about that. And
> > there was a problem with my example which showed Hpricot behavior to be
> > opposite of what it does in the real world. Both of the examples you
> > provided as a solution returned the rows from the nested table, but what
> I'm
> > trying to get is the rows from the root table.
> > I forked and updated the gist, adding some comments and fixing an issue
> with
> > the original
> > http://gist.github.com/57787
> > I understand your point about what should be valid in css, and yes,
> > naturally, you wouldn't use a css selector starting with a >. But you
> also
> > wouldn't be evaluating a css selector against a node in the document
> > somewhere (you always start from root).
> > IE:
> > Nokogiri::HTML.parse("table#users > tr")
> > Nokogiri::HTML.parse("table#users") / " > tr"
> > In the latter example, I see the node found by Nokogiri as a direct
> > substitute for "table#users", and should have some way to select child
> > elements from the node without recursing deeper. I hope I'm explaining
> > myself better.
>
> Okay, I think I understand a little better. In order to support this
> syntax:
>
> doc.css("table#users").css(" > tr")
>
> We would have to keep track of the previously used CSS selector. That
> doesn't sound like fun.....
>
> For now, you could do this:
>
> doc.css('table#users").xpath('./tr')
>
> That will select immediate children whose name is "tr".
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From jeff at somethingsimilar.com Tue Feb 3 23:27:16 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Tue, 3 Feb 2009 20:27:16 -0800
Subject: [Nokogiri-talk] [PATCH] no Exceptions and a momma SyntaxError
Message-ID:
I discovered that Nokogiri raises its SyntaxErrors from ruby's own
SyntaxError. This has the unfortunate side effect of causing
SyntaxErrors generated from slightly broken XML, etc. to be turned
into Exceptions that cannot be rescued from normally[1]. I've put up a
branch (well, two) to fix this.
The first branch[2] just fixes the problem of raising Exceptions
instead of StandardErrors by swapping out "< ::SyntaxError" for "<
::StandardError" and rb_eSyntaxError for rb_eStandardError. This is
fine, but one of the benefits of inheriting from ::SyntaxError is that
you can catch all the Nokogiri SyntaxErrors with one "rescue
SyntaxError".
This leads us to the second branch[3]. The second branch has these
SyntaxErrors all inheriting from one error class,
Nokogiri::SyntaxError, giving us that nice little rescue statement
back. This second branch might be overkill. The first branch might be
too little. Both might cause people's hair to catch aflame. So, I left
them separate.
Comments?
--
Jeff
[1] Per usual, this is troublesome for rfeedparser and I found it
while working Nokogiri#error_handler. Okay, not a lot of trouble but,
really, this is a problem for lots of code.
[2] http://github.com/jmhodges/nokogiri/tree/no_exceptions
[3] http://github.com/jmhodges/nokogiri/tree/combined_syntax_error
From jeff at somethingsimilar.com Wed Feb 4 01:42:04 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Tue, 3 Feb 2009 22:42:04 -0800
Subject: [Nokogiri-talk] [PATCH] no Exceptions and a momma SyntaxError
In-Reply-To:
References:
Message-ID:
Minor update: Forgot to update the Manifest.txt. Pull again if you need to.
Nobody cares but me: rfeedparser now works with nokogiri with either
the no_exceptions or combined_syntax_error patches applied. If or when
those patches are accepted and released, a new version of rfp goes
out, too. Yay! No more libxml-ruby causing drama!
--
Jeff
On Tue, Feb 3, 2009 at 8:27 PM, Jeff Hodges wrote:
> I discovered that Nokogiri raises its SyntaxErrors from ruby's own
> SyntaxError. This has the unfortunate side effect of causing
> SyntaxErrors generated from slightly broken XML, etc. to be turned
> into Exceptions that cannot be rescued from normally[1]. I've put up a
> branch (well, two) to fix this.
>
> The first branch[2] just fixes the problem of raising Exceptions
> instead of StandardErrors by swapping out "< ::SyntaxError" for "<
> ::StandardError" and rb_eSyntaxError for rb_eStandardError. This is
> fine, but one of the benefits of inheriting from ::SyntaxError is that
> you can catch all the Nokogiri SyntaxErrors with one "rescue
> SyntaxError".
>
> This leads us to the second branch[3]. The second branch has these
> SyntaxErrors all inheriting from one error class,
> Nokogiri::SyntaxError, giving us that nice little rescue statement
> back. This second branch might be overkill. The first branch might be
> too little. Both might cause people's hair to catch aflame. So, I left
> them separate.
>
> Comments?
> --
> Jeff
>
> [1] Per usual, this is troublesome for rfeedparser and I found it
> while working Nokogiri#error_handler. Okay, not a lot of trouble but,
> really, this is a problem for lots of code.
> [2] http://github.com/jmhodges/nokogiri/tree/no_exceptions
> [3] http://github.com/jmhodges/nokogiri/tree/combined_syntax_error
>
From mike.dalessio at gmail.com Wed Feb 4 11:20:54 2009
From: mike.dalessio at gmail.com (Mike Dalessio)
Date: Wed, 4 Feb 2009 08:20:54 -0800
Subject: [Nokogiri-talk] [PATCH] no Exceptions and a momma SyntaxError
In-Reply-To:
References:
Message-ID: <618c07250902040820s652564cdp72f5b791ba29ac35@mail.gmail.com>
Jeff -
I like! I'll talk to Aaron about which patch he'd prefer. Thanks so much!
-mike
On Tue, Feb 3, 2009 at 10:42 PM, Jeff Hodges wrote:
> Minor update: Forgot to update the Manifest.txt. Pull again if you need to.
>
> Nobody cares but me: rfeedparser now works with nokogiri with either
> the no_exceptions or combined_syntax_error patches applied. If or when
> those patches are accepted and released, a new version of rfp goes
> out, too. Yay! No more libxml-ruby causing drama!
> --
> Jeff
>
> On Tue, Feb 3, 2009 at 8:27 PM, Jeff Hodges
> wrote:
> > I discovered that Nokogiri raises its SyntaxErrors from ruby's own
> > SyntaxError. This has the unfortunate side effect of causing
> > SyntaxErrors generated from slightly broken XML, etc. to be turned
> > into Exceptions that cannot be rescued from normally[1]. I've put up a
> > branch (well, two) to fix this.
> >
> > The first branch[2] just fixes the problem of raising Exceptions
> > instead of StandardErrors by swapping out "< ::SyntaxError" for "<
> > ::StandardError" and rb_eSyntaxError for rb_eStandardError. This is
> > fine, but one of the benefits of inheriting from ::SyntaxError is that
> > you can catch all the Nokogiri SyntaxErrors with one "rescue
> > SyntaxError".
> >
> > This leads us to the second branch[3]. The second branch has these
> > SyntaxErrors all inheriting from one error class,
> > Nokogiri::SyntaxError, giving us that nice little rescue statement
> > back. This second branch might be overkill. The first branch might be
> > too little. Both might cause people's hair to catch aflame. So, I left
> > them separate.
> >
> > Comments?
> > --
> > Jeff
> >
> > [1] Per usual, this is troublesome for rfeedparser and I found it
> > while working Nokogiri#error_handler. Okay, not a lot of trouble but,
> > really, this is a problem for lots of code.
> > [2] http://github.com/jmhodges/nokogiri/tree/no_exceptions
> > [3] http://github.com/jmhodges/nokogiri/tree/combined_syntax_error
> >
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
--
mike dalessio
mike at csa.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From aaron.patterson at gmail.com Wed Feb 4 12:26:21 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Wed, 4 Feb 2009 09:26:21 -0800
Subject: [Nokogiri-talk] [PATCH] no Exceptions and a momma SyntaxError
In-Reply-To: <618c07250902040820s652564cdp72f5b791ba29ac35@mail.gmail.com>
References:
<618c07250902040820s652564cdp72f5b791ba29ac35@mail.gmail.com>
Message-ID: <6959e1680902040926t26186b5dlf6737ed9f0b7dbc1@mail.gmail.com>
On Wed, Feb 4, 2009 at 8:20 AM, Mike Dalessio wrote:
> Jeff -
>
> I like! I'll talk to Aaron about which patch he'd prefer. Thanks so much!
I am lazy, and I like the patches. I'm adding jeff to the collaborators list.
Jeff you may merge it yourself. :-)
--
Aaron Patterson
http://tenderlovemaking.com/
From jeff at somethingsimilar.com Wed Feb 4 12:29:10 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Wed, 4 Feb 2009 09:29:10 -0800
Subject: [Nokogiri-talk] [PATCH] no Exceptions and a momma SyntaxError
In-Reply-To: <6959e1680902040926t26186b5dlf6737ed9f0b7dbc1@mail.gmail.com>
References:
<618c07250902040820s652564cdp72f5b791ba29ac35@mail.gmail.com>
<6959e1680902040926t26186b5dlf6737ed9f0b7dbc1@mail.gmail.com>
Message-ID:
Fantastic.
On Wed, Feb 4, 2009 at 9:26 AM, Aaron Patterson
wrote:
> On Wed, Feb 4, 2009 at 8:20 AM, Mike Dalessio wrote:
>> Jeff -
>>
>> I like! I'll talk to Aaron about which patch he'd prefer. Thanks so much!
>
> I am lazy, and I like the patches. I'm adding jeff to the collaborators list.
>
> Jeff you may merge it yourself. :-)
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
From jeff at somethingsimilar.com Thu Feb 5 16:50:10 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Thu, 5 Feb 2009 13:50:10 -0800
Subject: [Nokogiri-talk] what's left for 1.1.2?
Message-ID:
Hey,
What all is left to do before the 1.1.2 release? I checked out the
lighthouse for nokogiri, and didn't see anything. I'm guessing that
the push parser still needs some love?
--
Jeff
From aaron.patterson at gmail.com Thu Feb 5 18:48:01 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Thu, 5 Feb 2009 15:48:01 -0800
Subject: [Nokogiri-talk] what's left for 1.1.2?
In-Reply-To:
References:
Message-ID: <6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com>
On Thu, Feb 5, 2009 at 1:50 PM, Jeff Hodges wrote:
> Hey,
> What all is left to do before the 1.1.2 release? I checked out the
> lighthouse for nokogiri, and didn't see anything. I'm guessing that
> the push parser still needs some love?
Nope. The push parser is done. The next release is actually going to
be 1.2.0. I need to delete the 1.1.2 milestone.
I'm planning on releasing on the 7th. I wanted to squash a couple
build bugs, but I simply can't reproduce them, and unfortunately I
can't get anyone in person to reproduce them.
So! Look for 1.2.0 this weekend. :-)
--
Aaron Patterson
http://tenderlovemaking.com/
From jeff at somethingsimilar.com Thu Feb 5 21:49:53 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Thu, 5 Feb 2009 18:49:53 -0800
Subject: [Nokogiri-talk] what's left for 1.1.2?
In-Reply-To: <6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com>
References:
<6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com>
Message-ID:
Cool. Christ, that is an ugly bug. Did Jacque ever let you see his
laptop to reproduce it?
--
Jeff
On Thu, Feb 5, 2009 at 3:48 PM, Aaron Patterson
wrote:
> On Thu, Feb 5, 2009 at 1:50 PM, Jeff Hodges wrote:
>> Hey,
>> What all is left to do before the 1.1.2 release? I checked out the
>> lighthouse for nokogiri, and didn't see anything. I'm guessing that
>> the push parser still needs some love?
>
> Nope. The push parser is done. The next release is actually going to
> be 1.2.0. I need to delete the 1.1.2 milestone.
>
> I'm planning on releasing on the 7th. I wanted to squash a couple
> build bugs, but I simply can't reproduce them, and unfortunately I
> can't get anyone in person to reproduce them.
>
> So! Look for 1.2.0 this weekend. :-)
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
From aaron.patterson at gmail.com Thu Feb 5 22:39:55 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Thu, 5 Feb 2009 19:39:55 -0800
Subject: [Nokogiri-talk] what's left for 1.1.2?
In-Reply-To:
References:
<6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com>
Message-ID: <6959e1680902051939u4edcb653l7bf74da11601e69a@mail.gmail.com>
On Thu, Feb 5, 2009 at 6:49 PM, Jeff Hodges wrote:
> Cool. Christ, that is an ugly bug. Did Jacque ever let you see his
> laptop to reproduce it?
No... He hasn't come to nerd club yet.
People have also had this problem (which I cannot reproduce):
http://nokogiri.lighthouseapp.com/projects/19607/tickets/7-mac-native-bundle-not-loading
Right now, I am working on a gem that will help report bugs in gems.
ugh. It would be nice if someone getting these errors could try
solving the issue. It's very hard for me to remote debug them! :-(
--
Aaron Patterson
http://tenderlovemaking.com/
From jeff at somethingsimilar.com Fri Feb 6 01:49:09 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Thu, 5 Feb 2009 22:49:09 -0800
Subject: [Nokogiri-talk] what's left for 1.1.2?
In-Reply-To: <6959e1680902051939u4edcb653l7bf74da11601e69a@mail.gmail.com>
References:
<6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com>
<6959e1680902051939u4edcb653l7bf74da11601e69a@mail.gmail.com>
Message-ID:
I've spent a couple of hours just now trying to get either one of
these to happen. I give. A release this weekend sounds great. I'm
waiting on it to push up the new rfeedparser with nokogiri as the
default strict parser.
If I can decode __xmlRaiseError in libxml2, and get it to play nice
with nokogiri, I'll have nokogiri everywhere in rfp in a couple of
weeks.
P.S. To anyone reading this: if you're writing in C and your function
declaration is 6 lines long and the function itself is 180 goddamn
lines long, you have fucked up.
--
Jeff
On Thu, Feb 5, 2009 at 7:39 PM, Aaron Patterson
wrote:
> On Thu, Feb 5, 2009 at 6:49 PM, Jeff Hodges wrote:
>> Cool. Christ, that is an ugly bug. Did Jacque ever let you see his
>> laptop to reproduce it?
>
> No... He hasn't come to nerd club yet.
>
> People have also had this problem (which I cannot reproduce):
>
> http://nokogiri.lighthouseapp.com/projects/19607/tickets/7-mac-native-bundle-not-loading
>
> Right now, I am working on a gem that will help report bugs in gems.
> ugh. It would be nice if someone getting these errors could try
> solving the issue. It's very hard for me to remote debug them! :-(
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
From aaron.patterson at gmail.com Fri Feb 6 02:11:58 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Thu, 5 Feb 2009 23:11:58 -0800
Subject: [Nokogiri-talk] what's left for 1.1.2?
In-Reply-To:
References:
<6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com>
<6959e1680902051939u4edcb653l7bf74da11601e69a@mail.gmail.com>
Message-ID: <6959e1680902052311o59b9b71eq510ec565c4b576d4@mail.gmail.com>
On Thu, Feb 5, 2009 at 10:49 PM, Jeff Hodges wrote:
> I've spent a couple of hours just now trying to get either one of
> these to happen. I give. A release this weekend sounds great. I'm
> waiting on it to push up the new rfeedparser with nokogiri as the
> default strict parser.
>
> If I can decode __xmlRaiseError in libxml2, and get it to play nice
> with nokogiri, I'll have nokogiri everywhere in rfp in a couple of
> weeks.
Looking at the header file, I don't think you have access to that
function... We don't define IN_LIBXML. I could be wrong though.
I may have a solution. It kind of sucks, but I just want to get it
out there. The current error handler is not thread safe. We /could/
set a mutex, then lock every time we parse a document, then capture
every error from that handler and set those errors on the document
after it's done being parsed.
I'll create a new branch and hack something together tomorrow.
Here is the error handling api btw:
http://xmlsoft.org/html/libxml-xmlerror.html
--
Aaron Patterson
http://tenderlovemaking.com/
From jeff at somethingsimilar.com Fri Feb 6 06:15:30 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Fri, 6 Feb 2009 03:15:30 -0800
Subject: [Nokogiri-talk] what's left for 1.1.2?
In-Reply-To: <6959e1680902052311o59b9b71eq510ec565c4b576d4@mail.gmail.com>
References:
<6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com>
<6959e1680902051939u4edcb653l7bf74da11601e69a@mail.gmail.com>
<6959e1680902052311o59b9b71eq510ec565c4b576d4@mail.gmail.com>
Message-ID:
Yeah, so far, I believe the solution looks something like
void Nokogiri_error_handler(void * ctx, xmlErrorPtr error)
{
xmlErrorPtr ptr = calloc(1, sizeof(xmlError));
xmlCopyError(error, ptr);
if (ptr->ctxt && ((xmlParserCtxtPtr)(ptr->ctxt))->sax) {
// Magic! Instantiate the Parser, snag the document, call document.error
} else {
VALUE err = Data_Wrap_Struct(cNokogiriXmlSyntaxError, NULL, dealloc, ptr);
VALUE block = rb_funcall(mNokogiri, rb_intern("error_handler"), 0);
rb_funcall(block, rb_intern("call"), 1, err);
}
}
However, the "Magic!" is where I'm at a loss. I've tried a few things,
and nothing seems to work. That conditional does seem to work, though.
I'm giving up for the night.
--
Jeff
On Thu, Feb 5, 2009 at 11:11 PM, Aaron Patterson
wrote:
> On Thu, Feb 5, 2009 at 10:49 PM, Jeff Hodges wrote:
>> I've spent a couple of hours just now trying to get either one of
>> these to happen. I give. A release this weekend sounds great. I'm
>> waiting on it to push up the new rfeedparser with nokogiri as the
>> default strict parser.
>>
>> If I can decode __xmlRaiseError in libxml2, and get it to play nice
>> with nokogiri, I'll have nokogiri everywhere in rfp in a couple of
>> weeks.
>
> Looking at the header file, I don't think you have access to that
> function... We don't define IN_LIBXML. I could be wrong though.
>
> I may have a solution. It kind of sucks, but I just want to get it
> out there. The current error handler is not thread safe. We /could/
> set a mutex, then lock every time we parse a document, then capture
> every error from that handler and set those errors on the document
> after it's done being parsed.
>
> I'll create a new branch and hack something together tomorrow.
>
> Here is the error handling api btw:
>
> http://xmlsoft.org/html/libxml-xmlerror.html
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
From aaron.patterson at gmail.com Fri Feb 6 12:57:17 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Fri, 6 Feb 2009 09:57:17 -0800
Subject: [Nokogiri-talk] better error handling
Message-ID: <6959e1680902060957s74fbc654x5094da475f2eef5e@mail.gmail.com>
I've pushed a new branch to github called "errors". I think it has
better error handling for DOM parsing.
Specifically check out this changeset:
http://github.com/tenderlove/nokogiri/commit/5f3453568202bae99f6618efb1d19ce925b79939
Comments?
If it looks good, I'll implement the same kind of deal with HTML
parsing. I also need to mess with the SAX parsing because I really
want to get those error and warning handlers working.
http://github.com/tenderlove/nokogiri/tree/errors
--
Aaron Patterson
http://tenderlovemaking.com/
From jeff at somethingsimilar.com Fri Feb 6 21:36:14 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Fri, 6 Feb 2009 18:36:14 -0800
Subject: [Nokogiri-talk] better error handling
In-Reply-To: <6959e1680902060957s74fbc654x5094da475f2eef5e@mail.gmail.com>
References: <6959e1680902060957s74fbc654x5094da475f2eef5e@mail.gmail.com>
Message-ID:
Looks good to me. I had originally thought you were referring to the
SAX error stuff which is what my last message referred to.
By the way, I've run Nokogiri_wrap_xml_syntax_error sans xmlCopyError,
and dike screams bloody murder and nokogiri segfaults. So, I'm
thinking that the comment about the xmlCopyError call in
Nokogiri_wrap_xml_syntax_error is unnecessary.
If someone else confirms, I'll toss it out.
--
Jeff
On Fri, Feb 6, 2009 at 9:57 AM, Aaron Patterson
wrote:
> I've pushed a new branch to github called "errors". I think it has
> better error handling for DOM parsing.
>
> Specifically check out this changeset:
>
> http://github.com/tenderlove/nokogiri/commit/5f3453568202bae99f6618efb1d19ce925b79939
>
> Comments?
>
> If it looks good, I'll implement the same kind of deal with HTML
> parsing. I also need to mess with the SAX parsing because I really
> want to get those error and warning handlers working.
>
> http://github.com/tenderlove/nokogiri/tree/errors
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
From jeff at somethingsimilar.com Fri Feb 6 21:37:06 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Fri, 6 Feb 2009 18:37:06 -0800
Subject: [Nokogiri-talk] better error handling
In-Reply-To:
References: <6959e1680902060957s74fbc654x5094da475f2eef5e@mail.gmail.com>
Message-ID:
Looking back, dike doesn't even get a chance. Just *boom* it all goes to hell.
--
Jeff
On Fri, Feb 6, 2009 at 6:36 PM, Jeff Hodges wrote:
> Looks good to me. I had originally thought you were referring to the
> SAX error stuff which is what my last message referred to.
>
> By the way, I've run Nokogiri_wrap_xml_syntax_error sans xmlCopyError,
> and dike screams bloody murder and nokogiri segfaults. So, I'm
> thinking that the comment about the xmlCopyError call in
> Nokogiri_wrap_xml_syntax_error is unnecessary.
>
> If someone else confirms, I'll toss it out.
> --
> Jeff
>
> On Fri, Feb 6, 2009 at 9:57 AM, Aaron Patterson
> wrote:
>> I've pushed a new branch to github called "errors". I think it has
>> better error handling for DOM parsing.
>>
>> Specifically check out this changeset:
>>
>> http://github.com/tenderlove/nokogiri/commit/5f3453568202bae99f6618efb1d19ce925b79939
>>
>> Comments?
>>
>> If it looks good, I'll implement the same kind of deal with HTML
>> parsing. I also need to mess with the SAX parsing because I really
>> want to get those error and warning handlers working.
>>
>> http://github.com/tenderlove/nokogiri/tree/errors
>>
>> --
>> Aaron Patterson
>> http://tenderlovemaking.com/
>> _______________________________________________
>> Nokogiri-talk mailing list
>> Nokogiri-talk at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>>
>
From lianliming at gmail.com Sat Feb 7 09:30:14 2009
From: lianliming at gmail.com (Lian Liming)
Date: Sat, 7 Feb 2009 22:30:14 +0800
Subject: [Nokogiri-talk] Docs for nokogiri?
Message-ID: <2ab0f52d0902070630o4f0833ffqf965ccfeffcb06f3@mail.gmail.com>
Hi all,
I am new to nokogiri, and would like to use nokogiri as xml parser. I
am wondering where I can find documentation about nokogiri. So far, I
have read the wiki pages on github, rdoc, and test cases in the source
codes, but still not sure how to use this tool in the most proper
ways. Maybe some tutorial or user guides are more easier for new users
to start with.
Any suggestions are appreciated! And thanks in advance!
From aaron.patterson at gmail.com Sat Feb 7 22:33:35 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Sat, 7 Feb 2009 19:33:35 -0800
Subject: [Nokogiri-talk] error handling in 1.2.0
Message-ID: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com>
I think I've got the error handling in a place where I'm happy.
Please take a look at the errors branch, and tell me what you think:
http://github.com/tenderlove/nokogiri/tree/errors
The "error_handler" lambda is *gone*. It was not thread safe, and
IMHO not very useful. When doing DOM parses, even if you ran in to
errors there isn't anything you could do about it. So being notified
of the parse errors *after* parsing seems acceptable to me. All
document objects will now have a list of errors encountered while
parsing the document.
For example:
doc = Nokogiri::XML('')
puts doc.errors.map { |error| error.to_s }.join("\n")
That being said, if you want *strict* parsing, you'll get an exception raised:
begin
doc = Nokogiri::XML('', nil, nil, 0)
rescue Nokogiri::XML::SyntaxError => ex
puts ex
end
Removing the error_handler lambda has also made the error callbacks on
SAX parsers work.
If everyone is cool with this, I'm going to merge it to master and it
will be in the next release. I will take silence as a sign of
approval. ;-)
The next thing I want to tackle is configuring the parser. I hate
that you have to look up constants and pass numbers as flags to the
parser. I would like to do something like this:
doc = Nokogiri::XML('') do |config|
config.encoding = 'UTF-8'
config.recover_errors
config.no_warnings
end
Comments?
--
Aaron Patterson
http://tenderlovemaking.com/
From aaron.patterson at gmail.com Sat Feb 7 23:18:32 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Sat, 7 Feb 2009 20:18:32 -0800
Subject: [Nokogiri-talk] Docs for nokogiri?
In-Reply-To: <2ab0f52d0902070630o4f0833ffqf965ccfeffcb06f3@mail.gmail.com>
References: <2ab0f52d0902070630o4f0833ffqf965ccfeffcb06f3@mail.gmail.com>
Message-ID: <6959e1680902072018t6aa94ab6sa10da3edb638e0c5@mail.gmail.com>
On Sat, Feb 7, 2009 at 6:30 AM, Lian Liming wrote:
> Hi all,
>
> I am new to nokogiri, and would like to use nokogiri as xml parser. I
> am wondering where I can find documentation about nokogiri. So far, I
> have read the wiki pages on github, rdoc, and test cases in the source
> codes, but still not sure how to use this tool in the most proper
> ways. Maybe some tutorial or user guides are more easier for new users
> to start with.
What kind of information are you looking for? I would hope that the
wiki, rdoc, and test cases would get you going. What kind of
information are you missing? That might help me document it better.
:-)
--
Aaron Patterson
http://tenderlovemaking.com/
From jeff at somethingsimilar.com Sun Feb 8 00:13:56 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Sat, 7 Feb 2009 21:13:56 -0800
Subject: [Nokogiri-talk] error handling in 1.2.0
In-Reply-To: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com>
References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com>
Message-ID:
Looks good to me.
Nice, I thought you'd have to put a conditional in the the structured
error function, but you managed to just move setting it (and then
unsetting it immediately) to where it was actually needed.
One question though, I'm seeing this in a few places:
if (doc == NULL) {
xmlFreeDoc(doc)
....
}
I'm going to guess this is used solely to get an error from libxml2 if
DEBUG_TREE is flipped. Otherwise, it's a noop, right? At least, that's
what I gathered from the libxml code I have on hand. Just making sure
I'm not missing Something Clever.
--
Jeff
On Sat, Feb 7, 2009 at 7:33 PM, Aaron Patterson
wrote:
> I think I've got the error handling in a place where I'm happy.
> Please take a look at the errors branch, and tell me what you think:
>
> http://github.com/tenderlove/nokogiri/tree/errors
>
> The "error_handler" lambda is *gone*. It was not thread safe, and
> IMHO not very useful. When doing DOM parses, even if you ran in to
> errors there isn't anything you could do about it. So being notified
> of the parse errors *after* parsing seems acceptable to me. All
> document objects will now have a list of errors encountered while
> parsing the document.
>
> For example:
>
> doc = Nokogiri::XML('')
> puts doc.errors.map { |error| error.to_s }.join("\n")
>
> That being said, if you want *strict* parsing, you'll get an exception raised:
>
> begin
> doc = Nokogiri::XML('', nil, nil, 0)
> rescue Nokogiri::XML::SyntaxError => ex
> puts ex
> end
>
> Removing the error_handler lambda has also made the error callbacks on
> SAX parsers work.
>
> If everyone is cool with this, I'm going to merge it to master and it
> will be in the next release. I will take silence as a sign of
> approval. ;-)
>
> The next thing I want to tackle is configuring the parser. I hate
> that you have to look up constants and pass numbers as flags to the
> parser. I would like to do something like this:
>
> doc = Nokogiri::XML('') do |config|
> config.encoding = 'UTF-8'
> config.recover_errors
> config.no_warnings
> end
>
> Comments?
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
From jeff at somethingsimilar.com Sun Feb 8 01:23:05 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Sat, 7 Feb 2009 22:23:05 -0800
Subject: [Nokogiri-talk] error handling in 1.2.0
In-Reply-To:
References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com>
Message-ID:
Oh, and I just noticed that it's still returning just a string to
SAX::Parser#error. Why wouldn't we want to return an actual error
object?
--
Jeff
On Sat, Feb 7, 2009 at 9:13 PM, Jeff Hodges wrote:
> Looks good to me.
>
> Nice, I thought you'd have to put a conditional in the the structured
> error function, but you managed to just move setting it (and then
> unsetting it immediately) to where it was actually needed.
>
> One question though, I'm seeing this in a few places:
>
> if (doc == NULL) {
> xmlFreeDoc(doc)
> ....
> }
>
> I'm going to guess this is used solely to get an error from libxml2 if
> DEBUG_TREE is flipped. Otherwise, it's a noop, right? At least, that's
> what I gathered from the libxml code I have on hand. Just making sure
> I'm not missing Something Clever.
> --
> Jeff
>
> On Sat, Feb 7, 2009 at 7:33 PM, Aaron Patterson
> wrote:
>> I think I've got the error handling in a place where I'm happy.
>> Please take a look at the errors branch, and tell me what you think:
>>
>> http://github.com/tenderlove/nokogiri/tree/errors
>>
>> The "error_handler" lambda is *gone*. It was not thread safe, and
>> IMHO not very useful. When doing DOM parses, even if you ran in to
>> errors there isn't anything you could do about it. So being notified
>> of the parse errors *after* parsing seems acceptable to me. All
>> document objects will now have a list of errors encountered while
>> parsing the document.
>>
>> For example:
>>
>> doc = Nokogiri::XML('')
>> puts doc.errors.map { |error| error.to_s }.join("\n")
>>
>> That being said, if you want *strict* parsing, you'll get an exception raised:
>>
>> begin
>> doc = Nokogiri::XML('', nil, nil, 0)
>> rescue Nokogiri::XML::SyntaxError => ex
>> puts ex
>> end
>>
>> Removing the error_handler lambda has also made the error callbacks on
>> SAX parsers work.
>>
>> If everyone is cool with this, I'm going to merge it to master and it
>> will be in the next release. I will take silence as a sign of
>> approval. ;-)
>>
>> The next thing I want to tackle is configuring the parser. I hate
>> that you have to look up constants and pass numbers as flags to the
>> parser. I would like to do something like this:
>>
>> doc = Nokogiri::XML('') do |config|
>> config.encoding = 'UTF-8'
>> config.recover_errors
>> config.no_warnings
>> end
>>
>> Comments?
>>
>> --
>> Aaron Patterson
>> http://tenderlovemaking.com/
>> _______________________________________________
>> Nokogiri-talk mailing list
>> Nokogiri-talk at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>>
>
From aaron.patterson at gmail.com Sun Feb 8 02:41:59 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Sat, 7 Feb 2009 23:41:59 -0800
Subject: [Nokogiri-talk] error handling in 1.2.0
In-Reply-To:
References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com>
Message-ID: <6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com>
On Sat, Feb 7, 2009 at 10:23 PM, Jeff Hodges wrote:
> Oh, and I just noticed that it's still returning just a string to
> SAX::Parser#error. Why wouldn't we want to return an actual error
> object?
I suppose so.... I'll have to play with that. It might not be
possible. The callbacks for SAX parsing only give you a string for
the error callback, they don't actually give you the error object.
I can ask libxml to tell me about the last error it encountered, but
that might not be thread safe, and libxml may not have added the error
to its list until *after* the error callback finishes.
> On Sat, Feb 7, 2009 at 9:13 PM, Jeff Hodges wrote:
>> Looks good to me.
>>
>> Nice, I thought you'd have to put a conditional in the the structured
>> error function, but you managed to just move setting it (and then
>> unsetting it immediately) to where it was actually needed.
>>
>> One question though, I'm seeing this in a few places:
>>
>> if (doc == NULL) {
>> xmlFreeDoc(doc)
>> ....
>> }
>>
>> I'm going to guess this is used solely to get an error from libxml2 if
>> DEBUG_TREE is flipped. Otherwise, it's a noop, right? At least, that's
>> what I gathered from the libxml code I have on hand. Just making sure
>> I'm not missing Something Clever.
Nope, nothing clever. You are correct.
--
Aaron Patterson
http://tenderlovemaking.com/
From jeff at somethingsimilar.com Sun Feb 8 06:53:42 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Sun, 8 Feb 2009 03:53:42 -0800
Subject: [Nokogiri-talk] error handling in 1.2.0
In-Reply-To: <6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com>
References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com>
<6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com>
Message-ID:
I was thinking something along the lines of
static void error_func(void * ctx, const char *msg, ...)
{
...
VALUE mah_error = Nokogiri_syntax_error_from_string(message);
rb_funcall(doc, rb_intern("error"), 1, mah_error);
free(message);
}
Would this be problematic, too?
--
Jeff
On Sat, Feb 7, 2009 at 11:41 PM, Aaron Patterson
wrote:
> On Sat, Feb 7, 2009 at 10:23 PM, Jeff Hodges wrote:
>> Oh, and I just noticed that it's still returning just a string to
>> SAX::Parser#error. Why wouldn't we want to return an actual error
>> object?
>
> I suppose so.... I'll have to play with that. It might not be
> possible. The callbacks for SAX parsing only give you a string for
> the error callback, they don't actually give you the error object.
>
> I can ask libxml to tell me about the last error it encountered, but
> that might not be thread safe, and libxml may not have added the error
> to its list until *after* the error callback finishes.
>
>> On Sat, Feb 7, 2009 at 9:13 PM, Jeff Hodges wrote:
>>> Looks good to me.
>>>
>>> Nice, I thought you'd have to put a conditional in the the structured
>>> error function, but you managed to just move setting it (and then
>>> unsetting it immediately) to where it was actually needed.
>>>
>>> One question though, I'm seeing this in a few places:
>>>
>>> if (doc == NULL) {
>>> xmlFreeDoc(doc)
>>> ....
>>> }
>>>
>>> I'm going to guess this is used solely to get an error from libxml2 if
>>> DEBUG_TREE is flipped. Otherwise, it's a noop, right? At least, that's
>>> what I gathered from the libxml code I have on hand. Just making sure
>>> I'm not missing Something Clever.
>
> Nope, nothing clever. You are correct.
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
From jeff at somethingsimilar.com Sun Feb 8 06:55:05 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Sun, 8 Feb 2009 03:55:05 -0800
Subject: [Nokogiri-talk] error handling in 1.2.0
In-Reply-To:
References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com>
<6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com>
Message-ID:
Obviously, we couldn't use the usual XML::SyntaxError since we lack
the rest of the struct we need. So maybe a XML::SAX::SyntaxError <
::Nokogiri::SyntaxError?
--
Jeff
On Sun, Feb 8, 2009 at 3:53 AM, Jeff Hodges wrote:
> I was thinking something along the lines of
>
> static void error_func(void * ctx, const char *msg, ...)
> {
> ...
> VALUE mah_error = Nokogiri_syntax_error_from_string(message);
> rb_funcall(doc, rb_intern("error"), 1, mah_error);
> free(message);
> }
>
> Would this be problematic, too?
> --
> Jeff
From jeff at somethingsimilar.com Sun Feb 8 06:56:30 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Sun, 8 Feb 2009 03:56:30 -0800
Subject: [Nokogiri-talk] error handling in 1.2.0
In-Reply-To:
References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com>
<6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com>
Message-ID:
Oh, fuck, right. We have that structure on Nokogiri::SyntaxError. I
should get some sleep. Hrm.
--
Jeff
On Sun, Feb 8, 2009 at 3:55 AM, Jeff Hodges wrote:
> Obviously, we couldn't use the usual XML::SyntaxError since we lack
> the rest of the struct we need. So maybe a XML::SAX::SyntaxError <
> ::Nokogiri::SyntaxError?
> --
> Jeff
>
> On Sun, Feb 8, 2009 at 3:53 AM, Jeff Hodges wrote:
>> I was thinking something along the lines of
>>
>> static void error_func(void * ctx, const char *msg, ...)
>> {
>> ...
>> VALUE mah_error = Nokogiri_syntax_error_from_string(message);
>> rb_funcall(doc, rb_intern("error"), 1, mah_error);
>> free(message);
>> }
>>
>> Would this be problematic, too?
>> --
>> Jeff
>
From lianliming at gmail.com Sun Feb 8 09:52:21 2009
From: lianliming at gmail.com (Lian Liming)
Date: Sun, 8 Feb 2009 22:52:21 +0800
Subject: [Nokogiri-talk] Docs for nokogiri?
In-Reply-To: <6959e1680902072018t6aa94ab6sa10da3edb638e0c5@mail.gmail.com>
References: <2ab0f52d0902070630o4f0833ffqf965ccfeffcb06f3@mail.gmail.com>
<6959e1680902072018t6aa94ab6sa10da3edb638e0c5@mail.gmail.com>
Message-ID: <2ab0f52d0902080652rae2c1c2l341924fed2ff5d57@mail.gmail.com>
>
> What kind of information are you looking for? I would hope that the
> wiki, rdoc, and test cases would get you going. What kind of
> information are you missing? That might help me document it better.
> :-)
>
It will be better if nokogiri can also have similar docs like
:http://wiki.github.com/why/hpricot/an-hpricot-showcase, which is more
easier for new users to learn basic and advanced usages of nokogiri.
From jeff at somethingsimilar.com Sun Feb 8 16:39:06 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Sun, 8 Feb 2009 13:39:06 -0800
Subject: [Nokogiri-talk] error handling in 1.2.0
In-Reply-To:
References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com>
<6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com>
Message-ID:
Hah, I was wrong twice. Nokogiri::SyntaxError doesn't have the
structure, Nokogiri::XML::SyntaxError does. I whipped up a 10 minute
proof of concept that passes error objects to SAX::Parser#error with
the side effect of having a new Nokogiri::XML::SAX::SyntaxError. The
only problem is that SAX::SyntaxError does not have ancestry in common
with XML::SyntaxError. A few options present themselves.
--
Jeff
On Sun, Feb 8, 2009 at 3:56 AM, Jeff Hodges wrote:
> Oh, fuck, right. We have that structure on Nokogiri::SyntaxError. I
> should get some sleep. Hrm.
> --
> Jeff
From jeff at somethingsimilar.com Sun Feb 8 16:41:15 2009
From: jeff at somethingsimilar.com (Jeff Hodges)
Date: Sun, 8 Feb 2009 13:41:15 -0800
Subject: [Nokogiri-talk] error handling in 1.2.0
In-Reply-To:
References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com>
<6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com>
Message-ID:
The link to the branch:
Spamming everyone always.
--
Jeff
On Sun, Feb 8, 2009 at 1:39 PM, Jeff Hodges wrote:
> Hah, I was wrong twice. Nokogiri::SyntaxError doesn't have the
> structure, Nokogiri::XML::SyntaxError does. I whipped up a 10 minute
> proof of concept that passes error objects to SAX::Parser#error with
> the side effect of having a new Nokogiri::XML::SAX::SyntaxError. The
> only problem is that SAX::SyntaxError does not have ancestry in common
> with XML::SyntaxError. A few options present themselves.
> --
> Jeff
>
> On Sun, Feb 8, 2009 at 3:56 AM, Jeff Hodges wrote:
>> Oh, fuck, right. We have that structure on Nokogiri::SyntaxError. I
>> should get some sleep. Hrm.
>> --
>> Jeff
>
From aaron.patterson at gmail.com Sun Feb 8 19:10:22 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Sun, 8 Feb 2009 16:10:22 -0800
Subject: [Nokogiri-talk] error handling in 1.2.0
In-Reply-To:
References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com>
<6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com>
Message-ID: <6959e1680902081610x430e68a5mf524878311b30a08@mail.gmail.com>
On Sun, Feb 8, 2009 at 1:39 PM, Jeff Hodges wrote:
> Hah, I was wrong twice. Nokogiri::SyntaxError doesn't have the
> structure, Nokogiri::XML::SyntaxError does. I whipped up a 10 minute
> proof of concept that passes error objects to SAX::Parser#error with
> the side effect of having a new Nokogiri::XML::SAX::SyntaxError. The
> only problem is that SAX::SyntaxError does not have ancestry in common
> with XML::SyntaxError. A few options present themselves.
Hmmm... I'm not sure what this buys us. Since we can't get an
xmlErrorPtr in the SAX parser, we're essentially just using an
exception object to pass a string. Why not just pass the string? The
person implementing the SAX document knows that is an error case and
they can act accordingly.
--
Aaron Patterson
http://tenderlovemaking.com/
From aaron.patterson at gmail.com Tue Feb 10 12:03:47 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Tue, 10 Feb 2009 09:03:47 -0800
Subject: [Nokogiri-talk] just one more feature....
Message-ID: <6959e1680902100903k495ba810n2326a23b9eb05e8e@mail.gmail.com>
I need to get some opinions on this one.....
I've added namespace support to CSS selectors. Now you can do
something like this:
doc.css('xmlns|link')
and that gets converted to this xpath:
//xmlns:link
I'm thinking about automatically registering the default namespace on
the root node when doing CSS selector searches. If you check out
section 4 of the CSS3 namespace spec, hopefully that will make sense:
http://www.w3.org/TR/css3-namespace/
This way, given the following document:
These CSS to XPath conversions will be made:
doc.css('foo') => //xmlns:foo
doc.css('|foo') => //foo
doc.css('xmlns|foo') => //xmlns:foo
I think it would make CSS queries in XML documents less surprising and
more useful.
Just a couple open points:
1. I'm not quite sure how to support the '*|foo' syntax.
2. Should I automatically register *all* namespaces on the root, or
just the default one?
If I get a couple +1's on this, I'll add it for 1.2.0 (the next release).
--
Aaron Patterson
http://tenderlovemaking.com/
From julien.genestoux at gmail.com Thu Feb 19 02:22:13 2009
From: julien.genestoux at gmail.com (Julien Genestoux)
Date: Wed, 18 Feb 2009 23:22:13 -0800
Subject: [Nokogiri-talk] Nokogiri and namespaces
Message-ID: <26c0cf900902182322x5aefca21hd2721599bbf9a6cf@mail.gmail.com>
Hello!
I am working on Babylon (http://github.com/julien51/babylon/tree/master) and
we're using an extensive use Nokogiri's XML SAX Push parser! Thank you for
this!
To "dispatch" the xml stanzas to the right component, we're using XPATH
matching. We used REXML for this and decided to switch all over to Nokogiri
(maibnly to decrease the number of dependencies).
Unfortunately it seems that it deosn't work, since namespaces are set up
with the following code :
class XmppParser < Nokogiri::XML::SAX::Document
def initialize(&callback)
@callback = callback
super()
@parser = Nokogiri::XML::SAX::Parser.new(self)
@doc = nil
@elem = nil
end
def parse(data)
@parser.parse data
end
def start_document
@doc = Nokogiri::XML::Document.new
end
def characters(string)
@elem.add(Nokogiri::XML::Text.new(string, @doc)) if @elem
end
alias :characters :cdata_block
def start_element(qname, attributes = [])
e = Nokogiri::XML::Element.new(qname, @doc)
# Attributes is an array like [name, value, name, value]...
(attributes.size / 2).times do |i|
name, value = attributes[2 * i], attributes[2 * i + 1]
e.set_attribute name, value
end
@elem = @elem ? @elem.add_child(e) : (@root = e)
if @elem.parent.nil?
@callback.call(@elem)
end
end
def end_element(name)
if @elem
puts @elem.inspect
puts @elem.namespaces.inspect
@callback.call(@elem) if @elem.parent == @root
@elem = @elem.parent
# now remove from parent again to avoid space leak:
# TODO
end
end
end
When the parser receives :
<< adzadz
It outputs :
{}
{}
{}
{}
Which seems to mean that namespaces are not added (specially for , for
example). I have looked into the documentation to find out how to explicitly
specify namespaces, but haven't found anything... Can you guys help?
Thanks a lot,
Julien
--
Julien Genestoux
http://www.ouvre-boite.com
http://blog.notifixio.us
+1 (415) 254 7340
+33 (0)9 70 44 76 29
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From aaron.patterson at gmail.com Thu Feb 19 11:40:33 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Thu, 19 Feb 2009 08:40:33 -0800
Subject: [Nokogiri-talk] Nokogiri and namespaces
In-Reply-To: <26c0cf900902182322x5aefca21hd2721599bbf9a6cf@mail.gmail.com>
References: <26c0cf900902182322x5aefca21hd2721599bbf9a6cf@mail.gmail.com>
Message-ID: <6959e1680902190840o541f6493oa8050ecf73ee0e2c@mail.gmail.com>
On Wed, Feb 18, 2009 at 11:22 PM, Julien Genestoux
wrote:
> Hello!
>
> I am working on Babylon (http://github.com/julien51/babylon/tree/master) and
> we're using an extensive use Nokogiri's XML SAX Push parser! Thank you for
> this!
>
> To "dispatch" the xml stanzas to the right component, we're using XPATH
> matching. We used REXML for this and decided to switch all over to Nokogiri
> (maibnly to decrease the number of dependencies).
> Unfortunately it seems that it deosn't work, since namespaces are set up
> with the following code :
>
> class XmppParser < Nokogiri::XML::SAX::Document
> def initialize(&callback)
> @callback = callback
> super()
> @parser = Nokogiri::XML::SAX::Parser.
> new(self)
> @doc = nil
> @elem = nil
> end
>
> def parse(data)
> @parser.parse data
> end
>
> def start_document
> @doc = Nokogiri::XML::Document.new
> end
>
> def characters(string)
> @elem.add(Nokogiri::XML::Text.new(string, @doc)) if @elem
> end
> alias :characters :cdata_block
>
> def start_element(qname, attributes = [])
> e = Nokogiri::XML::Element.new(qname, @doc)
> # Attributes is an array like [name, value, name, value]...
> (attributes.size / 2).times do |i|
> name, value = attributes[2 * i], attributes[2 * i + 1]
> e.set_attribute name, value
> end
>
> @elem = @elem ? @elem.add_child(e) : (@root = e)
> if @elem.parent.nil?
> @callback.call(@elem)
> end
> end
>
> def end_element(name)
> if @elem
>
> puts @elem.inspect
> puts @elem.namespaces.inspect
>
> @callback.call(@elem) if @elem.parent == @root
> @elem = @elem.parent
> # now remove from parent again to avoid space leak:
> # TODO
> end
> end
> end
>
>
> When the parser receives :
> << to='pubsubapi-dev.xmpp.notifixio.us' type='chat'
> id='purple6aff0038'>adz xmlns='http://jabber.org/protocol/xhtml-im'> xmlns='http://www.w3.org/1999/xhtml'>adz
>
> It outputs :
>
> {}
>
> {}
>
>
>
> {}
> to="pubsubapi-dev.xmpp.notifixio.us" type="chat" id="purple6aff003d">
>
>
>
>
>
> {}
>
>
> Which seems to mean that namespaces are not added (specially for , for
> example). I have looked into the documentation to find out how to explicitly
> specify namespaces, but haven't found anything... Can you guys help?
You can't with the currently released version, but it seems easy
enough to add, and something we need.
I will add it for 1.2.0. I *hope* to have that released this weekend.
I'll post a follow up once I get it implemented.
--
Aaron Patterson
http://tenderlovemaking.com/
From julien.genestoux at gmail.com Thu Feb 19 12:20:18 2009
From: julien.genestoux at gmail.com (Julien Genestoux)
Date: Thu, 19 Feb 2009 09:20:18 -0800
Subject: [Nokogiri-talk] Nokogiri and namespaces
In-Reply-To: <6959e1680902190840o541f6493oa8050ecf73ee0e2c@mail.gmail.com>
References: <26c0cf900902182322x5aefca21hd2721599bbf9a6cf@mail.gmail.com>
<6959e1680902190840o541f6493oa8050ecf73ee0e2c@mail.gmail.com>
Message-ID: <26c0cf900902190920s4bf74280hf8ab6ed2e621ddb9@mail.gmail.com>
Aaron,
Thanks for this! Looking forward to see it during the weekend ;)
Please let me know if I can be of any help!
Thanks again!
Julien
--
Julien Genestoux
http://www.ouvre-boite.com
http://blog.notifixio.us
+1 (415) 254 7340
+33 (0)9 70 44 76 29
On Thu, Feb 19, 2009 at 8:40 AM, Aaron Patterson
wrote:
> On Wed, Feb 18, 2009 at 11:22 PM, Julien Genestoux
> wrote:
> > Hello!
> >
> > I am working on Babylon (http://github.com/julien51/babylon/tree/master)
> and
> > we're using an extensive use Nokogiri's XML SAX Push parser! Thank you
> for
> > this!
> >
> > To "dispatch" the xml stanzas to the right component, we're using XPATH
> > matching. We used REXML for this and decided to switch all over to
> Nokogiri
> > (maibnly to decrease the number of dependencies).
> > Unfortunately it seems that it deosn't work, since namespaces are set up
> > with the following code :
> >
> > class XmppParser < Nokogiri::XML::SAX::Document
> > def initialize(&callback)
> > @callback = callback
> > super()
> > @parser = Nokogiri::XML::SAX::Parser.
> > new(self)
> > @doc = nil
> > @elem = nil
> > end
> >
> > def parse(data)
> > @parser.parse data
> > end
> >
> > def start_document
> > @doc = Nokogiri::XML::Document.new
> > end
> >
> > def characters(string)
> > @elem.add(Nokogiri::XML::Text.new(string, @doc)) if @elem
> > end
> > alias :characters :cdata_block
> >
> > def start_element(qname, attributes = [])
> > e = Nokogiri::XML::Element.new(qname, @doc)
> > # Attributes is an array like [name, value, name, value]...
> > (attributes.size / 2).times do |i|
> > name, value = attributes[2 * i], attributes[2 * i + 1]
> > e.set_attribute name, value
> > end
> >
> > @elem = @elem ? @elem.add_child(e) : (@root = e)
> > if @elem.parent.nil?
> > @callback.call(@elem)
> > end
> > end
> >
> > def end_element(name)
> > if @elem
> >
> > puts @elem.inspect
> > puts @elem.namespaces.inspect
> >
> > @callback.call(@elem) if @elem.parent == @root
> > @elem = @elem.parent
> > # now remove from parent again to avoid space leak:
> > # TODO
> > end
> > end
> > end
> >
> >
> > When the parser receives :
> > << > to='pubsubapi-dev.xmpp.notifixio.us' type='chat'
> > id='purple6aff0038'>adz > xmlns='http://jabber.org/protocol/xhtml-im'> > xmlns='http://www.w3.org/1999/xhtml'>adz
> >
> > It outputs :
> >
> > {}
> >
> > {}
> >
> >
> >
> > {}
> > > to="pubsubapi-dev.xmpp.notifixio.us" type="chat" id="purple6aff003d">
> >
> >
> >
> >
> >
> > {}
> >
> >
> > Which seems to mean that namespaces are not added (specially for ,
> for
> > example). I have looked into the documentation to find out how to
> explicitly
> > specify namespaces, but haven't found anything... Can you guys help?
>
> You can't with the currently released version, but it seems easy
> enough to add, and something we need.
> I will add it for 1.2.0. I *hope* to have that released this weekend.
>
> I'll post a follow up once I get it implemented.
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From julien.genestoux at gmail.com Mon Feb 23 14:41:34 2009
From: julien.genestoux at gmail.com (Julien Genestoux)
Date: Mon, 23 Feb 2009 11:41:34 -0800
Subject: [Nokogiri-talk] Nokogiri and namespaces
In-Reply-To: <26c0cf900902190920s4bf74280hf8ab6ed2e621ddb9@mail.gmail.com>
References: <26c0cf900902182322x5aefca21hd2721599bbf9a6cf@mail.gmail.com>
<6959e1680902190840o541f6493oa8050ecf73ee0e2c@mail.gmail.com>
<26c0cf900902190920s4bf74280hf8ab6ed2e621ddb9@mail.gmail.com>
Message-ID: <26c0cf900902231141s7105bf0dia69c1528afe2eb80@mail.gmail.com>
Thanks a lot Namespace support!
http://github.com/tenderlove/nokogiri/commit/e969fbe3d2cd273c7988968e0b2555b5e18f8f16
;)
--
Julien Genestoux
http://www.ouvre-boite.com
http://blog.notifixio.us
+1 (415) 254 7340
+33 (0)9 70 44 76 29
On Thu, Feb 19, 2009 at 9:20 AM, Julien Genestoux <
julien.genestoux at gmail.com> wrote:
> Aaron,
>
> Thanks for this! Looking forward to see it during the weekend ;)
>
> Please let me know if I can be of any help!
>
> Thanks again!
>
> Julien
>
>
> --
> Julien Genestoux
> http://www.ouvre-boite.com
> http://blog.notifixio.us
>
> +1 (415) 254 7340
> +33 (0)9 70 44 76 29
>
>
> On Thu, Feb 19, 2009 at 8:40 AM, Aaron Patterson <
> aaron.patterson at gmail.com> wrote:
>
>> On Wed, Feb 18, 2009 at 11:22 PM, Julien Genestoux
>> wrote:
>> > Hello!
>> >
>> > I am working on Babylon (http://github.com/julien51/babylon/tree/master)
>> and
>> > we're using an extensive use Nokogiri's XML SAX Push parser! Thank you
>> for
>> > this!
>> >
>> > To "dispatch" the xml stanzas to the right component, we're using XPATH
>> > matching. We used REXML for this and decided to switch all over to
>> Nokogiri
>> > (maibnly to decrease the number of dependencies).
>> > Unfortunately it seems that it deosn't work, since namespaces are set up
>> > with the following code :
>> >
>> > class XmppParser < Nokogiri::XML::SAX::Document
>> > def initialize(&callback)
>> > @callback = callback
>> > super()
>> > @parser = Nokogiri::XML::SAX::Parser.
>> > new(self)
>> > @doc = nil
>> > @elem = nil
>> > end
>> >
>> > def parse(data)
>> > @parser.parse data
>> > end
>> >
>> > def start_document
>> > @doc = Nokogiri::XML::Document.new
>> > end
>> >
>> > def characters(string)
>> > @elem.add(Nokogiri::XML::Text.new(string, @doc)) if @elem
>> > end
>> > alias :characters :cdata_block
>> >
>> > def start_element(qname, attributes = [])
>> > e = Nokogiri::XML::Element.new(qname, @doc)
>> > # Attributes is an array like [name, value, name, value]...
>> > (attributes.size / 2).times do |i|
>> > name, value = attributes[2 * i], attributes[2 * i + 1]
>> > e.set_attribute name, value
>> > end
>> >
>> > @elem = @elem ? @elem.add_child(e) : (@root = e)
>> > if @elem.parent.nil?
>> > @callback.call(@elem)
>> > end
>> > end
>> >
>> > def end_element(name)
>> > if @elem
>> >
>> > puts @elem.inspect
>> > puts @elem.namespaces.inspect
>> >
>> > @callback.call(@elem) if @elem.parent == @root
>> > @elem = @elem.parent
>> > # now remove from parent again to avoid space leak:
>> > # TODO
>> > end
>> > end
>> > end
>> >
>> >
>> > When the parser receives :
>> > << > > to='pubsubapi-dev.xmpp.notifixio.us' type='chat'
>> > id='purple6aff0038'>adz> > xmlns='http://jabber.org/protocol/xhtml-im'>> > xmlns='http://www.w3.org/1999/xhtml'>adz
>> >
>> > It outputs :
>> >
>> > {}
>> >
>> > {}
>> >
>> >
>> >
>> > {}
>> > > > to="pubsubapi-dev.xmpp.notifixio.us" type="chat" id="purple6aff003d">
>> >
>> >
>> >
>> >
>> >
>> > {}
>> >
>> >
>> > Which seems to mean that namespaces are not added (specially for ,
>> for
>> > example). I have looked into the documentation to find out how to
>> explicitly
>> > specify namespaces, but haven't found anything... Can you guys help?
>>
>> You can't with the currently released version, but it seems easy
>> enough to add, and something we need.
>> I will add it for 1.2.0. I *hope* to have that released this weekend.
>>
>> I'll post a follow up once I get it implemented.
>>
>> --
>> Aaron Patterson
>> http://tenderlovemaking.com/
>> _______________________________________________
>> Nokogiri-talk mailing list
>> Nokogiri-talk at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From aaron at tenderlovemaking.com Mon Feb 23 22:37:27 2009
From: aaron at tenderlovemaking.com (Aaron Patterson)
Date: Mon, 23 Feb 2009 19:37:27 -0800
Subject: [Nokogiri-talk] [ANN] nokogiri 1.2.1 Released
Message-ID: <20090224033727.GA16639@Jordan.local>
nokogiri version 1.2.1 has been released!
*
*
*
*
*
Nokogiri (?) is an HTML, XML, SAX, and Reader parser.
Changes:
### 1.2.1 / 2008-02-23
* Bugfixes
* Fixed a CSS selector space bug
* Fixed Ruby 1.9 String Encoding (Thanks ?????)
## FEATURES:
* XPath support for document searching
* CSS3 selector support for document searching
* XML/HTML builder
* Drop in replacement for Hpricot (though not bug for bug)
Nokogiri parses and searches XML/HTML very quickly, and also has
correctly implemented CSS3 selector support as well as XPath support.
Here is a speed test:
* http://gist.github.com/24605
Nokogiri also features an Hpricot compatibility layer to help ease the change
to using correct CSS and XPath.
## SUPPORT:
The Nokogiri mailing list is available here:
* http://rubyforge.org/mailman/listinfo/nokogiri-talk
The bug tracker is available here:
* http://nokogiri.lighthouseapp.com/projects/19607-nokogiri/overview
## SYNOPSIS:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))
####
# Search for nodes by css
doc.css('h3.r a.l').each do |link|
puts link.content
end
####
# Search for nodes by xpath
doc.xpath('//h3/a[@class="l"]').each do |link|
puts link.content
end
####
# Or mix and match.
doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
puts link.content
end
## REQUIREMENTS:
* ruby 1.8 or 1.9
* libxml
* libxslt
## INSTALL:
* sudo gem install nokogiri
*
*
*
*
*
--
Aaron Patterson
http://tenderlovemaking.com/
From adam.vandenhoven at gmail.com Tue Feb 24 19:55:35 2009
From: adam.vandenhoven at gmail.com (Adam van den Hoven)
Date: Tue, 24 Feb 2009 16:55:35 -0800
Subject: [Nokogiri-talk] XPath expressions other than location paths
Message-ID: <1235523335.14023.40.camel@vandenhoven>
Last month, Aaron Patterson, Andrew Watts-Curnow and a few others
discussed this briefly. It can be summarized in this exchange:
>> Does libxml support expressions other than location paths?
>> Would this make sense as an enhancement to nokogiri?
>I think it is possible, but I have a hard time justifying to myself
>why you would need it.
Today I want to offer a reason why non-location paths are important,
even critical, to any XPath implementation. Its in the filters, or
"where" clauses, if you prefer.
Here's an example.
So I might want the following:
//store[count(book) gt 7]/book
//store[name[contains(text(), 'Jim')]]/book
That is, give me all the books that are immediate children of stores
that have at least 7 books and give me all the books that are immediate
children of stores whose name child contains "Jim".
They're a little contrived, but you get the point.
Anticipating your next objection, we should NOT rely on ruby code to
handle this. That is you might do something like:
books = []
doc.xpath('//store').each do |store|
books << store.xpath('./book') if ( store.xpath('./book').length >
7 )
end
or some variation (I've never tried it so it might be syntactically
wrong but you get the idea).
There are several differences between the two.
First, is complexity. The ruby code is much longer. Its also a lot more
challenging to understand what's going on. As the XPath gets more
complex, the ruby will grow less understandable. Further, converting
from one to the other is hard to do in the non-trivial case unless you
have a strong understanding of set theory. For example
/foo[bar != 'kronk']
is very different from
/foo[not(bar = 'kronk')]
in the first case, you get all the foos where one bar is not kronk and
in the second case, you get all the foos where no bar is kronk. In the
non-trivial case, this logic can be very easy to express in an XPath but
very hard to get right in code.
Second is usage. In order to be considered for a position with a local
software company, I've been asked to write a little cgi script that will
take author and/or title or ISBN, scrape some number of sites for the
pricing information and present a rather banal table comparing the
results.
In general, its better to write configuration files than it is code. In
non-trivial development environments, deploying code is a bigger deal
than deploying content, so keeping something that is likely to change
frequently (compared to your build cycle) as content is a superior
approach than putting it in compiled code. If this was a real
application, then I would expect the sites I'm scraping to change their
structure with some frequency; it would probably not fit within my build
cycle and the system would be broken for some weeks. But if its just a
configuration file, those (at least in the environments where I've
worked before) are not part of the build process but of the content
publishing process. That happens much more frequently (each time someone
writes new content).
With some careful work, I can write robust XPaths for probably any site
I will need to work with; many times those paths will be UGLY. In the
cases where there is little semantic information to work from, and the
structure changes, so that an element you care about is only
identifiable by the text content of some related node. If all the
non-location paths are supported, I can use them for the filters of my
XPaths and that means that I can actually encode everything as an XPath
which can be saved as a string in my yaml file.
Adam
From aaron.patterson at gmail.com Tue Feb 24 20:16:21 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Tue, 24 Feb 2009 17:16:21 -0800
Subject: [Nokogiri-talk] XPath expressions other than location paths
In-Reply-To: <1235523335.14023.40.camel@vandenhoven>
References: <1235523335.14023.40.camel@vandenhoven>
Message-ID: <6959e1680902241716i161911epbebff81c988ec77a@mail.gmail.com>
On Tue, Feb 24, 2009 at 4:55 PM, Adam van den Hoven
wrote:
> Last month, Aaron Patterson, Andrew Watts-Curnow and a few others
> discussed this briefly. It can be summarized in this exchange:
>
>>> Does libxml support expressions other than location paths?
>>> Would this make sense as an enhancement to nokogiri?
>>I think it is possible, but I have a hard time justifying to myself
>>why you would need it.
Not exactly. XPath functions are fine. I object to the xpath()
method returning anything other than an XML::NodeSet.
> Today I want to offer a reason why non-location paths are important,
> even critical, to any XPath implementation. Its in the filters, or
> "where" clauses, if you prefer.
>
> Here's an example.
>
> So I might want the following:
>
> //store[count(book) gt 7]/book
> //store[name[contains(text(), 'Jim')]]/book
These examples work with XPath and Nokogiri out of the box. An example:
http://gist.github.com/69929
--
Aaron Patterson
http://tenderlovemaking.com/
From adam.vandenhoven at gmail.com Tue Feb 24 22:36:17 2009
From: adam.vandenhoven at gmail.com (Adam van den Hoven)
Date: Tue, 24 Feb 2009 19:36:17 -0800
Subject: [Nokogiri-talk] XPath expressions other than location paths
In-Reply-To: <6959e1680902241716i161911epbebff81c988ec77a@mail.gmail.com>
References: <1235523335.14023.40.camel@vandenhoven>
<6959e1680902241716i161911epbebff81c988ec77a@mail.gmail.com>
Message-ID: <1235532977.6580.7.camel@vandenhoven>
On Tue, 2009-02-24 at 17:16 -0800, Aaron Patterson wrote:
> On Tue, Feb 24, 2009 at 4:55 PM, Adam van den Hoven
> wrote:
> > Last month, Aaron Patterson, Andrew Watts-Curnow and a few others
> > discussed this briefly. It can be summarized in this exchange:
> >
> >>> Does libxml support expressions other than location paths?
> >>> Would this make sense as an enhancement to nokogiri?
> >>I think it is possible, but I have a hard time justifying to myself
> >>why you would need it.
>
> Not exactly. XPath functions are fine. I object to the xpath()
> method returning anything other than an XML::NodeSet.
Oh. I can see that. The only question, then, is you can claim to support
XPath without doing it? And would this lack of full implementation be a
hindrance to acceptance, for example among those who already know XPath?
OK the only TWO questions, then, are ....
> > Today I want to offer a reason why non-location paths are important,
> > even critical, to any XPath implementation. Its in the filters, or
> > "where" clauses, if you prefer.
> >
> > Here's an example.
> >
> > So I might want the following:
> >
> > //store[count(book) gt 7]/book
> > //store[name[contains(text(), 'Jim')]]/book
>
> These examples work with XPath and Nokogiri out of the box. An example:
>
> http://gist.github.com/69929
Hmm. OK. I'd never worked with Nokogiri before (I'd previously worked
with Scrubyt but the latest version wasn't working for me at all). I'd
tried an XPath that was working there and I fixed it to do something
similar who what I suggested and it didn't work the way I expected. But
now it seems to be working so I guess my tests used the wrong path.
Sorry for the confusion and thanks for the clarification.
Adam
From aaron.patterson at gmail.com Wed Feb 25 00:48:31 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Tue, 24 Feb 2009 21:48:31 -0800
Subject: [Nokogiri-talk] XPath expressions other than location paths
In-Reply-To: <1235532977.6580.7.camel@vandenhoven>
References: <1235523335.14023.40.camel@vandenhoven>
<6959e1680902241716i161911epbebff81c988ec77a@mail.gmail.com>
<1235532977.6580.7.camel@vandenhoven>
Message-ID: <6959e1680902242148u238325ffle4416defe709f7ab@mail.gmail.com>
On Tue, Feb 24, 2009 at 7:36 PM, Adam van den Hoven
wrote:
> On Tue, 2009-02-24 at 17:16 -0800, Aaron Patterson wrote:
>> On Tue, Feb 24, 2009 at 4:55 PM, Adam van den Hoven
>> wrote:
>> > Last month, Aaron Patterson, Andrew Watts-Curnow and a few others
>> > discussed this briefly. It can be summarized in this exchange:
>> >
>> >>> Does libxml support expressions other than location paths?
>> >>> Would this make sense as an enhancement to nokogiri?
>> >>I think it is possible, but I have a hard time justifying to myself
>> >>why you would need it.
>>
>> Not exactly. ?XPath functions are fine. ?I object to the xpath()
>> method returning anything other than an XML::NodeSet.
>
> Oh. I can see that. The only question, then, is you can claim to support
> XPath without doing it? And would this lack of full implementation be a
> hindrance to acceptance, for example among those who already know XPath?
> OK the only TWO questions, then, are ....
If someone is unhappy with my code, I will issue a full refund.
>> > Today I want to offer a reason why non-location paths are important,
>> > even critical, to any XPath implementation. Its in the filters, or
>> > "where" clauses, if you prefer.
>> >
>> > Here's an example.
>> >
>> > So I might want the following:
>> >
>> > //store[count(book) gt 7]/book
>> > //store[name[contains(text(), 'Jim')]]/book
>>
>> These examples work with XPath and Nokogiri out of the box. ?An example:
>>
>> ? http://gist.github.com/69929
>
> Hmm. OK. I'd never worked with Nokogiri before (I'd previously worked
> with Scrubyt but the latest version wasn't working for me at all). I'd
> tried an XPath that was working there and I fixed it to do something
> similar who what I suggested and it didn't work the way I expected. But
> now it seems to be working so I guess my tests used the wrong path.
> Sorry for the confusion and thanks for the clarification.
No problem. Glad I could help.
--
Aaron Patterson
http://tenderlovemaking.com/
From adam.vandenhoven at gmail.com Wed Feb 25 18:54:41 2009
From: adam.vandenhoven at gmail.com (Adam van den Hoven)
Date: Wed, 25 Feb 2009 15:54:41 -0800
Subject: [Nokogiri-talk] HTML builder and Paragraph tags.
Message-ID: <1235606081.6571.20.camel@vandenhoven>
hey guys,
The documentation is a little thin on one point that I need help with.
I want to write some paragraph tags. Following the builder's syntax that
would look something like:
builder = Nokogiri::HTML::Builder.new do
div.test do
p "this is a paragraph"
end
end
The problem is, however, that p is also a method of the Kernel so it
doesn't trigger method_missing.
What's the "right" way to put in paragraphs?
Adam
From greg at intelligentassistance.com Wed Feb 25 19:38:59 2009
From: greg at intelligentassistance.com (Gregory Clarke)
Date: Wed, 25 Feb 2009 16:38:59 -0800
Subject: [Nokogiri-talk] HTML builder and Paragraph tags.
In-Reply-To: <1235606081.6571.20.camel@vandenhoven>
References: <1235606081.6571.20.camel@vandenhoven>
Message-ID: <580FC677-B443-4544-B784-DE176F055E87@intelligentassistance.com>
With the Builder gem you do things like this:
x = Builder::XmlMarkup.new(:target => @xml, :indent => 2)
x.instruct!
x.div "test" do
x.p "this is a paragraph"
end
Maybe nokogiri is similar?
> hey guys,
>
> The documentation is a little thin on one point that I need help with.
>
> I want to write some paragraph tags. Following the builder's syntax
> that
> would look something like:
>
> builder = Nokogiri::HTML::Builder.new do
> div.test do
> p "this is a paragraph"
> end
> end
>
> The problem is, however, that p is also a method of the Kernel so it
> doesn't trigger method_missing.
>
> What's the "right" way to put in paragraphs?
>
> Adam
>
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
From mike at csa.net Thu Feb 26 08:44:59 2009
From: mike at csa.net (Mike Dalessio)
Date: Thu, 26 Feb 2009 08:44:59 -0500
Subject: [Nokogiri-talk] HTML builder and Paragraph tags.
In-Reply-To: <580FC677-B443-4544-B784-DE176F055E87@intelligentassistance.com>
References: <1235606081.6571.20.camel@vandenhoven>
<580FC677-B443-4544-B784-DE176F055E87@intelligentassistance.com>
Message-ID: <618c07250902260544g307d500s790b29a2a65320ef@mail.gmail.com>
That's close - you can access the builder through a block argument:
builder = Nokogiri::HTML::Builder.new do
div.test do |builder|
builder.p "this is a paragraph"
end
end
On Wed, Feb 25, 2009 at 7:38 PM, Gregory Clarke <
greg at intelligentassistance.com> wrote:
> With the Builder gem you do things like this:
>
> x = Builder::XmlMarkup.new(:target => @xml, :indent => 2)
> x.instruct!
> x.div "test" do
> x.p "this is a paragraph"
> end
>
> Maybe nokogiri is similar?
>
>
>
> hey guys,
>>
>> The documentation is a little thin on one point that I need help with.
>>
>> I want to write some paragraph tags. Following the builder's syntax that
>> would look something like:
>>
>> builder = Nokogiri::HTML::Builder.new do
>> div.test do
>> p "this is a paragraph"
>> end
>> end
>>
>> The problem is, however, that p is also a method of the Kernel so it
>> doesn't trigger method_missing.
>>
>> What's the "right" way to put in paragraphs?
>>
>> Adam
>>
>> _______________________________________________
>> Nokogiri-talk mailing list
>> Nokogiri-talk at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>>
>
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
--
mike dalessio
mike at csa.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From adam.vandenhoven at gmail.com Fri Feb 27 02:00:41 2009
From: adam.vandenhoven at gmail.com (Adam van den Hoven)
Date: Thu, 26 Feb 2009 23:00:41 -0800
Subject: [Nokogiri-talk] Nokogiri markup to CGI
Message-ID: <1235718041.23995.23.camel@vandenhoven>
I have what is probably an obvious question.
I'm using nokogiri in a simple CGI script and I need to send contents of
the builder object to the output. There is probably an easy way and a
right way, but I'm not sure what that would be.
Any thoughts.
From andrew at nextmobileweb.com Sat Feb 28 17:07:31 2009
From: andrew at nextmobileweb.com (Andrew Farmer)
Date: Sat, 28 Feb 2009 14:07:31 -0800
Subject: [Nokogiri-talk] would you use this feature? inner_html=
Message-ID:
I made a ticket for this feature request: I would like Nodes to have an
inner_html= function.
http://nokogiri.lighthouseapp.com/projects/19607/tickets/46-feature-request-inner_html-method-on-node
Aaron would like to know if anyone aside from me would use this feature, so
would you? And for my own edification, what is the existing way to set the
inner html of an element?
Thanks,
Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From andrew at nextmobileweb.com Sat Feb 28 17:28:53 2009
From: andrew at nextmobileweb.com (Andrew Farmer)
Date: Sat, 28 Feb 2009 14:28:53 -0800
Subject: [Nokogiri-talk] would you use this feature? Node.swap(html)
Message-ID:
Another feature that I would like: Node.swap( html ). It would be a method
on a Node that you can use to replace that Node with arbitrary HTML. This is
something that I'm used to using heavily in Hpricot.
http://nokogiri.lighthouseapp.com/projects/19607/tickets/50-swap-method-hpricot-compatibility#ticket-50-1
So this is another open question to everyone: would you use this
feature? Do you currently do something similar but implement it in a
different way?
Thanks,
Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From holtonma at gmail.com Sat Feb 28 17:44:23 2009
From: holtonma at gmail.com (Holtonma)
Date: Sat, 28 Feb 2009 14:44:23 -0800
Subject: [Nokogiri-talk] would you use this feature? Node.swap(html)
In-Reply-To:
References:
Message-ID:
On Feb 28, 2009, at 2:28 PM, Andrew Farmer
wrote:
> Another feature that I would like: Node.swap( html ). It would be a
> method on a Node that you can use to replace that Node with
> arbitrary HTML. This is something that I'm used to using heavily in
> Hpricot.
>
> http://nokogiri.lighthouseapp.com/projects/19607/tickets/50-swap-method-hpricot-compatibility#ticket-50-1
>
> So this is another open question to everyone: would you use this
> feature? Do you currently do something similar but implement it
> in a different way?
>
>
> Thanks,
> Andrew
Curious - how does that differ from .innerHTML? (which I believe
Nokogiri supports)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From aaron.patterson at gmail.com Sat Feb 28 19:33:24 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Sat, 28 Feb 2009 16:33:24 -0800
Subject: [Nokogiri-talk] would you use this feature? Node.swap(html)
In-Reply-To:
References:
Message-ID: <6959e1680902281633x7de0b161jb8a12b8e77cbb8d6@mail.gmail.com>
On Sat, Feb 28, 2009 at 2:44 PM, Holtonma wrote:
>
> On Feb 28, 2009, at 2:28 PM, Andrew Farmer wrote:
>
> Another feature that I would like: Node.swap( html ). It would be a method
> on a Node that you can use to replace that Node with arbitrary HTML. This is
> something that I'm used to using heavily in Hpricot.
>
> http://nokogiri.lighthouseapp.com/projects/19607/tickets/50-swap-method-hpricot-compatibility#ticket-50-1
>
> So this is another open question to everyone: would you use this
> feature????? Do you currently do something similar but implement it in a
> different way?
>
>
> Thanks,
> Andrew
>
> Curious - how does that differ from .innerHTML? (which I believe Nokogiri
> supports)
This is to actually swap out the html with something else. It's
basically doing this:
require 'nokogiri'
doc = Nokogiri::HTML(<

hello

eohtml
div = doc.at('div')
Nokogiri::HTML.fragment('world').children.each do |node|
div.parent << node
end
div.remove
puts doc.to_html
I want to get a feel for how many people would actually use this.
Hpricot has it, but there are no tests for it, and I don't want to be
compatible with something that has no tests. If I add this, I would
add it because people find it useful (as opposed to being compatible).
I wouldn't actually use the proposed methods, which is why I want some
public opinion. :-)
--
Aaron Patterson
http://tenderlovemaking.com/
From andrew at nextmobileweb.com Sat Feb 28 22:00:15 2009
From: andrew at nextmobileweb.com (Andrew Farmer)
Date: Sat, 28 Feb 2009 19:00:15 -0800
Subject: [Nokogiri-talk] would you use this feature? Node.swap(html)
In-Reply-To: <6959e1680902281633x7de0b161jb8a12b8e77cbb8d6@mail.gmail.com>
References:
<6959e1680902281633x7de0b161jb8a12b8e77cbb8d6@mail.gmail.com>
Message-ID:
The proposed swap method is a little bit different from setting inner HTML
and it is a little bit different from what you've written Aaron. Let's take
a different document for an example.
doc = Nokogiri::HTML(<

hello

eohtml
I would like to be able to replace that div with my span like this:
doc.at("div").swap("world")
And get this result:
world
Aaron, your code would produce this:
world
The span is in the wrong place!
Hopefully now it is somewhat clear what I would like this method to do. I
use this function a lot in hpricot for re-working web pages so I think it is
a very useful function. Am I the only one who thinks so?
On Sat, Feb 28, 2009 at 4:33 PM, Aaron Patterson
wrote:
> On Sat, Feb 28, 2009 at 2:44 PM, Holtonma wrote:
> >
> > On Feb 28, 2009, at 2:28 PM, Andrew Farmer
> wrote:
> >
> > Another feature that I would like: Node.swap( html ). It would be a
> method
> > on a Node that you can use to replace that Node with arbitrary HTML. This
> is
> > something that I'm used to using heavily in Hpricot.
> >
> >
> http://nokogiri.lighthouseapp.com/projects/19607/tickets/50-swap-method-hpricot-compatibility#ticket-50-1
> >
> > So this is another open question to everyone: would you use this
> > feature? Do you currently do something similar but implement it in a
> > different way?
> >
> >
> > Thanks,
> > Andrew
> >
> > Curious - how does that differ from .innerHTML? (which I believe Nokogiri
> > supports)
>
> This is to actually swap out the html with something else. It's
> basically doing this:
>
> require 'nokogiri'
>
> doc = Nokogiri::HTML(<
>
>

hello

>
>
> eohtml
>
> div = doc.at('div')
> Nokogiri::HTML.fragment('world').children.each do |node|
> div.parent << node
> end
> div.remove
>
> puts doc.to_html
>
> I want to get a feel for how many people would actually use this.
> Hpricot has it, but there are no tests for it, and I don't want to be
> compatible with something that has no tests. If I add this, I would
> add it because people find it useful (as opposed to being compatible).
>
> I wouldn't actually use the proposed methods, which is why I want some
> public opinion. :-)
>
> --
> Aaron Patterson
> http://tenderlovemaking.com/
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From aaron.patterson at gmail.com Sat Feb 28 22:31:18 2009
From: aaron.patterson at gmail.com (Aaron Patterson)
Date: Sat, 28 Feb 2009 19:31:18 -0800
Subject: [Nokogiri-talk] would you use this feature? Node.swap(html)
In-Reply-To:
References:
<6959e1680902281633x7de0b161jb8a12b8e77cbb8d6@mail.gmail.com>
Message-ID: <6959e1680902281931m4a42fac8sac0c80d606fc2e8c@mail.gmail.com>
On Sat, Feb 28, 2009 at 7:00 PM, Andrew Farmer wrote:
> The proposed swap method is a little bit different from setting inner HTML
> and it is a little bit different from what you've written Aaron. Let's take
> a different document for an example.
>
> doc = Nokogiri::HTML(< ?
> ? ?
> ????
> ? ? ?

hello

> ????
> ? ?
> ?
> eohtml
>
> I would like to be able to replace that div with my span like this:
>
> doc.at("div").swap("world")
>
> And get this result:
>
> ?
> ? ?
> ????
> ???? world
> ????
> ? ?
> ?
>
> Aaron, your code would produce this:
>
> ?
> ? ?
> ????
> ????
> ???? world
> ? ?
> ?
>
> The span is in the wrong place!
Yes. There is a bug in my implementation. But my point remains the same.
Here is a less buggy implementation:
div = doc.at('div')
Nokogiri::HTML.fragment('world').children.reverse.each do |node|
div.add_next_sibling node
end
div.remove
> Hopefully now it is somewhat clear what I would like this method to do. I
> use this function a lot in hpricot for re-working web pages so I think it is
> a very useful function. Am I the only one who thinks so?
Interesting. What are you reworking? I'm curious. Also, please
bottom post. Thanks!
--
Aaron Patterson
http://tenderlovemaking.com/