diff --git a/doc/migrating_from_nokogiri.md b/doc/migrating_from_nokogiri.md new file mode 100644 index 0000000..883eb70 --- /dev/null +++ b/doc/migrating_from_nokogiri.md @@ -0,0 +1,169 @@ +# Migrating From Nokogiri + +If you're parsing XML/HTML documents using Ruby, chances are you're using +[Nokogiri][nokogiri] for this. This guide aims to make it easier to switch from +Nokogiri to Oga. + +## Parsing Documents + +In Nokogiri there are two defacto ways of parsing documents: + +* `Nokogiri.XML()` for XML documents +* `Nokogiri.HTML()` for HTML documents + +For example, to parse an XML document you'd use the following: + + Nokogiri::XML('foo') + +Oga instead uses the following two methods: + +* `Oga.parse_xml` +* `Oga.parse_html` + +Their usage is similar: + + Oga.parse_xml('foo') + +Nokogiri returns two distinctive document classes based on what method was used +to parse a document: + +* `Nokogiri::XML::Document` for XML documents +* `Nokogiri::HTML::Document` for HTML documents + +Oga on the other hand always returns `Oga::XML::Document` instance, Oga +currently makes no distinction between XML and HTML documents other than on +lexer level. This might change in the future if deemed required. + +## Querying Documents + +Nokogiri allows one to query documents/elements using both XPath expressions and +CSS selectors. In Nokogiri one queries a document as following: + + document = Nokogiri::XML('bar') + + document.xpath('root/foo') + document.css('root foo') + +Oga currently only supports XPath expressions, CSS selectors will be added in +the near future. Querying documents works similar to Nokogiri: + + document = Oga.parse_xml('bar') + + document.xpath('root/foo') + +Nokogiri also allows you to query a document and return the first match, opposed +to an entire node set, using the method `at`. In Nokogiri this method can be +used for both XPath expression and CSS selectors. Oga has no such method, +instead it provides the following more dedicated methods: + +* `at_xpath`: returns the first node of an XPath expression + +For example: + + document = Oga.parse_xml('bar') + + document.at_xpath('root/foo') + +By using a dedicated method Oga doesn't have to try and guess what type of +expression you're using (XPath or CSS), meaning it can never make any mistakes. + +## Retrieving Attribute Values + +Nokogiri provides two methods for retrieving attributes and attribute values: + +* `Nokogiri::XML::Node#attribute` +* `Nokogiri::XML::Node#attr` + +The first method always returns an instance of `Nokogiri::XML::Attribute`, the +second method returns the attribute value as a `String`. This behaviour, +especially due to the names used, is extremely confusing. + +Oga on the other hand provides the following two methods: + +* `Oga::XML::Element#attribute` (aliased as `attr`) +* `Oga::XML::Element#get` + +The first method always returns a `Oga::XML::Attribute` instance, the second +returns the attribute value as a `String`. I deliberately chose `get` for +getting a value to remove the confusion of `attribute` vs `attr`. This also +allows for `attr` to simply be an alias of `attribute`. + +As an example, this is how you'd get the value of a `class` attribute in +Nokogiri: + + document = Nokogiri::XML('') + + document.xpath('root').first.attr('class') # => "foo" + +This is how you'd get the same value in Oga: + + document = Oga.parse_xml('') + + document.xpath('root').first.get('class') # => "foo" + +## Modifying Documents + +Modifying documents in Nokogiri is not as convenient as it perhaps could be. For +example, adding an element to a document is done as following: + + document = Nokogiri::XML('') + root = document.xpath('root').first + + name = Nokogiri::XML::Element.new('name', document) + + name.inner_html = 'Alice' + + root.add_child(name) + +The annoying part here is that we have to pass a document into an Element's +constructor. As such, you can not create elements without first creating a +document. Another thing is that Nokogiri has no method called `inner_text=`, +instead you have to use the method `inner_html=`. + +In Oga you'd use the following: + + document = Oga.parse_xml('') + root = document.xpath('root').first + + name = Oga::XML::Element.new(:name => 'name') + + name.inner_text = 'Alice' + + root.children << name + +Adding attributes works similar for both Nokogiri and Oga. For Nokogiri you'd +use the following: + + element.set_attribute('class', 'foo') + +Alternatively you can do the following: + + element['class'] = 'foo' + +In Oga you'd instead use the method `set`: + + element.set('class', 'foo') + +This method automatically creates an attribute if it doesn't exist, including +the namespace if specified: + + element.set('foo:class', 'foo') + +## Serializing Documents + +Serializing the document back to XML works the same in both libraries, simply +call `to_xml` on a document or element and you'll get a String back containing +the XML. There is one key difference here though: Nokogiri does not return the +exact same output as it was given as input, for example it adds XML declaration +tags: + + Nokogiri::XML('').to_xml # => "\n\n" + +Oga on the other hand does not do this: + + Oga.parse_xml('').to_xml # => "" + +Oga also doesn't insert random newlines or other possibly unexpected (or +unwanted) data. + +[nokogiri]: http://nokogiri.org/