Commit Graph

8 Commits

Author SHA1 Message Date
Yorick Peterse 94f8ed5421 Removed start/end comments of YARD blocks 2015-09-01 19:59:52 +02:00
Yorick Peterse b6d34a406d Removed redundant returns 2015-06-16 00:51:51 +02:00
Yorick Peterse 2766d5f27f Pack HTML entities using "U*"
See https://github.com/YorickPeterse/oga/issues/90#issuecomment-89859273
for more details, apparently I didn't fix this before.
2015-05-21 11:26:01 +02:00
Yorick Peterse 2ec91f130f Lazy decoding of XML/HTML entities.
Instead of decoding entities in the lexer we'll do this whenever XML::Text#text
is called. This removes the overhead from the parsing phase and ensures the
process is only triggered when actually needed. Note that calling #to_xml and/or
the #inspect methods on a Text (or parent) instance will also trigger the entity
conversion process.

The new entity decoding API supports both regular entities (e.g. &) as well
as codepoint based entities (both regular and hexadecimal codepoints).

To allow safe read-only access to Text instances from multiple threads a mutex
is used. This mutex ensures that only 1 thread can trigger the conversion
process.

Fixes #68
2015-03-05 23:00:43 +01:00
Yorick Peterse 317b49bcf6 Implemented a basic SAX API.
This API is a little bit dodgy (similar to Nokogiri's API) due to the use of
separate parser and handler classes. This is done to ensure that the return
values of callback methods (e.g. on_element) aren't used by Racc for building
AST trees. This also ensures that whatever variables are set by the handler
don't conflict with any variables of the parser.

This fixes #42.
2014-09-16 14:30:46 +02:00
Yorick Peterse d8e2b97031 Tweaked docs of the XML parsers. 2014-09-04 09:34:59 +02:00
Yorick Peterse 8237d5791d Stream tokens when lexing.
Instead of returning the tokens as a whole they are now streamed using
XML::Lexer#advance. This method returns the next token upon every call. It uses
a small buffer in case a particular block of text results in multiple tokens.
2014-04-09 22:08:13 +02:00
Yorick Peterse 79818eb349 Added a convenience class for parsing HTML.
This removes the need for users having to set the `:html` option themselves.
2014-03-25 09:40:24 +01:00