Input/Output
The library provides parsing and serialization algorithms to transform JSON to
and from the value
container as needed. This is accomplished through
free functions and classes, described as follows.
Parsing
Parsing is the process where a serialized JSON text is validated and decomposed into elements. The library provides these functions and types to assist with parsing:
Name | Description |
---|---|
A SAX push parser implementation which converts a serialized JSON text into a series of member function calls to a user provided handler. This allows custom behaviors to be implemented for representing the document in memory. |
|
A structure used to select which extensions are enabled during parsing. |
|
Parse a string containing a complete serialized JSON text, and return
a |
|
A stateful DOM parser object which may be used to efficiently parse a series
of JSON texts each contained in a single contiguous character buffer,
returning each result as a |
|
A stateful DOM parser object which may be used to efficiently parse a series
of JSON texts incrementally, returning each result as a |
|
A low level building block used for efficiently building a |
The parse
function offers a simple interface for converting
a serialized JSON text to a value
in a single function call. This
overload uses exceptions to indicate errors:
value jv = parse( "[1,2,3,4,5]" );
Alternatively, an error_code
can be used:
boost::system::error_code ec;
value jv = parse( "[1,2,3,4,5]", ec );
if( ec )
std::cout << "Parsing failed: " << ec.message() << "\n";
Even when using error codes, exceptions thrown from the underlying
memory_resource
are still possible:
try
{
boost::system::error_code ec;
value jv = parse( "[1,2,3,4,5]", ec );
if( ec )
std::cout << "Parsing failed: " << ec.message() << "\n";
}
catch( std::bad_alloc const& e)
{
std::cout << "Parsing failed: " << e.what() << "\n";
}
The value
returned in the preceding examples use the
default memory resource. The following code uses
a monotonic_resource
, which results in faster parsing. jv
is marked
const
to prevent subsequent modification, because containers using
a monotonic resource waste memory when mutated.
monotonic_resource mr;
value const jv = parse( "[1,2,3,4,5]", &mr );
Non-Standard JSON
Unless otherwise specified, the parser in this library is strict. It recognizes
only valid, standard JSON. The parser can be configured to allow certain
non-standard extensions by filling in a parse_options
structure and
passing it by value. By default all extensions are disabled:
parse_options opt; // all extensions default to off
opt.allow_comments = true; // permit C and C++ style comments
// to appear in whitespace
opt.allow_trailing_commas = true; // allow an additional trailing comma in
// object and array element lists
opt.allow_invalid_utf8 = true; // skip utf-8 validation of keys and strings
opt.allow_invalid_utf16 = true; // replace invalid surrogate pair UTF-16 code point(s)
// with the Unicode replacement character
value jv = parse( "[1,2,3,] // comment ", storage_ptr(), opt );
When building with C++20 or later, the use of
designated
initializers with parse_options
is possible:
value jv = parse(
"[1,2,3,] // comment ",
storage_ptr(),
{
.allow_comments = true, // permit C and C++ style comments
// to appear in whitespace
.allow_trailing_commas = true, // allow a trailing comma in object and array lists
.allow_invalid_utf8 = true // skip utf-8 validation of keys and strings
});
When allow_invalid_utf16
is enabled, the parser will not throw an error in
the case of illegal leading, trailing, or half a surrogate. Instead, it will
replace the invalid UTF-16 code point(s) with the Unicode replacement
character.
value jv = parse( "{\"command\":\"\\uDF3E\\uDEC2\"}", storage_ptr(),
{
.allow_invalid_utf16 = true // replace illegal leading surrogate pair with ��
});
When enabling comment support take extra care not to drop whitespace
when reading the input. For example, std::getline removes the endline
characters from the string it produces.
|
Full Precision Number Parsing
The default algorithm that the library uses to parse numbers is fast, but may
result in slight precision loss. This may not be suitable for some
applications, so there is an option to enable an alternative algorithm that
doesn’t have that flaw, but is somewhat slower. To do this, you also need to
use parse_options
structure.
parse_options opt;
opt.numbers = number_precision::precise;
value jv = parse( "1002.9111801605201", storage_ptr(), opt );
Note that full precision number parsing requires the algorithm to see the full
number. This means, that when used with stream_parser
, additional
memory allocations may be necessary to store the number parts which were so far
accepted by the parser. The library does try its best to avoid such
allocations.
Parser
Instances of parser
and stream_parser
offer functionality
beyond what is available when using the parse
free functions:
-
More control over memory
-
Streaming API, parse input JSON incrementally
-
Improved performance when parsing multiple JSON texts
-
Ignore non-JSON content after the end of a JSON text
The parser implementation uses temporary storage space to accumulate values
during parsing. When using the parse
free functions, this storage is
allocated and freed in each call. However, by declaring an instance of
parser
or stream_parser
, this temporary storage can be reused
when parsing more than one JSON text, reducing the total number of dynamic
memory allocations.
To use the parser
, declare an instance. Then call parser::write
once with the buffer containing representing the input JSON. Finally, call
parser::release
to take ownership of the resulting value
upon
success. This example persists the parser instance in a class member to reuse
across calls:
class connection
{
parser p_; // persistent data member
public:
void do_read( string_view s ) // called for each complete message from the network
{
p_.reset(); // start parsing a new JSON using the default resource
p_.write( s ); // parse the buffer, using exceptions to indicate error
do_rpc( p_.release() ); // process the command
}
void do_rpc( value jv );
};
Sometimes a protocol may have a JSON text followed by data that is in
a different format or specification. The JSON portion can still be parsed by
using the function parser::write_some
. Upon success, the return value
will indicate the number of characters consumed from the input, which will
exclude the non-JSON characters:
stream_parser p;
boost::system::error_code ec;
string_view s = "[1,2,3] %HOME%";
std::size_t n = p.write_some( s, ec );
assert( ! ec && p.done() && n == 8 );
s = s.substr( n );
value jv = p.release();
assert( s == "%HOME%" );
The parser instance may be constructed with parse options which allow some non-standard JSON extensions to be recognized:
parse_options opt; // All extensions default to off
opt.allow_comments = true; // Permit C and C++ style comments to appear in whitespace
opt.allow_trailing_commas = true; // Allow an additional trailing comma in
// object and array element lists
opt.allow_invalid_utf8 = true; // Skip utf-8 validation of keys and strings
stream_parser p( storage_ptr(), opt ); // The stream_parser will use the options
Streaming Parser
The stream_parser
implements
a streaming algorithm; it
allows incremental processing of large JSON inputs using one or more contiguous
character buffers. The entire input JSON does not need to be loaded into memory
at once. A network server can use the streaming interface to process incoming
JSON in fixed-size amounts, providing these benefits:
-
CPU consumption per I/O cycle is bounded
-
Memory consumption per I/O cycle is bounded
-
Jitter, unfairness, and latency is reduced
-
Less total memory is required to process the full input
To use the stream_parser
, declare an instance. Then call
stream_parser::write
zero or more times with successive buffers
representing the input JSON. When there are no more buffers, call
stream_parser::finish
. The function stream_parser::done
returns
true
after a successful call to write
or finish
if parsing is complete.
In the following example a JSON text is parsed from standard input a line at
a time. Error codes are used instead. The function stream_parser::finish
is used to indicate the end of the input:
This example will break, if comments are enabled, because of
std::getline use (see the warning in Non-Standard JSON section).
|
value read_json( std::istream& is, boost::system::error_code& ec )
{
stream_parser p;
std::string line;
while( std::getline( is, line ) )
{
p.write( line, ec );
if( ec )
return nullptr;
}
p.finish( ec );
if( ec )
return nullptr;
return p.release();
}
We can complicate the example further by extracting several JSON values from the sequence of lines.
std::vector<value> read_jsons( std::istream& is, boost::system::error_code& ec )
{
std::vector< value > jvs;
stream_parser p;
std::string line;
std::size_t n = 0;
while( true )
{
if( n == line.size() )
{
if( !std::getline( is, line ) )
break;
n = 0;
}
n += p.write_some( line.data() + n, line.size() - n, ec );
if( p.done() )
{
jvs.push_back( p.release() );
p.reset();
}
}
if( !p.done() ) // this part handles the cases when the last JSON text in
{ // the input is either incomplete or doesn't have a marker
p.finish(ec); // for end of the value (e.g. it is a number)
if( ec.failed() )
return jvs;
jvs.push_back( p.release() );
}
return jvs;
}
Controlling Memory
After default construction, or after stream_parser::reset
is called with
no arguments, the value
produced after a successful parse operation
uses the default memory resource. To use a different memory resource, call
reset
with the resource to use. Here we use a monotonic_resource
,
which is optimized for parsing but not subsequent modification:
monotonic_resource mr;
stream_parser p;
p.reset( &mr ); // Use mr for the resulting value
p.write( "[1,2,3,4,5]" ); // Parse the input JSON
value const jv = p.release(); // Retrieve the result
assert( *jv.storage() == mr ); // Same memory resource
To achieve performance and memory efficiency, the parser uses a temporary storage area to hold intermediate results. This storage is reused when parsing more than one JSON text, reducing the total number of calls to allocate memory and thus improving performance. Upon construction, the memory resource used to perform allocations for this temporary storage area may be specified. Otherwise, the default memory resource is used. In addition to a memory resource, the parser can make use of a caller-owned buffer for temporary storage. This can help avoid dynamic allocations for small inputs. The following example uses a four kilobyte temporary buffer for the parser, and falls back to the default memory resource if needed:
unsigned char temp[ 4096 ]; // Declare our buffer
stream_parser p(
storage_ptr(), // Default memory resource
parse_options{}, // Default parse options (strict parsing)
temp); // Use our buffer for temporary storage
Avoiding Dynamic Allocations
Through careful specification of buffers and memory resources, it is possible to eliminate all dynamic allocation completely when parsing JSON, for the case where the entire JSON text is available in a single character buffer, as shown here:
/* Parse JSON and invoke the handler
This function parses the JSON specified in `s`
and invokes the handler, whose signature must
be equivalent to:
void( value const& jv );
The operation is guaranteed not to perform any
dynamic memory allocations. However, some
implementation-defined upper limits on the size
of the input JSON and the size of the resulting
value are imposed.
Upon error, an exception is thrown.
*/
template< class Handler >
void do_rpc( string_view s, Handler&& handler )
{
unsigned char temp[ 4096 ]; // The parser will use this storage for its temporary needs
parser p( // Construct a strict parser using
// the temp buffer and no dynamic memory
get_null_resource(), // The null resource never dynamically allocates memory
parse_options(), // Default constructed parse options allow only standard JSON
temp );
unsigned char buf[ 16384 ]; // Now we need a buffer to hold the actual JSON values
static_resource mr2( buf ); // The static resource is monotonic,
// using only a caller-provided buffer
p.reset( &mr2 ); // Use the static resource for producing the value
p.write( s ); // Parse the entire string we received from the network client
// Retrieve the value and invoke the handler with it.
// The value will use `buf` for storage. The handler
// must not take ownership, since monotonic resources
// are inefficient with mutation.
handler( p.release() );
}
Custom Parsers
Users who wish to implement custom parsing strategies may create their own
handler to use with an instance of basic_parser
. The handler implements
the function signatures required by SAX event interface. In
Validate example we define the "null" parser, which throws out the
parsed results, to use in the implementation of a function that determines if
a JSON text is valid.