uriparser 0.9.8
|
Welcome to the short uriparser integration tutorial. It is intended to answer upcoming questions and to shed light where function prototypes alone are not enough. Please drop me a line if you need further assistance and I will see what I can do for you. Good luck with uriparser!
Parsing a URI with uriparser looks like this:
While the URI object (UriUriA) holds information about the recognized parts of the given URI string, in case of URI_ERROR_SYNTAX
, errorPos
points to the first character starting invalid syntax.
According to RFC 3986 gluing parts of a URI together to form a string is called recomposition. Before we can recompose a URI object we have to know how much space the resulting string will take:
Now we can tell uriToStringA() to write the string to a given buffer:
charsRequired
by 1 is required since uriToStringCharsRequiredA() returns the length of the string as strlen() does, but uriToStringA() works with the number of maximum characters to be written including the zero-terminator.Reference Resolution is the process of turning a (relative) URI reference into an absolute URI by applying a base URI to it. In code it looks like this:
Reference Creation is the inverse process of Reference Resolution: A common base URI is "subtracted" from an absolute URI to make a (relative) reference. If the base URI is not common the remaining URI will still be absolute, i.e. will carry a scheme
The fourth parameter is the domain root mode. With URI_FALSE
as above this will produce URIs relative to the base URI. With URI_TRUE
the resulting URI will be relative to the domain root instead, e.g. "/one/TWO" in this case.
Converting filenames to and from URIs works on strings directly, i.e. without creating an URI object.
Conversion works ..
All you have to do is to choose the right function for the task and allocate the required space (in characters) for the target buffer. Let me present you an overview:
Sometimes we come across unnecessarily long URIs like "http://example.org/one/two/../../one". The algorithm we can use to shorten this URI down to "http://example.org/one" is called Syntax-Based Normalization. Note that normalizing a URI does more than just "stripping dot segments". Please have a look at Section 6.2.2 of RFC 3986 for the full description.
As we asked uriToStringCharsRequiredA() for the required space when converting a URI object back to a string, we can ask uriNormalizeSyntaxMaskRequiredA() for the parts of a URI that require normalization and then pass this normalization mask to uriNormalizeSyntaxExA():
If you don't want to normalize all parts of the URI you can pass a custom mask as well:
Please see UriNormalizationMaskEnum for the complete set of flags.
On the other hand calling plain uriNormalizeSyntaxA() (without the "Ex") saves you thinking about single parts, as it queries uriNormalizeSyntaxMaskRequiredA() internally:
RFC 3986 itself does not understand the query part of a URI as a list of key/value pairs. But HTML 2.0 does and defines a media type application/x-www-form-urlencoded in in section 8.2.1 of RFC 1866. uriparser allows you to dissect (or parse) a query string into unescaped key/value pairs and back.
To dissect the query part of a just-parsed URI you could write code like this:
NULL
in the value
member means there was no '=' in the item text as with "?abc&def".value
member means there was '=' in the item as with "?abc=&def".To compose a query string from a query list you could write code like this:
uriparser comes with two versions of every structure and function: one handling narrow strings (char *
) and one working with wide strings (wchar_t *
), for instance
char *
wchar_t *
.This tutorial only shows the usage of the narrow string editions but their wide string counterparts work in the very same way.
You can use the code below to make ./configure
test for presence of uriparser 0.9.0 or later.
PKG_CHECK_MODULES([URIPARSER], [liburiparser >= 0.9.0], [], [])