uriparser  0.9.5
uriparser Documentation

Table of Contents

Introduction

Welcome to the short uriparser integration tutorial. It is intended to answer upcoming questions and to shed light where function prototypes alone are not enough. Please drop me a line if you need further assistance and I will see what I can do for you. Good luck with uriparser!

Parsing URIs (from string to object)

Parsing a URI with uriparser looks like this:

UriUriA uri;
const char * const uriString = "file:///home/user/song.mp3";
const char * errorPos;
if (uriParseSingleUriA(&uri, uriString, &errorPos) != URI_SUCCESS) {
/* Failure (no need to call uriFreeUriMembersA) */
...
return ...;
}
/* Success */
...
uriFreeUriMembersA(&uri);
URI_PUBLIC int uriParseSingleUriA(UriUriA *uri, const char *text, const char **errorPos)
Definition: Uri.h:373

While the URI object (UriUriA) holds information about the recognized parts of the given URI string, in case of URI_ERROR_SYNTAX, errorPos points to the first character starting invalid syntax.

Recomposing URIs (from object back to string)

According to RFC 3986 gluing parts of a URI together to form a string is called recomposition. Before we can recompose a URI object we have to know how much space the resulting string will take:

UriUriA uri;
char * uriString;
int charsRequired;
...
if (uriToStringCharsRequiredA(&uri, &charsRequired) != URI_SUCCESS) {
/* Failure */
...
}
charsRequired++;
URI_PUBLIC int uriToStringCharsRequiredA(const UriUriA *uri, int *charsRequired)

Now we can tell uriToStringA() to write the string to a given buffer:

uriString = malloc(charsRequired * sizeof(char));
if (uriString == NULL) {
/* Failure */
...
}
if (uriToStringA(uriString, &uri, charsRequired, NULL) != URI_SUCCESS) {
/* Failure */
...
}
URI_PUBLIC int uriToStringA(char *dest, const UriUriA *uri, int maxChars, int *charsWritten)
Remarks
Incrementing charsRequired by 1 is required since uriToStringCharsRequiredA() returns the length of the string as strlen() does, but uriToStringA() works with the number of maximum characters to be written including the zero-terminator.

Resolving References

Reference Resolution is the process of turning a (relative) URI reference into an absolute URI by applying a base URI to it. In code it looks like this:

UriUriA absoluteDest;
UriUriA relativeSource;
UriUriA absoluteBase;
...
/* relativeSource holds "../TWO" now */
/* absoluteBase holds "file:///one/two/three" now */
if (uriAddBaseUriA(&absoluteDest, &relativeSource, &absoluteBase) != URI_SUCCESS) {
/* Failure */
uriFreeUriMembersA(&absoluteDest);
...
}
/* absoluteDest holds "file:///one/TWO" now */
...
uriFreeUriMembersA(&absoluteDest);
URI_PUBLIC void uriFreeUriMembersA(UriUriA *uri)
URI_PUBLIC int uriAddBaseUriA(UriUriA *absoluteDest, const UriUriA *relativeSource, const UriUriA *absoluteBase)
Remarks
uriAddBaseUriA() does not normalize the resulting URI. Usually you might want to pass it through uriNormalizeSyntaxA() after.

Creating References

Reference Creation is the inverse process of Reference Resolution: A common base URI is "subtracted" from an absolute URI to make a (relative) reference. If the base URI is not common the remaining URI will still be absolute, i.e. will carry a scheme

UriUriA dest;
UriUriA absoluteSource;
UriUriA absoluteBase;
...
/* absoluteSource holds "file:///one/TWO" now */
/* absoluteBase holds "file:///one/two/three" now */
if (uriRemoveBaseUriA(&dest, &absoluteSource, &absoluteBase, URI_FALSE) != URI_SUCCESS) {
/* Failure */
...
}
/* dest holds "../TWO" now */
...
uriFreeUriMembersA(&dest);
URI_PUBLIC int uriRemoveBaseUriA(UriUriA *dest, const UriUriA *absoluteSource, const UriUriA *absoluteBase, UriBool domainRootMode)

The fourth parameter is the domain root mode. With URI_FALSE as above this will produce URIs relative to the base URI. With URI_TRUE the resulting URI will be relative to the domain root instead, e.g. "/one/TWO" in this case.

Filenames and URIs

Converting filenames to and from URIs works on strings directly, i.e. without creating an URI object.

const char * const absFilename = "E:\\Documents and Settings";
const int bytesNeeded = 8 + 3 * strlen(absFilename) + 1;
char * absUri = malloc(bytesNeeded * sizeof(char));
if (uriWindowsFilenameToUriStringA(absFilename, absUri) != URI_SUCCESS) {
/* Failure */
free(absUri);
...
}
/* absUri is "file:///E:/Documents%20and%20Settings" now */
...
free(absUri);
URI_PUBLIC int uriWindowsFilenameToUriStringA(const char *filename, char *uriString)

Conversion works ..

  • for relative or absolute values,
  • in both directions (filenames <–> URIs) and
  • with Unix and Windows filenames.

All you have to do is to choose the right function for the task and allocate the required space (in characters) for the target buffer. Let me present you an overview:

Normalizing URIs

Sometimes we come across unnecessarily long URIs like "http://example.org/one/two/../../one". The algorithm we can use to shorten this URI down to "http://example.org/one" is called Syntax-Based Normalization. Note that normalizing a URI does more than just "stripping dot segments". Please have a look at Section 6.2.2 of RFC 3986 for the full description.

As we asked uriToStringCharsRequiredA() for the required space when converting a URI object back to a string, we can ask uriNormalizeSyntaxMaskRequiredA() for the parts of a URI that require normalization and then pass this normalization mask to uriNormalizeSyntaxExA():

const unsigned int dirtyParts = uriNormalizeSyntaxMaskRequiredA(&uri);
if (uriNormalizeSyntaxExA(&uri, dirtyParts) != URI_SUCCESS) {
/* Failure */
...
}
URI_PUBLIC int uriNormalizeSyntaxExA(UriUriA *uri, unsigned int mask)
URI_PUBLIC unsigned int uriNormalizeSyntaxMaskRequiredA(const UriUriA *uri)

If you don't want to normalize all parts of the URI you can pass a custom mask as well:

const unsigned int normMask = URI_NORMALIZE_SCHEME | URI_NORMALIZE_USER_INFO;
if (uriNormalizeSyntaxExA(&uri, normMask) != URI_SUCCESS) {
/* Failure */
...
}
@ URI_NORMALIZE_USER_INFO
Definition: UriBase.h:180
@ URI_NORMALIZE_SCHEME
Definition: UriBase.h:179

Please see UriNormalizationMaskEnum for the complete set of flags.

On the other hand calling plain uriNormalizeSyntaxA() (without the "Ex") saves you thinking about single parts, as it queries uriNormalizeSyntaxMaskRequiredA() internally:

if (uriNormalizeSyntaxA(&uri) != URI_SUCCESS) {
/* Failure */
...
}
URI_PUBLIC int uriNormalizeSyntaxA(UriUriA *uri)

Working with Query Strings

RFC 3986 itself does not understand the query part of a URI as a list of key/value pairs. But HTML 2.0 does and defines a media type application/x-www-form-urlencoded in in section 8.2.1 of RFC 1866. uriparser allows you to dissect (or parse) a query string into unescaped key/value pairs and back.

To dissect the query part of a just-parsed URI you could write code like this:

UriUriA uri;
UriQueryListA * queryList;
int itemCount;
...
if (uriDissectQueryMallocA(&queryList, &itemCount, uri.query.first,
uri.query.afterLast) != URI_SUCCESS) {
/* Failure */
...
}
...
uriFreeQueryListA(queryList);
URI_PUBLIC int uriDissectQueryMallocA(UriQueryListA **dest, int *itemCount, const char *first, const char *afterLast)
Definition: Uri.h:410
const char * first
Definition: Uri.h:334
const char * afterLast
Definition: Uri.h:335
UriTextRangeA query
Definition: Uri.h:381
Remarks
  • NULL in the value member means there was no '=' in the item text as with "?abc&def".
  • An empty string in the value member means there was '=' in the item as with "?abc=&def".

To compose a query string from a query list you could write code like this:

int charsRequired;
int charsWritten;
char * queryString;
...
if (uriComposeQueryCharsRequiredA(queryList, &charsRequired) != URI_SUCCESS) {
/* Failure */
...
}
queryString = malloc((charsRequired + 1) * sizeof(char));
if (queryString == NULL) {
/* Failure */
...
}
if (uriComposeQueryA(queryString, queryList, charsRequired + 1, &charsWritten) != URI_SUCCESS) {
/* Failure */
...
}
...
free(queryString);
URI_PUBLIC int uriComposeQueryCharsRequiredA(const UriQueryListA *queryList, int *charsRequired)
URI_PUBLIC int uriComposeQueryA(char *dest, const UriQueryListA *queryList, int maxChars, int *charsWritten)

Narrow Strings and Wide Strings

uriparser comes with two versions of every structure and function: one handling narrow strings (char *) and one working with wide strings (wchar_t *), for instance

This tutorial only shows the usage of the narrow string editions but their wide string counterparts work in the very same way.

Autoconf Check

You can use the code below to make ./configure test for presence of uriparser 0.9.0 or later.

PKG_CHECK_MODULES([URIPARSER], [liburiparser >= 0.9.0], [], [])