Introduction

Welcome to the short uriparser integration tutorial. It is intended to answer upcoming questions and to shed light where function prototypes alone are not enough. Please drop me a line if you need further assistance and I will see what I can do for you. Good luck with uriparser!

Parsing URIs (from string to object)

Parsing a URI with uriparser looks like this:

UriUriA uri;
const char * const uriString = "file:///home/user/song.mp3";
const char * errorPos;
 
if (uriParseSingleUriA(&uri, uriString, &errorPos) != URI_SUCCESS) {
    /* Failure (no need to call uriFreeUriMembersA) */
    ...
    return ...;
}
 
/* Success */
...
uriFreeUriMembersA(&uri);

While the URI object (UriUriA) holds information about the recognized parts of the given URI string, in case of URI_ERROR_SYNTAX, errorPos points to the first character starting invalid syntax.

Recomposing URIs (from object back to string)

According to RFC 3986 gluing parts of a URI together to form a string is called recomposition. Before we can recompose a URI object we have to know how much space the resulting string will take:

UriUriA uri;
char * uriString;
int charsRequired;
...
if (uriToStringCharsRequiredA(&uri, &charsRequired) != URI_SUCCESS) {
    /* Failure */
    ...
}
charsRequired++;

Now we can tell uriToStringA() to write the string to a given buffer:

uriString = malloc(charsRequired * sizeof(char));
if (uriString == NULL) {
    /* Failure */
    ...
}
if (uriToStringA(uriString, &uri, charsRequired, NULL) != URI_SUCCESS) {
    /* Failure */
    ...
}

Remarks: Incrementing charsRequired by 1 is required since uriToStringCharsRequiredA() returns the length of the string as strlen() does, but uriToStringA() works with the number of maximum characters to be written including the zero-terminator.

Resolving References

Reference Resolution is the process of turning a (relative) URI reference into an absolute URI by applying a base URI to it. In code it looks like this:

UriUriA absoluteDest;
UriUriA relativeSource;
UriUriA absoluteBase;
...
/* relativeSource holds "../TWO" now */
/* absoluteBase holds "file:///one/two/three" now */
if (uriAddBaseUriA(&absoluteDest, &relativeSource, &absoluteBase) != URI_SUCCESS) {
    /* Failure */
    uriFreeUriMembersA(&absoluteDest);
    ...
}
/* absoluteDest holds "file:///one/TWO" now */
...
uriFreeUriMembersA(&absoluteDest);

Remarks: uriAddBaseUriA() does not normalize the resulting URI. Usually you might want to pass it through uriNormalizeSyntaxA() after.

Creating References

Reference Creation is the inverse process of Reference Resolution: A common base URI is "subtracted" from an absolute URI to make a (relative) reference. If the base URI is not common the remaining URI will still be absolute, i.e. will carry a scheme

UriUriA dest;
UriUriA absoluteSource;
UriUriA absoluteBase;
...
/* absoluteSource holds "file:///one/TWO" now */
/* absoluteBase holds "file:///one/two/three" now */
if (uriRemoveBaseUriA(&dest, &absoluteSource, &absoluteBase, URI_FALSE) != URI_SUCCESS) {
    /* Failure */
    uriFreeUriMembersA(&dest);
    ...
}
/* dest holds "../TWO" now */
...
uriFreeUriMembersA(&dest);

The fourth parameter is the domain root mode. With URI_FALSE as above this will produce URIs relative to the base URI. With URI_TRUE the resulting URI will be relative to the domain root instead, e.g. "/one/TWO" in this case.

Filenames and URIs

Converting filenames to and from URIs works on strings directly, i.e. without creating an URI object.

const char * const absFilename = "E:\\Documents and Settings";
const int bytesNeeded = 8 + 3 * strlen(absFilename) + 1;
char * absUri = malloc(bytesNeeded * sizeof(char));
if (uriWindowsFilenameToUriStringA(absFilename, absUri) != URI_SUCCESS) {
    /* Failure */
    free(absUri);
    ...
}
/* absUri is "file:///E:/Documents%20and%20Settings" now */
...
free(absUri);

Conversion works ..

for relative or absolute values,
in both directions (filenames <–> URIs) and
with Unix and Windows filenames.

All you have to do is to choose the right function for the task and allocate the required space (in characters) for the target buffer. Let me present you an overview:

Filename –> URI
- uriUnixFilenameToUriStringA()
  Space required: [7 +] 3 * len(filename) + 1
- uriWindowsFilenameToUriStringA()
  Space required: [8 +] 3 * len(filename) + 1
URI –> filename
- uriUriStringToUnixFilenameA()
  Space required: len(uriString) + 1 [- 7]
- uriUriStringToWindowsFilenameA()
  Space required: len(uriString) + 1 [- 8]

Normalizing URIs

Sometimes we come across unnecessarily long URIs like "http://example.org/one/two/../../one". The algorithm we can use to shorten this URI down to "http://example.org/one" is called Syntax-Based Normalization. Note that normalizing a URI does more than just "stripping dot segments". Please have a look at Section 6.2.2 of RFC 3986 for the full description.

As we asked uriToStringCharsRequiredA() for the required space when converting a URI object back to a string, we can ask uriNormalizeSyntaxMaskRequiredA() for the parts of a URI that require normalization and then pass this normalization mask to uriNormalizeSyntaxExA():

const unsigned int dirtyParts = uriNormalizeSyntaxMaskRequiredA(&uri);
if (uriNormalizeSyntaxExA(&uri, dirtyParts) != URI_SUCCESS) {
    /* Failure */
    ...
}

If you don't want to normalize all parts of the URI you can pass a custom mask as well:

const unsigned int normMask = URI_NORMALIZE_SCHEME | URI_NORMALIZE_USER_INFO;
if (uriNormalizeSyntaxExA(&uri, normMask) != URI_SUCCESS) {
    /* Failure */
    ...
}

Please see UriNormalizationMaskEnum for the complete set of flags.

On the other hand calling plain uriNormalizeSyntaxA() (without the "Ex") saves you thinking about single parts, as it queries uriNormalizeSyntaxMaskRequiredA() internally:

if (uriNormalizeSyntaxA(&uri) != URI_SUCCESS) {
    /* Failure */
    ...
}

Working with Query Strings

RFC 3986 itself does not understand the query part of a URI as a list of key/value pairs. But HTML 2.0 does and defines a media type application/x-www-form-urlencoded in in section 8.2.1 of RFC 1866. uriparser allows you to dissect (or parse) a query string into unescaped key/value pairs and back.

To dissect the query part of a just-parsed URI you could write code like this:

UriUriA uri;
UriQueryListA * queryList;
int itemCount;
...
if (uriDissectQueryMallocA(&queryList, &itemCount, uri.query.first,
        uri.query.afterLast) != URI_SUCCESS) {
    /* Failure */
    ...
}
...
uriFreeQueryListA(queryList);

Remarks

NULL in the value member means there was no '=' in the item text as with "?abc&def".
An empty string in the value member means there was '=' in the item as with "?abc=&def".

To compose a query string from a query list you could write code like this:

int charsRequired;
int charsWritten;
char * queryString;
...
if (uriComposeQueryCharsRequiredA(queryList, &charsRequired) != URI_SUCCESS) {
    /* Failure */
    ...
}
queryString = malloc((charsRequired + 1) * sizeof(char));
if (queryString == NULL) {
    /* Failure */
    ...
}
if (uriComposeQueryA(queryString, queryList, charsRequired + 1, &charsWritten) != URI_SUCCESS) {
    /* Failure */
    ...
}
...
free(queryString);

Narrow Strings and Wide Strings

uriparser comes with two versions of every structure and function: one handling narrow strings (char *) and one working with wide strings (wchar_t *), for instance

uriParseSingleUriA() for char *
uriParseSingleUriW() for wchar_t *.

This tutorial only shows the usage of the narrow string editions but their wide string counterparts work in the very same way.

Autoconf Check

You can use the code below to make ./configure test for presence of uriparser 0.9.0 or later.

PKG_CHECK_MODULES([URIPARSER], [liburiparser >= 0.9.0], [], [])

Table of Contents