uriparser  0.9.4
About: uriparser is a strictly RFC 3986 compliant URI parsing library (with Unicode support).
  Fossies Dox: uriparser-0.9.4.tar.xz  ("unofficial" and yet experimental doxygen-generated source code documentation)  

uriparser Documentation

Table of Contents

Introduction

  • Welcome to the short uriparser integration tutorial.
  • It is intended to answer upcoming questions and to shed light
  • where function prototypes alone are not enough.
  • Please drop me a line if you need further assistance and I will
  • see what I can do for you. Good luck with uriparser!

Parsing URIs (from string to object)

  • Parsing a URI with uriparser looks like this:
  • * UriUriA uri;
    * const char * const uriString = "file:///home/user/song.mp3";
    * const char * errorPos;
    *
    * if (uriParseSingleUriA(&uri, uriString, &errorPos) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure (no need to call uriFreeUriMembersA) *COMMENT_HACK/
    * ...
    * return ...;
    * }
    *
    * /COMMENT_HACK* Success *COMMENT_HACK/
    * ...
    * uriFreeUriMembersA(&uri);
    *
  • While the URI object (::UriUriA) holds information about the recognized
  • parts of the given URI string, in case of URI_ERROR_SYNTAX,
  • errorPos points to the first character starting invalid syntax.

Recomposing URIs (from object back to string)

  • According to RFC 3986
  • gluing parts of a URI together to form a string is called recomposition.
  • Before we can recompose a URI object we have to know how much
  • space the resulting string will take:
  • * UriUriA uri;
    * char * uriString;
    * int charsRequired;
    * ...
    * if (uriToStringCharsRequiredA(&uri, &charsRequired) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * ...
    * }
    * charsRequired++;
    *
  • Now we can tell uriToStringA() to write the string to a given buffer:
  • * uriString = malloc(charsRequired * sizeof(char));
    * if (uriString == NULL) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * ...
    * }
    * if (uriToStringA(uriString, &uri, charsRequired, NULL) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * ...
    * }
    *
  • Remarks
  • Incrementing charsRequired by 1 is required since
  • uriToStringCharsRequiredA() returns the length of the string
  • as strlen() does, but uriToStringA() works with the number
  • of maximum characters to be written including the
  • zero-terminator.

Resolving References

  • Reference Resolution
  • is the process of turning a (relative) URI reference into an absolute URI by applying a base
  • URI to it. In code it looks like this:
  • * UriUriA absoluteDest;
    * UriUriA relativeSource;
    * UriUriA absoluteBase;
    * ...
    * /COMMENT_HACK* relativeSource holds "../TWO" now *COMMENT_HACK/
    * /COMMENT_HACK* absoluteBase holds "file:///one/two/three" now *COMMENT_HACK/
    * if (uriAddBaseUriA(&absoluteDest, &relativeSource, &absoluteBase) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * uriFreeUriMembersA(&absoluteDest);
    * ...
    * }
    * /COMMENT_HACK* absoluteDest holds "file:///one/TWO" now *COMMENT_HACK/
    * ...
    * uriFreeUriMembersA(&absoluteDest);
    *
  • Remarks
  • uriAddBaseUriA() does not normalize the resulting URI.
  • Usually you might want to pass it through uriNormalizeSyntaxA() after.

Creating References

  • Reference Creation is the inverse process of Reference Resolution: A common base URI
  • is "subtracted" from an absolute URI to make a (relative) reference.
  • If the base URI is not common the remaining URI will still be absolute, i.e. will
  • carry a scheme
  • * UriUriA dest;
    * UriUriA absoluteSource;
    * UriUriA absoluteBase;
    * ...
    * /COMMENT_HACK* absoluteSource holds "file:///one/TWO" now *COMMENT_HACK/
    * /COMMENT_HACK* absoluteBase holds "file:///one/two/three" now *COMMENT_HACK/
    * if (uriRemoveBaseUriA(&dest, &absoluteSource, &absoluteBase, URI_FALSE) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * uriFreeUriMembersA(&dest);
    * ...
    * }
    * /COMMENT_HACK* dest holds "../TWO" now *COMMENT_HACK/
    * ...
    * uriFreeUriMembersA(&dest);
    *
  • The fourth parameter is the domain root mode. With URI_FALSE as above this will produce
  • URIs relative to the base URI. With URI_TRUE the resulting URI will be relative to the
  • domain root instead, e.g. "/one/TWO" in this case.

Filenames and URIs

  • Converting filenames to and from URIs works on strings directly,
  • i.e. without creating an URI object.
  • * const char * const absFilename = "E:\\Documents and Settings";
    * const int bytesNeeded = 8 + 3 * strlen(absFilename) + 1;
    * char * absUri = malloc(bytesNeeded * sizeof(char));
    * if (uriWindowsFilenameToUriStringA(absFilename, absUri) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * free(absUri);
    * ...
    * }
    * /COMMENT_HACK* absUri is "file:///E:/Documents%20and%20Settings" now *COMMENT_HACK/
    * ...
    * free(absUri);
    *
  • Conversion works ..
  • - for relative or absolute values,
  • - in both directions (filenames <–> URIs) and
  • - with Unix and Windows filenames.
  • All you have to do is to choose the right function for the task and allocate
  • the required space (in characters) for the target buffer.
  • Let me present you an overview:
  • - Filename –> URI
  • - uriUnixFilenameToUriStringA()
  • Space required: [7 +] 3 * len(filename) + 1
  • - uriWindowsFilenameToUriStringA()
  • Space required: [8 +] 3 * len(filename) + 1
  • - URI –> filename
  • - uriUriStringToUnixFilenameA()
  • Space required: len(uriString) + 1 [- 7]
  • - uriUriStringToWindowsFilenameA()
  • Space required: len(uriString) + 1 [- 8]

Normalizing URIs

  • Sometimes we come across unnecessarily long URIs like "http://example.org/one/two/../../one".
  • The algorithm we can use to shorten this URI down to "http://example.org/one" is called
  • Syntax-Based Normalization.
  • Note that normalizing a URI does more than just "stripping dot segments". Please have a look at
  • Section 6.2.2 of RFC 3986
  • for the full description.
  • As we asked uriToStringCharsRequiredA() for the required space when converting
  • a URI object back to a string, we can ask uriNormalizeSyntaxMaskRequiredA() for
  • the parts of a URI that require normalization and then pass this normalization
  • mask to uriNormalizeSyntaxExA():
  • * const unsigned int dirtyParts = uriNormalizeSyntaxMaskRequiredA(&uri);
    * if (uriNormalizeSyntaxExA(&uri, dirtyParts) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * ...
    * }
    *
  • If you don't want to normalize all parts of the URI you can pass a custom
  • mask as well:
  • * const unsigned int normMask = URI_NORMALIZE_SCHEME | URI_NORMALIZE_USER_INFO;
    * if (uriNormalizeSyntaxExA(&uri, normMask) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * ...
    * }
    *
  • Please see UriNormalizationMaskEnum for the complete set of flags.
  • On the other hand calling plain uriNormalizeSyntaxA() (without the "Ex")
  • saves you thinking about single parts, as it queries uriNormalizeSyntaxMaskRequiredA()
  • internally:
  • * if (uriNormalizeSyntaxA(&uri) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * ...
    * }
    *

Working with query strings

  • RFC 3986
  • itself does not understand the query part of a URI as a list of key/value pairs.
  • But HTML 2.0 does and defines a media type application/x-www-form-urlencoded
  • in in section 8.2.1
  • of RFC 1866.
  • uriparser allows you to dissect (or parse) a query string into unescaped key/value pairs
  • and back.
  • To dissect the query part of a just-parsed URI you could write code like this:
  • * UriUriA uri;
    * UriQueryListA * queryList;
    * int itemCount;
    * ...
    * if (uriDissectQueryMallocA(&queryList, &itemCount, uri.query.first,
    * uri.query.afterLast) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * ...
    * }
    * ...
    * uriFreeQueryListA(queryList);
    *
  • Remarks
  • - NULL in the value member means there was no '=' in the item text as with "?abc&def".
  • - An empty string in the value member means there was '=' in the item as with "?abc=&def".
  • To compose a query string from a query list you could write code like this:
  • * int charsRequired;
    * int charsWritten;
    * char * queryString;
    * ...
    * if (uriComposeQueryCharsRequiredA(queryList, &charsRequired) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * ...
    * }
    * queryString = malloc((charsRequired + 1) * sizeof(char));
    * if (queryString == NULL) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * ...
    * }
    * if (uriComposeQueryA(queryString, queryList, charsRequired + 1, &charsWritten) != URI_SUCCESS) {
    * /COMMENT_HACK* Failure *COMMENT_HACK/
    * ...
    * }
    * ...
    * free(queryString);
    *

Ansi and Unicode

  • uriparser comes with two versions of every structure and function:
  • one handling Ansi text (char *) and one working with Unicode text (wchar_t *),
  • for instance
  • - uriParseSingleUriA() for Ansi and
  • - uriParseSingleUriW() for Unicode.
  • This tutorial only shows the usage of the Ansi editions but
  • their Unicode counterparts work in the very same way.

Autoconf Check

  • You can use the code below to make ./configure test for presence
  • of uriparser 0.6.4 or later.
  • *
    URIPARSER_MISSING="Please install uriparser 0.9.0 or later.
     *   On a Debian-based system enter 'sudo apt-get install liburiparser-dev'."
     *AC_CHECK_LIB(uriparser, uriParseSingleUriA,, AC_MSG_ERROR(${URIPARSER_MISSING}))
     *AC_CHECK_HEADER(uriparser/Uri.h,, AC_MSG_ERROR(${URIPARSER_MISSING}))
     *
     *URIPARSER_TOO_OLD="uriparser 0.9.0 or later is required, your copy is too old."
     *AC_COMPILE_IFELSE([
     *#include <uriparser/Uri.h>
     *#if (defined(URI_VER_MAJOR) && defined(URI_VER_MINOR) && defined(URI_VER_RELEASE) \
     *&& ((URI_VER_MAJOR > 0) \
     *|| ((URI_VER_MAJOR == 0) && (URI_VER_MINOR > 9)) \
     *|| ((URI_VER_MAJOR == 0) && (URI_VER_MINOR == 9) && (URI_VER_RELEASE >= 0)) \
     *))
     */* FINE */
     *#else
     *# error uriparser not recent enough
     *#endif
     *],,AC_MSG_ERROR(${URIPARSER_TOO_OLD}))
    */