Class AnyURI

java.lang.Object
com.aoapps.net.AnyURI
Direct Known Subclasses:
IRI, URI

public class AnyURI extends Object
Extremely minimal representation of an RFC 3986 URI or RFC 3987 IRI, optimized for altering the path, query, or fragment for URI rewriting.

This only deals with four parts of the URI:

  1. scheme - everything before the first ':' (exclusive).
  2. hier-part - everything after the scheme and before the first '?' or '#' (exclusive). This may include host, port, path, and such, which this class is not concerned with.
  3. query - everything after the first '?' (exclusive) and the fragment '#' (exclusive)
  4. fragment - everything after the first '#' (exclusive)

This class specifically:

  1. Does not do significant amounts of normalization
  2. Does not support any relative path resolution
  3. Does not do any scheme-specific validation
  4. Does not thoroughly detect malformed URIs

Instances of this class are immutable and thus thread-safe. Mutating operations return a new instance.

When a strict ASCII-only representation of a RFC 3986 URI is required, use URI. When a Unicode representation of a RFC 3987 IRI is preferred, use IRI. Otherwise, to support both, use AnyURI, which should also perform the best since it performs fewer conversions.


Encoding and decoding is always done in UTF-8. This choice is supported by IRIStatus - Query encoding, and is consistent with URI.

This simplification allows us to no longer pass encoding around and no longer throw any UnsupportedEncodingException as UTF-8 is a standard character set.

We do not support the use of any encoding other than UTF-8, which allows us to avoid all the gray zones of the various protocol specifications, versions, and implementations.


TODO: These methods are for highest performance and are consistent with the JavaScript methods. They are not meant for general purpose URL manipulation, and are not trying to replace any full-featured URI tools.

Consider the following if needing more than what this provides (in no particular order):

  1. URL
  2. URI
  3. URIBuilder
  4. UriUtils
  5. UrlEscapers
  6. jena-iri
  7. org.xbib:net-url

Further reading:

  1. IRIStatus - Query encoding:
    Update 2015-08-25: The URL spec defines this formally. By default the query string uses UTF-8. X-Form's (defined by HTML) allows the page author to supply the override legacy character encoding if needed (UTF-8 is encouraged). If an override is used, there may be nothing in the URL itself that indicates what this override encoding is: the receiver just has to know.
  2. RFC 3987: 6.4. Use of UTF-8 for Encoding Original Characters:
    Similar considerations apply to query parts. The functionality of IRIs (namely, to be able to include non-ASCII characters) can only be used if the query part is encoded in UTF-8.
  3. HTML 5: 2.6.1 Terminology:
    The URL is a valid IRI reference and its query component contains no unescaped non-ASCII characters. [RFC3987]
    The URL is a valid IRI reference and the character encoding of the URL's Document is UTF-8 or UTF-16. [RFC3987]
  4. HTML 5: 2.6.3 Resolving URLs:
    Let encoding be determined as follows:
    If the URL came from a DOM node (e.g. from an element)
    The node has a Document, and the URL character encoding is the document's character encoding.
Author:
AO Industries, Inc.
See Also:
  • Constructor Details

    • AnyURI

      public AnyURI(String uri)
  • Method Details

    • toString

      public String toString()
      Gets the full URI.

      This may be a mixture of RFC 3986 URI US-ASCII and RFC 3987 IRI Unicode formats.

      This might not be percent-encoding normalized. Use toIRI().toString() or toIRI().toURI().toString() if consistent formatting is required.

      Overrides:
      toString in class Object
    • toASCIIString

      public String toASCIIString()
      Gets the full URI in RFC 3986 URI US-ASCII format.

      This might not be percent-encoding normalized. Use toIRI().toASCIIString() if consistent formatting is required.

    • equals

      public final boolean equals(Object obj)
      Compares the URI directly. No encoding or decoding is performed. This does not compare URIs semantically.
      Overrides:
      equals in class Object
    • hashCode

      public final int hashCode()
      The hash code is the same as the hash code of the uri.
      Overrides:
      hashCode in class Object
      See Also:
    • getSchemeLength

      public int getSchemeLength()
      Gets the length of the scheme or -1 when there is no scheme. This is also the index of the colon (':') that ends the scheme.
      Returns:
      the index of the ':' marking the end of the scheme or -1 when there is no scheme.
    • hasScheme

      public boolean hasScheme()
      Checks if this has a scheme.
    • isScheme

      public boolean isScheme(String scheme) throws IllegalArgumentException
      Checks if a URI starts with the given scheme.
      Parameters:
      scheme - The scheme to look for, not including colon. For example "http". When null, with match a URI without a scheme.
      Throws:
      IllegalArgumentException - when scheme is determined to be invalid. Please note that this determination is not guaranteed as shortcuts may skip individual character comparisons.
    • getScheme

      public String getScheme()
      Gets the scheme for a URI, or null when has no scheme. An empty scheme will never be returned (if the URI starts with ':').

      This method may involve string manipulation, favor the writeScheme(…) and appendScheme(…) methods when appropriate.

      Returns:
      The scheme, not including colon, or null when there is no scheme. For example "http".
    • writeScheme

      public void writeScheme(Writer out) throws IOException
      Writes the scheme (not including the ':').
      Throws:
      IOException
    • writeScheme

      public void writeScheme(Writer out, Encoder encoder) throws IOException
      Writes the scheme (not including the ':').
      Throws:
      IOException
    • appendScheme

      public AnyURI appendScheme(Appendable out) throws IOException
      Appends the scheme (not including the ':').
      Returns:
      this
      Throws:
      IOException
    • appendScheme

      public AnyURI appendScheme(Encoder encoder, Appendable out) throws IOException
      Appends the scheme (not including the ':').
      Returns:
      this
      Throws:
      IOException
    • appendScheme

      public StringBuilder appendScheme(StringBuilder sb)
      Appends the scheme (not including the ':').
      Returns:
      The StringBuilder sb
    • appendScheme

      public StringBuffer appendScheme(StringBuffer sb)
      Appends the scheme (not including the ':').
      Returns:
      The StringBuffer sb
    • getPathEnd

      public int getPathEnd()
      Gets the path end within this URI.
      Returns:
      the index of the first '?' or '#' (exclusive), or the length of the URI when neither found.
    • pathEndsWith

      public boolean pathEndsWith(String suffix)
      Checks if the path ends with the given value.
      See Also:
    • pathEndsWithIgnoreCase

      public boolean pathEndsWithIgnoreCase(String suffix)
      Checks if the path ends with the given value, case-insensitive.
      See Also:
    • getHierPart

      public String getHierPart()
      Gets the hier-part - everything after the scheme and before the first '?' or '#' (exclusive). This may include host, port, path, and such, which this class is not concerned with.

      This method may involve string manipulation, favor the writeHierPart(…) and appendHierPart(…) methods when appropriate.

      Returns:
      the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
    • writeHierPart

      public void writeHierPart(Writer out) throws IOException
      Writes the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
      Throws:
      IOException
    • writeHierPart

      public void writeHierPart(Writer out, Encoder encoder) throws IOException
      Writes the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
      Throws:
      IOException
    • appendHierPart

      public AnyURI appendHierPart(Appendable out) throws IOException
      Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
      Returns:
      this
      Throws:
      IOException
    • appendHierPart

      public AnyURI appendHierPart(Encoder encoder, Appendable out) throws IOException
      Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
      Returns:
      this
      Throws:
      IOException
    • appendHierPart

      public StringBuilder appendHierPart(StringBuilder sb)
      Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
      Returns:
      The StringBuilder sb
    • appendHierPart

      public StringBuffer appendHierPart(StringBuffer sb)
      Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
      Returns:
      The StringBuffer sb
    • getQueryIndex

      public int getQueryIndex()
      Gets the index of the query marker ('?').
      Returns:
      the index of the '?' marking the query string or -1 when there is no query string.
    • hasQuery

      public boolean hasQuery()
      Checks if this has a query.
    • getQueryString

      public String getQueryString()
      Gets the query string.

      This method may involve string manipulation, favor the writeQueryString(…) and appendQuery(…) methods when appropriate.

      Returns:
      the query string (not including the '?') or null when there is no query.
    • writeQueryString

      public void writeQueryString(Writer out) throws IOException
      Writes the query string (not including the '?').
      Throws:
      IOException
    • writeQueryString

      public void writeQueryString(Writer out, Encoder encoder) throws IOException
      Writes the query string (not including the '?').
      Throws:
      IOException
    • appendQueryString

      public AnyURI appendQueryString(Appendable out) throws IOException
      Appends the query string (not including the '?').
      Returns:
      this
      Throws:
      IOException
    • appendQueryString

      public AnyURI appendQueryString(Encoder encoder, Appendable out) throws IOException
      Appends the query string (not including the '?').
      Returns:
      this
      Throws:
      IOException
    • appendQueryString

      public StringBuilder appendQueryString(StringBuilder sb)
      Appends the query string (not including the '?').
      Returns:
      The StringBuilder sb
    • appendQueryString

      public StringBuffer appendQueryString(StringBuffer sb)
      Appends the query string (not including the '?').
      Returns:
      The StringBuffer sb
    • getFragmentIndex

      public int getFragmentIndex()
      Gets the index of the fragment marker ('#').
      Returns:
      the index of the '#' marking the fragment or -1 when there is no fragment.
    • hasFragment

      public boolean hasFragment()
      Checks if this has an fragment.
    • getFragment

      public String getFragment()
      Gets the fragment.

      This method may involve string manipulation, favor the writeFragment(…) and appendFragment(…) methods when appropriate.

      Returns:
      the fragment (not including the '#') or null when there is no fragment.
    • writeFragment

      public void writeFragment(Writer out) throws IOException
      Writes the fragment (not including the '#').
      Throws:
      IOException
    • writeFragment

      public void writeFragment(Writer out, Encoder encoder) throws IOException
      Writes the fragment (not including the '#').
      Throws:
      IOException
    • appendFragment

      public AnyURI appendFragment(Appendable out) throws IOException
      Appends the fragment (not including the '#').
      Returns:
      this
      Throws:
      IOException
    • appendFragment

      public AnyURI appendFragment(Encoder encoder, Appendable out) throws IOException
      Appends the fragment (not including the '#').
      Returns:
      this
      Throws:
      IOException
    • appendFragment

      public StringBuilder appendFragment(StringBuilder sb)
      Appends the fragment (not including the '#').
      Returns:
      The StringBuilder sb
    • appendFragment

      public StringBuffer appendFragment(StringBuffer sb)
      Appends the fragment (not including the '#').
      Returns:
      The StringBuffer sb
    • isEncodingNormalized

      public boolean isEncodingNormalized()
      Is this URI percent-encoding normalized? Normalized percent encoding means it will have only the required percent encodings, and the encodings are capitalized hexadecimal.

      Note: This only refers to the percent encodings. This is not related to full URI normalization.

    • toURI

      public URI toURI()
      Gets this URI encoded in RFC 3986 URI US-ASCII format.

      This might not be percent-encoding normalized. Use toIRI().toURI() if consistent formatting is required.

      Returns:
      The URI or this when unmodified.
      See Also:
    • toIRI

      public IRI toIRI()
      Gets this URI encoded in RFC 3987 IRI Unicode format.
      Returns:
      The IRI or this when unmodified.
      See Also:
    • setHierPart

      public AnyURI setHierPart(String hierPart)
      Replaces the hier-part.
      Parameters:
      hierPart - The hier-part may not contain the query marker '?' or fragment marker '#'
      Returns:
      The new AnyURI or this when unmodified.
    • setQueryString

      public AnyURI setQueryString(String query)
      Replaces the query string.
      Parameters:
      query - The query (not including the first '?') - it is added without additional encoding. The query is removed when the query is null. The query may not contain the fragment marker '#'
      Returns:
      The new AnyURI or this when unmodified.
    • addQueryString

      public AnyURI addQueryString(String query)
      Adds a query string.
      Parameters:
      query - The query (not including the first '?' / '&') - it is added without additional encoding. Nothing is added when the query is null. The query may not contain the fragment marker '#'
      Returns:
      The new AnyURI or this when unmodified.
    • addEncodedParameter

      public AnyURI addEncodedParameter(String encodedName, String encodedValue)
      Adds an already-encoded parameter.
      Parameters:
      encodedName - The parameter name - it is added without additional encoding. Nothing is added when the name is null. The name may not contain the fragment marker '#'
      encodedValue - The parameter value - it is added without additional encoding. When null, the parameter is added without any '='. Must be null when name is null. The value may not contain the fragment marker '#'
      Returns:
      The new AnyURI or this when unmodified.
    • addParameter

      public AnyURI addParameter(String name, String value)
      Encodes and adds a parameter.
      Parameters:
      name - The parameter name. Nothing is added when the name is null.
      value - The parameter value. When null, the parameter is added without any '='. Must be null when name is null.
      Returns:
      The new AnyURI or this when unmodified.
      See Also:
    • addParameters

      public AnyURI addParameters(URIParameters params)
      Adds all of the parameters.
      Parameters:
      params - The parameters to add. Nothing is added when null or empty.
      Returns:
      The new AnyURI or this when unmodified.
      See Also:
    • setEncodedFragment

      public AnyURI setEncodedFragment(String encodedFragment)
      Replaces the fragment.
      Parameters:
      encodedFragment - The fragment (not including the '#') - it is added without additional encoding. Removes fragment when null.
      Returns:
      The new AnyURI or this when unmodified.
    • setFragment

      public AnyURI setFragment(String fragment)
      Replaces the fragment in the default encoding IRI.ENCODING.

      TODO: Implement specification of fragment-escape.

      Parameters:
      fragment - The fragment (not including the '#') or null for no fragment.
      Returns:
      The new AnyURI or this when unmodified.