Class AnyURI

  • Direct Known Subclasses:
    IRI, URI

    public class AnyURI
    extends Object
    Extremely minimal representation of an RFC 3986 URI or RFC 3987 IRI, optimized for altering the path, query, or fragment for URI rewriting.

    This only deals with four parts of the URI:

    1. scheme - everything before the first ':' (exclusive).
    2. hier-part - everything after the scheme and before the first '?' or '#' (exclusive). This may include host, port, path, and such, which this class is not concerned with.
    3. query - everything after the first '?' (exclusive) and the fragment '#' (exclusive)
    4. fragment - everything after the first '#' (exclusive)

    This class specifically:

    1. Does not do significant amounts of normalization
    2. Does not support any relative path resolution
    3. Does not do any scheme-specific validation
    4. Does not thoroughly detect malformed URIs

    Instances of this class are immutable and thus thread-safe. Mutating operations return a new instance.

    When a strict ASCII-only representation of a RFC 3986 URI is required, use URI. When a Unicode representation of a RFC 3987 IRI is preferred, use IRI. Otherwise, to support both, use AnyURI, which should also perform the best since it performs fewer conversions.


    Encoding and decoding is always done in UTF-8. This choice is supported by IRIStatus - Query encoding, and is consistent with URI.

    This simplification allows us to no longer pass encoding around and no longer throw any UnsupportedEncodingException as UTF-8 is a standard character set.

    We do not support the use of any encoding other than UTF-8, which allows us to avoid all the gray zones of the various protocol specifications, versions, and implementations.


    TODO: These methods are for highest performance and are consistent with the JavaScript methods. They are not meant for general purpose URL manipulation, and are not trying to replace any full-featured URI tools.

    Consider the following if needing more than what this provides (in no particular order):

    1. URL
    2. URI
    3. URIBuilder
    4. UriUtils
    5. UrlEscapers
    6. jena-iri
    7. org.xbib:net-url

    Further reading:

    1. IRIStatus - Query encoding:
      Update 2015-08-25: The URL spec defines this formally. By default the query string uses UTF-8. X-Form's (defined by HTML) allows the page author to supply the override legacy character encoding if needed (UTF-8 is encouraged). If an override is used, there may be nothing in the URL itself that indicates what this override encoding is: the receiver just has to know.
    2. RFC 3987: 6.4. Use of UTF-8 for Encoding Original Characters:
      Similar considerations apply to query parts. The functionality of IRIs (namely, to be able to include non-ASCII characters) can only be used if the query part is encoded in UTF-8.
    3. HTML 5: 2.6.1 Terminology:
      The URL is a valid IRI reference and its query component contains no unescaped non-ASCII characters. [RFC3987]
      The URL is a valid IRI reference and the character encoding of the URL's Document is UTF-8 or UTF-16. [RFC3987]
    4. HTML 5: 2.6.3 Resolving URLs:
      Let encoding be determined as follows:
      If the URL came from a DOM node (e.g. from an element)
      The node has a Document, and the URL character encoding is the document's character encoding.
    Author:
    AO Industries, Inc.
    See Also:
    URI, URIParser, URI
    • Constructor Detail

      • AnyURI

        public AnyURI​(String uri)
    • Method Detail

      • equals

        public final boolean equals​(Object obj)
        Compares the URI directly. No encoding or decoding is performed. This does not compare URIs semantically.
        Overrides:
        equals in class Object
      • getSchemeLength

        public int getSchemeLength()
        Gets the length of the scheme or -1 when there is no scheme. This is also the index of the colon (':') that ends the scheme.
        Returns:
        the index of the ':' marking the end of the scheme or -1 when there is no scheme.
      • hasScheme

        public boolean hasScheme()
        Checks if this has a scheme.
      • isScheme

        public boolean isScheme​(String scheme)
                         throws IllegalArgumentException
        Checks if a URI starts with the given scheme.
        Parameters:
        scheme - The scheme to look for, not including colon. For example "http". When null, with match a URI without a scheme.
        Throws:
        IllegalArgumentException - when scheme is determined to be invalid. Please note that this determination is not guaranteed as shortcuts may skip individual character comparisons.
      • getScheme

        public String getScheme()
        Gets the scheme for a URI, or null when has no scheme. An empty scheme will never be returned (if the URI starts with ':').

        This method may involve string manipulation, favor the writeScheme(…) and appendScheme(…) methods when appropriate.

        Returns:
        The scheme, not including colon, or null when there is no scheme. For example "http".
      • getPathEnd

        public int getPathEnd()
        Gets the path end within this URI.
        Returns:
        the index of the first '?' or '#' (exclusive), or the length of the URI when neither found.
      • getHierPart

        public String getHierPart()
        Gets the hier-part - everything after the scheme and before the first '?' or '#' (exclusive). This may include host, port, path, and such, which this class is not concerned with.

        This method may involve string manipulation, favor the writeHierPart(…) and appendHierPart(…) methods when appropriate.

        Returns:
        the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
      • writeHierPart

        public void writeHierPart​(Writer out)
                           throws IOException
        Writes the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
        Throws:
        IOException
      • writeHierPart

        public void writeHierPart​(Writer out,
                                  Encoder encoder)
                           throws IOException
        Writes the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
        Throws:
        IOException
      • appendHierPart

        public AnyURI appendHierPart​(Appendable out)
                              throws IOException
        Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
        Returns:
        this
        Throws:
        IOException
      • appendHierPart

        public AnyURI appendHierPart​(Encoder encoder,
                                     Appendable out)
                              throws IOException
        Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
        Returns:
        this
        Throws:
        IOException
      • appendHierPart

        public StringBuilder appendHierPart​(StringBuilder sb)
        Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
        Returns:
        The StringBuilder sb
      • appendHierPart

        public StringBuffer appendHierPart​(StringBuffer sb)
        Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
        Returns:
        The StringBuffer sb
      • getQueryIndex

        public int getQueryIndex()
        Gets the index of the query marker ('?').
        Returns:
        the index of the '?' marking the query string or -1 when there is no query string.
      • hasQuery

        public boolean hasQuery()
        Checks if this has a query.
      • getQueryString

        public String getQueryString()
        Gets the query string.

        This method may involve string manipulation, favor the writeQueryString(…) and appendQuery(…) methods when appropriate.

        Returns:
        the query string (not including the '?') or null when there is no query.
      • writeQueryString

        public void writeQueryString​(Writer out)
                              throws IOException
        Writes the query string (not including the '?').
        Throws:
        IOException
      • getFragmentIndex

        public int getFragmentIndex()
        Gets the index of the fragment marker ('#').
        Returns:
        the index of the '#' marking the fragment or -1 when there is no fragment.
      • hasFragment

        public boolean hasFragment()
        Checks if this has an fragment.
      • getFragment

        public String getFragment()
        Gets the fragment.

        This method may involve string manipulation, favor the writeFragment(…) and appendFragment(…) methods when appropriate.

        Returns:
        the fragment (not including the '#') or null when there is no fragment.
      • writeFragment

        public void writeFragment​(Writer out)
                           throws IOException
        Writes the fragment (not including the '#').
        Throws:
        IOException
      • isEncodingNormalized

        public boolean isEncodingNormalized()
        Is this URI percent-encoding normalized? Normalized percent encoding means it will have only the required percent encodings, and the encodings are capitalized hexadecimal.

        Note: This only refers to the percent encodings. This is not related to full URI normalization.

      • setHierPart

        public AnyURI setHierPart​(String hierPart)
        Replaces the hier-part.
        Parameters:
        hierPart - The hier-part may not contain the query marker '?' or fragment marker '#'
        Returns:
        The new AnyURI or this when unmodified.
      • setQueryString

        public AnyURI setQueryString​(String query)
        Replaces the query string.
        Parameters:
        query - The query (not including the first '?') - it is added without additional encoding. The query is removed when the query is null. The query may not contain the fragment marker '#'
        Returns:
        The new AnyURI or this when unmodified.
      • addQueryString

        public AnyURI addQueryString​(String query)
        Adds a query string.
        Parameters:
        query - The query (not including the first '?' / '&') - it is added without additional encoding. Nothing is added when the query is null. The query may not contain the fragment marker '#'
        Returns:
        The new AnyURI or this when unmodified.
      • addEncodedParameter

        public AnyURI addEncodedParameter​(String encodedName,
                                          String encodedValue)
        Adds an already-encoded parameter.
        Parameters:
        encodedName - The parameter name - it is added without additional encoding. Nothing is added when the name is null. The name may not contain the fragment marker '#'
        encodedValue - The parameter value - it is added without additional encoding. When null, the parameter is added without any '='. Must be null when name is null. The value may not contain the fragment marker '#'
        Returns:
        The new AnyURI or this when unmodified.
      • addParameter

        public AnyURI addParameter​(String name,
                                   String value)
        Encodes and adds a parameter.
        Parameters:
        name - The parameter name. Nothing is added when the name is null.
        value - The parameter value. When null, the parameter is added without any '='. Must be null when name is null.
        Returns:
        The new AnyURI or this when unmodified.
        See Also:
        URIEncoder.encodeURIComponent(java.lang.String)
      • setEncodedFragment

        public AnyURI setEncodedFragment​(String encodedFragment)
        Replaces the fragment.
        Parameters:
        encodedFragment - The fragment (not including the '#') - it is added without additional encoding. Removes fragment when null.
        Returns:
        The new AnyURI or this when unmodified.
      • setFragment

        public AnyURI setFragment​(String fragment)
        Replaces the fragment in the default encoding IRI.ENCODING.

        TODO: Implement specification of fragment-escape.

        Parameters:
        fragment - The fragment (not including the '#') or null for no fragment.
        Returns:
        The new AnyURI or this when unmodified.