This only deals with four parts of the URI:
- scheme - everything before the first ':' (exclusive).
- hier-part - everything after the scheme and before the first '?' or '#' (exclusive). This may include host, port, path, and such, which this class is not concerned with.
- query - everything after the first '?' (exclusive) and the fragment '#' (exclusive)
- fragment - everything after the first '#' (exclusive)
This class specifically:
- Does not do significant amounts of normalization
- Does not support any relative path resolution
- Does not do any scheme-specific validation
- Does not thoroughly detect malformed URIs
Instances of this class are immutable and thus thread-safe. Mutating operations return a new instance.
When a strict ASCII-only representation of a RFC 3986 URI
is required, use URI
. When a Unicode representation of a RFC 3987 IRI
is preferred, use IRI
. Otherwise, to support both, use AnyURI
, which should also perform
the best since it performs fewer conversions.
Encoding and decoding is always done in UTF-8. This choice is supported by
IRIStatus - Query encoding,
and is consistent with URI
.
This simplification allows us to no longer pass encoding around and no longer
throw any UnsupportedEncodingException
as UTF-8 is a
standard character set.
We do not support the use of any encoding other than UTF-8, which allows us to avoid all the gray zones of the various protocol specifications, versions, and implementations.
TODO: These methods are for highest performance and are consistent with the JavaScript methods. They are not meant for general purpose URL manipulation, and are not trying to replace any full-featured URI tools.
Consider the following if needing more than what this provides (in no particular order):
Further reading:
- IRIStatus - Query encoding:
Update 2015-08-25: The URL spec defines this formally. By default the query string uses UTF-8. X-Form's (defined by HTML) allows the page author to supply the override legacy character encoding if needed (UTF-8 is encouraged). If an override is used, there may be nothing in the URL itself that indicates what this override encoding is: the receiver just has to know.
- RFC 3987: 6.4. Use of UTF-8 for Encoding Original Characters:
Similar considerations apply to query parts. The functionality of IRIs (namely, to be able to include non-ASCII characters) can only be used if the query part is encoded in UTF-8.
- HTML 5: 2.6.1 Terminology:
The URL is a valid IRI reference and its query component contains no unescaped non-ASCII characters. [RFC3987]
The URL is a valid IRI reference and the character encoding of the URL's Document is UTF-8 or UTF-16. [RFC3987] - HTML 5: 2.6.3 Resolving URLs:
Let encoding be determined as follows:
If the URL came from a DOM node (e.g. from an element)
The node has a Document, and the URL character encoding is the document's character encoding.
- Author:
- AO Industries, Inc.
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionaddEncodedParameter
(String encodedName, String encodedValue) Adds an already-encoded parameter.addParameter
(String name, String value) Encodes and adds a parameter.addParameters
(URIParameters params) Adds all of the parameters.addQueryString
(String query) Adds a query string.appendFragment
(Encoder encoder, Appendable out) Appends the fragment (not including the '#').appendFragment
(Appendable out) Appends the fragment (not including the '#').Appends the fragment (not including the '#').Appends the fragment (not including the '#').appendHierPart
(Encoder encoder, Appendable out) Appends the part of the URI after the scheme and up to the first '?'appendHierPart
(Appendable out) Appends the part of the URI after the scheme and up to the first '?'Appends the part of the URI after the scheme and up to the first '?'Appends the part of the URI after the scheme and up to the first '?'appendQueryString
(Encoder encoder, Appendable out) Appends the query string (not including the '?').Appends the query string (not including the '?').Appends the query string (not including the '?').Appends the query string (not including the '?').appendScheme
(Encoder encoder, Appendable out) Appends the scheme (not including the ':').appendScheme
(Appendable out) Appends the scheme (not including the ':').Appends the scheme (not including the ':').Appends the scheme (not including the ':').final boolean
Compares theURI
directly.Gets the fragment.int
Gets the index of the fragment marker ('#').Gets the hier-part - everything after the scheme and before the first '?'int
Gets the path end within this URI.int
Gets the index of the query marker ('?').Gets the query string.Gets the scheme for a URI, ornull
when has no scheme.int
Gets the length of the scheme or-1
when there is no scheme.boolean
Checks if this has an fragment.final int
hashCode()
The hash code is the same as the hash code of the uri.boolean
hasQuery()
Checks if this has a query.boolean
Checks if this has a scheme.boolean
Is this URI percent-encoding normalized?boolean
Checks if a URI starts with the given scheme.boolean
pathEndsWith
(String suffix) Checks if the path ends with the given value.boolean
pathEndsWithIgnoreCase
(String suffix) Checks if the path ends with the given value, case-insensitive.setEncodedFragment
(String encodedFragment) Replaces the fragment.setFragment
(String fragment) Replaces the fragment in the default encodingIRI.ENCODING
.setHierPart
(String hierPart) Replaces the hier-part.setQueryString
(String query) Replaces the query string.Gets the full URI in RFC 3986 URI US-ASCII format.toIRI()
Gets this URI encoded in RFC 3987 IRI Unicode format.toString()
Gets the full URI.toURI()
Gets this URI encoded in RFC 3986 URI US-ASCII format.void
writeFragment
(Writer out) Writes the fragment (not including the '#').void
writeFragment
(Writer out, Encoder encoder) Writes the fragment (not including the '#').void
writeHierPart
(Writer out) Writes the part of the URI after the scheme and up to the first '?'void
writeHierPart
(Writer out, Encoder encoder) Writes the part of the URI after the scheme and up to the first '?'void
writeQueryString
(Writer out) Writes the query string (not including the '?').void
writeQueryString
(Writer out, Encoder encoder) Writes the query string (not including the '?').void
writeScheme
(Writer out) Writes the scheme (not including the ':').void
writeScheme
(Writer out, Encoder encoder) Writes the scheme (not including the ':').
-
Constructor Details
-
AnyURI
-
-
Method Details
-
toString
Gets the full URI.This may be a mixture of RFC 3986 URI US-ASCII and RFC 3987 IRI Unicode formats.
This might not be percent-encoding normalized. Use
toIRI()
.toString()
ortoIRI()
.toURI()
.toString()
if consistent formatting is required. -
toASCIIString
Gets the full URI in RFC 3986 URI US-ASCII format.This might not be percent-encoding normalized. Use
toIRI()
.toASCIIString()
if consistent formatting is required. -
equals
Compares theURI
directly. No encoding or decoding is performed. This does not compare URIs semantically. -
hashCode
public final int hashCode()The hash code is the same as the hash code of the uri. -
getSchemeLength
public int getSchemeLength()Gets the length of the scheme or-1
when there is no scheme. This is also the index of the colon (':') that ends the scheme.- Returns:
- the index of the ':' marking the end of the scheme or
-1
when there is no scheme.
-
hasScheme
public boolean hasScheme()Checks if this has a scheme. -
isScheme
Checks if a URI starts with the given scheme.- Parameters:
scheme
- The scheme to look for, not including colon. For example"http"
. Whennull
, with match a URI without a scheme.- Throws:
IllegalArgumentException
- whenscheme
is determined to be invalid. Please note that this determination is not guaranteed as shortcuts may skip individual character comparisons.
-
getScheme
Gets the scheme for a URI, ornull
when has no scheme. An empty scheme will never be returned (if the URI starts with ':').This method may involve string manipulation, favor the
writeScheme(…)
andappendScheme(…)
methods when appropriate.- Returns:
- The scheme, not including colon, or
null
when there is no scheme. For example"http"
.
-
writeScheme
Writes the scheme (not including the ':').- Throws:
IOException
-
writeScheme
Writes the scheme (not including the ':').- Throws:
IOException
-
appendScheme
Appends the scheme (not including the ':').- Returns:
this
- Throws:
IOException
-
appendScheme
Appends the scheme (not including the ':').- Returns:
this
- Throws:
IOException
-
appendScheme
Appends the scheme (not including the ':').- Returns:
- The
StringBuilder
sb
-
appendScheme
Appends the scheme (not including the ':').- Returns:
- The
StringBuffer
sb
-
getPathEnd
public int getPathEnd()Gets the path end within this URI.- Returns:
- the index of the first '?' or '#' (exclusive), or the length of the URI when neither found.
-
pathEndsWith
Checks if the path ends with the given value.- See Also:
-
pathEndsWithIgnoreCase
Checks if the path ends with the given value, case-insensitive.- See Also:
-
getHierPart
Gets the hier-part - everything after the scheme and before the first '?' or '#' (exclusive). This may include host, port, path, and such, which this class is not concerned with.This method may involve string manipulation, favor the
writeHierPart(…)
andappendHierPart(…)
methods when appropriate.- Returns:
- the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.
-
writeHierPart
Writes the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.- Throws:
IOException
-
writeHierPart
Writes the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.- Throws:
IOException
-
appendHierPart
Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.- Returns:
this
- Throws:
IOException
-
appendHierPart
Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.- Returns:
this
- Throws:
IOException
-
appendHierPart
Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.- Returns:
- The
StringBuilder
sb
-
appendHierPart
Appends the part of the URI after the scheme and up to the first '?' or '#' (exclusive), or the full URI when neither found.- Returns:
- The
StringBuffer
sb
-
getQueryIndex
public int getQueryIndex()Gets the index of the query marker ('?').- Returns:
- the index of the '?' marking the query string or
-1
when there is no query string.
-
hasQuery
public boolean hasQuery()Checks if this has a query. -
getQueryString
Gets the query string.This method may involve string manipulation, favor the
writeQueryString(…)
andappendQuery(…)
methods when appropriate.- Returns:
- the query string (not including the '?') or
null
when there is no query.
-
writeQueryString
Writes the query string (not including the '?').- Throws:
IOException
-
writeQueryString
Writes the query string (not including the '?').- Throws:
IOException
-
appendQueryString
Appends the query string (not including the '?').- Returns:
this
- Throws:
IOException
-
appendQueryString
Appends the query string (not including the '?').- Returns:
this
- Throws:
IOException
-
appendQueryString
Appends the query string (not including the '?').- Returns:
- The
StringBuilder
sb
-
appendQueryString
Appends the query string (not including the '?').- Returns:
- The
StringBuffer
sb
-
getFragmentIndex
public int getFragmentIndex()Gets the index of the fragment marker ('#').- Returns:
- the index of the '#' marking the fragment or
-1
when there is no fragment.
-
hasFragment
public boolean hasFragment()Checks if this has an fragment. -
getFragment
Gets the fragment.This method may involve string manipulation, favor the
writeFragment(…)
andappendFragment(…)
methods when appropriate.- Returns:
- the fragment (not including the '#') or
null
when there is no fragment.
-
writeFragment
Writes the fragment (not including the '#').- Throws:
IOException
-
writeFragment
Writes the fragment (not including the '#').- Throws:
IOException
-
appendFragment
Appends the fragment (not including the '#').- Returns:
this
- Throws:
IOException
-
appendFragment
Appends the fragment (not including the '#').- Returns:
this
- Throws:
IOException
-
appendFragment
Appends the fragment (not including the '#').- Returns:
- The
StringBuilder
sb
-
appendFragment
Appends the fragment (not including the '#').- Returns:
- The
StringBuffer
sb
-
isEncodingNormalized
public boolean isEncodingNormalized()Is this URI percent-encoding normalized? Normalized percent encoding means it will have only the required percent encodings, and the encodings are capitalized hexadecimal.Note: This only refers to the percent encodings. This is not related to full URI normalization.
-
toURI
Gets this URI encoded in RFC 3986 URI US-ASCII format.This might not be percent-encoding normalized. Use
toIRI()
.toURI()
if consistent formatting is required.- Returns:
- The
URI
orthis
when unmodified. - See Also:
-
toIRI
Gets this URI encoded in RFC 3987 IRI Unicode format.- Returns:
- The
IRI
orthis
when unmodified. - See Also:
-
setHierPart
Replaces the hier-part.- Parameters:
hierPart
- The hier-part may not contain the query marker '?' or fragment marker '#'- Returns:
- The new
AnyURI
orthis
when unmodified.
-
setQueryString
Replaces the query string.- Parameters:
query
- The query (not including the first '?') - it is added without additional encoding. The query is removed when the query isnull
. The query may not contain the fragment marker '#'- Returns:
- The new
AnyURI
orthis
when unmodified.
-
addQueryString
Adds a query string.- Parameters:
query
- The query (not including the first '?' / '&') - it is added without additional encoding. Nothing is added when the query isnull
. The query may not contain the fragment marker '#'- Returns:
- The new
AnyURI
orthis
when unmodified.
-
addEncodedParameter
Adds an already-encoded parameter.- Parameters:
encodedName
- The parameter name - it is added without additional encoding. Nothing is added when the name isnull
. The name may not contain the fragment marker '#'encodedValue
- The parameter value - it is added without additional encoding. Whennull
, the parameter is added without any '='. Must benull
whenname
isnull
. The value may not contain the fragment marker '#'- Returns:
- The new
AnyURI
orthis
when unmodified.
-
addParameter
Encodes and adds a parameter.- Parameters:
name
- The parameter name. Nothing is added when the name isnull
.value
- The parameter value. Whennull
, the parameter is added without any '='. Must benull
whenname
isnull
.- Returns:
- The new
AnyURI
orthis
when unmodified. - See Also:
-
addParameters
Adds all of the parameters.- Parameters:
params
- The parameters to add. Nothing is added whennull
or empty.- Returns:
- The new
AnyURI
orthis
when unmodified. - See Also:
-
setEncodedFragment
Replaces the fragment.- Parameters:
encodedFragment
- The fragment (not including the '#') - it is added without additional encoding. Removes fragment whennull
.- Returns:
- The new
AnyURI
orthis
when unmodified.
-
setFragment
Replaces the fragment in the default encodingIRI.ENCODING
.TODO: Implement specification of fragment-escape.
- Parameters:
fragment
- The fragment (not including the '#') ornull
for no fragment.- Returns:
- The new
AnyURI
orthis
when unmodified.
-