Ant tasks used in building AO-supported projects.
Features
- Fine-grained management of last-modified times within
*.aar
,*.jar
,*.war
, and*.zip
files for optimum reproducibility and publishability. - Generate directory-only ZIP files with a reference timestamp to be able to manipulate ZIP file structure reproducibly while also not losing per-entry timestamps.
-
SEO filter Javadocs: Canonical URLs,
selective
rel="nofollow"
, Sitemaps, and Google Analytics tracking code.
Motivation
Our immediate goal is to have efficient sitemaps for generated Javadocs. The sitemaps must provide accurate last-modified timestamps for generated pages. Our current implementation of reproducible builds is losing last-modified information.
More broadly, we desire accurate last-modified times for all project resources deployed in *.aar
, *.jar
, *.war
,
and *.zip
files. This can have implications for
web content modeling,
web resource caching, and the resulting
sitemap generation.
Standard Solutions and Related Deficiencies
As a simple strategy to create reproducible builds, a typical starting point is to
declare a timestamp in the ${project.build.outputTimestamp}
property.
This timestamp is then used for all entries in all resulting AAR/JAR/WAR/ZIP files. Standard Maven plugins all use this
value, and the Maven Release Plugin will update this
value automatically during releases.
This simple approach, however, introduces an issue when serving web content from the /META-INF/resources
directory
within JAR files (and a similar issue with the main content served directly from or expanded from the main WAR file).
Specifically, in snapshot builds between releases, the JAR/WAR will be recreated with the same timestamp,
thus being updated without the main web application and clients being aware of the change.
One workaround is to modify this timestamp on each commit. This concept can be automated through the use of git-commit-id-maven-plugin, which will use the timestamp of the last commit. This ensures last modified times are updated, so nothing is cached too aggressively. However, it now appears that all resources are modified on every commit. This, in turn, can cause web crawlers to be directed toward many pages that have, in fact, not been updated. When this is Javadoc-generated content, it can cause the crawler to take a while to find actual updated content, or even worse it could distract the crawler from more meaningful changes elsewhere in the site.
Our Solution
Leveraging the Apache Ant tasks provided by this project, our
Jenkins builds will now compare the AAR/JAR/WAR/ZIP files between the last successful build
and the current build. When the entry content is identical to the previous build, the entry will be modified in-place
to have the same timestamp as the previous build. Thus, modified times will be carried through from build to build so
long as the content has not changed. If the entry is new to a build, it will retain the timestamp resulting from
${project.build.outputTimestamp}
as is already done.
Our release builds do not use this optimization. They use standard reproducible timestamps, typically derived from
${git.commit.time}
.
This is only an optimization to assist crawlers in identifying new content more efficiently. We only publish content from our SNAPSHOT (or POST-SNAPSHOT) builds. These snapshots are typically published by Jenkins (which will contain the patched modification times), but may also be published directly by developers (which will use standard reproducible timestamps).
Why Ant Tasks Instead of Maven Plugin?
We have implemented this as Ant tasks instead of a Maven plugin because the task is used to process its own artifacts. While we use the tasks via Apache Maven AntRun Plugin, the versatility of TaskDef Task allows us to pick-up the artifact of the current build on the classpath.