Bit-for-bit reproducible builds with Maven

Thomas Lorblanchès

21/04/2016

DukeFriends

Maven, reproducible?

maven
  • All plugin versions pinned in the pom file

  • No SNAPSHOT dependency

  • Use of maven-enforcer-plugin (rules requirePluginVersions and requireReleaseVersion)

  • And yet…​

Hello unreproducible world!

$ cd HelloWorld
$ mvn clean install
...
[INFO] Downloading the Internet...
[INFO] Running tons of plugins...
...
[INFO] BUILD SUCCESS

$ sha256sum target/hello-1.0-SNAPSHOT.jar
87de0b[...]7ca9  target/hello-1.0-SNAPSHOT.jar
$ mvn clean install
...
[INFO] BUILD SUCCESS

$ sha256sum target/hello-1.0-SNAPSHOT.jar
2f3167[...]86ab  target/hello-1.0-SNAPSHOT.jar

Ok, but why should I care?

  • To prove that a given binary file is the result of the compilation of a given source bundle.

    • QA (human error)

    • Computer security (deliberate corruption)

  • To check the consistency between a source package and a binary package

    • Maven Central: mylib-1.0.0.jar and mylib-1.0.0-sources.jar

    • Linux distributions: openjdk-8-jdk_8u72-b05_amd64.deb and openjdk-8_8u72-b05[.orig.tar.gz/.dsc/.diff.gz]

The Debian "reproducible builds" project

debian
It should be possible to reproduce, byte for byte, every build of every package in Debian.
— https://wiki.debian.org/ReproducibleBuilds
  • 2007: Discussions on debian-devel

  • 2011: Reproducible build of Bitcoin

  • 2013: Snowden case

  • 2013: Reproducible build of Tor Browser

  • 2013: Start of the "Debian reproducible builds" project

Debian reproducible builds status (testing/amd64)

stats pkg state

April 2016: ~88% of Debian testing/amd64 packages are reproducible

Sources of unreproducibility

  • Inside the JAR archive

hello-1.0-SNAPSHOT/
├── Main.class
└── META-INF
    ├── MANIFEST.MF                    # Not reproducible
    └── maven
        └── prez
            └── hello
                ├── pom.properties     # Not reproducible
                └── pom.xml
  • In the archive file format (ZIP)

MANIFEST.MF

Manifest-Version: 1.0
Archiver-Version: Plexus Archiver
Built-By: thomas                     # Not reproducible
Created-By: Apache Maven 3.0.5       # Difficult to reproduce
Build-Jdk: 1.8.0_72-internal         # Difficult to reproduce

maven-enforcer-plugin goals enforce-maven and enforce-java can help.

pom.properties

#Generated by Apache Maven
#Tue Feb 23 18:02:52 CET 2016        # Not reproducible
version=1.0-SNAPSHOT
groupId=prez
artifactId=hello

ZIP format

zip format
  • File / Central directory file headers

    • File last modification time

    • File last modification date

    • Extra field : X5455_ExtendedTimestamp

  • Insertion order of the files inside the zip

JAXB

JAXB2
  • Bug JAXB-598 for versions before JAXB 2.2.11

  • Order of the methods inside ObjectFactory.java file generated by xjc not reproducible ⇒ File ObjectFactory.class not reproducible !

  • Java 8 includes JAXB 2.2.8

Javadoc

3 unreproducible lines.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- NewPage -->
<html lang="fr">
<head>
<!-- Generated by javadoc (1.8.0_72-internal) on Thu Feb 25 17:37:57 CET 2016 -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Main (hello 1.0-SNAPSHOT API)</title>
<meta name="date" content="2016-02-25">
[...]
<!-- ======== END OF BOTTOM NAVBAR ======= -->
<p class="legalCopy"><small>Copyright &#169; 2016. All rights reserved.</small></p>
</body>
</html>

Javadoc

With javadoc option "-notimestamp":

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- NewPage -->
<html lang="fr">
<head>
<!-- Generated by javadoc -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Main (hello 1.0-SNAPSHOT API)</title>
[...]
<!-- ======== END OF BOTTOM NAVBAR ======= -->
<p class="legalCopy"><small>Copyright &#169; 2016. All rights reserved.</small></p>
</body>
</html>

Groovy

groovy
println "Hello World!"

Compile then decompile:

groovyc hello.groovy
java -jar procyon-decompiler-0.5.30.jar hello.class

Groovy: decompiled class file

public class hello extends Script
{
    public static /* synthetic */ long __timeStamp;
    public static /* synthetic */ long __timeStamp__239_neverHappen1442922905877;
    private static /* synthetic */ SoftReference $callSiteArray;

[...]

    static {
        __$swapInit();
        test.__timeStamp__239_neverHappen1442922905877 = 0L;
        test.__timeStamp = 1442922905877L;
    }

[...]
}

2 public static fields with strange and unreproducible name/value.

Groovy: origin of the timestamps

Classe groovy.lang.GroovyClassLoader :

protected void addTimeStamp(ClassNode node) {
   if (node.getDeclaredField(Verifier.__TIMESTAMP) == null)
      FieldNode timeTagField = new FieldNode(Verifier.__TIMESTAMP,
         ACC_PUBLIC | ACC_STATIC | ACC_SYNTHETIC, ClassHelper.long_TYPE, node,
         new ConstantExpression(System.currentTimeMillis()));
      timeTagField.setSynthetic(true);
      node.addField(timeTagField);

      timeTagField = new FieldNode(
          Verifier.__TIMESTAMP__ + System.currentTimeMillis(),
          ACC_PUBLIC | ACC_STATIC | ACC_SYNTHETIC, ClassHelper.long_TYPE, node,
          new ConstantExpression((long) 0));
       timeTagField.setSynthetic(true);
       node.addField(timeTagField);
   }
}

Use of System.currentTimeMillis() for the 2 "timestamp" public static fields.

Maven plugin

<plugin>
    <groupId>io.github.zlika</groupId>
    <artifactId>reproducible-build-maven-plugin</artifactId>
    <version>0.2</version>
</plugin>

Goals:

  • strip-jaxb: sorts methods inside xjc generated ObjectFactory.java files

  • strip-jar: removes unreproducible bits inside the JAR, sets ZIP format timestamps to 0 and re-orders the files inside the ZIP

Thank you!

stormtroopers