Skip to content

Conversation

@hboutemy
Copy link

@hboutemy hboutemy commented Aug 8, 2022

as found while rebuilding projects using ANTLR3 (including ANTLR3 itself), there are non-reproducible outputs at 2 levels:

├── org/antlr/grammar/v3/ANTLRv3Parser.java
│ @@ -1,8 +1,8 @@
│ -// $ANTLR 3.5.2 org/antlr/grammar/v3/ANTLRv3.g 2022-08-08 12:16:39
│ +// $ANTLR 3.5.2 org/antlr/grammar/v3/ANTLRv3.g 2022-08-08 12:17:56
│
│      package org.antlr.grammar.v3;
│
│
│  import org.antlr.runtime.*;
│  import java.util.Stack;
│  import java.util.List;
│ @@ -427,15 +427,15 @@
│  				cnt7++;
│  			}
│
│  			EOF12=(Token)match(input,EOF,FOLLOW_EOF_in_grammarDef489); if (state.failed) return retval;
│  			if ( state.backtracking==0 ) stream_EOF.add(EOF12);
│
│  			// AST REWRITE
│ -			// elements: DOC_COMMENT, attrScope, optionsSpec, rule, tokensSpec, action, id
│ +			// elements: action, tokensSpec, attrScope, rule, optionsSpec, DOC_COMMENT, id
│  			// token labels:
│  			// rule labels: retval
│  			// token list labels:
│  			// rule list labels:
│  			// wildcard labels:
│  			if ( state.backtracking==0 ) {
│  			retval.tree = root_0;

this PR fixes the 2 issues in 2 separate commits:

  • removes the timestamp from generated sources
  • ensure order or rewritten AST by sorting

@hboutemy hboutemy force-pushed the reproducible-output branch from e8866a1 to 75e5e87 Compare August 9, 2022 21:37
@parrt
Copy link
Member

parrt commented Aug 26, 2022

Is it removing the timestamp going to break any codes on it? I agree it was a dumb idea but I'm afraid to change it now. haha

@hboutemy
Copy link
Author

perhaps adding an option for users to choose if they want this timestamp or not is a better choice (like JAXB that provides -no-header option for such case)

@parrt
Copy link
Member

parrt commented Sep 16, 2022

I'm generally opposed to options, I'm afraid. In this case it's a fairly heavy change just to get rid of this date output, which in retrospect was definitely a mistake on my part. I agree that the should be reproducible but I'm not sure risking backward compatibility is worth it.

I do know that some companies simply remove that line using their build tools. Is this possible with maven?

@hboutemy
Copy link
Author

hboutemy commented Oct 8, 2022

yes, we do it with maven-replacer-plugin
I just added the workaround for ANTLR4 antlr/antlr4@270f13b

@hboutemy
Copy link
Author

hboutemy commented Oct 8, 2022

one question: there are 2 sources of non-reproducible bit

  1. the timestamp; that we can postprocess with maven-replacer-plugin
  2. the AST rewrite elements, like
			// AST REWRITE
-			// elements: stringArg, multiAttrFunction
+			// elements: multiAttrFunction, stringArg

fixed by the sorting in the commit on file tool/src/main/antlr3/org/antlr/grammar/v3/CodeGenTreeWalker.g

is it possible to fix the reproducibility issue for the elements, please? This would reduce the places where we need to postprocess

@greedy
Copy link

greedy commented Oct 24, 2022

Instead of totally dropping the timestamp, antlr could be updated to support SOURCE_DATE_EPOCH for the build-time-stamp. This should leave any existing use-cases for the build-stamp undisturbed but make it easy for users to opt-in to a deterministic mode using this standard.

I have also observed non-determinism in the order of methods in the "Delegated rules" section of the generated parser.

Other hazards from looking through the templates and the code:

  • throwsSpec is a HashSet<String>
  • getAllImportedRules uses a HashSet<Rule>
  • altToRuleRefMap and altToTokenRefMap are populated with HashMap<String, _> which are used to populate the result of getAllRuleRefsInAltsWithRewrites which is then used in the AST template

My audit is far from complete though.

Things to ponder: One could replace all HashSets and HashMaps by their Linked versions but that seems like a rather heavy hammer. It would also be nice if the collections that needed to have deterministic iteration had a distinguishing static type to avoid re-introducing problems.

@hboutemy
Copy link
Author

hboutemy commented Nov 6, 2022

@parrt elements order commit extracted to a separate PR: see #213
is it possible to merge this one, please?

@tristantarrant
Copy link

It seems I just duplicated your work in #228

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants