This is the raw data produced by our EFSM inference tool described in [1, 2]. There are four top-level directories which correspond to the four case studies described in [1]. Each case study contains 30 subdirectories named numerically, one for each training set. These training sets can be found in the "inference-tool/experimental-data" folder of [3], which has a corresponding directory structure. Each numeric subdirectory contains named subdirectories for each experimental configuration (not all case studies contain all of these): - gp - the model inferred with genetic programming (GP) preprocessing - none - the model inferred without GP preprocessing - MINT - the model inferred by the MINT tool (with default configuration values) - _-obfuscated-VAR-PRE - the model inferred with VAR obfuscated using the specified PREprocessing technique (either "GP" or "none") - _-distinguish - the model inferred when the inference tool is provided with the ability to infer guards to distinguish transitions which cannot be merged Each experimental configuration contains 30 subdirectories (each representing a single run of inference) named for the configuration and the random seeds used for the various aspects of GP. The exception to this is the "none" configurations which are not dependent on random seeds so only contain data for one run. Each run contains a file named "log" which is the output of the inference process. An explanation of this log file can be found in Appendix A of [1]. There are also a number of Graphviz "dot" files which represent the EFSM at different stages of inference, with the final output model being presented in a file named for the experimental configuration. Finally, there are two JSON files which represent the result of executing the traces in the test set (available with the training data at [3]) on both the initial prefix tree acceptor and the final model. These record, for each event recognised by the model, the label and inputs of the event, the anterior and posterior states, the anterior register values, the ID of the transition which was taken, and the expected and actual output values which were produced. There may also be suffixes of the trace which were not recognised by the model. These are recorded in a separate JSON object "rejected". REFERENCES: [1] Foster M. (2020) Reverse Engineering Systems to Identify Flaws and Understand Behaviour. (PhD thesis) [2] Foster M., Walkinshaw N., Derrick J (DRAFT) Using Genetic Programming to Infer Output and Update Functions for EFSM Transitions [3] https://github.com/jmafoster1/efsm-inference