Difference between revisions of "Star Exec"
Line 12: | Line 12: | ||
* Scoring systems: How can a scoring system like we use for the complexity competition be implemented? Post-processors seem to be restricted to working on one solver/result pair, while our scoring needs to see all results in a competition for a given example. | * Scoring systems: How can a scoring system like we use for the complexity competition be implemented? Post-processors seem to be restricted to working on one solver/result pair, while our scoring needs to see all results in a competition for a given example. | ||
* Will there be "Teams"? StarExec seems to know Communitys and single users, but not about the fact that some tools are produced by a group of people. | * Will there be "Teams"? StarExec seems to know Communitys and single users, but not about the fact that some tools are produced by a group of people. | ||
+ | * How are example sets organized? For example I'd like to have the official TPDB and private example sets (which might be submitted to TPDB lateron). Furthermore, it is very useful to have complex ways of picking the relevant examples for a benchmark run (not just "the whole TPDB", but maybe "just the Java Bytecode examples which are not yet in the TPDB"). Here having a number of user-defineable tags for each example and being able to pick example sets based on boolean combinations of tags can be a solution I'd like to see. ([[User:C-Otto|C-Otto]] 16:46, 12 June 2012 (CEST)) | ||
+ | * What further information do examples have? We made good experience with storing information about the expected result (entered manually), so that runs giving a different result can be identified easily. Furthermore, it is useful to get a list of runs in which the specific example could be solved. Statistical data (for example: this example has never been solved, yet) is useful, too. ([[User:C-Otto|C-Otto]] 16:46, 12 June 2012 (CEST)) |
Revision as of 14:46, 12 June 2012
StarExec is a cross-community solver execution and benchmark library service under joint development at the University of Iowa and the University of Miami. Its goal is to facilitate the experimental evaluation of logic solvers and other automated reasoning tools by providing a shared storage and computing infrastructure to store, manage and make available benchmark libraries; execute comparative evaluations; and run solver competitions. StarExec should be available for public use in the fall of 2012.
The termination community intends to use this platform for future competitions (and other tasks, like running experiments, archiving benchmarks and results, etc.)
There will be a Starexec Workshop affiliated with IJCAR 2012 (Manchester). We use this wiki page here to collect and organize some points that we want to make during that workshop.
Open Questions:
- Format of results: The specification does not speak explicitly about the format of the results. Is this going to be handled using Post-Processors?
- Scoring systems: How can a scoring system like we use for the complexity competition be implemented? Post-processors seem to be restricted to working on one solver/result pair, while our scoring needs to see all results in a competition for a given example.
- Will there be "Teams"? StarExec seems to know Communitys and single users, but not about the fact that some tools are produced by a group of people.
- How are example sets organized? For example I'd like to have the official TPDB and private example sets (which might be submitted to TPDB lateron). Furthermore, it is very useful to have complex ways of picking the relevant examples for a benchmark run (not just "the whole TPDB", but maybe "just the Java Bytecode examples which are not yet in the TPDB"). Here having a number of user-defineable tags for each example and being able to pick example sets based on boolean combinations of tags can be a solution I'd like to see. (C-Otto 16:46, 12 June 2012 (CEST))
- What further information do examples have? We made good experience with storing information about the expected result (entered manually), so that runs giving a different result can be identified easily. Furthermore, it is useful to get a list of runs in which the specific example could be solved. Statistical data (for example: this example has never been solved, yet) is useful, too. (C-Otto 16:46, 12 June 2012 (CEST))