Difference between revisions of "Star Exec"

From Termination-Portal.org
Jump to navigationJump to search
Line 9: Line 9:
  
 
Open Questions:
 
Open Questions:
* Format of results: The specification does not speak explicitly about the format of the results. Is this going to be handled using Post-Processors?
+
* Format of results: The specification does not speak explicitly about the format of the results. Is this going to be handled using Post-Processors? ([[User:Mmjb|Marc Brockschmidt]] 16:19, 12 June 2012 (CEST))
* Scoring systems: How can a scoring system like we use for the complexity competition be implemented? Post-processors seem to be restricted to working on one solver/result pair, while our scoring needs to see all results in a competition for a given example.
+
* Scoring systems: How can a scoring system like we use for the complexity competition be implemented? Post-processors seem to be restricted to working on one solver/result pair, while our scoring needs to see all results in a competition for a given example. ([[User:Mmjb|Marc Brockschmidt]] 16:19, 12 June 2012 (CEST))
* Will there be "Teams"? StarExec seems to know Communitys and single users, but not about the fact that some tools are produced by a group of people.
+
* Will there be "Teams"? StarExec seems to know Communitys and single users, but not about the fact that some tools are produced by a group of people. ([[User:Mmjb|Marc Brockschmidt]] 16:19, 12 June 2012 (CEST))
 
* How are benchmark sets (note: the organizers call the individual examples/problems/TRSs/... "benchmarks") organized? For example I'd like to have the official TPDB and private benchmark sets (which might be submitted to TPDB lateron). Furthermore, it is very useful to have complex ways of picking the relevant benchmark for a tool run (not just "the whole TPDB", but maybe "just the Java Bytecode benchmarks which are not yet in the TPDB"). Here having a number of user-defineable tags for each benchmark and being able to pick benchmark sets based on boolean combinations of tags can be a solution I'd like to see. ([[User:C-Otto|C-Otto]] 16:46, 12 June 2012 (CEST))
 
* How are benchmark sets (note: the organizers call the individual examples/problems/TRSs/... "benchmarks") organized? For example I'd like to have the official TPDB and private benchmark sets (which might be submitted to TPDB lateron). Furthermore, it is very useful to have complex ways of picking the relevant benchmark for a tool run (not just "the whole TPDB", but maybe "just the Java Bytecode benchmarks which are not yet in the TPDB"). Here having a number of user-defineable tags for each benchmark and being able to pick benchmark sets based on boolean combinations of tags can be a solution I'd like to see. ([[User:C-Otto|C-Otto]] 16:46, 12 June 2012 (CEST))
 
* What further information do benchmarks have? We made good experience with storing information about the expected result (entered manually), so that runs giving a different result can be identified easily. Furthermore, it is useful to get a list of tool runs in which the specific benchmark could be solved. Statistical data (for example: this benchmark has never been solved, yet) is useful, too. ([[User:C-Otto|C-Otto]] 16:46, 12 June 2012 (CEST))
 
* What further information do benchmarks have? We made good experience with storing information about the expected result (entered manually), so that runs giving a different result can be identified easily. Furthermore, it is useful to get a list of tool runs in which the specific benchmark could be solved. Statistical data (for example: this benchmark has never been solved, yet) is useful, too. ([[User:C-Otto|C-Otto]] 16:46, 12 June 2012 (CEST))

Revision as of 14:55, 12 June 2012

StarExec is a cross-community solver execution and benchmark library service under joint development at the University of Iowa and the University of Miami. Its goal is to facilitate the experimental evaluation of logic solvers and other automated reasoning tools by providing a shared storage and computing infrastructure to store, manage and make available benchmark libraries; execute comparative evaluations; and run solver competitions. StarExec should be available for public use in the fall of 2012.

The termination community intends to use this platform for future competitions (and other tasks, like running experiments, archiving benchmarks and results, etc.)

There will be a Starexec Workshop affiliated with IJCAR 2012 (Manchester). We use this wiki page here to collect and organize some points that we want to make during that workshop.

Open Questions:

  • Format of results: The specification does not speak explicitly about the format of the results. Is this going to be handled using Post-Processors? (Marc Brockschmidt 16:19, 12 June 2012 (CEST))
  • Scoring systems: How can a scoring system like we use for the complexity competition be implemented? Post-processors seem to be restricted to working on one solver/result pair, while our scoring needs to see all results in a competition for a given example. (Marc Brockschmidt 16:19, 12 June 2012 (CEST))
  • Will there be "Teams"? StarExec seems to know Communitys and single users, but not about the fact that some tools are produced by a group of people. (Marc Brockschmidt 16:19, 12 June 2012 (CEST))
  • How are benchmark sets (note: the organizers call the individual examples/problems/TRSs/... "benchmarks") organized? For example I'd like to have the official TPDB and private benchmark sets (which might be submitted to TPDB lateron). Furthermore, it is very useful to have complex ways of picking the relevant benchmark for a tool run (not just "the whole TPDB", but maybe "just the Java Bytecode benchmarks which are not yet in the TPDB"). Here having a number of user-defineable tags for each benchmark and being able to pick benchmark sets based on boolean combinations of tags can be a solution I'd like to see. (C-Otto 16:46, 12 June 2012 (CEST))
  • What further information do benchmarks have? We made good experience with storing information about the expected result (entered manually), so that runs giving a different result can be identified easily. Furthermore, it is useful to get a list of tool runs in which the specific benchmark could be solved. Statistical data (for example: this benchmark has never been solved, yet) is useful, too. (C-Otto 16:46, 12 June 2012 (CEST))