Difference between revisions of "Star Exec"
J.waldmann (talk | contribs) |
|||
Line 15: | Line 15: | ||
* What further information do benchmarks have? We made good experience with storing information about the expected result (entered manually), so that runs giving a different result can be identified easily. Furthermore, it is useful to get a list of tool runs in which the specific benchmark could be solved. Statistical data (for example: this benchmark has never been solved, yet) is useful, too. ([[User:C-Otto|C-Otto]] 16:46, 12 June 2012 (CEST)) | * What further information do benchmarks have? We made good experience with storing information about the expected result (entered manually), so that runs giving a different result can be identified easily. Furthermore, it is useful to get a list of tool runs in which the specific benchmark could be solved. Statistical data (for example: this benchmark has never been solved, yet) is useful, too. ([[User:C-Otto|C-Otto]] 16:46, 12 June 2012 (CEST)) | ||
* How will inherited example sets be handled (i.e., at the moment, the TPDB does not contain complexity examples, but these are derived from the TPDB) ([[User:Mmjb|Marc Brockschmidt]] 16:58, 12 June 2012 (CEST)) | * How will inherited example sets be handled (i.e., at the moment, the TPDB does not contain complexity examples, but these are derived from the TPDB) ([[User:Mmjb|Marc Brockschmidt]] 16:58, 12 June 2012 (CEST)) | ||
+ | * verified termination categories: these require a two-stage execution, first stage is that solvers run on problems, producing proof traces; second stage is that verifiers run on proof traces. (JW) | ||
+ | * legacy data: it would be very useful to be able to compare data over time, and even back in time (like "has this problem (benchmark) ever been solved"). For this, star-exec needs a way to import competition results. (JW) | ||
+ | * submission of new problems: it would be prefereable to have this automated: new benchmark sets can be uploaded to the platform and marked as "submitted for next competition", then they can be syntax-checked, and known solvers can be run on them. (JW) | ||
+ | * "live" interface: for demonstration/education it would be good to be able to create "experiments" on-the-fly. Upload a problem, or select from data base, then possibly edit it, then run a selection of solvers on that. (JW) |
Revision as of 08:21, 13 June 2012
StarExec is a cross-community solver execution and benchmark library service under joint development at the University of Iowa and the University of Miami. Its goal is to facilitate the experimental evaluation of logic solvers and other automated reasoning tools by providing a shared storage and computing infrastructure to store, manage and make available benchmark libraries; execute comparative evaluations; and run solver competitions. StarExec should be available for public use in the fall of 2012.
The termination community intends to use this platform for future competitions (and other tasks, like running experiments, archiving benchmarks and results, etc.)
There will be a Starexec Workshop affiliated with IJCAR 2012 (Manchester). We use this wiki page here to collect and organize some points that we want to make during that workshop.
Open Questions:
- Format of results: The specification does not speak explicitly about the format of the results. Is this going to be handled using Post-Processors? (Marc Brockschmidt 16:19, 12 June 2012 (CEST))
- Scoring systems: How can a scoring system like we use for the complexity competition be implemented? Post-processors seem to be restricted to working on one solver/result pair, while our scoring needs to see all results in a competition for a given example. (Marc Brockschmidt 16:19, 12 June 2012 (CEST))
- Will there be "Teams"? StarExec seems to know Communitys and single users, but not about the fact that some tools are produced by a group of people. (Marc Brockschmidt 16:19, 12 June 2012 (CEST))
- How are benchmark sets (note: the organizers call the individual examples/problems/TRSs/... "benchmarks") organized? For example I'd like to have the official TPDB and private benchmark sets (which might be submitted to TPDB lateron). Furthermore, it is very useful to have complex ways of picking the relevant benchmark for a tool run (not just "the whole TPDB", but maybe "just the Java Bytecode benchmarks which are not yet in the TPDB"). Here having a number of user-defineable tags for each benchmark and being able to pick benchmark sets based on boolean combinations of tags can be a solution I'd like to see. (C-Otto 16:46, 12 June 2012 (CEST))
- What further information do benchmarks have? We made good experience with storing information about the expected result (entered manually), so that runs giving a different result can be identified easily. Furthermore, it is useful to get a list of tool runs in which the specific benchmark could be solved. Statistical data (for example: this benchmark has never been solved, yet) is useful, too. (C-Otto 16:46, 12 June 2012 (CEST))
- How will inherited example sets be handled (i.e., at the moment, the TPDB does not contain complexity examples, but these are derived from the TPDB) (Marc Brockschmidt 16:58, 12 June 2012 (CEST))
- verified termination categories: these require a two-stage execution, first stage is that solvers run on problems, producing proof traces; second stage is that verifiers run on proof traces. (JW)
- legacy data: it would be very useful to be able to compare data over time, and even back in time (like "has this problem (benchmark) ever been solved"). For this, star-exec needs a way to import competition results. (JW)
- submission of new problems: it would be prefereable to have this automated: new benchmark sets can be uploaded to the platform and marked as "submitted for next competition", then they can be syntax-checked, and known solvers can be run on them. (JW)
- "live" interface: for demonstration/education it would be good to be able to create "experiments" on-the-fly. Upload a problem, or select from data base, then possibly edit it, then run a selection of solvers on that. (JW)