The idea is that you select one of the projects proposed here, and you do it in a team of 3 or 4 students. For each project you will be assigned to a particular contact TA who will monitor your progress and answer your questions related to the project. Your TA will be as invested as you in the success of your project, because the project is a component of their ongoing research program.

Grading scheme for projects:

10%: dedicated to weekly check-ins for each project starting from October 27th.
  • Each week, by the end of the day on Friday, you will check in with your TA supervisor and report on your progress.
  • Each project has its own tailored set of milestones, and therefore the check-ins are customized for each project.
  • If you miss 1 check in, you will miss 50% of this grade, that is 5% of the project grade.
  • If you miss 2 check ins, you forfeit the entire mark for check-ins, that is 10% of the entire grade.
  • It suffices that one member of each group does the check-in. In cases where this is in email form, please CC all members of the group.

90%: for delivery of the project as designed.
  • Each project has its own division of the marks customized for it as indicated in the full specification for the project, and that will constitute this 90% share of the mark.

Here is a list of projects.

Project 1

Detecting and analyzing loop constructs

full specification

Given a C input program, detect loop constructs and extract exit conditions.


A C-like program in the form of a block AST with loop constructs, without side-effects.

A mapping from loops to the following information on loops:
- live variables at loop entry, and available expressions.
- write/read sets of variables (if you use Cil, please don’t use the Dataflow module :) ).
- the loop body’s abstract syntax tree.

(Dependence analysis)
Some of the loops might be nested loops, the program should be able to see if the following transformations can be applied:
- loop interchange
- loop skewing
- loop peeling
And more here.

Language: Any language that can parse a C-like program and handle easily the program representation to perform dataflow analysis (Ocaml preferred)

Project 2

Translation from parallel assignments to sequence of assignments:
(see here for more info)

full specification

Given a set of parallel assignments, perform a simple analysis to output the simplest possible sequence of assignment that would lead to the same values. New temporary variables might need to be created in the process.
- A set of variables mapped to expressions modelling the parallel assignments.
- A sequence of assignments, such that executing the assignments sequentially leads to the same result as applying the parallel assignments at the same time.

Project 3

Translation to concise functional form with let constructs:

full specification

This is about translation of imperative programs to function form. Given a sequence of assignments with possible conditional constructs (but no loops), the program has to build an equivalent function. The function does not have any side effect: all variables must be taken as input and the their value has to be returned at the end of the function (closure). Be careful when you need to capture all the used/defined variables.

Dataflow analysis will be helpful in the translation, in order to reduce IF-THEN of IF-THEN-ELSE expressions for example.

Challenges: translate loops to recursive functions. A loop block in the input language is replaced by a recursive function that takes as input all the variables appearing in the loop body and returns the modified state. Be careful with the exit conditions.

Project 4

Expression simplification using synthesis

full specification

Finding equivalent expressions with lower depth.

Language Racket (scheme)

Build a LIA/Boolean simplifier using the Rosette Sygus solver. It must handle expressions using integer/boolean variables, and the more complex the better !
* Write an efficient grammar for LIA & Boolean expressions in Rosette.
* Define the search strategy for simplified expressions.
- Find a heuristic to generate expression skeletons.
- Play with the parameters of the search space during the search. A well built expression grammar will make this easier.

Project 5

Proofs of homomorphisms: generate proofs of homomorphism.

full specification

 Given a function and a join operator, with all the needed information, generate a proof of functional equivalence between the function and the divide and conquer version of this function (same function + join operator). The proof has to be verifiable by the Dafny tool.

- A set of recursively defined functions.
- A Dafny program and proof that the functions in the input pass the desired correctness specification.

(some context and
more info in this paper)

Project 6

Program Normalizer and Control Flow Graph Visualizer

full specification

The goal of this project is to automatically transform a program into an equivalent program who's control flow graph has at most one back-edge (i.e. a program with at most one loop). Students will write a function that takes in a program represented as an abstract syntax tree and produces a control flow graph of an equivalent program with at most one back edge. The control flow graph can be represented simply as a data structure

As a challenge, students can instead have their function accept a list of abstract syntax trees, and produce a control flow graph with at most a single back-edge corresponding to their parallel composition.

The second goal of this project is to automatically visualize control flow graphs of concurrent programs. Students will write a function that takes in a list of programs represented as abstract syntax trees and produces a control flow graph corresponding to their parallel composition. The output must be in DOT format (the description language of GraphViz). The implementation language is up to the students.

Project 7


Determinism is a program property that states "equal inputs implies equal outputs". Unlike typical correctness properties expressed as postconditions, determinism concerns multiple runs of a program - non-determinism cannot be observed by running a program once (Challenge: how would one express this property in Dafny)? One way to verify determinism is to check if an extended version of the program gives rise to a confluent rewrite system. To do this, a verifier must approximate the set of reachable states, as well as generate additional transitions.

Students will implement a procedure that takes in a set of transition relations and invariants, as well as a target formula. The procedure will attempt to weaken the relations and invariants such that the target formula is valid, and produce a counter-example if it cannot. The implementation language is up to the students.

Project 8

full specification

Implement PAC model for automata. The PACS model involves PAC learning with simple examples, i.e. examples drawn according to the conditional Solomonoff-Levin distribution. P(x)=c2-K(x|c), where K(x|c) denotes the Kolmogorov complexity of x given a representation c of the automata to be learned. Kolmogrov complexity is not computable. Implement an approximation to Solomonoff-Levin distribution, i.e. create a module (class in Java/C++) such that given as input a DFA which produces samples as closely as possible to the actual Solomonoff-Levin distribution. Use this to then implement the PACS model for regular languages represented.

For a given automata given as input the module should produce sample strings which would be consistent with the Solomonoff-Levin distribution. These samples will be used to infer the automata using the PACS model. Run this whole process several times and report the average, and maximum accuracy of the generated automata, as well as the precision of the results.

You can use the dk.brics.automaton package (a DFA/NFA package for Java), or any other basic automata package that you can find. Recommended to use Java, but you can use any other programming language and reasonable additional packages to help you along the way.

Project 9

full specification

Deterministic Finite Automaton (DFA) induction is a popular technique to infer a regular language from positive and negative sample strings defined over a finite alphabet. Several state-merging algorithms have been proposed to tackle this task, including RPNI, EDSM etc. These algorithms start from a prefix tree acceptor (PTA), that accepts all the positive samples only, and successively merge states to generalize the induced language.

We want to implement a probabilistic search in the state-space defined by state-merging. Instead of obtaining an automata which is compatible with all sample strings, we want an automata with a particular complexity that is compatible with the most possible sample strings. (We define complexity to simply be the number of states at this point).

Learn about state-merging in automata, and the state space defined accordingly. Also get to know how probabilistic search algorithms. Implement a probabilistic search in the space of automata starting from the prefix-tree automata and merging states. The input given will be a set of positive and negative sample strings over the binary alphabet {0,1}, and the complexity (number of states of the desired automata) k. The desired output is an automata with at most k states such that it is compatible with most strings from the sample using a probabilistic search in the space guided by a heuristic. You may use a simple heuristic (such as the number of samples compatible with the automata) or come up with a more complex heuristic. For any given inputs, you must run the algorithm several times and report the accuracy, precision and consistency of the results. You should try to set up parameters for the algorithm and the heuristic to make these outputs as good as possible.

You can use the dk.brics.automaton package (a DFA/NFA package for Java), or any other basic automata package that you can find. Recommended to use Java, but you can use any other programming language and reasonable additional packages to help you along the way.