This is my compilation project for the semester 8. The goal was to compile a mini version of OCaml into C with a Java parser.
Before going any further, I must say that the project was maybe way too ambitious, and I wasn't able to complete nearly as much features as I wanted to, mainly due to a lack of time. It is also not a compiler anymore, but an interpreter.
## Testing
Use the command `mvn clean package` at the root of the repository, and then `java -jar target/minichamo-0.7.jar test.mll` (or any other file).
## Functionalities
### Typing
Minichamo is currently able to handle two of the three main kinds of types of Ocaml.
First, we have type aliasing that enables us to create new types based on existing types :
```OCaml
type ('a, 'b) t1 = ('a * 'b) list;;
type 'a t2 = ('a list * 'a list);;
```
We can also create sum types, that may have generics or even be recursive :
```Ocaml
type 'a l = Cons of 'a * 'a l | Empty
```
Due to the way the genericity system is made, we create a new `Type` when we want to specify a type (for example, `int list` would be a new type object called `list`, with the blank filled in as `int`). In the case of recursive sum types, it is a little tricky to pull off as it can lead to infinite loops quite easily, so the constuctors are actually shared by all the objects derivating from the sum type.
I decided not to care about product types (`type t = {x:int; y:int}`) as I barely use them and they are a mess to manage in case of conflicts (in both the code and the parser).
### Assignements
Minichamo supports the assignments of variables and functions, but it only supports basic types and conditions :
```Ocaml
let a = ();;
let b = false;
let c = 1;;
let d = 1.1;;
let e = '.';;
let f = "bonjour";;
let g = if true then 1 else 0;;
fun f a = a;;
```
And that is basically it...
## Things I didn't have the time to do
### More variable types
The parser should originally have been able to parse :
It wasn't very complicated (except for the comparisons, which can be used with any pair of variables of the same type), but I just didn't have the time to do it.
Note that if I had implemented them I would have made them infix, because prefix operators would require too much time.
### Function calls
Not being able to call a function in a functional language is quite ironic but here we are...
It should actually not have been too complicated, if it wasn't for the fact that I needed a critical piece of information in two different locations. The thing is that a function can be either a variable that has laready been defined, or a variable that will be passed as an arfument to another function (or other things that we will see later). In order to take those cases into account, we have an `Expression` object wrapping around either another expression that contains the function definition or the empty variable. But in order to pass the arguments to the function, we need to add them to the context and associate them with the name of the correct variables, which is something we can't know when we are above the abstraction layer. It is possible to fix this in a generic way, but it would require more time than I have on my hands
Another functionality that I wanted to introduce was partial function application :
```Ocaml
let f a b c = a + b + c;;
let r1 = f 1 2 3
let g = f 1
let r2 = g 2 3
```
### Match cases
The match cases were kind of like type unification, so it was alright. The main problem is checking whether or not all the possibilities have been taken into account (which becomes very difficult with nested structures). My solution to this problem would have been to force the presence of a default case.
Once again, I couldn't do it because of time.
## Problems encountered
### Type assignation and unification
Aside from what I explained earlier, I encountered a problem that I had difficulties to solve (it is probably still not working as intended).
The problem occurs when we create a "name expression" (i.e. a variable that has a name but no value yet, for example the argument of a function). At this point, we know that some variable is supposed to go there, but we don't know its type. We can try to assign it to a generic type that will be determined later (`'a`), but there is the risk of a conflict. For example, take the following situation :
```Ocaml
let f (a: 'c) b (c: int) d =
let g (x: 'c) (y: 'c) = (x == y)
in let e = g a b
in (h e c, d)
```
Now the problem is that when we create the "name expression" for `a` and `b` in the third line, we don't have the information of the type yet. It will come later, once we do the unification. For now, we can only put generic number, so if we do things naively we'll say that `a` has type `'a`, `b` has type `'b`, ... But of course you see that this problem is rigged because now `a` has type `'a` and `'d`, so both types will be unified togeter and now `a` and `b` have to share the same type (which is not necessarily true in this case). We might want to rename the types in order to avoid conflicts. However, in this case we would lose the link between the type of `a` and the types of the arguments of `g`, which means it would be correct to call `g c d` (`d` would become an `int`, but that isn't a problem for the parser), however this is not what we have written.
### Sum types management
In OCaml, if we define consecutively two sum types that have the same constructor and create a variable with the name of that constructor, the last type defined will prevail. However, an old variable can still exist with a constructor of the previous type. Due to the fact that a constructor may have its own type within its argument, the method that checks if two sum types are equal does not check the types of the constructors, only the names (else it could cause an infinite loop). This means that if we define two types with the same names, the same constructors but with different types for the values of the constructors, there is a way to trick the parser into assigning a value of the old type to a variable of the new type (and potentially crashing everything). In order to prevent that, we could add an ID to all sum types, or prevent creating a sum type with the name of an already existing one.