Research mission

In short

The overarching philosophy of my work is this: by learning from the massive amounts of software-related data out there, we can reduce the redundancies and complexities of software development to make software engineering accessible to all. Here, I believe, lies the solution to the software crisis.

In full

I think we can all agree: software development is hard, software engineering even harder. But as complicated as it may be, software is also awesome! For the first time in history we can automate anything that we can describe in sufficient detail. Studying why software is hard is key: it can enable us to build tools to reduce these difficulties, perhaps even code collaboratively, and hopefully allow us to bring the power to write good software to many people.
So, the end-goal of my research is to simplify software development for the masses. Now how do we get there? First, Software Engineering research is key here: it adds discipline and rigour to the goal of creating "good" software. However, many fundamental questions (what defines "good code"? what design principles should we use when?) in SE are hard to answer without an empirical understanding how coding happens in practice, in the open-source and at companies. And even empirical data isn't enough: what can we do with statistics on (massive) datasets of code alone? How do we answer fundamental questions with data?
The key premise of my work is that such understanding can come from producing learned system, which attempt to replicate features of interest in that data. Models, today mainly deep learners, that attempt to generate and describe software are not only highly useful from an application perspective. Their training needs and evaluation performance actually tell us a bigger story: what parts of the task are hard? What types of structure and representation help the model achieve its purpose. And how does that relate to real developers? I study everything described by these tasks, improving models and addressing new tasks both for the good of developers and to better understand software development to begin with.