1QBit scientist Kyle Mills’s research was recently featured in the September issue of Nature Machine Intelligence. The research discovered new ways to use reinforcement learning to find the ground states of spin Hamiltonians. Kyle worked with Pooya Ronagh, Head of the Hardware Innovation Lab at 1QBit, and Isaac Tamblyn from the National Research Council of Canada.
“We knew reinforcement learning was capable of powerful control tasks such as robotics and playing video games at elite levels, so we had a good idea that it would work for this task. In that sense, we pretty much had the right idea right from the start, and just needed to piece it all together.”
Kyle Mills is currently working part time at 1QBit as a Scientist while pursuing his Ph.D. in Modelling and Computational Sciences at the University of Ontario Institute of Technology. When Kyle is not working, he loves spending his summers outside cycling, canoeing, camping, and skiing in the winter.
How did you end up here? Why did you become a scientist?
I loved physics and math in high school and decided to pursue it in university. I always enjoyed problem solving and thinking about interesting ways to tackle problems. In university, I had a summer job in an office setting, where I was assigned some very repetitive tasks to complete. I taught myself to program so that I could automate these repetitive tasks. Pursuing computational science in graduate school allowed me to combine the problem solving aspect of physics with my interest in programming, and artificial intelligence added a whole new aspect to this.
Please explain your research and what you discovered.
We trained a reinforcement learning agent to discover, through experience alone, a temperature schedule for simulated annealing to solve spin problems. This is a difficult task for a learning algorithm because the reward is sparse: we don’t care how the algorithm changes the temperature. All we care about is that it finds the solution, so we can only reward it at the end of its attempts.
Do you have an analogy to help us understand your work?
Imagine you’re paying someone who’s never seen a cake to bake a cake. You set them free in a fully stocked kitchen and tell them they have an hour to produce something without telling them that you want a cake. After an hour, you come back and give them some money based on how much their creation resembles a cake. In order for them to maximize the amount of money they receive, they have to slowly adapt their technique using the rewards you give them, trying slightly different things each time. It takes a long time, and a lot of experimentation, but eventually, they can produce a delicious cake.
What question or challenge were you setting out to address when you started this work?
We knew reinforcement learning was capable of powerful control tasks such as robotics and playing video games at elite levels, so we had a good idea that it would work for this task, so in that sense, we pretty much had the right idea right from the start. We were surprised to see, however, just how much better the RL algorithm could perform compared to standard methods: it performs about 100 times better on even modestly sized problems, and gets even better as the problems get larger.
Why is your research important? What are the possible real-world applications?
This specific application is interesting for combinatorial optimization problems: for example, scheduling and route-finding. Think of a delivery driver that needs to travel around town delivering packages. In which order should they deliver their packages in order to minimize travel time? For a large number of deliveries, while it’s possible to know how long your route is, it is essentially impossible to know if it’s the best route. One way of solving this problem is to use simulated annealing.
What kind of response has your research received?
It’s pretty early to see our approach used practically, but there’s been some buzz on Twitter, and it was covered in Nature Machine Intelligence News and Views, as well as featured on the cover of the September 2020 issue. Hopefully, people see this work and can make use of reinforcement learning in their own optimization applications.
What are the next steps for this work?
We showed that the reinforcement learning algorithm could still operate in a “destructive observation” setting. Without going into much detail, this means that we can apply this to a quantum annealing simulation, and learn to schedule the transverse field of a quantum annealer, which is a physical device that uses quantum effects to solve exactly the same problems as we solved here classically.
If you are interested in learning more about Kyle Mills, Pooya Ronagh, and collaborator Isaac Tamblyn’s research on finding the ground states of spin Hamiltonians, the full article can be found here: Finding the ground state of spin Hamiltonians with reinforcement learning.