Kyle Mills’s Journey from Ph.D. Student to Being on the Cover of Nature Machine Intelligence

1QBit scientist Kyle Mills’s research was recently featured in the September issue of Nature Machine Intelligence. The research discovered new ways to use reinforcement learning to find the ground states of spin Hamiltonians. Kyle worked with Pooya Ronagh, Head of the Hardware Innovation Lab at 1QBit, and Isaac Tamblyn from the National Research Council of Canada.

“We knew reinforcement learning was capable of powerful control tasks such as robotics and playing video games at elite levels, so we had a good idea that it would work for this task. In that sense, we pretty much had the right idea right from the start, and just needed to piece it all together.”
—Kyle Mills

Kyle Mills is currently working part time at 1QBit as a Scientist while pursuing his Ph.D. in Modelling and Computational Sciences at the University of Ontario Institute of Technology. When Kyle is not working, he loves spending his summers outside cycling, canoeing, camping, and skiing in the winter.

How did you end up here? Why did you become a scientist?

I loved physics and math in high school and decided to pursue it in university. I always enjoyed problem solving and thinking about interesting ways to tackle problems. In university, I had a summer job in an office setting, where I was assigned some very repetitive tasks to complete. I taught myself to program so that I could automate these repetitive tasks. Pursuing computational science in graduate school allowed me to combine the problem solving aspect of physics with my interest in programming, and artificial intelligence added a whole new aspect to this.

Please explain your research and what you discovered.

We trained a reinforcement learning agent to discover, through experience alone, a temperature schedule for simulated annealing to solve spin problems. This is a difficult task for a learning algorithm because the reward is sparse: we don’t care how the algorithm changes the temperature. All we care about is that it finds the solution, so we can only reward it at the end of its attempts.

Algorithm

Do you have an analogy to help us understand your work?

Imagine you’re paying someone who’s never seen a cake to bake a cake. You set them free in a fully stocked kitchen and tell them they have an hour to produce something without telling them that you want a cake. After an hour, you come back and give them some money based on how much their creation resembles a cake. In order for them to maximize the amount of money they receive, they have to slowly adapt their technique using the rewards you give them, trying slightly different things each time. It takes a long time, and a lot of experimentation, but eventually, they can produce a delicious cake.

What question or challenge were you setting out to address when you started this work?

We knew reinforcement learning was capable of powerful control tasks such as robotics and playing video games at elite levels, so we had a good idea that it would work for this task, so in that sense, we pretty much had the right idea right from the start. We were surprised to see, however, just how much better the RL algorithm could perform compared to standard methods: it performs about 100 times better on even modestly sized problems, and gets even better as the problems get larger.

Why is your research important? What are the possible real-world applications?

This specific application is interesting for combinatorial optimization problems: for example, scheduling and route-finding. Think of a delivery driver that needs to travel around town delivering packages. In which order should they deliver their packages in order to minimize travel time? For a large number of deliveries, while it’s possible to know how long your route is, it is essentially impossible to know if it’s the best route. One way of solving this problem is to use simulated annealing.

Delivery

What kind of response has your research received?

It’s pretty early to see our approach used practically, but there’s been some buzz on Twitter, and it was covered in Nature Machine Intelligence News and Views, as well as featured on the cover of the September 2020 issue. Hopefully, people see this work and can make use of reinforcement learning in their own optimization applications.

What are the next steps for this work?

We showed that the reinforcement learning algorithm could still operate in a “destructive observation” setting. Without going into much detail, this means that we can apply this to a quantum annealing simulation, and learn to schedule the transverse field of a quantum annealer, which is a physical device that uses quantum effects to solve exactly the same problems as we solved here classically.

If you are interested in learning more about Kyle Mills, Pooya Ronagh, and collaborator Isaac Tamblyn’s research on finding the ground states of spin Hamiltonians, the full article can be found here: Finding the ground state of spin Hamiltonians with reinforcement learning.

We have a Monthly Newsletter

Subscribe

Sign Up for Our Monthly Newsletter

We respect your privacy. If you subscribe, we will never spam you or sell, rent, lease, or give away your information to any third party.

More Articles from 1QBit

The Importance of Investing in People

Interview

Karen Scanlan is the Director of People Operations at 1QBit. While working as an editor for a legal publication, she became interested in the employment side of human rights, which led her to an HR consulting firm. There, she learned the importance of hiring the best talent and then keeping that talent through engagement, development, and recognition.

Optimizing Logistics Using Quantum Computing

Optimization

Transportation and delivery make up a significant portion of the cost of many products. As society’s demands become more complex and fast-paced, optimizing route logistics can result in huge savings for many companies. In recent years, researchers have been inspired by quantum computing to create better optimization algorithms now, and even better ones in the future as quantum technology matures.

Looking into the Future(s) with Options

Finance, Market Sentiment Meter (MSM)

When deciding to buy, sell, or hold a particular investment, it can be tempting to simply look at the spot price or volume data and go from there. For those not acquainted with these terms, the spot price of an asset is the current price in the marketplace at which it can be bought or sold for immediate delivery. Volume data quantifies the number of shares traded and, for futures and options, it reveals how many contracts have changed hands.