NeurIPS is the world’s largest Machine Learning and Computational Neuroscience conference. Last year, we not only attended but were also the event’s official streaming partner for Bulgaria, bringing the Sofia community a week of meet-ups to review key presentations and discuss the latest techniques. In March, we were planning a follow-up talk at our Sofia office, led by Daniel Balchev, one of our ML Engineers, which got cancelled due to COVID-19. Instead, we reached out to Daniel to hear his insights from the talk.
To start, who are you and what do you do at Hyperscience?
I’m a ML Engineer at Hyperscience. Ever since I joined in October 2017 I’ve been on the team responsible for the data extraction and handwriting recognition engine. We’re constantly increasing the scope of the engine, and some tasks I’ve worked on include improving the language model for the engine, as well as adding French, the first non-English language we support. My day-to-day spans from generating ideas (and reading papers) to determine how to approach new tasks at the beginning of the release, to running the experiments to solve them in the middle of the release, all the way through evaluating how well the models perform.
It’s been fascinating watching Hyperscience grow over the last few years. One of my favorite things is when an Account Executive or member of our Customer Experience team describes a client’s reaction to a feature we’ve built. We heard insights firsthand during our 2020 Company Kick-Off in New York City this past February when three customers spoke to us about how they’re using Hyperscience and the value we’ve been able to provide their operational teams and end customers.
Can you tell us about NeurIPS 2019?
We send engineers to NeurIPS every year. There are 3 key takeaways from the conference that continue to impact how I approach our daily work at Hyperscience:
Takeaway #1: There’s a need to make ML research more reproducible.
It’s no secret that some academic fields are experiencing a reproducibility crisis, so seeing the machine learning community attacking this problem sooner than later is great. For NeurIPS presentations, there were a couple of steps taken to help with current and future reproducibility, including:
- The reproducibility checklist. The authors of each paper had to submit a checklist answering questions about their work ranging from “do you provide training/evaluation code and pre-trained models?” to “do you provide a description of your computing infrastructure?” Analyzing such checklists for this and future conferences will help make a recipe for a successful paper.
- Code submission policy. The authors of each paper are encouraged to provide the code used in their paper, which provides a reference implementation of the ideas in the paper. If we want to use them or compare our models with that paper, it makes it easier and also increases our confidence that the comparison is fair.
- Reproducibility challenge. Simply put, the best way to figure out if a paper is reproducible is to try and replicate it yourself! People trying to recreate it can ask the authors for the missing details, which helps strengthen the papers from this conference – and across the industry more generally. This challenge, combined with the reproducibility checklists, results in a refined recipe for industry and research success.
When developing a model for a new problem or if we need to make significant improvements on an old one, the first thing we do is to read papers for the problem or related ones. It’s not uncommon to find a paper with promising results, but face issues when we try to implement it. My hope is that these steps will make the new papers clearer and will help us incorporate ideas from them into our product.
Takeaway #2: Across the field, we need to focus on “opening the black box” and explaining why a model does what it does.
Deep learning models make some pretty surprising mistakes. In my work, I often find it hard to explain to a colleague why something happened, but it’s essential to dig around – even when it requires extra work – and see if there’s an obvious explanation or if there’s a systematic error that the model makes and we can fix. Some obvious explanations include unreadable text, bad image resolution or symbol lookalikes (try differentiating the handwritten letter “O” versus the number zero in an IBAN or other account number). Some explanations take more time to figure out, and we have tooling for them, but figuring out what causes the problem – especially during development of new features – is way trickier. This is part of the promise of explainability and interpretability: hinting what part of the input is the main cause for a specific result. Having a plausible explanation will speed up and simplify the debugging process.
Takeaway #3: Sometimes you need to trust the model’s uncertainty.
At Hyperscience, with the customers we work with, it’s not uncommon that a client asks for 99.5% accuracy. This is a pretty hard task to be fully automated, especially for tasks like data extraction or handwriting recognition, where the number of possible outputs is practically infinite and some fields are hard for humans to read and distinguish. We solve this problem by asking our models to output a value indicating how “certain” they are in their output. By automating only tasks our model believes are solved correctly, we’re able to deliver the target accuracy with some incredible automation – certainly higher levels than other solutions in the market are capable of delivering. Naturally improving the uncertainty estimation of the model improves the automation. As someone who has been here since 2017, it’s incredibly rewarding to see the development in this area and we’ve already tested some ideas from those papers on our models.
Interested in building a world-class machine learning product? Discover open roles across our global offices, including DevOps Engineer, Senior QA Automation Engineer, Backend Software Engineer, and ML Engineering Team Lead out of our Sofia office.