A Step-By-Step Guide To Approaching Complex Research Projects

When you look at a championship sports team, it’s easy to attribute the team’s success to the star players. And, while much of the credit surely does belong to the team, there’s a key player leading them along the journey– the coach. Although the coach can’t score a single point, (s)he has to manage the team and devise strategies. In essence, a coach creates the blueprint for winning.

Similar to coaching, managing research projects requires a team leader, who is responsible for the team’s players, adequate planning, and fostering an adaptive mindset to execute the work (which often changes on the fly). Like a coach devising a roadmap for winning, having a guide for how to conduct research can completely reshape how you approach your projects and maximize your chance at achieving successful outcomes.

To better understand what I mean, let’s rip a page from my seasons of being a team leader.

In 2016, my R&D team at OrCam started developing an Automatic Speech Recognition (ASR) engine from scratch. Since then, we have been consistently conducting a (relatively) long and iterative process of collecting insights and refining our research accordingly.

From our experience learning from our own mistakes, we have developed a methodology for how to perform research that:

Encourages us to take the time, think and then decide what is the best solution, instead of executing the first idea that crosses our minds (which, in most cases, is not the ideal solution).
Promotes focused research.
Minimizes (and enables) the delivery time of highly complex research projects.

We often get asked: Does this methodology slow down project onboarding and cause the work to be exhaustive and delayed?

The answer for that question is simple: our goal is to finish the project ASAP, not start executing it ASAP. By sticking to the methodology outlined below, initiating projects will not take longer than expected, and if it does, you can rest assured that it will be for a good reason.

If this in-depth educational content is useful for you, subscribe to our AI research mailing list to be alerted when we release new material.

A World of Possibilities

At times, kicking off a research project can feel like being a kid in a candy shop.

With so many options in front of you, it can be hard to choose how to proceed. In addition, as you dive deeper into the project, each of the options sounds more fascinating than the last, and the number of options seemingly grows exponentially. But, at the end of the day, you need to stay focused and deliver whatever was asked of you (for example, a feature for a product or a research paper).

The Overall Picture

Roughly speaking, a research project should consist of four (4) sequential stages:

Step 0: Motivation — understand why we are going to work on the project
Step 1: Goals — define what we are trying to achieve
Step 2: Planning — define how we are going to achieve the goal(s)
Step 3: Execution — implementing the selected plan

We would be remiss if we didn’t acknowledge that these steps hold true in theory. In practice, we know that complex research projects tend to be cyclic. This means that during execution, we may need to revisit the goals, rethink our plans, and shift our execution. To explain the methodology simply, let’s assume that these steps take shape in this defined order. But, remember that in practice, these steps will likely be iterative.

Here’s a look at the methodology, as well as some theoretical insights with practical examples.

Step 0: Motivation

The two most important days in your life are the day you are born and the day you find out why. [Mark Twain]

Many people prefer to know Why they are going to work on a project. By answering this question, we will have to understand the motivation behind the project, which leads us to its context.

This step is labeled as zero (0) because while it is not entirely necessary, it serves as a bonus. Understanding the motive behind a project can positively impact how the research and development will be executed.

Research is a dynamic field where things might change rapidly; when things don’t proceed as expected, understanding the motivation keeps us focused and enables us to converge towards a solution that solves the problem, rather than diverging.

Furthermore, when people from other departments ask for a feature or some help, they might be unaware of the dynamics at play. By understanding the reasoning behind their request, we can either agree or suggest an alternative solution that is easier to implement.

Step 1: Goals

If you do not know where you are going, every road will get you nowhere. [Henry A. Kissinger]

We want to know that when we deliver something, we solve the original proposed issue and not a different one (or even worse- no issue at all). Therefore, after the motivation is made known, it’s time to be very clear and answer: What are the objectives of the project?

It’s critical here to define measurements for success (i.e. selecting the benchmarks and evaluation metrics and designing them to serve as a good indicator of progress as you move forward [Ruder, 2021].)

Here is a (partial) list of high-level tips that we’ve found to be useful in constructing benchmarks and selecting evaluation metrics:

Validity: A benchmark should reflect your end goals.

We define our benchmarks such that when we reach the desired Key Performance Indicator (KPI), we achieve our goals. In other words, hitting your benchmark will imply that you’ve been successful in accomplishing what you set out to do. [Bowman, 2021].

For example, in automatic speech recognition (ASR), researchers often use LibriSpeech [Panayotov, 2015] as a benchmark. LibriSeech is derived from audiobooks and consists of audio segments paired with their corresponding transcripts. If one wants to develop an ASR engine for meeting transcription, they should choose a different benchmark as LibriSpeech doesn’t reflect their end goal because spontaneous speech holds different features compared to the structured reading of audiobooks [Aksenova, 2021]. This example showcases how it’s important to choose benchmarks that are aligned with and relevant to your goals.

Reliability: Labels must be correct and consistent [Bowman, 2021].

In order to trust benchmark results, the labels must be correct. It may be the case that you develop your benchmarks in-house or even outsource this stage. Since instruction may be interpreted in different ways, it’s of paramount importance to be concise and accurate when sharing your thoughts with others. Make sure you review the randomly sampled data and labels before you start working.

When thinking about **evaluation metrics, objectivity is preferred over subjectivity** as it reduces biases and enables reproducible evaluation.

We prefer metrics that highly correlate with human perception. Although it sounds simple, finding these metrics can sometimes be challenging in many domains, as it’s hard to imitate human perception.

Furthermore, think about whether all errors should be weighted the same in your statistics. For example, when evaluating language-related tasks, remember that the words that actually matter are rare. Filler words could end up biasing your statistics, so it’s important to rectify this by creating specific benchmarks to address these concerns.

If a human evaluation is required, do your best to reduce its biases. A common solution to do so is to use crowdsourcing, but it should be addressed with great attention to detail, as even the guidelines themselves may inherently possess bias [Hansen, 2021]. Thus, crowdsourcing should be collected from a diverse population and the headcount should be properly determined.

Define success: Performances with respect to the ground truth are not the only measurement for success, and benchmarks should reflect it [Aksenova, 2021].

Achieving 100% accuracy is not always preferable over 95%. Let’s take as an example always-on keywords spotting like ‘Hey Siri’ on the iPhone. It is preferable to have a short response time with lower accuracy rather than it is to have one’s iPhone to always wake up with a delay of 30 seconds.

Benchmarks should approximate reality and evolve as the project progresses.

Benchmarks should be sufficiently large and representative in order to approximate reality [Kiela, 2021].

Furthermore, benchmarks should evolve as the project progresses.

The way we work is to define a benchmark at the beginning of the project, based on our understanding of the problem at that specific point in time. As we work, new challenges arise. To make sure we solve these challenges, new and specific benchmarks are developed.

Step 2: Planning

By failing to prepare, you are preparing to fail. [Benjamin Franklin]

Planning answers: How are we going to achieve our goals? or What is the best method to solve the challenge?

The process could be abstractedly seen as a breadth-first search algorithm, which means that at every decision point, we examine all possible actions before moving on to the next decision. Doing this recursively (bounded in time, of course) results in a clear view of all possible pathways for the project to follow.

Then, we can choose the expected best solution given the constraints (which could include product, release dates, technological stack, etc.). The planning stage is crucial and should be conducted while we are relaxed and open-minded because every wrong decision that one makes here has the potential to cost a considerable amount of time and money.

Planning has two main objectives, namely:

Reducing uncertainty

Eliminating as many unknowns as possible by offering different solutions, taking advice from relevant teams (e.g. product and infrastructure teams), etc.

You’ll often reach a point where two options look promising, and it’s hard to choose between them. To remove unknowns, existing tools (like open-source software) can serve to help develop a quick proof of concept, even if it is not solving your exact problem. For example, if one wants to automatically detect lions within images, and they find an existing tool that detects dogs, horses, sheep, cows, elephants, etc. they can safely assume that lions could be detected as there shouldn’t be a difference between the groups, other than available training data. Furthermore, remember the existing toolset you have in your arsenal while planning. In many cases, it’s quicker to modify existing tools as opposed to developing new ones from scratch.

Setting up the roadmap for the project

List and detail the technical steps that must be completed to achieve project goals, including the timeline. It helps to break complex research projects into small steps. Although each step by itself will not lead us to the final goal, it enables us to steer the research in the right direction along the way instead of just at the end, which tends to be more challenging and time-consuming. Furthermore, it makes the process of finding and fixing bugs more straightforward.

We will demonstrate how complex projects could be broken into small steps by comparing the process to how contractors construct a house. The house, or the end goal, is ready after all construction stages are finished from foundations and framing, infrastructure (plumbing, electricity, etc.), to interior trim, to name a few. The construction stages serve as the small tasks; although each stage on its own will not create a house, the combination of each finished step will. Furthermore, if there is a problem at a particular stage of the execution, it’s easy to pinpoint the issue and assign the necessary professional to fix it. The same can be said when you break down a complex research project into its parts.

It is also important to include a solid baseline. We can discuss the importance of baselines and how to construct them in a different post, but to summarize it shortly:

Baselines assess the quality of your method by comparing the outcome to prior work / a naive approach. Do you consider the performance of a highly complexed model that achieves 95% accuracy to be good? If the naive baseline reaches 97%, the answer is no. Baselines are sometimes given, such as an existing component that one wants to improve. But the main takeaway here is that even if baselines don’t exist, make an effort to create one, even if it is naive and will not serve as the ultimate solution. Examples for naive baselines for text summarization can be found in [Ferriera, 2013].

To make the planning as effective as possible, we usually brainstorm. We gather and briefly walk through the motivation and project goals. We then ask each of the participants to think about their preferred solution independently.

Naturally, some people are prone to perform a literature review, while others like to think independently. We encourage this diversity since it helps in keeping some of us more open-minded and less biased towards a certain solution based on our respective discipline and expertise.

After everyone completes their “homework,” we meet again, and each of the participants presents their point of view while everyone else solely listens. This way, judgments like “Who is the most assertive person in the room?” or “My idea sounds silly, I will not mention it” are minimized [Kahneman, 2011].

After everyone shares their solutions, a technical discussion begins. Criticism is welcomed, but it should be specific and well-explained. It might take more than one meeting to agree on the optimal solution, since complex research questions often require further thought.

When the planning stage is finished, we know (up to some extent) the steps we need to execute to achieve our goals, as well as an estimated timeline.

Step 3: Execution

Doing the right thing is more important than doing the thing right. [Peter F. Drucker]

First, make something work, then make it work better.

We prefer to solve problems with an increased level of difficulty, by first solving the challenge without constraints and only then adding constraints and performing optimizations. For example, suppose we want to develop a low latency app for a mobile device. In that case, we will start by developing an offline (non-streaming) version that runs on unlimited computational power (servers). When this is solved, we will add the constraints of streaming and mobile device.

In order to be able to trust your results, you must minimize flaws in your experimental methodology. Musgrave [Musgrave, 2020] showed that after fixing flaws such as lack of validation set for hyper-parameters tuning (that results in training with test-set feedback), metric learning papers between 2016–2020 had marginal improvement at best, although they claimed great advances in accuracy.

Once you’ve minimized flaws and can trust that your module is high-performing and accurate, it’s best to perform an ablation study to understand why it is the case. The ablation can serve to gain an intuition for future projects, to point where optimization of the module should be applied, and to ensure that independent effects won’t conflate.

A great example of the need for an ablation study can be found in Tay et al. [Tay, 2021]. Their paper claims that tying pre-training and architectural advances (i.e. Transformers) in natural language processing (NLP) tasks is misguided and that both advances should be considered independently. Thus, when one applies the pre-train-fine-tune paradigm and wants to improve the model’s performance and memory consumption, they can optimize the search for new architectural changes that weren’t tested before due to this unbreakable tie.

Remember that things exist with context, meaning that you will need to deploy your project in a bigger system. To prevent embarrassing outcomes, you should include a ‘fail safe’ mechanism.

Keep in mind: although we want the original plan to succeed, end users don’t care how you go to the solution; they just care that the product or service works as promised. So, if you see a new approach/technology that outperforms your hard work and sleepless nights’ solution, don’t hesitate to embrace it. In the same vein, try to understand why things may have gone wrong or shifted so you can prevent repeating the same mistake twice.

If you decide to diverge from the original plan, make sure your decision to do so is data-driven. As Jim Barksdale said, “If we have data, let’s look at data. If all we have are opinions, let’s go with mine.”

Finally, build everything using a solid infrastructure that you can modify easily and that can serve multiple purposes. There is a constant tension between running as fast as possible and making something that will last long.

We try to stick to the following rule of thumb: 70–80% of our code is written in a way that even if the project is discarded, we can use it for future projects. This has two benefits: the execution time of future projects is shortened (because we already have part of the building blocks ready), and it enables us to focus on the task-specific core issues.

Let’s take LEGO as an example. Models share most of the same building blocks, and only differ in their size, color, and quantity. Having a unified block set saves the company money by cutting manufacturing costs, among other expenses.

From the R&D aspect, it significantly reduces the amount of work and attention needed for execution (as only a tiny portion of model-specific building blocks are needed), which eventually reduces the delivery time of new models and enables the creation of more models, including complex LEGO sets.

Closing Thoughts

This post aims to present a step-by-step methodology that helps to keep research focused. The methodology can serve to answer three main questions: Why, What, and How, which all guide implementation.

In retrospect, having a managing-research guide like this one would have helped my team and I avoid a plethora of mistakes over the years (despite the fact that I have always had great executive supervision and oversight).

Throughout my professional career, I like to revisit feedback that I’ve received from colleagues and executives. A comment that sticks out is that I “run too fast” once something has to be done, without taking the adequate time to plan. This comment has stuck with me and served as a large motivator in creating and sharing this guide to conducting research.

Practically speaking, once my team and I started to adhere to this methodology, we have witnessed efficiency and less mistakes in our research and projects as compared to when we hit the ground running in the past.

Hopefully, the outlined methodology will benefit you like a coach does a team– by helping you choose the right pathway forward and directing you along your research journey(s). If you have any suggestions or thoughts regarding this step-by-step research guide, I warmly welcome them in the comments.

This article was originally published on Towards Data Science and re-published to TOPBOTS with permission from the author.

We’ll let you know when we release more technical education.

A World of Possibilities

The Overall Picture

Step 0: Motivation

Step 1: Goals

Validity: A benchmark should reflect your end goals.

Reliability: Labels must be correct and consistent [Bowman, 2021].

When thinking about evaluation metrics, objectivity is preferred over subjectivity as it reduces biases and enables reproducible evaluation.

Define success: Performances with respect to the ground truth are not the only measurement for success, and benchmarks should reflect it [Aksenova, 2021].

Benchmarks should approximate reality and evolve as the project progresses.

Step 2: Planning

Reducing uncertainty

Setting up the roadmap for the project

Step 3: Execution

Closing Thoughts

Enjoy this article? Sign up for more AI updates.

Related

Reader Interactions

About Tal Rosenwein

Comments

Leave a Reply

Footer

About TOPBOTS

When thinking about **evaluation metrics, objectivity is preferred over subjectivity** as it reduces biases and enables reproducible evaluation.