THE INTRODUCTION IN A SCIENTIFIC PAPER
- Genaro Pimienta
- Sep 6, 2024
- 6 min read
The grammar and sentence construction rules in scientific writing are well-defined and accept little or no variation. It is therefore unsurprising that while there are a few good references in how to write a research article, the recommendations made by different authors are very similar.
In this blogpost, instead of repeating what has already been written elsewhere, I discuss the impact of a well-formulated hypothesis on the quality of the introduction.
After a brief recount of basic grammar tips, I start by explaining the structure of the introduction section and its purpose in a research article. I next elaborate on the central roles played by the hypothesis and the approach in this part of a research article.
I have devoted the two last sections to present the alternative way of performing science, which has had an uptick since the beginning of the post-genomic era. I refer to the hypothesis-generating scientific method, as opposed to the hypothesis-driven one.
GRAMMAR TIPS
Unexperienced authors tend to make mistakes with the tense, voice and person used in scientific writing. Important considerations are the following.
Person
Avoid personal pronouns to maintain an objective tone.
Voice
Active voice is preferable.
Passive voice may be used to reduce sentence ambiguity or redundancy.
Tense
Use present tense for the background to the topic.
Use past tense when describing the results obtained.
Use future tense for the conclusion and perspective.
Overall, it is important to construct sentences with precise wording and to avoid being repetitive across sections in the paper.
A neutral tone is also appropriate. It is important to avoid superlative nouns and sentences that appear to exaggerate claims, as this tends engender skepticism in the reader.
THE STRUCTURE
Since the mid-1900s contemporary research articles are structured in the IMRAD format, which means that they are composed of four sections: Introduction, Methods, Results and Discussion.
The IMRAD format has variations. One being that the Results and Discussion are often grouped together and followed by a Conclusion section. Other sections not implied in the IMRAD acronym, are the Title, Abstract, Bibliography (references) and Supplementary information. I discussed this matter in my previous blogpost “The Title and Abstract in a Scientific Paper”.
The introduction section in a research article is typically about 500 words and preferably structured in four (sometimes five) paragraphs. Shorter research formats may require two or three paragraphs.
Being the first section of the main body of the paper and right after the title and abstract, the introduction sets the stage before the results and figures are presented to the reader.
THE PURPOSE
The purpose of the introduction is to provide the reader with contextual information about the study and its relevance. The introduction also presents the reader with a logical argument, based on supporting literature, from which a hypothesis or question is constructed, and which justifies the objectives of the study. The last paragraph of the introduction provides a general overview of the results and their implications to the topic as a whole.
A well-structured introduction has four components. Ideally, each component is discussed in a paragraph, although the occasional addition of one or more paragraphs, makes the document easier to digest.
Paragraph 1 — background to the topic
Paragraph 2 — knowledge gap and methodological limitations
Paragraph 3 — hypothesis or question and the approach used to test the hypothesis
Paragraph 4 — results and their implications
THE HYPOTHESIS
A hypothesis, which lacks supporting background. One solely based on a gut feeling or inkling, is pure metaphysical speculation.
The hypothesis is centerfold to the logical justification of the study and must be carefully crafted. A hypothesis must be above all novel. One that helps overcome an outstanding knowledge gap in the topic under investigation.
Therefore, the main body of the introduction is devoted to a solid, yet brief recount of what is known in the field and what still needs to be investigated.
When addressing the “unknowns” in the field, it is also necessary to point out the methodological roadblocks, which stand in the way of knowledge advancement.
A deficient introduction often mischaracterizes a well-understood phenomena in a topic as uncharacterized. Likewise, a deficient introduction may also fail to acknowledge the full extent of methodological limitations in the field.

THE APPROACH
The third paragraph in the introduction is devoted to the hypothesis and the strategy used to test it — the approach.
A hypothesis must be testable (properly speaking, falsifiable) using an approach that delivers unambiguous results.
It helps if the hypothesis has a well-defined scope to assure that the approach used to test it delivers a clear outcome, with space for the generalization of findings.
Furthermore, the collection of methods used in the approach must be carefully chosen to provide an unambiguous interpretation.
This doesn’t mean that the approach used must be the one in vogue or the most expensive methodology available (e.g., the use of next-generation methods for the sake of it). A well-crafted hypothesis is best evaluated with a clever assortment of experimental strategies, whether deemed cutting-edge or not (e.g., northern blots may be preferred over real-time quantitative PCR when a transcript is difficult to amplify without unspecific reaction products).
THE HYPOTHESIS-GENERATING SCIENTIFIC METHOD
The genomic era came about with many paradigm shifts. One is the shift from the traditional hypothesis-driven scientific method to a hypothesis-generating one.
When using the hypothesis-generating scientific method, the project starts as an agnostic one and is tightly linked to high-throughput technology, such as next-generation sequencing (e.g., proteomics and RNA-seq).
Many have denounced the hypothesis-generating scientific method arguing that it is merely a descriptive approach, which generates no mechanistic insight, other than pilling up observation after observation.
This is in many ways true when the agnostic approach is not connected to a hypothesis testing pipeline down the road.

Consider for example a quantitative proteomics experiment to compare proteome abundance remodeling across different cellular states. If the study stops at the description of protein changes and some orthogonal validation with antibodies, then the project is that — nothing but a descriptive exercise using expensive technology (quantitative proteomics).
An agnostic high-throughput approach becomes mechanistic when coupled to the traditional scientific method pipeline.
A way of doing this is by:
Using biologically relevant samples for next-generation sequencing
Performing a robust computational interpretation of the data obtained, preferably with newly developed deep learning algorithms
Using the results to craft well-rounded hypotheses and testing these to obtain mechanistic insight.

“Mechanistic Science” — 2009
NEW PUBLICATION FORMATS
The post-genomic era has led to the creation of new scientific subdisciplines (e.g., genomics and proteomics) and publication formats.
Examples of journals created in response to emerging next-generation sequencing technologies are:
Genome Biology (first published in 2000)
Molecular and Cellular Proteomics (first published in 2002)
Nature Methods (first published in 2004)
Scientific Data (first published in 2014)
Cell Systems (first published in 2015)
These journals accept papers with findings based on the hypothesis-generating scientific method.
In this case the introduction section is slightly different from the traditional one mentioned above. The central component is no longer a hypothesis, but rather the rationale behind the development of an innovative approach to improve high-throughput methodologies. Examples are innovative methodological developments are:
Instrumentation
Computational algorithm
Reagent development
Experimental workflow optimization
Large-breadth OMICs studies of any type are routinely catalogued by specialized repositories (metadatabases) such as the human protein atlas (HPA), the encyclopedia of DNA elements (ENCODE) or ProteomicsDB.
These metadatabases, if used properly, represent a treasure chest for the construction of novel hypothesis.
In the introduction, the background bibliography and methodology used to obtain results have to convince the reader of the relevance of the study.
Stay tuned for our next post on how to draft the Results and Discussion of your paper.
GPR
Disclosure: At BioTech Writing and Consulting we believe in the use AI in Data Science, but do not use AI to generate text or images.
Comentários