We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Senior Scientific Knowledge Engineer

GlaxoSmithKline
paid holidays
United States, Massachusetts, Cambridge
Mar 24, 2026

The Onyx Research Data Tech organization is GSK's Research data ecosystem which has the capability to bring together, analyze, and power the exploration of data at scale. We partner with scientists across GSK to define and understand their challenges and develop tailored solutions that meet their needs. The goal is to ensure scientists have the right data and insights when they need it to give them a better starting point for and accelerate medical discovery. Ultimately, this helps us get ahead of disease in more predictive and powerful ways.

Onyx is a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward:

  • Building a next-generation, metadata- and automation-driven data experience for GSK's scientists, engineers, and decision-makers, increasing productivity and reducing time spent on "data mechanics"

  • Providing best-in-class AI/ML and data analysis environments to accelerate our predictive capabilities and attract top-tier talent

  • Aggressively engineering our data at scale, as one unified asset, to unlock the value of our unique collection of data and predictions in real-time

The Scientific Knowledge Engineering team, which sits within the Onyx Product Management organization, is responsible for the data modeling, ontology definition and management, vocabulary mapping, and other key metadata activities that ensure Onyx platforms and data assets speak scientific language. They are a core factor in delivering the GSK R&D Knowledge Graph - the semantic layer that connects all of our data and metadata systems - as well as the core metadata experiences that ultimately allow us to build products and services that both delight our customers and enable impressive automation and intelligence.

This role is responsible for maximizing the value of our data assets over a lifetime to bring purpose to data by acting as translators of highly technical information from domain experts into an appropriate data model - complete with significant ontology and vocabulary - that can be utilized to effectively structure and index the data. Specifically working with Product managers and R&D subject matter expertise to define the language (data models, ontology, standards, etc.) of science into data products by acting as the voice of "Knowledgebase" and interoperability/value of asset. This includes responsibility for the understanding and translation of computational methods back through the data chain to maximize the quality and speed of data from source to drive experimental multi-variant analysis and data driven decision-making teams to ensure that we have the right infrastructure components to power our platforms and services reliably and securely.

Key responsibilities for the Senior Scientific Knowledge Engineer include:
  • Definition of schemas and data models of scientific information required for the creation of value adding data products. This includes accountability for the quality control and mapping specifications to be industrialized by data engineering and maintained in platform provisioned tooling.

  • Accountable for the quality control (through validation and verification) of mapping specifications to be industrialized by data engineering and maintained in platform provisioned tooling - e.g., models, schemas, controlled vocab.

  • Working with Product managers/engineers confidently convert business need into defined deliverable business requirements to enable the integration of large-scale biology data to predict, model, and stabilize therapeutically relevant protein complex and antigen conformations for drug and vaccine discovery.

  • Collaborate with external groups to align GSK data standards with industry/ academic ontologies ensuring that data standards are defined with usage/analytics in mind. They may also provide data source profiling and advisory consultancy to R&D outside of Onyx.

  • Support effective ingestion of data by GSK through understanding the entry requirements required by platform engineering teams and ensuring that the "barrier for entry" is met e.g. Scientific information has the appropriate metadata to be indexed, structured, integrated and standardised as needed. This may require articulation of GSK engineering standards and metadata information needs to third parties to ensure efficient and automate ingestion at scale.

  • Provides bespoke subject matter expertise for R&D data to translate deep science into data for actionable insights

  • Champion data lineage, data quality, and FAIR data principles across the Onyx platform, working with engineering and product teams to embed governance and quality frameworks into data pipelines

  • Contribute to and maintain documentation of data standards, ontology decisions, and mapping rationale to support organizational knowledge transfer and auditability

  • Support self-service data enablement by ensuring metadata and knowledge products are accessible, well-documented, and usable by scientists and analysts without requiring bespoke engineering support

Why you? Basic Qualifications:

We are looking for professionals with these required skills to achieve our goals:

  • Masters degree in Bioinformatics, Biomedical Science, Biomedical Engineering, Molecular Biology, or Computer Science (with a life science application focus)

  • 6+ years of relevant work experience

  • Experience contributing to Knowledge Graph development efforts, including entity modeling, relationship design, and schema governance

  • Experience in operating and leading across organizational boundaries a matrixed team

  • Experience with industry standard data management / metadata platforms e.g. Collibra, Datahub, Datum, Informatica

  • Proficiency in at least one programming language - preferably Python - for scripting vocabulary mappings, building data models, automating QC, and prototyping pipelines

  • Experience with bioinformatics pipelines and workflow management systems (e.g., Nextflow,)

Preferred Qualifications:

If you have the following characteristics, it would be a plus:

  • Membership of industry committee, board, consortium, or data standards group

  • Participation in peer-reviewed research (both publication and review), particularly in genetics and/or bioinformatics

  • Experience with data governance and data quality tooling (e.g., Ataccama, Informatica, Talend, OpenRefine, Great Expectations, dbt)

  • Experience with industry standard tools for building data protocols e.g. Avro, Protocol Buffers, Thrift

  • Experience with at least one programming language - e.g. Python - for scripting vocabulary mappings, building data models, etc

  • Experience supporting LLM integration or AI-readiness workflows - including metadata enrichment, entity linking, embedding pipelines, or retrieval-augmented generation (RAG) architectures

  • Understanding of vector databases and their role in semantic search and knowledge retrieval (e.g., Weaviate, Chroma)

  • Familiarity with cloud data platforms and infrastructure relevant to large-scale biological data (e.g., AWS, GCP, Azure)

  • Familiarity with graph database technologies (e.g., Neo4j, Amazon Neptune, Stardog, GraphDB, TigerGraph)

  • Experience designing or contributing to FAIR data frameworks, data catalogs, or self-service data enablement initiatives

  • Hands-on experience with open-source ontology tools and languages: Protege, SPARQL, OWL, SKOS, SHACL, RML, RDF/Turtle

  • Working knowledge of major life sciences ontologies: Gene Ontology (GO), OBO Foundry ontologies (CL, UBERON, HPO, MONDO, CHEBI, EFO, CLO), MeSH, SNOMED CT, UMLS

  • Experience with CDISC standards (SDTM, ADaM, SEND) or other clinical/research data standards relevant to pharmaceutical R&D

  • Familiarity with linked data principles and semantic web technologies

  • Experience with multi-omics data formats and standards (e.g., h5ad/AnnData, VCF, FASTQ, MAF, CellXGene, GEO)

  • Experience with industry-standard tools for building data serialization protocols (e.g., JSON Schema, LinkML)

#GSKOnyx #LI-GSK

* If you are based in Cambridge, MA; Waltham, MA; Rockville, MD; or San Francisco, CA, the annual base salary for new hires in this position ranges $145,200 to $242,000. The US salary ranges take into account a number of factors including work location within the US market, the candidate's skills, experience, education level and the market rate for the role. In addition, this position offers an annual bonus and eligibility to participate in our share based long term incentive program which is dependent on the level of the role. Available benefits include health care and other insurance benefits (for employee and family), retirement benefits, paid holidays, vacation, and paid caregiver/parental and medical leave. If salary ranges are not displayed in the job posting for a specific country, the relevant compensation will be discussed during the recruitment process.

Please visit GSK US Benefits Summary to learn more about the comprehensive benefits program GSK offers US employees.

Why GSK?

Uniting science, technology and talent to get ahead of disease together.

GSK is a global biopharma company with a purpose to unite science, technology and talent to get ahead of disease together. We aim to positively impact the health of 2.5 billion people by the end of the decade, as a successful, growing company where people can thrive. We get ahead of disease by preventing and treating it with innovation in specialty medicines and vaccines. We focus on four therapeutic areas: respiratory, immunology and inflammation; oncology; HIV; and infectious diseases - to impact health at scale.

People and patients around the world count on the medicines and vaccines we make, so we're committed to creating an environment where our people can thrive and focus on what matters most. Our culture of being ambitious for patients, accountable for impact and doing the right thing is the foundation for how, together, we deliver for patients, shareholders and our people.

If you require an accommodation or other assistance to apply for a job at GSK, please contact the appropriate Recruitment Staff by emailing us at - usrecruitment.adjustments@gsk.com

GSK is an Equal Opportunity Employer. This ensures that all qualified applicants will receive equal consideration for employment without regard to race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), parental status, national origin, age, disability, genetic information (including family medical history), military service or any basis prohibited under federal, state or local law.

Important notice to Employment businesses/ Agencies

GSK does not accept referrals from employment businesses and/or employment agencies in respect of the vacancies posted on this site. All employment businesses/agencies are required to contact GSK's commercial and general procurement/human resources department to obtain prior written authorization before referring any candidates to GSK. The obtaining of prior written authorization is a condition precedent to any agreement (verbal or written) between the employment business/ agency and GSK. In the absence of such written authorization being obtained any actions undertaken by the employment business/agency shall be deemed to have been performed without the consent or contractual agreement of GSK. GSK shall therefore not be liable for any fees arising from such actions or any fees arising from any referrals by employment businesses/agencies in respect of the vacancies posted on this site.

Please note that if you are a US Licensed Healthcare Professional or Healthcare Professional as defined by the laws of the state issuing your license, GSK may be required to capture and report expenses GSK incurs, on your behalf, in the event you are afforded an interview for employment. This capture of applicable transfers of value is necessary to ensure GSK's compliance to all federal and state US Transparency requirements. For more information, please visit the Centers for Medicare and Medicaid Services (CMS) website at https://openpaymentsdata.cms.gov/

Applied = 0

(web-bd9584865-7clgh)