Posted on
Data Scientist
Lexical Intelligence provides software and services related to processing large-scale biomedical information sources. Our Natural Language Processing (NLP) and analytics software is used by policy and decision makers to evaluate and prioritize current and emerging areas of research.
We are looking for a data scientist to work within the National Institutes of Health. The data scientist will have experience in NLP, applying transformer-based language models and developing robust machine learning pipelines for the categorization of scientific information. The data scientist will have a firm understanding of the nuances involved in unbalanced training sets, test/train/validation splits, gold standards, significance testing, and hyperparameter optimization. The data scientist shall be able to work well within a team of analysts, data scientists, and software developers. The selected applicant will be subject to a pre-employment background and reference check.
Qualifications
• 3 – 7 years of data science experience.
• MS or other degree(s) in data science, biotechnology, computer science or related fields.
• Extensive NLP experience.
• Highly proficient in Python.
Preferred Qualifications:
• Experience utilizing computational language processing techniques, including the development and application of embedding models to capture semantic meaning. Proficient in performing document-level classification tasks, such as categorizing large volumes of text.
• Knowledge of deep learning and traditional machine learning models using libraries such as TensorFlow, or PyTorch, and Scikit-learn.
• Experience using biomedical transformer models.
• Experience with Hugging Face frameworks, including the implementation and fine-tuning of Large Language Models (LLMs), transfer learning techniques, and the use of pre-trained biomedical models to enhance model performance and adapt to specific applications.
• Proficiency in prompt engineering techniques to optimize model performance and achieve desired outcomes.
• Experience using neural network architectures such as recurrent neural networks, and generative models among other techniques and approaches.
• Experience with data extraction from unstructured sources, integrating data from multiple databases, cleaning, and normalizing for downstream data analysis.
• Independent worker with strong analytical skills and excellent written and verbal communication skills.
All candidates will be required to undergo a background check and must be authorized to work in the United States.
Salary and benefits
We offer a competitive salary and a generous benefits package, including full health and dental, HSA and retirement accounts, short- and long-term disability insurance, life insurance, paid time off and 11 federal holidays.
Location: Bethesda, MD – Remote.
Equal Employment Opportunity Policy
Lexical Intelligence, LLC, provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.