Shuai Zheng

Cofounder
Boson AI
4677 Old Ironsides Dr
Santa Clara, CA 95054, US
shuai@boson.ai
Github

About Me

I am building next generation of foundation models and LLM powered products to make AI more accessible to the world at Boson.ai. The company is still in stealth mode. Stay tuned on what will be revealed soon!

In 2019 I received the Doctoral Degree in computer science from the Hong Kong University of Science and Technology. After that, I worked as a scientist at Amazon Web Services until 2023. I led the distributed system and LLM training efforts across Amazon. These include scalable distributed training and inference infrastructures, more intelligent models with hundreds of billions of parameters, and faster distributed optimization algorithms.

We at Boson.ai are hiring full-time machine learning engineers and researchers for building LLM and its applications. Do drop me a line if you are interested and want to know more.

Research Interests

  • Distributed System
  • Large-scale Distributed Algorithm
  • Deep Learning
  • Natural Language Processing

Working Experience

  • Cofounder
    Boson AI
    Santa Clara, CA, USA, Mar 2023 - Present

  • Senior Applied Scientist
    AWS Deep Learning, Amazon AI
    East Palo Alto, CA, USA, Sep 2019 - Feb 2023

  • Applied Scientist Intern
    AWS Deep Learning, Amazon AI
    East Palo Alto, CA, USA, Feb 2018 - Aug 2018

  • Research Intern
    VIPL Group, Institute of Computing Technology, Chinese Academy of Sciences
    Beijing, China, August 2012 - April 2013

Open Source Software

  • MXNet: A deep learning framework that mixes symbolic and imperative programming to maximize efficiency and productivity.
  • Gluon NLP: GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research.
  • Slapo: Slapo is a schedule language for progressive optimization of large deep learning model training.
  • MiCS: MiCS is a proprietary distributed system that enables the training of trillion parameter language models on public cloud. We upstreamed its implementation to DeepSpeed.