CV

General Information

Full Name Swayam Singh
Email Address singhswayam008@gmail.com
Phone Number +91 9116277756
Profiles

Education

  • 2020 - 2024
    B.Tech
    University of Allahabad, Uttar Pradesh, India
    • Relevant Courses:
      • Gained a comprehensive understanding of key areas through courses like Data Structures, Algorithms, Operating Systems, Big Data, and Software Engineering.
    • Skills Acquired:
      • Acquired hands-on experience with multiple programming languages, such as Python, and C++, along with proficiency in Data Analysis and Machine Learning.
    • Projects and Research:
      • Actively involved in projects focused on Machine Learning with Natural Language Processing and Computer Vision.
      • Developing a comprehensive benchmark for performance-improving code-generation analysis of LLMs.
      • Building a system to help users try different clothes virtually by just uploading the image.

Work Experience

  • July 2024 - Present
    Research Fellow
    Microsoft Research
    • Developing LLM with a focus on code generation, instruction-following, and reasoning, working on post-pretraining, online and offline RL methods.
    • Led large-scale model training on multi-GPU clusters, optimizing distributed training workflows and fine-tuning to achieve high-performance code synthesis and execution.
  • July 2024 - Present
    Open Source Engineer (Numpy)
    Quansight-Labs
    • Implementing an extended precision floating point Data Type
    • Extending current level of hardware support for extended precision data types. It no longer looks like x86 is the only game in town.
    • Packaging and distribution of a multi-platform python package that includes C extensions.
  • Aug 2022 - Present
    Data Science Intern
    Scaler (by Interviewbit)
    • Assisted in the development and deployment of predictive models to minimize infrastructural discrepancies and mitigate potential revenue losses. This proactive approach resulted in a 25% uptick in user engagement.
    • Developed automation pipelines to auto-pre-process user engagement data and curate monthly reports and insights on personalized recommendations.
    • Utilized statistical analysis and machine learning techniques to identify infrastructure issues and developed mitigation methodologies.
    • Made core contributions to Scaler modules in Data Science and Machine Learning.
  • Feb 2023 - Present
    Open Source Research Engineer
    BigCode
    • Contributed significantly to research projects enhancing code generation and large language models for code, and evaluated mathematical reasoning strategies.
    • Collaborated on StarCoder, a 15.5B parameter multi-lingual code-gen model trained on 1 trillion tokens, surpassing OpenAI code-Cushman-001 model.
    • Collaboratively developed OctoPack, leveraging Git commits for instruction tuning and leading to the creation of CommitPack data, Extended HumanEval benchmarks and Octo-duo models.
  • May 2022 - Nov 2022
    Machine Learning Engineer Intern
    dataX.ai (CrowdANALYTX)
    • Developed Deep Learning models for business market growth from domains of Computer Vision and Language modelling.
    • Engineered a comprehensive API to automate the conversion of a pretrained model into ONNX format and its seamless deployment via Nvidia Triton Server, culminating in a 12% reduction in VM load.
    • Developed custom CUDA kernels to accelerate 3D image processing for medical scans. Achieved a 2x speedup in segmentation tasks compared to existing CPU-based solutions, thereby improving the overall throughput of the system.
    • Worked on state-of-the-art techniques like DreamBooth, Dichotomous Image Segmentation.
    • Supporting model deployment team in model code analysis and optimizations for DeployX
  • Jan 2022 - April 2022
    Applied Machine Learning Instructor
    Bili Consultancy
    • Mentored final year undergraduate students in their course of Machine Learning.
    • Evaluated and improved the projects developed by students.

Projects

  • Virtual Clothing Assistant(275+ ⭐️)
    • Allow user to try on different clothes virtually, without going to any trial room.
    • Used ResNet101 and UNet to segment the cloth and model respectively and pose estimation using openpose then final predictions are done using the PyTorch implementation of VITON.
    • SSIM (Structural Similarity Index Metric) of 0.895
  • MIRA - Multimodal Image Reconstruction with Attention *
    • MIRA is a multimodal transformer for Text/Image to 3D reconstruction just using single 2D image of object within seconds
    • It uses pre-trained DINO-V2 as encoder and custom triplane decoder that learns to project features on triplane via cross-attention and model the relations among the spatially-structured triplane tokens via self-attention, camera features are modulated within the decoder.
    • Trained by minimizing the difference between the rendered images and ground truth images at novel views, without the need for excessive 3D-aware regularization or delicate hyper-parameter tuning.
  • S.E.A.R.C.H (Systematic Engine for Analyzed Retrieval and Contextual Handling)
    • S.E.A.R.C.H is a production-ready intelligent open-source agent that can process factual information, and documents across web, providing interactive, personalized assistance.
    • Backed by Zephyr-7B-beta model with efficient post-quantized fine-tuned on self-rag dataset using LoRA, to improve cross-domain performance.
    • Production-ready, scalable and efficient Retriever-Augmented Generation system, Retrieving process backed with Keyword extraction + ReRanking.
    • Custom subsequent-query engine allows generating multiple in-context related queries to intelligently cover all possible scenraios.
    • In-built code-interpreter allows efficiently solving reasoning and critical tasks much better than ChatGPT3.5
  • 3D Cervical Spine Segmentation and Multi-Vertebrae Fracture Detection
    • Developed "3D Cervical Spine Segmentation and Multi-Vertebrae Fracture Detection" to automate cervical spine fracture detection from CT scans, aiming to match radiologist accuracy and ensure timely medical interventions.
    • Leveraged the RSNA 2022's Kaggle Competition dataset; implemented a two-stage model, with the first stage focusing on 3D segmentation using EfficientNet + UNet to create binary masks for the cervical vertebrae (C1-C7).
    • The second stage combined Convnext + LSTM for classification. Extracted 15 even slices per vertebrae sample by the z-dimension and added the predicted segmentation mask as an additional channel to distinguish between multiple vertebrae.
    • Created a 3D reconstruction of spine to visualize the bone and corresponding fracture vertebrae.
    • Achieved a Multilabel Dice Score of 0.92 for segmentation and used a Modified BCE loss with a score of 0.365 for classification.
  • PythonCoder [PyTorch]
    • A custom 60M params GPT-like casual code generation model trained and evaluated on large Python codebase for 10k steps.
    • Self-implemented GPT2 architecture using MultiQueryAttention allows fast large batch inference and FlashAttention allows 3x faster calculation of attention values with context window of 1024 tokens
    • Training process leverages the entire context length by concatenating the sequences into a one big chunk, enabling faster training and better consumption of compute resources
  • Image Style Transfer using CNN [PyTorch, Pillow, OpenCV]
    • It can impose the texture/style of one image on the other without disturbing the content of the other image.
    • Used pre-trained VGG-19 network to extract the features from multiple convolutional layers for style and content then merged them while maintaining the content and style tradeoff.
  • Reinforced Edwin Lander [PyTorch, Stablebaselines3, Gym]
    • It's a custom Gym environment built to train RL models on the Edwin Lander game to land via following the rules
    • Current model is trained on PPO algorithm and attained the perfect landing score by training up to 1M steps.
  • Multilingual Name Entity Recognition [PyTorch, Huggingface]
    • XLM-RoBERTa fine-tuned in German, French, and Italian languages enables robust Zero-Shot Classification.
    • It achieves the expected f1-score of 0.8638 on validation set and gives decent f1-scores on low-resource languages.
    • Thoroughly processed and analyzed using a custom Error Analysis pipeline.
    • Ensures reliable classification across multiple languages through meticulous analysis and error handling.
  • Reasoning With StarCoder
    • User-driven playground for interacting with StarCoder, assessing its reasoning capabilities.
    • Evaluation strategies including PAL, TA, and MathPrompter.
    • Model generates code based on the selected strategy followed by the post-processing of model response then executes it on the server, and displays the result.
    • Leverage the infilling and commit-style of StarCoder to self repair the codes for potential errors
    • Flexibility provided by allowing users to edit the generated code and re-run it for customized solutions.

Research and Publication

  • Swayam Singh, Niklas Muennighoff, et al., "OctoPack Instruction Tuning Code Large Language Models" arXiv:2308.07124, 2023. Accepted at Instruction Workshop at NeurIPS 2023, ICLR 2024
  • Swayam Singh, Harm de Vries, et al."StarCoder may the source be with you!" arXiv:2305.06161, 2023. Accepted at TMLR (Transactions on Machine Learning Research)
  • Swayam Singh, Kamalkumar Rathinasamy, et al., "Narrow Transformer, Starcoder-Based Java-LM For Desktop" arXiv:2407.03941, 2024.

Achievements

  • 2024
    • Became Kaggle Competition Expert
    • Research work "OctoPack Instruction Tuning Code Large Language Models" is accepted as the SPOTLIGHT at ICLR 2024 (top 5%)
    • Top 7% (Bronze medal) in Kaggle’s UBC Ovarian Cancer Subtype Classification and Outlier Detection (UBC-OCEAN) Competition
  • 2023
    • My project Clothes Virtual Try On hit 275+ stars on GitHub! 🌟
    • My collaborative research work, "StarCoder may the source be with you!" has been accepted at the TMLR (Transactions on Machine Learning Research).
    • 700+ citations of my research work on Google Scholar
    • My collaborative research work, "OctoPack Instruction Tuning Code Large Language Models" has been accepted at the Instruction Workshop @ NeurIPS 2023.
    • Selected in Amazon ML Summer School 2023, Engaged in advanced ML modules and collaborated with leading Amazon ML Scientists.
  • 2022
    • Secured Top 3 % global rank in Kaggle's 30 days ML challenge.

SKILLS

  • Languages: Python, C++, Cuda
  • Frameworks / Libraries: PyTorch, Tensorflow, Huggingface, OpenCV, scikit-learn, Weight & Bias, Docker, AWS, TFX
  • Technical Skills: Machine Learning, Deep Learning, Reinforcement Learning, NLP , Computer Vision, Data Analysis, Data Structures and Algorithms, NeRF, 3D reconstruction, VLM
Let's Chat 💬

Have a thought? Let's chat!