Andrew Liew

Logo

Hi there! I'm a former Microbiologist turned Data Scientist, with a passion for harnessing the power of machine learning, AI and visualizations to uncover hidden stories in data. I am also a video game, Raspberry Pi and bushwalking enthusiast.



View My GitHub Profile

Selected data science and AI projects


Melbourne property price prediction API

This POC project provides a Flask-based API for predicting house prices using a Random Forest regressor. The model, trained using data scraped from a real estate website, outputs predictions based on input features like building type, area, rooms, and garage availability. The app is containerized using Docker for easy deployment and scalability.

View project on GitHub


Chatting with scientific documents using large language models

Scientific papers are often long and contain technical jargon. In this proof of concept project, I utilized several open access Python libraries (LangChain, Hugging Face/Transformers) and open-weights large language models (Mistral-7B and Zephyr-7B-Beta) to create ChatGPT-like chatbots that can summarize and answer questions from an uploaded scientific PDF document.

View project on GitHub


Fine-tuning AI models for biomedical text analysis

Pre-trained language models such as BERT can often be fine-tuned on biomedical data to improve their ability to perform domain-specific tasks. In this project, I fine-tuned a small BERT uncased model to perform text classification and extractive Q&A using the Hugging Face/Transformers library in Python.

View project on GitHub


Deploying an open source text-to-image model as a serverless endpoint

Stable Diffusion 2 is an open source, text-to-image model capable of generating high-quality images from textual descriptions. In this fun weekend project, I deployed Stable Diffusion 2 as a serverless endpoint on RunPod (via a Docker image), leveraging its cloud GPU resources to handle real-time image generation. This setup enables efficient, scalable, and cost-effective deployment, making it easy to integrate AI-driven image generation into various applications.

View project on GitHub


Expert profiling platform

Bibliometrics often allows a researcher to quickly and quantitatively identify prominent, highly published authors who are likely to be leaders in the field. In this project, I use R, VOSviewer and open-source data from EuropePMC to identify these experts.

View project on GitHub


Building an R Shiny dashboard for clinical trial insights

In this project, I developed an interactive R Shiny dashboard that provides insights into infection-related clinical trials using data from the AACT (Aggregate Analysis of ClinicalTrials.gov) database. The dashboard allows users to quickly explore trial sponsors, phases and geographic distribution, offering a comprehensive view of the clinical trial landscape.

View project on GitHub


Text analysis of pet adoption websites

Adopting a pet can be a daunting experience without proper research. The aim of this project was to determine the key themes in the description of cats that were available for adoption in Melbourne using methods such as LDA topic modelling and Scattertext in Python. I also experimented with basic machine learning models to determine factors that could influence adoption fees.

View project on GitHub


Selected posters presented at conferences