Hi there! I'm a former Microbiologist turned Data Scientist, with a passion for harnessing the power of machine learning, AI and visualizations to uncover hidden stories in data. I am also a video game, Raspberry Pi and bushwalking enthusiast.
This POC project provides a Flask-based API for predicting house prices using a Random Forest regressor. The model, trained using data scraped from a real estate website, outputs predictions based on input features like building type, area, rooms, and garage availability. The app is containerized using Docker for easy deployment and scalability.

Scientific papers are often long and contain technical jargon. In this proof of concept project, I utilized several open access Python libraries (LangChain, Hugging Face/Transformers) and open-weights large language models (Mistral-7B and Zephyr-7B-Beta) to create ChatGPT-like chatbots that can summarize and answer questions from an uploaded scientific PDF document.
Pre-trained language models such as BERT can often be fine-tuned on biomedical data to improve their ability to perform domain-specific tasks. In this project, I fine-tuned a small BERT uncased model to perform text classification and extractive Q&A using the Hugging Face/Transformers library in Python.
Stable Diffusion 2 is an open source, text-to-image model capable of generating high-quality images from textual descriptions. In this fun weekend project, I deployed Stable Diffusion 2 as a serverless endpoint on RunPod (via a Docker image), leveraging its cloud GPU resources to handle real-time image generation. This setup enables efficient, scalable, and cost-effective deployment, making it easy to integrate AI-driven image generation into various applications.

Bibliometrics often allows a researcher to quickly and quantitatively identify prominent, highly published authors who are likely to be leaders in the field. In this project, I use R, VOSviewer and open-source data from EuropePMC to identify these experts.

In this project, I developed an interactive R Shiny dashboard that provides insights into infection-related clinical trials using data from the AACT (Aggregate Analysis of ClinicalTrials.gov) database. The dashboard allows users to quickly explore trial sponsors, phases and geographic distribution, offering a comprehensive view of the clinical trial landscape.
Adopting a pet can be a daunting experience without proper research. The aim of this project was to determine the key themes in the description of cats that were available for adoption in Melbourne using methods such as LDA topic modelling and Scattertext in Python. I also experimented with basic machine learning models to determine factors that could influence adoption fees.

Wager K., Wang Y., Liew A., Campbell D., Fettig L., Liu F., Martini J-F., Ziaee N., Liu Y. Using bioinformatics and artificial intelligence (AI) to map the cyclin-dependent kinase 4/6 inhibitor (CDK4/6i) translational biomarker landscape. Poster presented at the Annual ESMO Breast Cancer Congress, Berlin, Germany, 2023
Rawlings H., Rees T., Koti L., Pal A., Liew A., Why did it go viral? An informatics-based case study of exaggerated language in news and social media. Poster presented at the European Meeting of the International Society for Medical Publication Professionals (ISMPP), London, UK, 2023
Banner S., Rees T., Liew A., Brown N., Dhanky V., Humphreys L., Naimy H., Peters D., Young F., Factors influencing pharmaceutical industry-affiliated clinical trial publication timelines. Poster presented at the 19th Annual Meeting of the International Society for Medical Publication Professionals (ISMPP), Washington, DC, USA, 2023