Selected data science and AI projects

Melbourne property price prediction API

This POC project provides a Flask-based API for predicting house prices using a Random Forest regressor. The model, trained using data scraped from a real estate website, outputs predictions based on input features like building type, area, rooms, and garage availability. The app is containerized using Docker for easy deployment and scalability.

View project on GitHub

Chatting with scientific documents using large language models

Scientific papers are often long and contain technical jargon. In this proof of concept project, I utilized several open access Python libraries (LangChain, Hugging Face/Transformers) and open-weights large language models (Mistral-7B and Zephyr-7B-Beta) to create ChatGPT-like chatbots that can summarize and answer questions from an uploaded scientific PDF document.

View project on GitHub

Fine-tuning AI models for biomedical text analysis

Pre-trained language models such as BERT can often be fine-tuned on biomedical data to improve their ability to perform domain-specific tasks. In this project, I fine-tuned a small BERT uncased model to perform text classification and extractive Q&A using the Hugging Face/Transformers library in Python.

View project on GitHub

Deploying an open source text-to-image model as a serverless endpoint

Stable Diffusion 2 is an open source, text-to-image model capable of generating high-quality images from textual descriptions. In this fun weekend project, I deployed Stable Diffusion 2 as a serverless endpoint on RunPod (via a Docker image), leveraging its cloud GPU resources to handle real-time image generation. This setup enables efficient, scalable, and cost-effective deployment, making it easy to integrate AI-driven image generation into various applications.

View project on GitHub

Expert profiling platform

Bibliometrics often allows a researcher to quickly and quantitatively identify prominent, highly published authors who are likely to be leaders in the field. In this project, I use R, VOSviewer and open-source data from EuropePMC to identify these experts.

View project on GitHub

Building an R Shiny dashboard for clinical trial insights

In this project, I developed an interactive R Shiny dashboard that provides insights into infection-related clinical trials using data from the AACT (Aggregate Analysis of ClinicalTrials.gov) database. The dashboard allows users to quickly explore trial sponsors, phases and geographic distribution, offering a comprehensive view of the clinical trial landscape.

View project on GitHub

Text analysis of pet adoption websites

Adopting a pet can be a daunting experience without proper research. The aim of this project was to determine the key themes in the description of cats that were available for adoption in Melbourne using methods such as LDA topic modelling and Scattertext in Python. I also experimented with basic machine learning models to determine factors that could influence adoption fees.

View project on GitHub

Selected posters presented at conferences

Wager K., Wang Y., Liew A., Campbell D., Fettig L., Liu F., Martini J-F., Ziaee N., Liu Y. Using bioinformatics and artificial intelligence (AI) to map the cyclin-dependent kinase 4/6 inhibitor (CDK4/6i) translational biomarker landscape. Poster presented at the Annual ESMO Breast Cancer Congress, Berlin, Germany, 2023
Rawlings H., Rees T., Koti L., Pal A., Liew A., Why did it go viral? An informatics-based case study of exaggerated language in news and social media. Poster presented at the European Meeting of the International Society for Medical Publication Professionals (ISMPP), London, UK, 2023
Banner S., Rees T., Liew A., Brown N., Dhanky V., Humphreys L., Naimy H., Peters D., Young F., Factors influencing pharmaceutical industry-affiliated clinical trial publication timelines. Poster presented at the 19th Annual Meeting of the International Society for Medical Publication Professionals (ISMPP), Washington, DC, USA, 2023