This personal statement was part of this student’s successful application in Data Science to UCL, LSE, University of Bristol, University of Warwick and University of Exeter.
“How does Facebook read my mind?”, I find myself asking this every time I log into Facebook. This
curiosity led me to revealing the role of data science behind the scene, and discovering neural
networks, which retains memory of past data and captures trends and patterns in users’ feed to identify
users’ behaviour. Realising the incorporation of Bayes’ Theorem in neural networks to modelling
time-series data into posterior probability to advertise products, bridges my interest to explore more
about the intersections between computer science and mathematics.
Studying A-Levels Further Statistics built my foundation on various statistical techniques such as
hypothesis testing which enables data-driven decision-making. Learning regression analysis
independently, I analysed an advertising dataset by establishing a linear regression model using R to
determine the effects of different advertising on sales. I obtained the p-values for newspapers (0.86),
radio (2×10^-16) and TV (2×10^-16), indicating that TV and radio is a stronger predictive model. I
then tested the linearity assumptions of the model by plotting a residual plot, and it showed that the
points were not equally distributed between the lines. I thought that there might be some outliers in
the dataset which caused the model failing to meet the assumptions. I believe this experience has
allowed me to identify the use of a regression model in measuring the appropriateness and model fit
for a data, while also recognizing the importance of reliable data sources to provide a precise
projection.
With that in mind, I researched on the methods of storing high volumes of data. As I read an academic
paper “A Relational Model of Data for Large Shared Data Banks” by E. F. Codd. I was made aware
that the traditional method, tree-structured files use paths instead of relations, which burdens users to
memorise the domain ordering. Learning that relations can be linked using operations like projection,
join, composition, and restriction was astonishing. As the paper proposed the problem of more
different data types being joined into common data banks, I brought myself to read an article from
“Digital Scholarship in the Humanities”. I was enlightened with the development of DoNoSQL as a
modification of relational databases to treat the flexibilities of data type, especially for semi-structured
data. Although DoNoSQL does ease users in integrating different data types, I think that data may
encounter issues regarding standardisation. These discoveries have inspired me to explore more about
handling raw data so that users can make use of data.
Intrigued by the function of databases ingesting and classifying data, I attended an online course (Data
Analyst Learning Path) held by Google Cloud Program to delve deeper in writing query language. In
one of the modules, I used BigQuery to extract top 5 products’ views with its quantity of orders taken
from Google’s e-commerce public dataset. Running the query languages, I was able to conclude that
the number of orders does not depend on the number of views. Then, I expanded my query to include
the average amount of product per order to determine the product with the highest demand. In light of
this course, I developed skills in using various constructs like HAVING and WITH, enabling data to
convey meaningful information to me.
Participating in several competitions, such as the Beaver Computational Thinking Skills Competition
and the International Mathematics Olympiad National Selection Test, honed my critical thinking and
logical reasoning as I solved numerous questions, such as spotting features and finding patterns.
Achieving High Merit in RAD Intermediate ballet has shaped me into a more confident individual.
After all, understanding the impact of data prediction and data modelling can change historical
records to meaningful hypotheses. I am excited to learn more about the concepts of data science in
meeting users’ preferences.