MAGIC Mentee Malky M’s Journey in Machine Learning
Last year, Malky M joined the Third Monkeys Mentorship program with little programming knowledge. However, in a short period of time, she accomplished a great deal. She started by learning Python and then delved into the world of machine learning. Eventually, she built a model that can make predictions, which we will discuss shortly.
One of my favorite moments working with Malky was when we encountered difficult concepts. She would ask me to explain them as if she were five years old. This forced me to distill complex ideas into their core concepts. It also demonstrated Malky’s determination to understand things at a fundamental level, a skill that will serve her well in programming and problem-solving.
Now, let’s welcome Malky to share her journey and project.
My name is Malky M, a ninth grader at SKA School. I have a keen interest in the STEM field. My mentor, Leia Einhorn, is a software engineer at Kepler Group in New York. Her hobbies include reading and running.
The name of my project is Sales AI, which utilizes machine learning to analyze item features and sales data to predict future sales. I chose this project to delve deeper into data science, specifically predictive analytics. As someone interested in marketing and targeted ads, I found this project highly relevant.
Throughout this project, I had the opportunity to learn various topics and technologies that were new to me. I gained knowledge in machine learning, including supervised and unsupervised learning, dependent and independent variables, and the use of machine learning libraries. I also learned how to analyze and understand data using Kaggle datasets. Additionally, I acquired Python skills through Codecademy, covering data structures and data types. Google Colab became my go-to tool for coding.
There were several highlights during my journey in this program. Firstly, I was thrilled to learn Python and realize its potential for future use. I enjoyed exploring a topic that genuinely interests me and witnessing my algorithm come together to predict sales. It was fascinating to see the overlap between machine learning and the topics I am currently studying in math, providing me with a deeper understanding.
Of course, there were challenges along the way. Initially, finding a suitable commercial dataset was difficult, and I had to be flexible and change the direction of my project. Coding frustrations also arose when encountering tiny errors that were hard to spot. However, these challenges taught me valuable lessons. I learned the importance of taking the easy route instead of making things unnecessarily complicated. I also realized that seeking help is not a weakness and that utilizing available resources is crucial for success. Moreover, I developed the habit of understanding the code I was typing instead of mindlessly copying and pasting.
Let’s dive into the implementation of my project. I utilized various programs and libraries to simplify the coding process. Google Colab allowed me to access my training dataset stored in Google Drive. I imported the dataset and assigned it to the variable “DF” for easier referencing throughout my code. The dataset contained information such as item weight, item type, item visibility, and item fat content.
Cleaning the data was an essential step. I identified and filled in blank spaces within the dataset. For numerical data, like item weight, I used the average to fill in the blanks. For categorical data, like item type, I used the mode. This ensured that the repairs did not affect the overall average.
Next, I encoded the words in the dataset so that the program could read them. I assigned numbers to different entries and variables, making it easier for the algorithm to process the data. For example, “1” could represent “low fat” while “2” could represent “regular.”
|Item Type: Low Fat
|Item Type: Regular
To train the model, I separated the final sales column from the other variables. The final sales were saved under the variable “Y,” while the remaining data was saved under the variable “X.” I used linear regression to calculate the relationship between the independent variables (item features) and the dependent variable (final sales). This process is known as training the model.
Once the model was trained, I tested its accuracy by making predictions. I fed item data into the model and observed how close the predictions were to the actual sales data. To test the algorithm, I imported a separate dataset and performed the same data cleaning and encoding steps. Finally, I ran the test dataset through the algorithm and obtained the predicted sales.
Here are the final predictions:
I have made significant progress in this project, reaching the stage where I can make final predictions with the dataset. In the future, I plan to further improve the accuracy of the model through additional training. I also aim to present the final results using charts and graphs for a more visually appealing representation.
Thank you for listening to my journey. I am open to any questions you may have.
Question 1: How did you find the data you ended up using?
Answer: Initially, I wanted to work with consumer statistics, such as gender-based data. However, it was challenging to find readily available datasets for that specific topic. After extensive searching, I decided to focus on item statistics, which were more easily accessible. I found the dataset through Kaggle.
Question 2: What were the numbers like for the predictions? What was the accuracy?
Answer: The accuracy of the predictions was measured using correlation coefficients. In this case, the correlation coefficient was 0.51, which is considered decent. As a first-time project, I am satisfied with this result. However, I aim to train the model further to improve its accuracy in the future.
Thank you all for your questions and support!