GSoC 2024 Project : AgroNota - Empowering Agriculture Through Machine Learning ⚡

GSoC 2024 Project : AgroNota - Empowering Agriculture Through Machine Learning ⚡

Google Summer of Code (GSoC) is an annual program by Google that offers students (above the age of 18) worldwide the opportunity to work on open-source software projects. Students gain real-world coding experience while collaborating with established open-source organizations. It's a platform for learning, contributing, and fostering innovation in the tech community.

This year, I am thrilled to participate in GSoC 2024 under the organization openSUSE for the Rancher project, working on an exciting project called AgroNota. My involvement is under the mentorship of an open-source organization, and our goal is to create an ML-driven application that revolutionizes agricultural practices by predicting optimal nitrogen values for soil fertilization and estimating crop yield. This initiative aligns with our broader objectives of promoting sustainability, mitigating climate change, and enhancing agricultural productivity.

Here’s a link to my project — Analytics Edge Ecosystem Workloads

In this project, we aim to develop an interactive Machine Learning application within a scalable, cloud-native infrastructure, all built on open-source principles. The goal is to provide a robust tool that addresses the specific needs of the agricultural sector, with a focus on optimizing fertilizer use and predicting crop yields to promote sustainable farming practices.

The core objective of AgroNota is to create an ML-driven application that predicts the optimal nitrogen levels required for soil fertilization and accurately estimates crop yields. This application is designed to be accessible to farmers, providing them with valuable insights that can be easily deployed on a Kubernetes cluster.

All deployments will be managed on Rancher-controlled Kubernetes clusters, utilizing platforms like K3s, RKE, or RKE2 to ensure efficient and seamless operations.

Project Overview 🤯

The AgroNota project is divided into two main parts:

  1. Predicting Optimal Nitrogen Value: This component focuses on analyzing environmental factors such as temperature, humidity, pH levels, rainfall, and irrigation data. Using machine learning algorithms, the application will forecast the optimal nitrogen value required for soil fertilization and recommend suitable crops. This empowers farmers to maximize crop yields while minimizing the adverse effects of excessive fertilizer use.

  2. Estimating Crop Yield: This part involves predicting crop yield based on factors like crop weight, moisture content, and cultivated area. Accurate yield predictions will enable farmers to make informed decisions about crop selection and optimize their crop planning strategies.

Design & Approach

  • Backend: Developed using Python, Google Colab, and PyCharm. The backend involves creating datasets with NPK values, temperature, humidity, pH, and rainfall data. Various libraries like Pandas, NumPy, and Matplotlib are used for data analysis and visualization.

  • Frontend: Developed using Streamlit with the navigation. This includes designing an intuitive user interface to interact with the application.

  • Deployment: The application will be deployed using Kubernetes on an AWS EKS cluster managed by Rancher. A CI/CD pipeline will be set up using GitOps workflows with ArgoCD.

We had 3 phases for our project — Community Bonding period, Coding Phase 1 and Coding Phase 2.

Here’s my GSoC’24 project link — https://summerofcode.withgoogle.com/programs/2024/projects/lwxLIHO2

  • Community Bonding period

I began exploring the sample rancher and Kubernetes environment provided to me. I installed Longhorn on the Kubernetes clusters, created new multi-node clusters, connected them with Rancher for easy management and tested them with sample workloads like WordPress and busybox.

SUSE tools used

  • Coding Phase — 1

The first coding phase focused on developing the core machine learning models for predicting the optimal nitrogen value. This involved:

  1. Data Collection and Preparation: Gathering datasets with environmental factors such as temperature, humidity, pH levels, rainfall data and crop-related parameters.

  2. Model Training and Testing: Implementing various machine learning algorithms to predict the ideal nitrogen value and crop recommendations. Techniques such as data classification and regression analysis were utilized.

  3. Evaluation and Refinement: Assessing model performance and refining the algorithms to enhance accuracy.

  • Coding Phase — 2

The second coding phase concentrated on developing the crop yield estimation component and Deployment. This included:

  1. Data Analysis: Analyzing parameters like crop weight, moisture content, and cultivated area to predict crop yield.

  2. Model Development: Creating and testing machine learning models to estimate potential crop yields.

  3. Integration: Integrating the crop yield estimation functionality with the existing application and ensuring seamless user interaction.

  4. Deployment: Building the docker image of the application and finally deploy onto Kubernetes cluster managed by Rancher.

Challenges

  • Data Quality: Ensuring the accuracy and completeness of datasets.

  • Model Accuracy: Fine-tuning machine learning models to improve prediction accuracy.

  • Deployment: Managing the deployment process and CI/CD pipeline effectively.

Technical Overview 🚀

  1. Data Collection: Gathering and preparing datasets for model training.

  2. Model Development: Implementing and testing machine learning algorithms.

  3. Deployment: Using Kubernetes and AWS EKS for application deployment.

  4. CI/CD Pipeline: Setting up automated workflows using GitOps with ArgoCD

    Access the code for the aforementioned project in the GitHub repository provided below. This repository encompasses all the code developed throughout the GSoC 2024 final Evaluation. For additional details and a comprehensive deployment guide, refer to the same GitHub repository:

Further Steps

I plan to write a SUSE technical reference documentation (TRD) covering my project — that will explain all the steps from creating the ML models to deploying them on Kubernetes clusters using Rancher in detail. I will update its link here when it gets published on the SUSE documentation site.

Acknowledgements 🙏

I would like to thank my mentors Bryan Gartner, Ann Davis and Terry Smith for their unwavering support and inspiration. Their invaluable guidance led me to solutions and instilled in me a culture of iterative improvement within the realm of Machine Learning. Engaging conversations throughout the GSoC tenure were truly captivating, fueling my desire for continuous learning.

I'm equally appreciative of the SUSE community for their backing.

Lastly, I'd like to acknowledge Google for orchestrating the Google Summer of Code, an initiative that has undeniably elevated the open-source community.

About Me 🚩

I appreciate your time in reviewing my GSoC 2024 project endeavors and the associated experience. I hope it resonated positively with you ❤️