written by dongheeh, hyejinp
1. Machine Learning Workspace Introduction
“All-in-one web-based development environment for machine learning”
This tool provides a docker image with an already built machine learning workspace. One of the advantages of this is web-based. Linux desktop GUI can access through a web browser. Using ml-workspace, it gives convenience to machine learning model development.
Here is the ml-workshop github: https://github.com/ml-tooling/ml-workspace
There are lots of ultimate tools for developers. Among them, we ran Jupyter Notebook which is web-based IDEs for data processing and Netdata for monitoring hardware status. We also verified by using pre-installed machine learning libraries such as pytorch, sklearn, pandas in the process.
We are working on a movie recommendation model system project, so we tried to preprocess this model and data using this tool.
GitHub - ml-tooling/ml-workspace: 🛠 All-in-one web-based IDE specialized for machine learning and data science.
🛠 All-in-one web-based IDE specialized for machine learning and data science. - GitHub - ml-tooling/ml-workspace: 🛠 All-in-one web-based IDE specialized for machine learning and data science.
github.com
2. How to install
First, install the WSL2
1. In Windows Powershell
Executes a shell command in Windows Powershell.
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
2. In Microsoft Store
Install the Ubuntu 20.04.4 LTS
3. Reboot your computer
Second, install and setup the docker
1. here the download link and click the Docker Desktop for Windows
https://docs.docker.com/get-docker/
Get Docker
docs.docker.com
2. Reboot your computer
What is docker?
Docker is an open source project that runs and manages Linux applications as containers using process isolation technologies. Here's a quote from the Docker web page:
A Docker container wraps some kind of software in a complete filesystem that contains everything needed to run the software. This includes code, runtime, system tools, system libraries, anything that is installed on the server. This guarantees that it will always run the same regardless of the environment in which it is running.
https://en.wikipedia.org/wiki/Docker_(software)
Docker (software) - Wikipedia
From Wikipedia, the free encyclopedia Jump to navigation Jump to search Software for deploying containerized applications This article is about the OS-level virtualization software. For the company, see Docker, Inc. Docker is a set of platform as a service
en.wikipedia.org
Third, Getting Started Machine Learning Workspace
1. Executes a shell command in cmd.
docker run -p 8080:8080 mltooling/ml-workspace:0.13.2
and then it will download many files.
you will see this cmd state(don't close the cmd)
2. Open new explore or chrome etc.. and go this link (http://localhost:8080)
and you will see Web-based Jupyter Notebook
3. Desktop GUI (Optional)
Open Tool -> VNC Click!
As the language setting of my browser is Korean, it is indicated as a "연결" button in the image below.
Click this button.
and then you will see below image
Enjoy!!
Apply to My movie recommendation system
1. Use Jupyter Notebook to data preprocessing
What is Jupyter Notebook?
Jupyter notebook is the one of web-based IDEs that allows you to run Python step by step. It can create and share documents containing live code, equations, visualizations, and narrative text and used for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and more. In our project, data preprocessing is required to collect user responses. We can check this conveniently in Jupyter Notebook.
https://en.wikipedia.org/wiki/Project_Jupyter
Project Jupyter - Wikipedia
From Wikipedia, the free encyclopedia Jump to navigation Jump to search Nonprofit organization developing open-source software Project JupyterAbbreviationJupyterFormationFebruary 2015; 7 years ago (2015-02)Typenonprofit organizationPurposeTo support int
en.wikipedia.org
First. Write code for data preprocess and Data upload
Second. Execute code on jupyter
2. Check my hardware state by using Netdata(optional)
What is Netdata?
Netdata is a real-time monitoring program to Linux, and the amount of hardware usage such as RAM of the computer currently in use is arranged in a table. We can easily check the figures in real-time every moment at a glance. After running the model, it allowed us to check our hardware status.
https://en.wikipedia.org/wiki/Netdata
Comment
It comes with the basic modules necessary to carry out a machine learning project, which is very convenient when starting a project. In addition, since it operates based on Docker, projects can be carried out regardless of a specific environment, so even novice programmers can easily build a machine learning environment. It helps.
However, from the point of view of an experienced programmer, it is not necessary. It accommodates more modules than expected, but because it is a workspace, memory can be wasted.
In addition, if a real-time streaming service is required to be applied to our movie recommendation project, it is better to run it in the host rather than serving it in the ML workspace.