From the course: MLOps Tools: MLflow and Hugging Face
Parameters, version, artifacts, and metrics - Hugging Face Tutorial
From the course: MLOps Tools: MLflow and Hugging Face
Parameters, version, artifacts, and metrics
- [Instructor] Let's go deeper into parameters, versions, artifacts, and metrics. So we've already seen how we can produce tons of metrics and log them to the UI, and we're going to, I'm going to try to run this databricks/mlflow example, which happens to be in the documentation. I'm going to run this and then I'll show you kind of, like, what the files look like. So I'm going to execute this, it's going to fetch the project, and it's getting into an error. All right, so let's see what's going on here. So one of the things that mlflow is kind of, like, strict about is that the changing parameter values is not allowed. So you can see here that the initial value was five and then kind of, like, tried to log it with a 5.0. So not good. And it actually, the exception message is pretty useful here because you can see that the parameter's being logged kind of, like, this is not okay, right? Because you're saying, hey, this is a parameter of depth three and then parameter of the five. So why is this not allowed? I think it's very helpful because what's happening here is you're logging parameters on how you would run things, say, for example, in the command line interface. So if you're running a command and passing --debug or log level rather equals warning, you're not going to probably change that warning to say debug because you've already run that, right? So it doesn't make sense. So mlflow here will complain when that happens. In this case, you would be able to kind of, like, override some parameter if you actually absolutely 100% need to do that. Now, don't do that, I don't think. So, like, I think, like, the right way is to set your parameters right from the get go and then stick with those and then try to log them. So that's what happened. So let's take a look at what that example looks like. So I'm going to go to the mlflow example and I'm going to actually go here to, because I've already forked the example. And let's take a look at train.py. So train.py will use the famous wine quality CSV dataset and is going to try to train it. That's fine. And let's take a look at where this is logging. So we're passing that value of five, which is alpha right here. So you can see here that the alpha is float c*rv1 if it's more than one. Otherwise it's going to be 0.5. So that is changing somehow because right here it is complaining about it. But the thing is that we need to ensure that this is actually set correctly and not a problem. So line 44, it's right here. And so I've forked the example.py to prevent this problem. I'm going to go back and I am going to show you... Go back again. I'm going to show you kind of, like, the change that I did here to make this work. So I'm just forcing it to be an integer and not worry about it, all right? So with that, let's go and back to the editor and run it again and see what happens. So back to Visual Studio Code. And instead of running for, let me see, running for the Databricks example, we're going to run this one right here. And I'm going to say alpha five and that's not this one. So let's say, so this one with alpha, not alpha four, but alpha five. I'm going to run this and it's going to take a second. All right, so that completed and you can see some values are being set out. Now, if you remember, those values are being captured in the train.py that we saw. So we go back to the browser, you can see those values are right here and that's why those are getting printed. But, you know, printing stuff on the terminal, it's only useful if you're running it directly, not if you're logging these metrics. And mlflow is capturing all of that information for you, including logging the actual model. Now, in this case it's using log model. So it's using a helper from scikit-learn to say, hey, I'm going to produce a model for scikit-learn and I'm going to capture, it's being captured here as lr. So that's basically right here on elastic net is producing that and the calling to fit with the train_x and train_y is producing that model. And then we're kind of capturing there. All right, so that's what's going on. Let's take a look at what the UI says. So I'm going to refresh here and we have several runs now. We have one that failed. So let's take a look at that one that failed. So we can see that the command that ran was the one with Databricks, and that's good because that's the one that doesn't work because of the problem with logging. And you can see here that alpha was five. And this is interesting because it's actually capturing other defaults that are being used with that mlflow run. So that's interesting. Let's take a look at the parameters. The parameters again was alpha was five, so remember it was logged in as five and then it was kind of, like, wanted to change to 5.0, which is a float, and that's not okay. So the ratio is 0.1, that's fine. Some metrics, there are no metrics. Let's see if there are tags, no tags and no artifacts. Why there aren't any artifacts, because the run failed. So you can see here the status is failed. So let's go back and let's take a look. And the last one that we run, which is this one, so very nice, we have a model right here. Let's take a closer look. So alpha was five already. Parameters was able to log them fine. And then we see that there are three metrics now. We have MAER2 and RMSE. So if we click those, it's just a single value that never change. It was always 0.85. And if you remember and I go back to the text data here on Visual Studio Code, that is right here on RMSE. All right, so we run that with five, the alpha five, that's kind of, like, the default. But we can do a few more runs. Let's run it with three and we're changing values on the fly. And then let's change it to 2.5 or 2.4 rather. That actually created a problem because I'm changing a collision there, but, like, let's make an integer and not use a float. That will work. So, all right, so we did a few runs and I'm taking advantage that this machine is able to run this very fairly quickly. I'm going to go back here and take a look at all of the runs. So now they're being captured and you can see here there are several variations. So why is this important? Well, we're seeing now that this is fairly populated. We're actually training with scikit-learn and the parameters are changing a little bit. You can see the 2.4 failed. That's why the metrics didn't, were not captured, but we're also have the ability to look at some of the models. So you can see here that scikit-learn is actually getting captured. And I can look at that experiment and look at the artifact. So let's take a look at the artifact, look at everything that got captured. So we have the ML model, we have the con.yaml, we have the model PKL, which is the pickle file, and then we have all of these, you know, we can do the predictions. I mean, this is pretty remarkable because it's capturing absolutely everything that would be needed to run. So, say, for example, if you want to do that on the pandas data frame, I mean, that's tremendous. So we can import mlflow, load the capture.log model from it tells you exactly where it's coming from, and load the model as a Python model and then do some predictions. So let's try it out and see if that works. So we're going to go to our editor and we're going to try to run, try to create an IPython notebook and let's say test IPython notebook. And we're going to paste that and run it and it's going to install some dependencies. I'm going to close this and it's going to run. All right, so this is running and you can see here that, well, it took 3.8 seconds and I got some warnings in the prediction with pandas, but that's fine. What I wanted to show you is that, well, data was not being defined. There's something else that I need to define there, but it is a good start of, like, how you can very quickly get access to that loaded model. So even if I don't have pandas here, I could just run this and it would be able to fetch that log model and load that from this mlflow helper called load model from this Python. So very, very good thorough way of interacting with the UI and the previous runs.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.