Conversion to Data Lake
On this page you can manage the conversion of your MySQL database to parquet files. It allows you to convert step-by-step all your database data. The conversion of the database is separate from indicating whether you want to use it. We explain how to do that at the end of this section.
Note
The data is not removed from the database during the conversion. This means that you can always go back to the database after the conversion in case something went wrong.
When you navigate to this page, you will see a table Status Overview which shows all the steps of the complete conversion process. We have split up the process in 7 different steps.
You have the option to perform all these steps in one go or step-by-step. In the latter case, you need to click the checkbox in column Not Now for every step which you want to skip. It is recommended to execute these steps in the same order as indicated in the table. Once the run has started, the System Configuration will show the progress and what action is performing at that time. You will see the progress of each step in the top-left corner of the screen. Clearly, if a step is small, you might not see much progress, but only that it has been completed. The step to convert the datasets takes most of the time and therefore, the model will show the estimated remaining time of that step.
Once you are ready to convert your database, the first action which you need to perform is click on the secondary page action “Initialize Conversion Status”. This fills the tables “Configuration Status Overview” and “Dataset Status Overview” with all the available configurations and datasets. Moreover, this shows which configurations and datasets have been already converted and which ones are still to be converted. The first time nothing has been converted yet, but if you have run a part of the conversion, then some might have been converted already.
You can start the conversion process by clicking on the main page action Execute Conversion. We suggest that you run all steps in one go. However, if you would like to run the conversion in steps, you can do this by checking every box in column Not Now, except the next step to be executed.
Once the process is completed you can still decide whether you want to continue using the database or whether you want to start using the parquet files as data source. The default is to continue using the database. Once you are ready to use the parquet files, you can do this by clicking on the secondary page action Set Data Lake as Data Source. If you don’t have that page action, but you have Set Database as Data Source instead, click on “Data source” in the status bar and select Use Data Lake, see Status bar. Now, the secondary page action should be Set Data Lake as Data Source, which you should click to use the parquet files going forward.