![]() Survival rates on the Titanic weren’t really distributed equally as you’ll find out pretty soon. The family size is a simple sum of the number of parents/childrens, the number of siblings/spouses and 1 to count the passenger itself. ![]() Since the title is always at the same place in the name variable, it was convenient to use a regular expression to extract the titles from the names of the passengers. The title variable refers to the title of the passenger. As part of the data preparation for fine tuning my model, I create a Title and a family_size variable. I will do the same for the test dataset as it also contains missing Age values. I extract the mean age for each combination and save it in a data frame that I will join back with the existing file. ![]() I n the getOutputSchema, I define the fields I wanna keep along the new ones I just created.Īs mentioned before, I would like to append the mean age for each group Sex-Pclass. In the data cleaning step, I append the new fields directly in the current dataset. If you want to return only certain columns or extract certain information from your R output, you can use this function to define the name and the type of the fields. The second function is used to define a certain output schema. Tableau Prep needs two functions: the first one will contain the actual R code you want to use on your data the second one is giving Tableau Prep the information on how to return the data. ![]() I need to create a R script that contains the basic structure of the code that Tableau Prep is able to read. I will not use those variables here but it could be interesting to use them when you start testing your model. We are then left with the ticket number, the cabin number, the fare of the ticket, the name of the passenger and the location where they embarked. There is also Pclass that I mentioned earlier refers to the class of the passenger. I will use those two variables later to calculate the number of persons that were travelling with the passenger. Let’s also have a look at other key variables: Parch refers to the number of parent/children Sibsp refers to the number of siblings/spouses. I would like to extract the Title from the Name field and create a new field called family size that contains the number of relatives travelling with a passenger. To solve this, we will simply replace those NULL with the mean age for each group Sex-Pclass (one may argue that this is not accurate as it is a ‘naive’ method but I will leave it for now). You can see that Tableau Prep makes it easy to count the null values within Age or see the skewness of the Fare distribution.įrom there, I already notice that the dataset contains some NULL values in the Age variable. You can with one glance get a good idea on the distribution of the variables like Age. The first step is to have a look at the data and that’s something where the interface of prep is very useful. I’ll use the train set to build and select the best model in order to make a proper submission on Kaggle. The dataset comes in two parts: a training set and a test set. But on the night of April, 14th 1912, it hit an iceberg that would lead to one of the most deadly naval incidents ever.Ī small summary of the movie in case you’ve never seen it. The RMS Titanic was the biggest ship ever built and was considered invincible. Today, we'll explore how R can be used within Tableau Prep Builder in order to solve one of the most popular machine learning competitions from Kaggle, the Titanic case. This opens up a whole new world of possibilities since this greatly expands the capabilities of Tableau Prep (See Make your own Table Calculations with Python in Tableau Prep Builder). Since the 2019.3 release, Tableau Prep has included the ability to run scripts leveraging R and Python.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |