Using Talend tRowGenerator to populate tables with foreign key constraints

Many tutorials show how to use the tGeneratorRow component, but most of them (if not all of them) consider ad-hoc examples. Therefore, they’re often hard to apply in real-world situations. In this post I’m going to explain how to populate a table referencing another table.

Let’s consider the following database schema:

This schema only contains two tables linked by a foreign key constraint (purchase.client references client.id). Now let’s also consider that the client table already contains some rows:

Our objective is to populate the purchase table. It’s easy to generate values for the attributes amount and purchased_at. It’s not trivial for the attribute client_id, as its value must be an existing ID in table client to guarantee that the foreign key constraint is satisfied. In order to achieve it, we need to define a job that looks like this:

The first component (tPostgresqlInput count_clients) simply retrieves the number of rows stored in table client:

SELECT COUNT(1) AS nb_clients
FROM public.client;

The second component (tSetGlobalVar nb_clients) sets a global variable to store the value received from the previous component:

This value will later be accessible in the globalMap variable. The third component (tPostgresqlInput client) is half of the solution to the problem we’re trying to solve. It returns all the client IDs associated with their rank. For instance, in our case it will return: (503, 1), (506, 2), …, (521, 10). The PostgreSQL query to get this result is:

SELECT id, 
    row_number() over(ORDER BY id ASC) AS row_number
FROM public.client;

Notice the way rows are ordered doesn’t matter; we could also order by firstname or lastname. The next component (tRowGenerator purchase_row_generator) is the other half of the solution, as it aims at generating values for the attributes of each row:

Instead of generating a value for the attribute client_id, the component generates a random number between 1 and N, where N is the number of rows in table client (remember that this value was previously stored in the globalMap). In our example, N = 10. The tMap component is then used to join the data returned by purchase_row_generator and client:

The final component (tPostgresqlOutput purchase) simply has to insert the rows in the purchase table.

Feel free to ask if you have any questions. I’d also be happy to share my code if requested.