PostgreSql
Project and Goal
An overseas corporate client who have been using Filemaker Pro (FMP) for their business process for
many years decided to switch over to
PostgreSql (PG) in a phased manner. A huge part of the vital data
was stored in FMP tables and these had to be migrated to PG tables.
They conducted an extensive search for Solution Providers who would effect the migration accurately and
within the expected time schedule. After a few detailed exchanges and thorough assessment of our
technical capabilities, they were satisfied that we would fulfil their expectations and awarded us the
project.
Many Challenges
- There was no correspondence between the tables in FMP and those in the PG database. It was not a simple case of reading the values from FMP tables and migrating the same to PG tables of matching/ compatible structure.
- In many cases a single table in FMP was a mini DB in the sense that it was a collection of tables. Many of these tables had 500-800 columns though in any row only 20-30 columns had data while the remaining were unused. It was like some of the rows using a set of columns to record the prices of commodities, some other rows using a completely different set of columns to record purchases and still more rows using yet another set of columns (disjoint from the first 2) to record sales – and all this in a single table. This meant that the programs for the migration process needed to dynamically identify the rows for different target tables.
- In FMP tables there is a concept/ facility of columns having repeated values – very similar to an array storing multiple values. And in the case of most of the tables in FMP, the columns with repeated values for a single target table (in PG) occurred in groups. For example, one column in a row could have 25 elements with each value representing the description of the item. A second column in the same row would then have 25 corresponding elements representing the quantity and a third column would have 25 entries representing the price. There would be other columns in the row without repeating data but forming part of the record. In the above hypothetical case, the data would need to be migrated to 25 rows in such a way that Item_DescriptionN, Item_QuantityN and Item_PriceN are row aligned (moved to the same row) in the target table. And the data in the single value columns (non-repeating) in the row would need to be replicated to all the 25 rows i.e. the values for these columns with non-repeating data would be identical for all the 25 rows.
- The FMP tables had columns with names like ‘2014’, ‘Profit / Sales %’ which did not conform to PG naming rules. The first step in the road map was to move the data from FMP tables to temporary mirror tables in PG (proxies for the FMP tables) so that for each column in a FMP table there would be a matching column in the corresponding (temporary) PG table but we could not use ‘2014’ and ‘Profit / Sales %’ as names for PG Table Columns.
- PG allows very large number of rows to be created in a single INSERT statement. In practice, such inserts were found to be unacceptably slow owing to the volume of data (large number of columns, rows and length of data in many of the columns). Populating the mirror tables (proxies) in PG with data from FMP tables was no easy task.
- As mentioned above, many of the columns in FMP tables had repeated values. The delimiter between these repeated values was the Linefeed (LF) character. These LF characters could not be embedded in INSERT scripts (for populating mirror tables) as these characters split and mangled the SQL statements.
- The client had populated the Master Data (Currency, Sections, Departments, Measurement Units etc.) in the respective tables. They had developed programs for working with tables in PG and as a result many of the target tables (for migration) were already populated with a small number of rows. We had to ensure that the primary keys generated for the records (older FMP data) migrated by us were distinct.
Development Platform
Ubuntu Linux 20.04
Programming Languages Used
- SQL (PG)
- PL/pgSQL ( PostgreSql procedural language)
- LibreOffice Basic Scripting Language
- Filemaker Scripting
- C (GCC) with libpq ( PostgreSql C API library)
Other Utilities
pg_dump (PostgreSql application)
Migration in multiple steps
- The tables from FMP were exported to excel spreadsheets through FMP scripting.
- Table Creation scripts for the mirror tables in PG and the INSERT scripts for populating these mirror tables (with the data from the excel sheets) were generated through a generic program prepared in Libre Office Basic Scripting language.
This generator program was designed to also handle multiple problems listed above:- problematic characters like LF (challenge no. 6)
- generate column names for the mirror tables (in PG) as per PG naming rules (challenge no. 4) and also persist the column name mapping information (FMP_Column_Name -> PG_Column_Name) in a table)
- programmatically adjust the volume of data in each INSERT to manageable chunks and copy the data to the mirror tables in multiple iterations (challenge no. 5).
- The next step was to create the target tables in PG that needed to be populated with the data from the FMP tables (in this case from the mirror tables in PG which were the proxies of the original FMP tables).
These target tables had to be created without the foreign key and other constraints as the data from FMP was not guaranteed to satisfy the constraints. Generating the individual table creation scripts using tools and then manually removing the constraints was laborious.
Instead the scripts for tables creation (but without the constraints) were generated through a program written in C. - Data Mapping came next. For each column in a target table (in PG) we needed to know the FMP table and column from which we had to copy the data (Data Mapping: Source Table/Column -> Destination Table/ Column). This information was obtained from the client in an excel file. This mapping was different from mapping FMP table/ column names to the proxies in PG described in Step 2 above.
- Again through a generic program written in Libre Office Basic, scripts were generated for persisting this data mapping information to a table (in PG).
- A final stored function written in PL/pgSQL generated SQL statements for populating the target tables by reading data from the mirror tables in PG (proxies for the original FMP tables) i.e. INSERT via SELECT. The primary key id values for the migrated records were also auto generated.
This stored function read the data mapping information (described in Step 5) and also the column name mapping information (described in Step 2) to generate the INSERT via SELECT statement.
A good deal of the data for the target tables came from the Master Tables (mostly in the form of Key ID values – for example storing a Section ID instead of the Section Name). The descriptive names in FMP tables (currency name/ department name/ user name etc) had to be matched with the corresponding columns in the PG Master tables for retrieving the Key ID. Instead of importing the Master Tables to the DB which served as the workbench for the migration, external links to these tables were created for matching and obtaining the Key ID.
In the last step the migrated data was exported using pg_dump utility.
Most of the migration process was automated by developing programs in different languages as mentioned above.
The result was that the migration was not only totally accurate but also done in minimal time. It was also a valuable experience for the development team as the project was also a successful experiment in many new techniques for handling the challenges and exemplified the popular saying
Every challenge is an opportunity!