Loader Method Standards and Best Practices

Easily load large amounts of client data into the Kurtosys App using supported file formats. The data is then consumed and loaded by Kurtosys-built loaders, and used in client documents, portals, or websites.

File Formats

Supported formats:

CSV (Comma Separated Values)
XML (eXtensible Markup Language)
JSON (JavaScript Object Notation)
XLSX (Excel Spreadsheet)

Non-supported formats:

DOC/DOCX (Word documents)
PDF
PPT/PPTX (Powerpoint presentations)
JPEG, PNG, etc (Images)

Do:

It is important to use easily understandable and consistent column headers for your data
You should ensure that each column contains only one type of data

Don't:

Mix data types in one column
Leave column headers blank ie unnamed columns within the file

Entity Identifiers

Entities refer to any share class, fund, benchmark, or account. To facilitate accurate data mapping and retrieval, every record for any entity in a data file should have a unique identifier. This enables seamless association of data during the loading process.

Common identifiers include ISINs, Tickers, Fund IDs, CUSIPs, and Account IDs.

Each entity and benchmark must have a unique identifier, and there should be a logical mapping that links them together.

Do:

Ensure Uniqueness:
Each record must have a unique primary key
Use Stable Attributes:
Select attributes that won’t change over time
Keep it Simple:
Prefer a single code rather than a complex string

Don't:

Use Changeable Data:
Avoid attributes that are likely to change
Overload Keys with Meaning:
Keep identifiers free from business logic
Use Large or Complex Keys:
They can lead to mapping issues

Examples of good identifiers:

BIALX (Fund ID)
US1234567890 (ISIN)
BENCH123456 (Benchmark ID)

Examples of bad identifiers:

Fixed Income Composite-Fund (Fund name)
Bloomberg U.S. Treasury Bills (1-3M) Index
(Benchmark name)

Reporting / As Of Dates

Each set of data should include a consistent date across all records.

This maintains uniformity across all datasets for the same reporting period. Consistency in reporting dates enables easy reference and alignment across multiple datasets.

Do:

Use Clear, Consistent Date Formatting:
Ensure all date fields follow a consistent format (e.g., YYYY-MM-DD)
Include Reporting Dates in Every Record:
Every data entry should have a corresponding date to indicate the reporting period.

Don't:

Use Inconsistent Date Formats:
Avoid mixing date formats within the same column. It will result in errors in processing
Omit Dates or Use Incorrect Headers:
Each record must have a date, and the column header should clearly indicate that it represents the date.

Example of good reporting dates:

ID	Name	Reporting Date	Value
1	Name A	2023-12-31	100
2	Name B	2023-12-31	200
3	Name C	2023-12-31	300

Example of bad reporting dates:

ID	Name		Value
1	Name A	2023-12-31	100
2	Name B	12-31-2023	200
3	Name C	31/12/2023	300

Consistent Data Structures

Ensure that data structures remain consistent over time. Do not add, remove, or rearrange columns from one month to the next without notifying Kurtosys about the change.

When data structures remain unchanged, it simplifies the development of automated processes, prevents disruptions in data processing, and reduces the need for frequent adjustments which can become costly over time.

Do:

Preserve Data Types:
Ensure that the data type for each column remains the same across different reporting periods
Keep Column Order Unchanged:
Do not rearrange columns from one reporting cycle to the next to avoid confusion and errors in data handling
Use Formulas in Files:
If formulas are used to calculate values, it is crucial to replace them with the corresponding values

Don't:

Add or Remove Columns:
Do not introduce new columns or remove existing ones between reporting cycles without notifying Kurtosys of the change, as it can disrupt data processing routines
Modify Data Types:
Do not change the data format or type within a column (e.g. from numeric to string or vice versa)
Leave Formulas in Files:
Files with formulas will load the formula as data instead of the resulting value

Example of a good data structure:

ID	Name	Reporting Date	Value
1	Name A	2023-12-31	100
2	Name B	2023-12-31	200
3	Name C	2023-12-31	300
4	Name D	2023-12-31	250

Example of a bad data structure:

ID	Name	Report Date	Value	Comment
1	Name A	2023-12-31	100	New
2	Name B	2023-12-31	Two hundred	Load
3	Name C	2023-12-31	300	Delete
4	Name D	2023-12-31	=100+150	New

Consistency in File Names and Types

It’s important to maintain consistency in data structures over time. One way to ensure this is by establishing consistent naming conventions for file names. This will help in correctly identifying files for processing.

Do:

Maintain File Naming Convention:
Consistently adhere to the established file naming convention to guarantee successful validation and loading of the file
Maintain File Type:
Consistently adhere to the established file type to guarantee successful validation and loading of the file

Don't:

Change File Naming Convention:
Ensure the file name remains unchanged between reporting cycles to prevent validation errors
Change File Type:
Ensure the file type remains unchanged between reporting cycles to prevent validation errors

Example of good file names and types over time:

Date received	File name	File type
1 Apr ’23	monthly_returns_20240331.csv	CSV
1 May ’23	monthly_returns_20240430.csv	CSV
1 Jun ’23	monthly_returns_20240531.csv	CSV

Example of bad file names and types over time:

Date received	File name	File type
1 Apr ’23	monthly_returns_20240331.xlsx	XLSX
1 May ’23	monthly_20240430.csv	CSV
1 Jun ’23	monthly_returns_.xls	XLS

Data Validation Checks

It is important to perform detailed checks on the information provided to ensure that it is accurate, complete, and trustworthy. This will help to spot and correct any mistakes, discrepancies or irregularities in the data. By validating the data against predetermined standards and guidelines, we can assure its dependability and appropriateness for examination and handling.

Validation checks can include:

Check for duplicates
Check column formats
Check column count and order
Check file types
Check that each record has a unique identifier
Check data types
Check file names

Common Mistakes Made

It is recommended to avoid manual data processing, which increases the risk of errors and inconsistencies. Instead, automated processes should be used for successful data ingestion and loading.

Multiple Tables per Page

A common mistake is creating multiple data tables within a single spreadsheet as show in the image.

A computer will rigidly interpret anything in the same row as belonging to the same observation.

Rather consolidate the tables into one. This will ensure a consistent data structure, and avoid errors when looking up data from different tables.

Using Problematic Null or Blank Values

When working with data in a file, it is crucial to distinguish between a zero and a blank cell.

A zero represents a valid measurement, whereas a blank cell indicates that no measurement was taken. In such cases, the computer interprets the blank cell as an unknown value, also known as a null value.

It is important to accurately record zeros as zeros and null values as nulls. Any other substitution for null values should be avoided. The table below shows common substitutions that should not be used.

Data Value	Problem	Recommendation
‘0’	Indistinguishable from a true zero if used to fill blanks in the data	Avoid unless a true zero
‘NA’, ‘na’, ‘N/A’ or similar	Causes problems with data-type validation (turns a numerical column into a text column)	Avoid – leave blank
‘NULL’	Can cause problems with data type validation	Avoid – leave blank
‘No data’	Can cause problems with data type validation	Avoid – leave blank
‘-‘,’+’,’.’	Can cause problems with data type validation	Avoid – leave blank
‘ ‘ (An empty space)	Can cause problems with data type validation	Avoid – leave blank

Using Formatting to Convey Information

A common example of using formatting to convey information is to highlight cells in a specific color that you want to be dealt with differently than others.

Another example is to leave a blank row to indicate a separation in the data. However, it is important to note that both of these highlighting approaches can cause problems since they are undetectable to computers.

Don't do this:

Do this:

Merging Cells or Unnecessary Aesthetic Formatting

When formatting a worksheet, avoid merging cells and using borders to separate data as it can impede the computer’s ability to see associations in the data. Merging cells cause column headers to disappear making it difficult to load data.

Borders alone will not be enough to indicate the start or end of a particular dataset. Rather place different tables on different tabs.

Placing Units in Cells

Units should be consistent and not included in data cells. Instead of noting “$5MM” in a cell, just write “5” and specify the unit in the column header or a separate column to avoid analysis complications.

Using Problematic Field or Column Names

For easy data analysis, use descriptive yet concise column names. Avoid spaces, numbers at the start, and special characters.

Use underscores or CamelCase for multi-word names to maintain readability and data integrity.

Avoid	Alternative	Reason
MaxPercentage ( % )	max_percentage	Uses a special character ( % )
Mean_growth/year	mean_yr_growth	Uses a special character ( / )
Annual returns	annual_returns	Uses a blank character
3 Year Returns	returns_3yr	Uses a blank character and starts with a number