Categories

System Admin
DXM
Documents
Data

Links

Loader Method Standards and Best Practices​

Easily load large amounts of client data into the Kurtosys App using supported file formats. The data is then consumed and loaded by Kurtosys-built loaders, and used in client documents, portals, or websites.

File Formats

Supported formats:

  • CSV (Comma Separated Values)

  • XML (eXtensible Markup Language)

  • JSON (JavaScript Object Notation)

  • XLSX (Excel Spreadsheet)

Non-supported formats:

  • DOC/DOCX (Word documents)

  • PDF

  • PPT/PPTX (Powerpoint presentations)

  • JPEG, PNG, etc (Images)

Do:

  • It is important to use easily understandable and consistent column headers for your data
  • You should ensure that each column contains only one type of data

Don't:

  • Mix data types in one column
  • Leave column headers blank ie unnamed columns within the file

Entity Identifiers

Entities refer to any share class, fund, benchmark, or account. To facilitate accurate data mapping and retrieval, every record for any entity in a data file should have a unique identifier. This enables seamless association of data during the loading process. 

Common identifiers include ISINs, Tickers, Fund IDs, CUSIPs, and Account IDs.

Each entity and benchmark must have a unique identifier, and there should be a logical mapping that links them together.

Do:

  • Ensure Uniqueness:
    Each record must have a unique primary key

  • Use Stable Attributes:
    Select attributes that won’t change over time

  • Keep it Simple:
    Prefer a single code rather than a complex string

Don't:

  • Use Changeable Data:
    Avoid attributes that are likely to change

  • Overload Keys with Meaning:
    Keep identifiers free from business logic

  • Use Large or Complex Keys:
    They can lead to mapping issues

Examples of good identifiers:

  • BIALX (Fund ID)
  • US1234567890 (ISIN)
  • BENCH123456 (Benchmark ID)

Examples of bad identifiers:​

  • Fixed Income Composite-Fund (Fund name)
  • Bloomberg U.S. Treasury Bills (1-3M) Index
    (Benchmark name)

Reporting / As Of Dates

Each set of data should include a consistent date across all records.

This maintains uniformity across all datasets for the same reporting period. Consistency in reporting dates enables easy reference and alignment across multiple datasets.

Do:

  • Use Clear, Consistent Date Formatting:
    Ensure all date fields follow a consistent format (e.g., YYYY-MM-DD)

  • Include Reporting Dates in Every Record:
    Every data entry should have a corresponding date to indicate the reporting period.

Don't:

  • Use Inconsistent Date Formats:
    Avoid mixing date formats within the same column. It will result in errors in processing

  • Omit Dates or Use Incorrect Headers:
    Each record must have a date, and the column header should clearly indicate that it represents the date.

Example of good reporting dates:

ID
Name
Reporting Date
Value
1
Name A
2023-12-31
100
2
Name B
2023-12-31
200
3
Name C
2023-12-31
300

Example of bad reporting dates:

ID
Name
Value
1
Name A
2023-12-31
100
2
Name B
12-31-2023
200
3
Name C
31/12/2023
300

Consistent Data Structures

Ensure that data structures remain consistent over time. Do not add, remove, or rearrange columns from one month to the next without notifying Kurtosys about the change. 

When data structures remain unchanged, it simplifies the development of automated processes, prevents disruptions in data processing, and reduces the need for frequent adjustments which can become costly over time.

Do:

  • Preserve Data Types:
    Ensure that the data type for each column remains the same across different reporting periods

  • Keep Column Order Unchanged:
    Do not rearrange columns from one reporting cycle to the next to avoid confusion and errors in data handling

  • Use Formulas in Files:
    If formulas are used to calculate values, it is crucial to replace them with the corresponding values

Don't:

  • Add or Remove Columns:
    Do not introduce new columns or remove existing ones between reporting cycles without notifying Kurtosys of the change, as it can disrupt data processing routines

  • Modify Data Types:
    Do not change the data format or type within a column (e.g. from numeric to string or vice versa)

  • Leave Formulas in Files:
    Files with formulas will load the formula as data instead of the resulting value

Example of a good data structure:

ID
Name
Reporting Date
Value
1
Name A
2023-12-31
100
2
Name B
2023-12-31
200
3
Name C
2023-12-31
300
4
Name D
2023-12-31
250

Example of a bad data structure:

ID
Name
Report Date
Value
Comment
1
Name A
2023-12-31
100
New
2
Name B
2023-12-31
Two hundred
Load
3
Name C
2023-12-31
300
Delete
4
Name D
2023-12-31
=100+150
New

Consistency in File Names and Types

It’s important to maintain consistency in data structures over time. One way to ensure this is by establishing consistent naming conventions for file names. This will help in correctly identifying files for processing. 

Do:

  • Maintain File Naming Convention:
    Consistently adhere to the established file naming convention to guarantee successful validation and loading of the file

  • Maintain File Type:
    Consistently adhere to the established file type to guarantee successful validation and loading of the file

Don't:

  • Change File Naming Convention:
    Ensure the file name remains unchanged between reporting cycles to prevent validation errors

  • Change File Type:
    Ensure the file type remains unchanged between reporting cycles to prevent validation errors

Example of good file names and types over time:

Date received
File name
File type
1 Apr ’23
monthly_returns_20240331.csv
CSV
1 May ’23
monthly_returns_20240430.csv
CSV
1 Jun ’23
monthly_returns_20240531.csv
CSV

Example of bad file names and types over time:

Date received
File name
File type
1 Apr ’23
monthly_returns_20240331.xlsx
XLSX
1 May ’23
monthly_20240430.csv
CSV
1 Jun ’23
monthly_returns_.xls
XLS

Data Validation Checks

It is important to perform detailed checks on the information provided to ensure that it is accurate, complete, and trustworthy. This will help to spot and correct any mistakes, discrepancies or irregularities in the data. By validating the data against predetermined standards and guidelines, we can assure its dependability and appropriateness for examination and handling.

Validation checks can include:

Common Mistakes Made

It is recommended to avoid manual data processing, which increases the risk of errors and inconsistencies. Instead, automated processes should be used for successful data ingestion and loading.

Multiple Tables per Page

A common mistake is creating multiple data tables within a single spreadsheet as show in the image.

A computer will rigidly interpret anything in the same row as belonging to the same observation.

Rather consolidate the tables into one. This will ensure a consistent data structure, and avoid errors when looking up data from different tables.

Using Problematic Null or Blank Values

When working with data in a file, it is crucial to distinguish between a zero and a blank cell. 

A zero represents a valid measurement, whereas a blank cell indicates that no measurement was taken. In such cases, the computer interprets the blank cell as an unknown value, also known as a null value. 

It is important to accurately record zeros as zeros and null values as nulls. Any other substitution for null values should be avoided. The table below shows common substitutions that should not be used.

Data Value
Problem
Recommendation
‘0’
Indistinguishable from a true zero if used to fill blanks in the data
Avoid unless a true zero
‘NA’, ‘na’, ‘N/A’ or similar
Causes problems with data-type validation (turns a numerical column into a text column)
Avoid – leave blank
‘NULL’
Can cause problems with data type validation
Avoid – leave blank
‘No data’
Can cause problems with data type validation
Avoid – leave blank
‘-‘,’+’,’.’
Can cause problems with data type validation
Avoid – leave blank
‘ ‘ (An empty space)
Can cause problems with data type validation
Avoid – leave blank
Using Formatting to Convey Information

A common example of using formatting to convey information is to highlight cells in a specific color that you want to be dealt with differently than others. 

Another example is to leave a blank row to indicate a separation in the data. However, it is important to note that both of these highlighting approaches can cause problems since they are undetectable to computers.

Don't do this:

Do this:

Merging Cells or Unnecessary Aesthetic Formatting

When formatting a worksheet, avoid merging cells and using borders to separate data as it can impede the computer’s ability to see associations in the data. Merging cells cause column headers to disappear making it difficult to load data. 

Borders alone will not be enough to indicate the start or end of a particular dataset. Rather place different tables on different tabs. 

Placing Units in Cells

Units should be consistent and not included in data cells. Instead of noting “$5MM” in a cell, just write “5” and specify the unit in the column header or a separate column to avoid analysis complications.

Using Problematic Field or Column Names

For easy data analysis, use descriptive yet concise column names. Avoid spaces, numbers at the start, and special characters.

Use underscores or CamelCase for multi-word names to maintain readability and data integrity.

Avoid
Alternative
Reason
MaxPercentage ( % )
max_percentage
Uses a special character ( % )
Mean_growth/year
mean_yr_growth
Uses a special character ( / )
Annual returns
annual_returns
Uses a blank character
3 Year Returns
returns_3yr
Uses a blank character and starts with a number