Skip to main content

LocalFileSystem

The LocalFileSystem persistence allows user to persist the response dataframe to a local file system in the desired format at the desired path.

User can persist dataframe to local file system in below formats:

  • CSV
  • JSON
  • XML
  • Parquet

The LocalFileSystem persistence needs below arguments from the user:

Input ArgumentsMandatoryDefault ValueDescription
file-formatYes-The format of the files to be persisted for the response dataframe represented by FileFormat class.
Supported values are CSVFileFormat, JSONFileFormat, XMLFileFormat and ParquetFileFormat.
file-pathYes-The path of the directory where output files will be persisted
save-modeNoErrorIfExistsThis is used to specify the expected behavior of saving a DataFrame to a data source.
Expected values are (append, overwrite, errorifexists, ignore)

User can configure the LocalFileSystem persistence in the below manner:

persistence = {
type = "LocalFileSystem"
file-format = {
type = "CSVFileFormat"
header = false
}
file-path = "./rest-output/"
}

Apart from just mentioning the type of file format, user can also configure few attributes related to each file format. The details of each attribute for all supported file formats are below:

CSVFileFormat

User can provide below options to the CSVFileFormat instance:

Parameter NameDefault ValueDescription
char-to-escape-quote-escaping\ Sets a single character used for escaping the escape for the quote character.
compressionnoneCompression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy, and deflate).
date-formatyyyy-MM-ddSets the string that indicates a date format.
empty-value"" (empty string)Sets the string representation of an empty value.
encodingUTF-8Specifies encoding (charset) of saved CSV files.
escape\ Sets a single character used for escaping quotes inside an already quoted value.
escape-quotestrueA flag indicating whether values containing quotes should always be enclosed in quotes. Default is to escape all values containing a quote character.
headertrueBoolean flag to tell whether csv text contains header names or not.
ignore-leading-white-spacefalseA flag indicating whether or not leading whitespaces from values being written should be skipped.
ignore-trailing-white-spacefalseA flag indicating whether or not trailing whitespaces from values being written should be skipped.
line-sep\nDefines the line separator that should be used for writing. Maximum length is 1 character.
null-valuenullSets the string representation of a null value.
quote"Sets a single character used for escaping quoted values where the separator can be part of the value.
For writing, if an empty string is set, it uses u0000 (null character).
quote-allfalseA flag indicating whether all values should always be enclosed in quotes. Default is to only escape values containing a quote character.
sep,Delimiter by which fields in a row are separated in a csv text.
timestamp-formatyyyy-MM-dd HH:mm:ssSets the string that indicates a timestamp format.
timestamp-ntz-formatyyyy-MM-dd'T'HH:mm:ss[.SSS]Sets the string that indicates a timestamp without timezone format.

JSONFileFormat

User can provide below options to the JSONFileFormat instance:

Parameter NameDefault ValueDescription
compressionnoneCompression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy, and deflate).
date-formatyyyy-MM-ddSets the string that indicates a date format.
encodingUTF-8Specifies encoding (charset) of saved CSV files.
ignore-null-fieldsfalseWhether to ignore null fields when generating JSON objects.
line-sep\nDefines the line separator that should be used for writing. Maximum length is 1 character.
timestamp-formatyyyy-MM-dd HH:mm:ssSets the string that indicates a timestamp format.
timestamp-ntz-formatyyyy-MM-dd'T'HH:mm:ss[.SSS]Sets the string that indicates a timestamp without timezone format.
timezoneUTCSets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values.

XMLFileFormat

User can provide below options to the XMLFileFormat instance:

Parameter NameDefault ValueDescription
array-element-nameitemName of XML element that encloses each element of an array-valued column when writing.
attribute-prefix_The prefix for attributes so that we can differentiate attributes and elements. This will be the prefix for field names.
compressionNoneCompression codec to use when saving to file.
Should be the fully qualified name of a class implementing org.apache.hadoop.io.compress.CompressionCodec or one of case-insensitive shorten names (bzip2, gzip, lz4, and snappy).
Defaults to no compression when a codec is not specified.
date-formatyyyy-MM-ddControls the format used to write DateType format columns.
declarationversion="1.0" encoding="UTF-8" standalone="yes"Content of XML declaration to write at the start of every output XML file, before the rootTag. Set to an empty string to suppress.
null-valuenullThe value to write null value. Default is the string "null". When this is null, it does not write attributes and elements for fields.
root-tagrowsThe root tag of your XML files to treat as the root. It can include basic attributes by specifying a value like "books foo="bar"".
row-tagrowThe row tag of your XML files to treat as a row.
timestamp-formatyyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]Controls the format used to write TimestampType format columns.
value-tag_VALUEThe tag used for the value when there are attributes in the element having no child.

ParquetFileFormat

User can provide below options to the ParquetFileFormat instance:

Parameter NameDefault ValueDescription
datetime-rebase-modeEXCEPTIONThe datetimeRebaseMode option allows specifying the rebasing mode for the values of the DATE, TIMESTAMP_MILLIS, TIMESTAMP_MICROS logical types from the Julian to Proleptic Gregorian calendar.
Currently supported modes are:
EXCEPTION: fails in reads of ancient dates/timestamps that are ambiguous between the two calendars.
CORRECTED: loads dates/timestamps without rebasing.
LEGACY: performs rebasing of ancient dates/timestamps from the Julian to Proleptic Gregorian calendar.
int96-rebase-modeEXCEPTIONThe int96RebaseMode option allows specifying the rebasing mode for INT96 timestamps from the Julian to Proleptic Gregorian calendar. Currently supported modes are:
EXCEPTION: fails in reads of ancient INT96 timestamps that are ambiguous between the two calendars.
CORRECTED: loads INT96 timestamps without rebasing.
LEGACY: performs rebasing of ancient timestamps from the Julian to Proleptic Gregorian calendar.
merge-schemafalseSets whether we should merge schemas collected from all Parquet part-files.
compressionsnappyCompression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, uncompressed, snappy, gzip, lzo, brotli, lz4, and zstd).