Here is a list of common questions regarding datasets:
What are the supported data formats?
We support the following dates formats:
- ISO-8601 - Example: 2019-11-20
- MM/DD/YYYY - Example: 11/20/2019
What's the recommended file size or number of tickets to upload?
We recommend to upload at least 10,000 documents or 2,000 calls (longer texts) to build a reliable workflow.
Is there a maximum document size to upload?
Our Enterprise product doesn't have a size limitation, although it will be limited by the hardware that is running on, in case it is self-managed. Most of the project should create a reliable classifier with less than 250k documents.
Can I upload short texts? What’s the minimum size of each unit of text?
The technology works well with short texts (tweets, chat conversations, etc…). However the dataset needs to provide enough context, so on average we recommend comments to have at least 5 words. Other short interactions will be identified as noise.
Can I upload long texts or documents? What’s the maximum size of each unit of text?
The technology works well in long texts, however there needs to be a representative amount of texts to extract contexts.
How long does it take to generate a classifier to review?
It depends on the size of the file and the vocabulary. Usually it takes between 5-20 minutes for 5-20k comments. For a 200k texts file it takes about an hour.
Can I upload voice data?
Voice data cannot be uploaded into the platform. You first need to do a speech2text process, Microsoft, Google, IBM and Amazon all have speech2text functionalities that can be easily accessed. We also work with partners in case you want a more tailored approach.