Strings vs binary for storing variables inside the file format

Question:

We aim at using HDF5 for our data format. HDF5 has been selected because it is a hierarchical filesystem-like cross-platform data format and it supports large amounts of data.

The file will contain arrays and some parameters. The question is about how to store the parameters (which are not made up by large amounts of data), considering also file versioning issues and the efforts to build the library. Parameters inside the HDF5 could be stored as either (A) human-readable attribute/value pairs or (B) binary data in the form of HDF5 compound data types.

Just as an example, let's consider as a parameter a polygon with three vertex. Under case A we could have for instance a variable named Polygon with the string representation of the series of vertices, e.g. for instance (1, 2); (3, 4); (4, 1). Under case B, we could have instead a variable named Polygon made up by a [2 x 3] matrix.

We have some idea, but it would be great to have inputs from people who have already worked with something similar. More precisely, could you please list pro/cons of A and B and also say under what circumstances which would be preferable?

Answer:

Speaking as someone who's had to do exactly what you're talking about a number of time, rr got it basically right, but I would change the emphasis a little.

For file versioning, text is basically the winner.

Since you're using an hdf5 library, I assume both serializing and parsing are equivalent human-effort.

text files are more portable. You can transfer the files across generations of hardware with the minimal risk.

text files are easier for humans to work with. If you want to extract a subset of the data and manipulate it, you can do that with many programs on many computers. If you are working with binary data, you will need a program that allows you to do so. Depending on how you see people working with your data, this can make a huge difference to the accessibility of the data and maintenance costs. You'll be able to sed, grep, and even edit the data in excel.

input and output of binary data (for large data sets) will be vastly faster than text.

working with those binary files in a new environmnet (e.g. a 128 bit little endian computer in some sci-fi future) will require some engineering.

similarly, if you write applications in other languages, you'll need to handle the encoding identically between applications. This will either mean engineering effort, or having the same libraries available on all platforms. Plain text this is easier...

If you want others to write applications that work with your data, plain text is simpler. If you're providing binary files, you'll have to provide a file specification which they can follow. With plain text, anyone can just look at the file and figure out how to parse it.

you can archive the text files with compression, so space concerns are primarily an issue for the data you are actively working with.

debugging binary data storage is significantly more work than debugging plain-text storage.

So in the end it depends a little on your use case. Is it meaningful to look at the data in the myriad tools that handle plain-text? Is it only meaningful to look at it with big-data hdf5 viewers? Will writing plain text be onerous to you in terms of time and space?

In general, when I'm faced with this issue, I basically always do the same thing: I store the data in plain text until I realize the speed problems are more irritating than working with binary would be, and then I switch. If you don't know in advance if you're crossing that threshold start with plain-text, and write your interface to your persistence layer in such a way that it will be easy to switch later. This is tiny bit of additional work, which you will probably get back thanks to plain text being easier to debug.

c++,opencv
I am trying to draw with mouse move in an opencv window. But when I draw, nothing draws on the window. When I try to close the window from the cross in the topleft(ubuntu), it opens a new window which it should be as I haven't pressed escape, and in...

c++,c++11
Hello fellow programmers! I was going to write a small program for calculating total pay for different periods of time depending on the amount of hours and the salary that the user enters. I managed to make a small bit of the program but when I try to run it...

c++
As a continuation of a: Thread, I came across a problem with writing a method of a class which returns: std::vector<std::unique_ptr<Object>> I get compiler errors when such a return type is written. There is some problem with delete operand or something ... Generally, I've wanted to write a method which...

c++,string
Here a basic code I'm trying to run But I'm having trouble with stoi (it's c++) I keep getting error: ‘stoi’ was not declared in this scope I tried atoi and strtol with this error .cpp:23: error: cannot convert ‘std::string’ to ‘const char*’ for argument ‘1’ to ‘int atoi(const char*)’...

c++
I am trying to print out the shape of a triangle but I am kinda lost... this is what I have so far: #include <iostream> using namespace std; int main() { int i, k, n; cout << "Please enter number of rows you want to see: \n"; cin >> n;...

c++,arrays,string
I was trying to achieve translating a character array into a integer string and corresponding character to their alphabetical order. For instance: A(a) = 0 , Z(z) = 25. string key_char = argv[1]; string key_num; for (int i = 0; i < key_char.length(); i++){ if (isalpha(key_char[i])){ if (islower(key_char[i])){ key_num[i] =...

c++,visual-c++,stl
I am using default features(push, pop, top, empty, size) of stack container of STL. If I want to add more features like access an element from middle of stack. How could I do this? Thanks...

c++,inline,private,member,protected
This is another question about inlining a function. But I will take possible comments and answers right away: Defining a function inside a class makes it inline automatically. The same behaviour can be achieved by marking a function with inline outside of the class. An inline function doesn't have to...

c++
I asked a question: Detecting if an object is still active or it has been destroyed Considering that I cannot use libraries, there are no good out of the box solutions in C++. So, is it a bad practice to check if the object has been destroyed by analyzing memory...

c++,user-input
I need to write a program that checks if the user-provided first and last names are correctly typed. The program needs to validate that only the first letter of each name part is uppercase. I managed to write code that checks the first character of the input. So I have...

c++,boost,boost-filesystem
I have used boost::filesystem::directory_iterator in order to get a list of all the available files into a given folder. The problem is that I supposed this method would give me the files in alphabetical order, while the results seem pretty random. Is there any fancy way of alphabetically sorting them?...

c++,build,makefile
I wrote some c++ files and after compiling with out make file it works fine . But when using make file it pop out some errors . My codes are : include directory files : application.h #ifndef APPLICATION_H #define APPLICATION_H #include "employee.h" #include "employee_data.h" #include "employee.h" ...some defintions here... #endif...

c++,undefined-behavior
I was trying to save the binary equivalent of a 32 bit number in an array A. For testing my showbits() function , I choosed 8,9 when I came across this thing: I am facing an unreasonable thing in my code when I am placing memset in the function showbits(),I...

c++,templates,template-specialization
I am just learning about Template Template class specialisation. Not a big problem to explain in detail. From my understanding std::uniform_int_distribution is a template whereas std::uniform_int_distribution<Type> is the full specialisation of uniform_int_distribution giving a type. I pass this in the specialisation class template as follows below Main class template <template...

c++,templates,constructor,explicit-instantiation
I'm working on a project in C++ and am having trouble understanding what members of a template class get explicitly instantiated when I explicitly instantiate the template class. I've written the following file, which I then compile using Visual C++ 2008 Express Edition's Release configuration and then pop into a...

c++,iterator,qlist
I get a segfault while iterating over a QList. I don't understand what I am doing wrong. I have a QList of Conversation. Inside a Conversation I have a QList of Msg. Below are the class description : Msg class : class Msg { public: Msg(); Msg(const Msg& other); Msg&...

c++,boost,boost-asio
I have a third-part server, and I'm writing a dll interface for it, my clients use my dll to communicate with the server. The protocol uses a long tcp connection, all traffic goes from this tcp connection. There could be sending/receiving multiple packets at the same time, like a send_msg...

c++
I define a template class in which, I define two type-cast operator template <class base_t> struct subclass { base_t base; //any function which defined for 'base_t' can be used with 'subclass<base_t>' operator base_t&() { return base; } //I want 'subclass<base_t>' can be converted to any class which 'base_t' can //I...

c++,arrays,string,qt,random
In my small Qt application, I want to pick a random string out of an array after I clicked on a button. I've read many threads but nothing works for me. So in my slot there's an array with several strings in it. I also implemented <string>, <time.h> and srand....

php,file,upload
Instead of the file name being what the users want to name it, I would like the file name to be the user's username. As you can see in the code below, I got the user's username from the session. Now, I would just like to apply it to $_FILES,...

c++,pointers,stl,iterator
I'm attempting to solve Project Euler's problem #3 using C++ to gain an understanding of how to use C++ iterators. According to the examples I've seen online, I can use the dereferened iterator as a parameter for cout, and it will print the elements successfully. By that same logic, I...

c++,algorithm,math,recursion
I'm trying to find all possible solutions to the 3X3 magic square. There should be exactly 8 solutions. My code gets them all but there are a lot of repeats. I'm having a hard time tracking the recursive steps to see why I'm getting all the repeats. // This program...

c++,opengl,opengl-es,integer,shader
I'm following the "OpenGL Superbible" book and I can't help but notice that when we create a shader and create the program that we attach the shaders to, we store them as GLuint which are unsigned integers. Why are they stored as numbers? What does the value of the number...

c++,file,hdf5,dataformat
We aim at using HDF5 for our data format. HDF5 has been selected because it is a hierarchical filesystem-like cross-platform data format and it supports large amounts of data. The file will contain arrays and some parameters. The question is about how to store the parameters (which are not made...

c++,polar-coordinates,cartesian-coordinates
I am getting incorrect conversions from polar to cartesian coordinates and vice versa. My code produces weird points like (1,-0). Im using this calculator to check my conversions. Also one of the conversions is completely wrong when I convert back to cartesian coordinates. Point b: (0,1) => (1,1.5708) => (0,0)...

c++,arrays,pointers
I've been having bad luck with dynamic pointers when I want to close it. why the application wrote to memory after end of heap buffer? how can I close my array? int main() { . . int **W; W = new int* [n]; for (int i=1; i <= n; i++)...

c++,string,c++11,memory,standards
I had heard that C++11 was going to require strings to be allocated in contiguous memory. I even thought I saw a stack overflow question on it, but I can't seem to find it. I know that in practice both gcc and Visual Studio do allocate strings contiguously, I'm just...

java,file,loops,path
This question already has an answer here: java.lang.IllegalStateException: Iterator already obtained 1 answer so I wrote a little Java program to test a little stack language I made vie various test file, but for some reason it won't work. Here is the code: import org.apache.commons.io.FilenameUtils; import java.io.IOException; import java.nio.file.*;...

c++,clang-format
I would like to have this: if (!enabled) { return; } turned to this: if (!enabled) { return; } (In other words, I want short if-statements on a single line but keep the {} around them) Currently I'm using the following configuration: AllowShortIfStatementsOnASingleLine: true AllowShortLoopsOnASingleLine: true AllowShortCaseLabelsOnASingleLine: true AllowShortFunctionsOnASingleLine: true...

c++,vector
I want to sort in ascending order according to the first element of the inner pair, i.e. a in this case. But its not at all sorting. I am not sure if my function func logic is correct. #include<iostream> #include<algorithm> #include<vector> using namespace std; bool func(const pair<int,pair<int,int> >&i , const...

c++,string,c++11,gcc
I know it sounds stupid, but I'm using MinGW32 on Windows7, and "to_string was not declared in this scope." It's an actual GCC Bug, and I've followed these instructions and they did not work. So, how can I convert an int to a string in C++11 without using to_string or...

c++,vector,heap-memory
I am creating a temporary vector of pointers to myObject objects. But I am wondering about what happens to the objects I created... { std::vector<myObject *> myVector; myVector.reserve(5); for (int i = 0 ; i < 5 ; ++i){ myVector[i] = new myObject(); } } I assume that at the...

c++,mfc
I just don't understand why i can use the public variables on the class but are getting a link error when trying to use the getLicenceRefused method. I wasn't sure if the problem was because of the CString copy constructor problem I have had before so took the parameter out,...

c++,pointers,reference
I am very confused as to what kind of variables I would put into my function here: names. I am doing a practice problem in a C++ book, because I am learning C++ and am on References and pointers right now, and cannot find a solution. Just for background information,...

c++,qt,clipboard
In my application I want generate random numbers or strings with a text in front of it. It is important for me that the text won't appear in my window, but instead gets copied to the clipboard. int randomnumber = rand() % 46 + 1; QClipboard *cb = QApplication::clipboard(); cb->setText("Just...

c++,command-line-arguments
I am working on a program that takes two command line arguments. Both arguments should be dates of the form yyyy-mm-dd. Since other folks will be using this program and it will be requesting from mysql, I want to make sure that the command line arguments are valid. My original...

c++,opencv
I am working on Traffic Surveillance System an OpenCv project, I need to detect moving cars and people. I am using background subtraction method to detect moving objects and thus drawing counters. I have a problem : When two car are moving on road closely them my system detects it...

c++,c++11,initializer-list
§[dcl.init.list] 8.5.4/2: The template std::initializer_list is not predefined; if the header <initializer_list> is not included prior to a use of std::initializer_list — even an implicit use in which the type is not named (7.1.6.4) — the program is ill-formed. Does that mean this program is ill-formed? #include <vector> int main()...