If both types of data are in one datafile, we'd be probably be duplicating any single-node-centric data points for every single edge row. I understand we might need to ultimately need to create such a single file, but I feel like two files will help keep it manageable as we identify and calculate feature data in the short term.

If both types of data are in one datafile, we'd be probably be duplicating any single-node-centric data points for every single edge row. I understand we might need to ultimately need to create such a single file, but I feel like two files will help keep it manageable as we identify and calculate feature data in the short term.

−

UPDATE: Network Size is definitely going to be an issue. Traversing the network to calculate shortest_distance crashed and burned with memory shortfalls. Initially I thought it was manageable until I allowed my code to use the entire adjacency list. Even without outputting data for edges with no path between them, I think the data dump storage itself would also be a problem. I am going to try rerunning the shortest_distance search with a max depth limit of <strike>six</strike> <strike>three<strike> two to see if that makes it more manageable.<br/>

+

UPDATE: Network Size is definitely going to be an issue. Traversing the network to calculate shortest_distance crashed and burned with memory shortfalls. Initially I thought it was manageable until I allowed my code to use the entire adjacency list. Even without outputting data for edges with no path between them, I think the data dump storage itself would also be a problem. I am going to try rerunning the shortest_distance search with a max depth limit of <strike>six</strike> <strike>three</strike> two to see if that makes it more manageable.<br/>

UPDATE 2: The BFS search approach I was taking to path finding was too slow since I kept retracing the data, so I just tried scripting a dump of all "friend-of-friends" edges. After 5.5 hours computation time I have "friend-of-friend" edge dump but it's a 6.5GB file. That said, I accidentally included duplicate edges, so hopefully after deduping it will be um smaller.

UPDATE 2: The BFS search approach I was taking to path finding was too slow since I kept retracing the data, so I just tried scripting a dump of all "friend-of-friends" edges. After 5.5 hours computation time I have "friend-of-friend" edge dump but it's a 6.5GB file. That said, I accidentally included duplicate edges, so hopefully after deduping it will be um smaller.

If both types of data are in one datafile, we'd be probably be duplicating any single-node-centric data points for every single edge row. I understand we might need to ultimately need to create such a single file, but I feel like two files will help keep it manageable as we identify and calculate feature data in the short term.

UPDATE: Network Size is definitely going to be an issue. Traversing the network to calculate shortest_distance crashed and burned with memory shortfalls. Initially I thought it was manageable until I allowed my code to use the entire adjacency list. Even without outputting data for edges with no path between them, I think the data dump storage itself would also be a problem. I am going to try rerunning the shortest_distance search with a max depth limit of sixthree two to see if that makes it more manageable.
UPDATE 2: The BFS search approach I was taking to path finding was too slow since I kept retracing the data, so I just tried scripting a dump of all "friend-of-friends" edges. After 5.5 hours computation time I have "friend-of-friend" edge dump but it's a 6.5GB file. That said, I accidentally included duplicate edges, so hopefully after deduping it will be um smaller.