@enumaris: There are definitely a few on the site (not me, yet - I've booked a course for RNNs, so maybe in a few weeks . . . .) You won't find many experts logged in to chat at any time on Data Science currently, the site doesn't have the critical mass for it. So if you are looking for help you may have to go to the trouble of creating a question.

Someone here, even if they don't know about LSTMs in depth, might be able to discuss how to put together a question about them

I see, I didn't really want to make a question because 1) I have multiple questions and they are mostly kind of boring technical questions about the implementation of LSTMs in Tensorflow and Pytorch and 2) I'm not sure how to best formulate the questions

Actually my question would apply to any RNN in these frameworks as they are implemented I think, as long as the frameworks implement the different RNNs in a consistent fashion.

OK, so throwing out a single question with lots of separate technical details probably won't go down well. If they can be split up and separate, that sort of thing is an OK questiom in my book: e.g. How to implement backpropagation through time in PyTorch? . . . With at least a start at your code, showing where you get stuck because the concept or how to use the library is not clear . . .

. . . but I would agree that "How to . . ." questions are a lot of work. If you are not sure and a bit vague about how things work, best to keep it short. But then the question may get down-voted for not having enough detail to isolate what the real issue it

Where are you starting from? have you looked at the TF and PyTorch examples from their library github pages?

I think I would be able to build a RNN or LSTM using these frameworks, but some of the details of their implementation confuses me

My questions mostly revolve around the fact that RNNs and LSTMs as implemented in these frameworks take in data in batches, and I don't see a clear way that the internal state or cell memory (as for LSTMs) can propagate from the end of one batch to the beginning of the next batch, assuming I'm breaking down a very long sequence into batches.

Or maybe the internal state simply does not propagate and the frameworks are relying on the fact that your sequence length is long enough to compensate for the dis-continuity between batches.

but it's hard for me to figure out how to formulate this question well lol

I guess it would be good to know if there's danger in using their CrossEntropyLoss class for problems with a large (>100) number of classes.

With regards to the "interrupting the GPU" comment, I feel like me writing a for-loop to deal with the ragged data at the character level is doing exactly that isn't it...interrupting the GPU by forcing it to explicitly loop over sequences of varying lengths?

Reading PyTorch code, I interpret ignore_index as being a target value to ignore, so can have ignored ground truth data, for instance if you had a class hierarchy and didn't care about mis-classifications in sub-parts of the hierarchy unrelated to the ground truth. So you might set your ground truth to [1, 0, 1, 0, 0, -100, -100, -100] if first values were class 1, 1a, 1b, 1c, 2, 2a, 2b, 2c, and your target class is 1b

. . . although with my level of knowledge of Torch, I could just be making that all up . . .

Actually scratch that, I think it is similar concept, but my example is wrong

They have a tf.contrib with like experimental code, and then they move the well tested versions over to to tf.nn while keeping a version in tf.contrib...so any one function could have like 3 different versions that all do something perhaps very slightly differently...