I have one machine, which produces parts. In machine_failure_rate% it produces faulty parts which need to be produced again. Thus, we end up with a simple queuing problem. Can the following code be futher functionalized? I have the feeling, I can get rid of time_parts, but all I have in mind deteriorates the code as I need further lookups in the production_df data frame to look for "what was produced / what needs to be produced now?". The following script is running:

Backround: I think about a generalisation (more machines, more input-sources, more complex rules how/when/... parts a produces). I fear that I will end up with tons of unreadable and unmaintainable code-lines if I proceed like this.

Edit: I think t <- t + min(time_parts$t[1], dt) is a bug and the correct version is t <- min(time_parts$t[1], t + dt). It only worked because the time difference dt was always the minimum. In the last case you could speed up using t <- max(time_parts$t[1], t + dt) as there is nothing to do in the time inbetween.

\$\begingroup\$Could you explain what "po_start" represents - is this the time at which production starts for a given part?\$\endgroup\$
– Russ HydeFeb 5 at 9:22

\$\begingroup\$Yes, it means, when the production really starts - due to a queue, this can be delayed in contrast to the original t_event. I also made a tiny edit to the post...\$\endgroup\$
– ChristophFeb 5 at 9:48

\$\begingroup\$I've been working on this, but can't post my code from work. I can restructure the code and get the same result but I was wondering whether there might be an error. Within your code t <- t + min(time_parts$t[1], dt). The contents of time_parts are the earliest-time a given part can be made and the current time must increase at each iteration, shouldn't you update current time to max(t + dt, time_parts$t[1])?\$\endgroup\$
– Russ HydeFeb 6 at 10:02

\$\begingroup\$@RussHyde No, I don't think so: if I increase t, I should take the earliest point in time where something can change. This is save. It might be, that sometimes a max would speed up... I am curious about your results :-)\$\endgroup\$
– ChristophFeb 6 at 15:39

1 Answer
1

This was a pretty difficult challenge - principally because R doesn't have a built-in priority queue data-structure, but also because the priority-queue-like data-frame (time_parts) was wrapped around the results-storing data-frame (production_df) and the main while loop contains code at a few different levels of abstraction.

Idiomatic R

I did some simple stuff first: pulled all your functions to the start of the script, reformatted some code/comments.

I added a create_part function that similarly returns a Part object. There was a lot of repeats of 1 / machine$production_rate in your code; I replaced these with a call to part$production_duration. Also I thought your test to see whether a produced part was a failure should be associated with the produced part object (part$is_failure); with this, the while-loop logic becomes more explicit:

Restructuring the while loop

I wanted to push that while-loop into a function - the less work you do in the global environment, the better.

Since you want to extract data from production_df for your report, the function should return the production_df. During the while-loop, you access production_df, time_parts, t, dt (which I renamed dt_recovery based on your comments), n and machine.
So we might want to pass all of those into that function. But we can compute some of those from the others:

n is the nrow of production_df,

t isn't needed outside of the while loop and

the data that initialises time_parts also initialises production_df.

The only thing we need to initialise both time_parts and production_df is the arrival-times or times at which the parts were ordered (which I renamed t_ordered).

So, we can put that while-loop into a function that takes arguments t_ordered, dt_recovery, machine.

A functional priority queue

The really big step:

R doesn't have a native priority-queue, and it would be pretty hard to encode using the S3 or S4 classes since you can't update by reference in those classes. There is a priority-queue defined in the package liqueueR, but I've no experience of that. So I just wrote a simpler version of the priority queue (as an S3 class): this allows you to

peek: extract the element in the queue with the lowest priority value (without mutating the queue)

delete_min: remove that element with the lowest priority value from the queue and return the resulting queue

add: add a new element to the queue according to it's priority, returning the resulting queue

and provides a couple of helper methods (is_empty, nrow)

However, this doesn't provide a pop_element(queue): typically, pop_element removes the leading element from the queue and returns that element. That is, it returns the leading element and updates the queue through a side-effect. This side-effect is problematic in R, so I didn't include a pop_element function. To achieve pop_element you have to peek and then delete_min.

(This is actually quite hard to explain). Well, the pop method on a priority-queue returns an element from the queue and moves the queue on by one step. (In R) Updating the queue might look like new_queue <- old_queue[-1] and obtaining the returned element might look like returned_element <- old_queue[1]. So a pop function might look like

pop <- function(q) {
# extract the head
el <- q[1]
# In a reference-based language you could update the queue
# using a side-effect like `q.drop()`
# But in R, this creates a new queue: and if it isn't returned
# explicitly, it is thrown away at the end of the `pop` function
new_q <- q[-1]
# return the element that's at the head of the original queue
el
}
# calling_env
my_q <- create_queue(...)
my_head <- pop(my_q)

But the queue has not been altered by that pop. Now we could rewrite that function to do something dangerous like q <<- q[-1] and that would update the q in the calling environment. I consider this dangerous because q might not exist in the calling environment and that introduces side-effects, which are much harder to reason about.

\$\begingroup\$@Christoph there was a couple of things I didn't feel comfortable restructuring. I couldn't work out why update_machine works the way it does: it seems to look into the future before it decides what to do now. It makes more sense to me for is_occupied to be set to FALSE at the end of each while-loop iteration.\$\endgroup\$
– Russ HydeFeb 12 at 13:08

\$\begingroup\$@Christoph could you give some indication of whether you'd prefer to generalise the production-time (eg, replace the fixed production time with a Poisson-sampled time) or the number of machines\$\endgroup\$
– Russ HydeFeb 12 at 13:10

\$\begingroup\$Updated. But it's pretty difficult to explain. I'm not particularly interested in stochastic simulation at present. I did have a look at your simmer documentation. Can you confirm that when you mclapply() over different simulation chains, different seeds are used for each chain?\$\endgroup\$
– Russ HydeFeb 18 at 9:53

\$\begingroup\$To your function pop <- function(q): why don't you just return(list(el=el, new_q=new_q)? Then you could work within one line r <- pop(q); el <- r$el; q <- r$new_q; rm(r);? But I still understand your comment that calling by reference would be smart...\$\endgroup\$
– ChristophAug 7 at 14:35