E.g (thought experiment) If we are 200m away from a tree (let's pretend it's perfectly stationary) and we take an exposure for a week, shouldn't we collect enough photons to provide super high resolution detail of the tree?

In addition to Michael's excellent answer on diffraction, there are a huge number of other effects which are going to limit your resolution; here's just a brainstormed list:

What are you using to record the image? The practical answers here are either film, which has a finite grain size and thus a finite resolution, or a digital sensor, which has a finite number of pixels and thus a finite resolution.

What's between you and this tree? If there's air there, you're going to get all sorts of changing refraction effects (think heat haze) as the air and the ground warm and cool, which they're certainly going to do over the course of a week.

Or in other words to answer the question in the title rather than your assumption about number of photons: there are a large number of effects which limit resolution; if you want to get a really sharp photo, you need to understand enough to know which of them is the most significant in your particular situation and work to reduce the effect of that one.

In theory there is no limit if the number of collected photons can be arbitrarily large and the object is stationary. The diffraction limit and lens imperfections can be circumvented by deconvolution. The limitations due to the finite pixel size can be dealt with using superresolution methods. Here you make multiple exposures where the camera is shifted such that the picture shifts by a some fraction of the pixel length. These can then be combined into a picture that has more pixels than the camera sensor has.

For the problem of resolving stellar objects, Leon Lucy has shown here that the resolution that can be obtained using N photons in the limit that N goes to infinity behaves as N^(-1/8). Based on numerical experiments using the Richardson-Lucy deconvolution method, he obtains an estimate for the number of photons needed to obtain a resolution of x times the diffraction limited resolution of N = 1.4 * 10^6 (0.2/x)^8. The fact that the number of photons needed increases as the 8th power, means that increasing the resolution by a large factor is not possible in practice. Astronomers do need to invest in larger telescopes to be able to see more details.

It should be noted that these methods work best for resolving point like objects, or objects for which the small scale structure is known. So, in the case of a tree, to get the most out of deconvolution, you should have a model of how leaves, branches etc, look like. If the picture shows a barely visibly tree with the branches difficult to see and the leaves merged together in one big green blur, you can in theory still make the tree with the leaves visible. But this requires specifying a model of the possible shapes the leaves can have.

The Bayer array doesn't help the issue of diffraction. Blue light has the shortest wavelength and thus suffers less from diffraction than green or red. The highest resolving "conventional" sensor would be sensitive to blue light. But in addition to losing color the images would look a bit weird because blue only contributes about 10% to luminance.

If you want to go even more unconventional, you would shift from taking pictures in the visible part of the spectrum and move to UV light. CoastalOpt has a superb 60mm that can handle from UV to infrared. I'd imagine that a lens optimized just for UV would be phenomenal and IIRC, use quartz optics. Of course you'd also need a UV optimzed sensor....