Comments

edited

What version of Go are you using (go version)?

1.9-beta1

What operating system and processor architecture are you using (go env)?

Windows 10 64-bit

set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Work
set GORACE=
set GOROOT=C:\Go
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config

It would be really useful to have a Length() method on sync.Map that can return the number of items in a map. Currently, I need to do something like this, which is quite tedious and not as readable:

This comment has been minimized.

This comment has been minimized.

edited

I am storing items filled by goroutines processing data into 2 separate maps. Once this is done, I need to compare the length of these 2 maps to perform a quick sanity check and decide on which branch to proceed for further processing.

This comment has been minimized.

sync.Map is optimized for long-lived, mostly-write workloads, for which a Len method would either be misleading (under-counting keys) or inefficient (introducing cache contention on the length counter).

The Range workaround at least has the benefit of appearing to be as expensive as it actually is.

I'm not opposed to the idea of adding a Len method, but I think we would need to show that the use-case is common enough to merit an API with such subtleties. At the very least, I don't think we should add it in 1.9.

This comment has been minimized.

Delete calls (and Store calls with previously-deleted keys) on disjoint keys in the read-only part of the map do not contend in the current implementation. An atomic counter would reintroduce contention for those calls.

We didn't omit Len just to be stubborn. It really is a subtle problem.

This comment has been minimized.

I believe that having a .Count() or .Len() or even just adding a count integer would be a good addition for syncmap.Map. Personally I'm using sync.Map to track connections in an asynchronous p2p API that is heavily built around propagating network writes. The count is necessary for artificial connection limits and I had to switch from maps, because of raise conditions in goroutines.

Hope this shows some use-cases for you. I can think of various situations where this simple functionality would reduce code by many lines.

This comment has been minimized.

The count is necessary for artificial connection limits and I had to switch from maps, because of raise conditions in goroutines.

I think that does not justify adding a method to sync.Map, because it's easy to make your own derived type which just additionaly keeps the count/len information if you think such information is useful.

This comment has been minimized.

edited

If dirty has length oft N.
But 1sr : nobody is interested in dirty - if this blocks You, leave it out - O(1)
2nd : we can asume, that dirty is somehow const and small against the total amount - O(1) + const = O(1)
I know that in worst case (map is empty and dort is full) we could habe O(N) - this can be legt out or ignorred, because in real life it dies not matter. In real live the map is bigger then dirty and the amount of dirty is constant.. So, if it hurts You, comment the dirty loop out or leave it, because it does not matter in real live. And there it is O(1)

This comment has been minimized.

That assumes that the internal representation of sync.Map always includes a single dirty map. Nothing in the API or documentation guarantees that to be the case, and in fact some optimizations (such as #21035) may require the opposite.

This comment has been minimized.

My usecase for an O(1) Len() func is to pre-allocate a slice of data (an optimization) to hold a filtered subset of the the map values, populated by a call to Range(). FYI, my primary use for sync.Map is concurrent O(1) lookups.

So for my usecase, Len() doesn't need to be consistent since the worse case is an under-allocated slice (resulting in inconsistent performance) or an over-allocated slice (resulting in a bit more memory consumed).

This comment has been minimized.

edited

I just want to clarify on something here. What would you say the scope of requirements adding such functionally be?

Like, to make a decision to add this utility function a part of the API, are you looking for a large quantity of developer need/want for the feature?

Or are we just aiming to decide how it should work?

I'm the former situation I believe that a count feature would just be expected by developers considering that the map type can be used in len()

In the latter situation if it's something that we all want added, but comes down to deciding how it should operate, I think that leads to two potential situations:

1 - a somewhat ballpark count of the map would be necessary for an application. This could be non blocking and the result just needs to be close to the current state.

2 - a blocking function that needs to be timed for an absolute accurate count at the given moment.

A utility function could be made for both scenarios.

I think it is best to make at least the boilerplate functionality for developers to help evade bad practices due to niavity when using the API. Best practices with concurrency is not obvious to new developers and devs trying to adopt go. I do foresee newbies at least reading a godoc and being able to choose based on need.

This comment has been minimized.

are you looking for a large quantity of developer need/want for the feature?

Yes, and use-cases that it would (or would not) improve.

Or are we just aiming to decide how it should work?

Not at this time.

I think it is best […] to help evade bad practices due to [naivety] when using the API. Best practices with concurrency is not obvious to new developers and devs trying to adopt go.

I agree that it is important to structure the API to encourage best practices. One best practice for using a concurrent data structure is to avoid depending on global properties of it (such as size), because computing those global properties can incur a surprising cost. That is perhaps the strongest reason why sync.Map does not have a Len method today.

This comment was marked as off-topic.

edited

So whats the status of this issue? After this argument? Personally I just saw a million of developers are trying to persuade an arrogant golang maintainer that their requirement needs to be noticed.
Feel bad for this issue, typically on top of Golang.

This comment has been minimized.

edited

To be fair, I don't even think Sync.map has a large amount of projects that even need it at this time. Though I have 4 separate projects in which I am clumsily iterating my maps to get counts. I would still like to see something more standardized, planned out, and discussed; eventually implemented as a feature. That way I don't have to second guess if my code is bad later.

@bcmills in regards to you comment from Oct 9, 2017, I can definitely see what you're getting at. In this particular case, considering what sync.Map, I think not globalizing a way to track length is a bit of an oversight.

Hypothetically the type of applications that will use (or can benefit from) sync.Map are likely to be any concurrent tasks that would want to have managed access to a map. In server-side implementations where the server is delegating information between it's clients, it is quite common to want to have a count of things. If someone wanted to use this to concurrently push data sources into memory, a resulting count of things may be application after.

Something I've been considering for a while is to copy sync.Map, adding an integer variable, a function that outputs the value of the new integer, and then do addition/subtraction on it everytime Store() or Delete() is completed successfully.

This is starting to look more appealing than iterating the entire map to confirm the count. I'm in a situation where the amount of time it takes to count the entire map is slowing things down noticeably and I think that just giving the small compute time it takes to just do this plan will alleviate my problem.

This comment has been minimized.

edited

Hello, I am currently developing an application which matches up pairs of individual records in a stream of input data. I have hash-partitioned the data stream such that each of my go routines will use disjoint sets of keys in my sync.Map. However, sometimes a record won't have a corresponding match in the data stream. Over time, these records accumulate in the map. I'd like to routinely get an estimate of the map's size to trigger an eviction policy carried out by Range. This keeps the RAM usage of the program fairly constant without degrading performance.

Currently I am looking at implementing a solution similar to @protosam and maintaining a count myself using a derived type from sync.Map

I'm not sure what I'm asking for should be called Len, but it would certainly be useful.

This comment has been minimized.

edited

I think the Len (or Count) is usefull when a maximum number of values need to be enforced. I don't know if has a workaround. I use the sync.Map to store information about nodes/peers in the network, however one single client maybe don't like to store everything and only keep N values. Using the map is possible to do a simple if len(...) > and then ignore the insert, it can't be done directly with sync.Map.

This comment has been minimized.

We could, in theory, keep track of the count of items in the map using some sort of sharded counter (as proposed in #18802), but that would still be pure overhead for applications that don't need it.

That suggests that @protosam's idea to wrap the sync.Map with a separate counter may be the right direction: you can always implement the same method set in the wrapper, and then you would only pay the overhead of tracking the length for call sites that actually use that wrapper.

This comment has been minimized.

Hello, I am currently developing an application which matches up pairs of individual records in a stream of input data. I have hash-partitioned the data stream such that each of my go routines will use disjoint sets of keys in my sync.Map. However, sometimes a record won't have a corresponding match in the data stream. Over time, these records accumulate in the map. I'd like to routinely get an estimate of the map's size to trigger an eviction policy carried out by Range. This keeps the RAM usage of the program fairly constant without degrading performance.

Currently I am looking at implementing a solution similar to @protosam and maintaining a count myself using a derived type from sync.Map

I'm not sure what I'm asking for should be called Len, but it would certainly be useful.

This comment has been minimized.

We could, in theory, keep track of the count of items in the map using some sort of sharded counter (as proposed in #18802), but that would still be pure overhead for applications that don't need it.

That suggests that @protosam's idea to wrap the sync.Map with a separate counter may be the right direction: you can always implement the same method set in the wrapper, and then you would only pay the overhead of tracking the length for call sites that actually use that wrapper.

This comment has been minimized.

edited

Delete calls (and Store calls with previously-deleted keys) on disjoint keys in the read-only part of the map do not contend in the current implementation. An atomic counter would reintroduce contention for those calls.

We didn't omit Len just to be stubborn. It really is a subtle problem.

i still cant understand,
When Delete called , and go in the lock free path , delete a normal key ( turn point into nil) , it will return true , then I call atomic.AddInt64(&m.length, -1) which is thread safe func to reduce length.
（you can see it in https://github.com/mojinfu/cmap/blob/master/cmap.go line:353, and store logic line 153）
it would not reintroduce contention .

Which step of my understanding is incorrect?

This comment has been minimized.

@mojinfu, atomic operations are not contention-free: they just move the locking from the application layer to the CPU.

(If you have N CPU cores working with the same data concurrently, the atomic.AddInt64 operations are executed sequentially, so each operation takes O(N) time. Each core first acquires exclusive access to the cache line, then copies the value from the core that previously owned that line, and finally updates the cached value. On an Intel CPU, that process takes about 40ns per atomic op.)

This comment has been minimized.

edited

@mojinfu, atomic operations are not contention-free: they just move the locking from the application layer to the CPU.

(If you have N CPU cores working with the same data concurrently, the atomic.AddInt64 operations are executed sequentially, so each operation takes O(N) time. Each core first acquires exclusive access to the cache line, then copies the value from the core that previously owned that line, and finally updates the cached value. On an Intel CPU, that process takes about 40ns per atomic op.)

This comment has been minimized.

@protosam thank you for using my package.
After I implement this method ， I found that it's reasonable to remove Len method from package sync/map.
you can see it in the discussion between @bcmills and me.
"An atomic counter will slow down "create" and "delete" method 15ns when method be called "

As you said ：“what are the trade-offs we want between functionality or performance”
right , so also it can be understood like that " In different scenarios, the weights of trade-offs are different"
in my test , when program runs on a "one CPU" computer , it works More efficient than sync/map.
Especially you got a big map (len bigger than 10000 ), Every call saves 0.17 ms.

This comment has been minimized.

This comment has been minimized.

edited

I just stumbled on this issue while looking to see if there were plans for a method that reports whether it's deleted the key, so I figure I'll add my $0.02.

A current use case I finished writing ~5 minutes ago is extracting data from the Map via Range. I want to pre-size the buffer where I'm storing data because data extraction is a common operation. (I won't run into the corner case where, after pre-sizing the data, a competing goroutine clears the Map.)

Since whenever I need to use sync.Map I create a wrapper type with type-safe methods, I easily added a guess int64 field. This allowed me to have a "best guess" at how much memory I needed to pre-allocate.

I was replacing a normal map (which was using len(m)) with sync.Map, and the lack of a Len method caused me to think and realize that there's no way to 100% replicate the behavior of a Mutex-locked map.

Anyway, I think a Len method could be the source of subtle bugs and perhaps hamper future optimizations.

This comment has been minimized.

edited

My use case. I want to print a sample from the map - a single arbitrary object. I want to print a nice "No data in the map" message if the map is empty. Without the map.Size() API I need two more lines of code - declare a variable and set the variable in the call to Range().

To be completely frank I keep a counter of objects in all my maps for debug. However I think that map without a Size() API is an unusual animal.