Details

Description

The implementation of NamedList - while being fast - is not necessarily type-safe. I have implemented an additional implementation of the same - ModernNamedList (a type-safe variation providing the same interface as NamedList) - while preserving the semantics in terms of ordering of elements and allowing null elements for key and values (keys are always Strings , while values correspond to generics ).

Karthik K
added a comment - 14/Dec/08 05:01 1) New Type-safe implementation of NamedList , ModernNamedList .
2) New interface INamedList<T> created
3) New Test case - ModernNamedList added.
4) Added more test cases to NamedListTest .
Once the patch is approved - NamedList would be deprecated and the existing codebase in Solr would be replaced to use ModernNamedList<T> to be more type-safe.

Noble Paul
added a comment - 14/Dec/08 11:31 Type safety is not an overriding concern when a NamedList is used (that is the beauty of it). It does not help in any way. Most of the usages of NamedList involves heterogeneous values .
your implementation is not as efficient (memory usage) as the original one
The idea of having an interface is an overkill
-1

The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ).

If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type. We could probably define it as NamedList , with T replaced to be an object internally. not making it a generic type.

There seems to be no memory leaks as far as the container is concerned. Creating an iterator object for every call to an iterator seems to be quite a bit of data redundancy issues when ideally we can use the iterator of one of the underlying objects as well.

Karthik K
added a comment - 14/Dec/08 13:59 The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ).
If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type. We could probably define it as NamedList , with T replaced to be an object internally. not making it a generic type.
There seems to be no memory leaks as far as the container is concerned. Creating an iterator object for every call to an iterator seems to be quite a bit of data redundancy issues when ideally we can use the iterator of one of the underlying objects as well.

The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ).

This means we will need to use this interface wherever we use NamedList which is not desirable

NamedList is designed to achieve a specific purpose

Solr is not meant for java users only. It is also meant to be consumed over xml/json etc .There is no type safety in these. NamedList helps us to have a datastructure which can easily be converted back and forth from these.

If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type

It helps where it makes sense . But where it is not necessary I can totally omit that and javac does not complain. So , it is better to keep it generic than not.

There seems to be no memory leaks as far as the container is concerned

There are no leaks. I was referring to the internal implementation . instead of one big array you keep a list of objects (means more objects, one per entry) .

Creating an iterator object for every call to an iterator

iterating over NamedList does not need to use an iterator. That is a choice left to the consumer of the API

Noble Paul
added a comment - 14/Dec/08 15:45 The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ).
This means we will need to use this interface wherever we use NamedList which is not desirable
NamedList is designed to achieve a specific purpose
Solr is not meant for java users only. It is also meant to be consumed over xml/json etc .There is no type safety in these. NamedList helps us to have a datastructure which can easily be converted back and forth from these.
If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type
It helps where it makes sense . But where it is not necessary I can totally omit that and javac does not complain. So , it is better to keep it generic than not.
There seems to be no memory leaks as far as the container is concerned
There are no leaks. I was referring to the internal implementation . instead of one big array you keep a list of objects (means more objects, one per entry) .
Creating an iterator object for every call to an iterator
iterating over NamedList does not need to use an iterator. That is a choice left to the consumer of the API

The interface is mostly used as a proof of concept that the new class - ModernNamedList implements the same methods as that of NamedList. Eventually - the interface would be gotten rid of , once we are happy with the same.

As far as NamedList , I believe if we want to have the flexibility of allowing any type in it - we might as well define it to an Object. If we do qualify it as a specific type - then we might as well implement type-safety in the class. javac does not complain today because the compiler switch to indicate type-safety errors has been turned off.

Previous implementation used to be having a pre-jdk5 List , with members being String and a type depending on the index. The revised implementation has Map.Entry<String, T> interface - which is directly intuitive to what is required ( a Map with order being preserved , allowing duplicates and nulls ). I did profile with 2 different implementations , involving Map.Entry<?> and a heterogenous list with String and a type (with insertion / deletion of 100,000 records). The current implementation in fact , failed in the performance comparison in both insertion / deletion in the middle of the List ( remove () ) , since we have to add/remove elements twice from the List (as in the current impl) , as compared to 1 insertion/deletion in the Map.Entry<> implementation. ) Given that addition/deletion in the List is worst-case linear - I believe the perceived performance degradation due to additional object , turns out to be not so bad when compared to 2-path insertion / deletion as we have today.

NamedList does seem to implement the interface Iterable<T> . I am not sure how the consumer of the API can have independent iterators (since only NamedList is supposed to be aware of the internal data structures and not the consumer). So I believe it would be upto NamedList<T> to provide an iterator to the user of the API.

Karthik K
added a comment - 14/Dec/08 16:34 The interface is mostly used as a proof of concept that the new class - ModernNamedList implements the same methods as that of NamedList. Eventually - the interface would be gotten rid of , once we are happy with the same.
As far as NamedList , I believe if we want to have the flexibility of allowing any type in it - we might as well define it to an Object. If we do qualify it as a specific type - then we might as well implement type-safety in the class. javac does not complain today because the compiler switch to indicate type-safety errors has been turned off.
Previous implementation used to be having a pre-jdk5 List , with members being String and a type depending on the index. The revised implementation has Map.Entry<String, T> interface - which is directly intuitive to what is required ( a Map with order being preserved , allowing duplicates and nulls ). I did profile with 2 different implementations , involving Map.Entry<?> and a heterogenous list with String and a type (with insertion / deletion of 100,000 records). The current implementation in fact , failed in the performance comparison in both insertion / deletion in the middle of the List ( remove () ) , since we have to add/remove elements twice from the List (as in the current impl) , as compared to 1 insertion/deletion in the Map.Entry<> implementation. ) Given that addition/deletion in the List is worst-case linear - I believe the perceived performance degradation due to additional object , turns out to be not so bad when compared to 2-path insertion / deletion as we have today.
NamedList does seem to implement the interface Iterable<T> . I am not sure how the consumer of the API can have independent iterators (since only NamedList is supposed to be aware of the internal data structures and not the consumer). So I believe it would be upto NamedList<T> to provide an iterator to the user of the API.

ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared.

INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc)

do i have those aspects correct?

with that in mind: i'm not sure i understand what "itch" changing the implementation "scratches" ... the initial issue description says it's because NamedList " is not necessarily type-safe" but it's not clear what that statement is referring to ... later comments suggest that the motivation is to improve the performance of "remove" ... which hardly seems like something worth optimizing for.

I agree that having the internals based on a "list of pairs" certainly seems like it might be more intuitive to developers looking at the internals (then the current approach is), but how is the current approach less type safe for consumers using just the NamedList API?

If the "modern" approach is more performant then the existing impl and passes all of the tests then i suppose it would make sense to switch – but i'm far more interested in how the performance compares for common cases (add/get/iterate) then for cases that hardly ever come up (remove).

My suggestion: provide two independent attachments. One patch that just replaces the internals of NamedList with the approach you suggest so people can apply the patch, test it out, and verify the API/behavior; A second attachment that provides some benchmarks against the NmaedList class – so people can read/run your benchmark with and with out the patch to see how the performance changes.

Hoss Man
added a comment - 17/Dec/08 06:12 If i'm understanding the discussion so far...
ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared.
INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc)
do i have those aspects correct?
with that in mind: i'm not sure i understand what "itch" changing the implementation "scratches" ... the initial issue description says it's because NamedList " is not necessarily type-safe" but it's not clear what that statement is referring to ... later comments suggest that the motivation is to improve the performance of "remove" ... which hardly seems like something worth optimizing for.
I agree that having the internals based on a "list of pairs" certainly seems like it might be more intuitive to developers looking at the internals (then the current approach is), but how is the current approach less type safe for consumers using just the NamedList API?
If the "modern" approach is more performant then the existing impl and passes all of the tests then i suppose it would make sense to switch – but i'm far more interested in how the performance compares for common cases (add/get/iterate) then for cases that hardly ever come up (remove).
My suggestion: provide two independent attachments. One patch that just replaces the internals of NamedList with the approach you suggest so people can apply the patch, test it out, and verify the API/behavior; A second attachment that provides some benchmarks against the NmaedList class – so people can read/run your benchmark with and with out the patch to see how the performance changes.

ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared.

INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc)

True.

Attached herewith is: NLProfile.java - that contains sample benchmarking against the 2 implementations (will work with the previous page on the page).

Some results:

addAll / getAll(): increase in performance is almost [1-10]% range.

add: increase in performance by around 30% , probably because of the additional growth in the List implementation when size approaches capacity. And since, in NamedList - we insert 2 elements as opposed to one, ( as done in ModernNamedList) - it might be more pronounced.

iterator: ~70% increase in performance in favor of the new implementation since it just reuses the iterator for the internal data structure.

The numbers should be present as comments in the corresponding methods - testAdd() , testAddAll(), testGetAll(), testIterator() .

I will attach the final patch once we are convinced with the benchmark methodology and the numbers.

Karthik K
added a comment - 18/Dec/08 00:02
ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared.
INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc)
True.
Attached herewith is: NLProfile.java - that contains sample benchmarking against the 2 implementations (will work with the previous page on the page).
Some results:
addAll / getAll(): increase in performance is almost [1-10] % range.
add: increase in performance by around 30% , probably because of the additional growth in the List implementation when size approaches capacity. And since, in NamedList - we insert 2 elements as opposed to one, ( as done in ModernNamedList) - it might be more pronounced.
iterator: ~70% increase in performance in favor of the new implementation since it just reuses the iterator for the internal data structure.
The numbers should be present as comments in the corresponding methods - testAdd() , testAddAll(), testGetAll(), testIterator() .
I will attach the final patch once we are convinced with the benchmark methodology and the numbers.

While we're going down the micro-benchmarking path, I tried eliminating ArrayList and got an additional 15-25% gain on common operations (create new, add between 5 and 15 elements, and then iterate over those elements later). This was with Java 1.6. -Xbatch improved the results even more... ~40% - but this is just a micro-benchmark.

System.arrayCopy is great. It is bound to perform much better because of the native code for the same.

Meanwhile - w.r.t resize() - ( trade-off because increasing size a lot would increase memory usage. increase a size by a smaller factor would be resulting in a more frequent increases in size). I believe reading some theory that the ideal increase factor is somewhere close to ( 1 + 2^0.5) / 2 or something similar to that.

The method - ensureCapacity(capacity) in ArrayList (Java 6) also seems to be a number along the lines ~ (1.5)

int newCapacity = (oldCapacity * 3)/2 + 1;

+1 seems to be move away from 0, and keep incrementing the count. ( Hmm .. That piece of code - in Java 6 ArrayList can definitely make use of bitwise operators for the div-by-2 operation !!).

Karthik K
added a comment - 19/Dec/08 16:26 System.arrayCopy is great. It is bound to perform much better because of the native code for the same.
Meanwhile - w.r.t resize() - ( trade-off because increasing size a lot would increase memory usage. increase a size by a smaller factor would be resulting in a more frequent increases in size). I believe reading some theory that the ideal increase factor is somewhere close to ( 1 + 2^0.5) / 2 or something similar to that.
The method - ensureCapacity(capacity) in ArrayList (Java 6) also seems to be a number along the lines ~ (1.5)
int newCapacity = (oldCapacity * 3)/2 + 1;
+1 seems to be move away from 0, and keep incrementing the count. ( Hmm .. That piece of code - in Java 6 ArrayList can definitely make use of bitwise operators for the div-by-2 operation !!).

Hoss Man
added a comment - 20/Dec/08 02:05 If there are performance gains to be had in the common case i'm all for it ... but i still feel like i'm not understanding the original goal: how does this approach give us more type safety?
We also need to make sure we don't eliminate any public constructors, which seems to be the case based on my quick glance at the latest patch.

As part of ensuring type-safety , the previous code had a heterogenous List ctor. as before. I changed the access level and added another public ctor. ( Object [] ) with deprecated tag to it so that people are still able to use the functionality.

Otherwise - retaining the same signature after type safety would imply - people passing in a List of String-s and T-s , when the List expects Map.Entry<String , T > and would cause more confusion.

Thanks to the erasure of generics , List and List<Map.Entry<String, T>> are all equal , not helping here.
If backward compatibility is the key here- I can revisit the patch again ensuring the same.

If there are performance gains to be had in the common case i'm all for it ... but i still feel like i'm not understanding the original goal: how does this approach give us more type safety?

When I logged the issue - type-safety was the major reason behind the same. When I submitted by first patch and did the benchmarking - performance was also found to be a major constraint , (with incremental addition and creation of iterator objects). NamedList seemed to be used all over the place. As long as we preserve the contract of the methods - this should definitely give an additional boost - since I discovered as part of profiling of the launch of SolrCore ( CoreContainer.Initializer.initalize() .. ) .

Karthik K
added a comment - 20/Dec/08 02:20
We also need to make sure we don't eliminate any public constructors, which seems to be the case based on my quick glance at the latest patch.
<code>
public NamedList(List nameValuePairs) {
nvPairs=nameValuePairs;
+ protected NamedList(List<Map.Entry<String, T>> nameValuePairs)
{
+ nvPairs = nameValuePairs;
- }
</code>
As part of ensuring type-safety , the previous code had a heterogenous List ctor. as before. I changed the access level and added another public ctor. ( Object [] ) with deprecated tag to it so that people are still able to use the functionality.
Otherwise - retaining the same signature after type safety would imply - people passing in a List of String-s and T-s , when the List expects Map.Entry<String , T > and would cause more confusion.
Thanks to the erasure of generics , List and List<Map.Entry<String, T>> are all equal , not helping here.
If backward compatibility is the key here- I can revisit the patch again ensuring the same.
If there are performance gains to be had in the common case i'm all for it ... but i still feel like i'm not understanding the original goal: how does this approach give us more type safety?
When I logged the issue - type-safety was the major reason behind the same. When I submitted by first patch and did the benchmarking - performance was also found to be a major constraint , (with incremental addition and creation of iterator objects). NamedList seemed to be used all over the place. As long as we preserve the contract of the methods - this should definitely give an additional boost - since I discovered as part of profiling of the launch of SolrCore ( CoreContainer.Initializer.initalize() .. ) .

Ok, Now I think I understand your type safety goal – the existing implementation is type safe if-and-only-if the precondition of the "List" constructor is met (that it contains pairwise names/values) ... your goal is to make NamedList garuntee type correctness. (correct?)

If backward compatibility is the key here- I can revisit the patch again ensuring the same.

There are a lot of internal APIs where I wouldn't be opposed to fudging backwards compatibility in the interests of better code, but NamedList isn't one of them. it's the datastructure that gets used by almost any type of plugin people may write – any request handler or search component that wants to add data to the response is going to be constructing a new NamedList, so I'm definitely not on board breaking things for all of those people.

unfortunately, i think your goal is in direct and inherent opposition to backwards compatiblity.

As you mentioned, type erasure prevents us from adding a new "List<Entry<String, T>>" constructor while keeping the existing "List" constructor – but that's not the biggest problem. We could always use tricks (like adding an extra ignored arg to the new constructor, or making the new constructor take in an "Entry<String, T>[]" instead of a List) to maintain binary API compatibility, and then have the legacy constructor cast the List elements as needed to delegate to the new constructor .... EXCEPT .... binary API compatibilty is only part of backwards compatibility. The bigger problem is this sentence in the javadocs...

* @param nameValuePairs underlying List which should be used to implement a NamedList; modifying this List will affect the NamedList.
*/
public NamedList(List nameValuePairs) {

...it would be nice if that was just an implementation detail, and the javadocs said "... may affect the NamedList", but it says "... will affect the NamedList" Changing the internals (and that constructor) would change the behavior our from under people who have an existing expectation that they can maintain a refrence to the List and modify it to affect the NamedList. Unfortunately this isn't an academic point: I've actually seen this utilized in plugin code. People build up datastructures containing NamedLists, and then data is added to the underlying Lists backing those NamedLists after the fact (but before the NamedList is iterated by a response writer)

I just don't see any way to feasibly achieve your type-safe constructor goal while maintaining back-compat.

Hoss Man
added a comment - 15/Jan/09 05:00 Ok, Now I think I understand your type safety goal – the existing implementation is type safe if-and-only-if the precondition of the "List" constructor is met (that it contains pairwise names/values) ... your goal is to make NamedList garuntee type correctness. (correct?)
If backward compatibility is the key here- I can revisit the patch again ensuring the same.
There are a lot of internal APIs where I wouldn't be opposed to fudging backwards compatibility in the interests of better code, but NamedList isn't one of them. it's the datastructure that gets used by almost any type of plugin people may write – any request handler or search component that wants to add data to the response is going to be constructing a new NamedList, so I'm definitely not on board breaking things for all of those people.
unfortunately, i think your goal is in direct and inherent opposition to backwards compatiblity.
As you mentioned, type erasure prevents us from adding a new "List<Entry<String, T>>" constructor while keeping the existing "List" constructor – but that's not the biggest problem. We could always use tricks (like adding an extra ignored arg to the new constructor, or making the new constructor take in an "Entry<String, T>[]" instead of a List) to maintain binary API compatibility, and then have the legacy constructor cast the List elements as needed to delegate to the new constructor .... EXCEPT .... binary API compatibilty is only part of backwards compatibility. The bigger problem is this sentence in the javadocs...
* @param nameValuePairs underlying List which should be used to implement a NamedList; modifying this List will affect the NamedList.
*/
public NamedList(List nameValuePairs) {
...it would be nice if that was just an implementation detail, and the javadocs said "... may affect the NamedList", but it says "... will affect the NamedList" Changing the internals (and that constructor) would change the behavior our from under people who have an existing expectation that they can maintain a refrence to the List and modify it to affect the NamedList. Unfortunately this isn't an academic point: I've actually seen this utilized in plugin code. People build up datastructures containing NamedLists, and then data is added to the underlying Lists backing those NamedLists after the fact (but before the NamedList is iterated by a response writer)
I just don't see any way to feasibly achieve your type-safe constructor goal while maintaining back-compat.

Kay: It occurs to me that one thing we could do is invert the nature of your patch:

add an "Entry<String, T>[]" constructor (with a warning, but no commitment, that modifying the array conents may affect the NamedList) which builds up a pairwise List<Object> and delegates to the existing "List" constructor

deprecate the existing List constructor.

Anyone using the new constructor will get the type safety benefits (not entirely enforced by the compiler, but enforced by the contract) and at some later date (Solr 2?) we can remove the "List" constructor and replace the guts of the class with your approach (to get the perf improvements)

Hoss Man
added a comment - 17/Jan/09 00:50 Kay: It occurs to me that one thing we could do is invert the nature of your patch:
add an "Entry<String, T>[]" constructor (with a warning, but no commitment, that modifying the array conents may affect the NamedList) which builds up a pairwise List<Object> and delegates to the existing "List" constructor
deprecate the existing List constructor.
Anyone using the new constructor will get the type safety benefits (not entirely enforced by the compiler, but enforced by the contract) and at some later date (Solr 2?) we can remove the "List" constructor and replace the guts of the class with your approach (to get the perf improvements)
thoughts?

Logged a separate issue - SOLR-967 for the temporary fix and tracking progress on the same. This issue changes the internals of the class that would be revisited after adopting a proper migration path , per SOLR-967 .

Karthik K
added a comment - 19/Jan/09 17:45 Logged a separate issue - SOLR-967 for the temporary fix and tracking progress on the same. This issue changes the internals of the class that would be revisited after adopting a proper migration path , per SOLR-967 .

Karthik K
added a comment - 04/Dec/09 08:04 So - what is the current status of this.
At what time are we planning to deprecate the old implementation and incorporate this new d.s ( changing the guts of the internals ).

Noble Paul
added a comment - 04/Dec/09 08:31 Type safety is generally not required in NamedList. very often we use heterogeneous NamedList.
Creating an Entry Object per entry is memory inefficient compared to the existing one.
type safety is there for the users of NamedList API even now. Internally how we manage it is not so important i feel

Type safety is generally not required in NamedList. very often we use heterogeneous NamedList.

Creating an Entry Object per entry is memory inefficient compared to the existing one.

type safety is there for the users of NamedList API even now. Internally how we manage it is not so important i feel

The performance numbers in here say a different story. The heterogenous NamedList data structure is not intuitive w.r.t code and performs poorly compared to the revised one as put in here. As regarding Entry Objects being a memory hog - do we have some stats to back it up. Otherwise it is premature to call that a memory optimization.

Karthik K
added a comment - 04/Dec/09 21:41
Type safety is generally not required in NamedList. very often we use heterogeneous NamedList.
Creating an Entry Object per entry is memory inefficient compared to the existing one.
type safety is there for the users of NamedList API even now. Internally how we manage it is not so important i feel
The performance numbers in here say a different story. The heterogenous NamedList data structure is not intuitive w.r.t code and performs poorly compared to the revised one as put in here. As regarding Entry Objects being a memory hog - do we have some stats to back it up. Otherwise it is premature to call that a memory optimization.

Chris A. Mattmann
added a comment - 04/Dec/09 23:03 The heterogenous NamedList data structure is not intuitive w.r.t code and performs poorly compared to the revised one as put in here.
+1 to this. NamedList is not a very intuitive structure at all, which I remarked on SOLR-1516 .
Cheers,
Chris

Resurrecting that issue, I reviewed NamedList and I don't understand why it has to be so complicated:

Its <T> generic results in TONS of warnings in eclipse, for no good reason. I don't buy this comment made on the issue that it's better to make it generic than not. If we have a generic API which isn't used like that, it calls for a bad API IMO. From what I can see, NamedList is not supposed to be generic at all, as its main purpose is to allow you to store a heterogeneous list of <name,value> pairs, where name is always String, and the type of value may differ. If we want to make it more convenient for people who know e.g. all values are Strings, we can add sugar methods like getInt(), getString().... I've also briefly reviewed some classes that use NamedList (outside of tests), and none seem to rely on <T> at all. So I'd rather we remove that generic from the API signature.

There is what seems to be a totally redundant SimpleOrderedMap class, which has contradicting documentation in its class-jdocs:

a NamedList where access by key is more important than maintaining order

This class does not provide efficient lookup by key

But the class doesn't add any additional data structures, contains only 3 ctors which delegate as-is to NamedList and offers a clone() which is identical to NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think this class is just confusing and has to go away.

Leaving performance aside for a second, NamedList could simply hold an internal Map<String,List<Object>> to enable efficient access by key, remove all values of a key, access a key's values in order etc. It doesn't allow accessing the <name,value> pairs in any order though, i.e. getVal(i). I don't know how important is this functionality though .. i.e. if we replaced it with a namesIterator(), would it not allow roughly the same thing? I'm kind of sure it does, but there are so many uses of NamedList across the Solr code base that I might be missing a case which won't like it. So I'd like to ask the Solr folks who know this code better than me – how important is getName/Val(i)?

Now back to performance, for a sec, in order to not always allocate a List<Object> when NamedList is used to hold only one value per parameter, we can either:

Use Collections.singletonList() on first add, and change to a concrete List on the second add only.

Use an Object[], it's less expensive than a List object.

Use a Map<String,Object> internally and do instanceof checks on add/get as appropriate.

BUT, if accessing/removing values by name is not important and it's OK if get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal above, to hold an Object[] array (instead of List). But I think we should remove the generic anyway.

Maybe we should break this down into 3 issues:

Get rid of SimpleOrderedMap – if it's important to keep in 4x, I can deprecate and move all uses of it to NamedList directly.

Remove the generics from NamedList's API. We can add sugar getters for specific types if we want.

Simplify NamedList internal implementation. On the performance side – how critical is NamedList on the execution path? I don't like micro-benchmarks too much, so if NamedList is only a fraction of an entire execution path, I'd rather it's even a tad slower but readable and easier to use/maintain, than if it's overly complicated only to buy us a few nanos in the overall request.

Shai Erera
added a comment - 03/Aug/14 07:19 - edited Resurrecting that issue, I reviewed NamedList and I don't understand why it has to be so complicated:
Its <T> generic results in TONS of warnings in eclipse, for no good reason. I don't buy this comment made on the issue that it's better to make it generic than not. If we have a generic API which isn't used like that, it calls for a bad API IMO. From what I can see, NamedList is not supposed to be generic at all, as its main purpose is to allow you to store a heterogeneous list of <name,value> pairs, where name is always String, and the type of value may differ. If we want to make it more convenient for people who know e.g. all values are Strings, we can add sugar methods like getInt(), getString()... . I've also briefly reviewed some classes that use NamedList (outside of tests), and none seem to rely on <T> at all. So I'd rather we remove that generic from the API signature.
There is what seems to be a totally redundant SimpleOrderedMap class, which has contradicting documentation in its class-jdocs:
a NamedList where access by key is more important than maintaining order
This class does not provide efficient lookup by key
But the class doesn't add any additional data structures, contains only 3 ctors which delegate as-is to NamedList and offers a clone() which is identical to NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think this class is just confusing and has to go away.
Leaving performance aside for a second, NamedList could simply hold an internal Map<String,List<Object>> to enable efficient access by key, remove all values of a key, access a key's values in order etc. It doesn't allow accessing the <name,value> pairs in any order though, i.e. getVal(i) . I don't know how important is this functionality though .. i.e. if we replaced it with a namesIterator() , would it not allow roughly the same thing? I'm kind of sure it does, but there are so many uses of NamedList across the Solr code base that I might be missing a case which won't like it. So I'd like to ask the Solr folks who know this code better than me – how important is getName/Val(i) ?
Now back to performance, for a sec, in order to not always allocate a List<Object> when NamedList is used to hold only one value per parameter, we can either:
Use Collections.singletonList() on first add , and change to a concrete List on the second add only.
Use an Object[] , it's less expensive than a List object.
Use a Map<String,Object> internally and do instanceof checks on add/get as appropriate.
BUT, if accessing/removing values by name is not important and it's OK if get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal above, to hold an Object[] array (instead of List). But I think we should remove the generic anyway.
Maybe we should break this down into 3 issues:
Get rid of SimpleOrderedMap – if it's important to keep in 4x, I can deprecate and move all uses of it to NamedList directly.
Remove the generics from NamedList's API. We can add sugar getters for specific types if we want.
Simplify NamedList internal implementation. On the performance side – how critical is NamedList on the execution path? I don't like micro-benchmarks too much, so if NamedList is only a fraction of an entire execution path, I'd rather it's even a tad slower but readable and easier to use/maintain, than if it's overly complicated only to buy us a few nanos in the overall request.

One other thing, how important is it to be able to store null names? I haven't dug deep through the code – do we actually use it? This doesn't prevent us from using a Map internally, as we can use our own key, something like "$$NULL_STRING!!" (or pick some other constant) and map the null-name requests to this key.

Shai Erera
added a comment - 03/Aug/14 07:31 One other thing, how important is it to be able to store null names? I haven't dug deep through the code – do we actually use it? This doesn't prevent us from using a Map internally, as we can use our own key, something like "$$NULL_STRING!!" (or pick some other constant) and map the null-name requests to this key.

I built a patch that eliminated SimpleOrderedMap, but it's huge and intrusive. Based on how often SimpleOrderedMap is used and the number of times that a SimpleOrderedMap object was actually named "nl" ... I am guessing that people assumed they could rely on the note in the NamedList javadoc about SimpleOrderedMap, or perhaps the notes in the javadoc for SimpleOrderedMap itself (which contradicts what it says in NamedList).

If somebody knows how to actually implement what SimpleOrderedMap promises, we might see a slight increase in Solr's performance, and the patch will be far less intrusive.

Shawn Heisey
added a comment - 06/Aug/14 05:24 I built a patch that eliminated SimpleOrderedMap, but it's huge and intrusive. Based on how often SimpleOrderedMap is used and the number of times that a SimpleOrderedMap object was actually named "nl" ... I am guessing that people assumed they could rely on the note in the NamedList javadoc about SimpleOrderedMap, or perhaps the notes in the javadoc for SimpleOrderedMap itself (which contradicts what it says in NamedList).
If somebody knows how to actually implement what SimpleOrderedMap promises, we might see a slight increase in Solr's performance, and the patch will be far less intrusive.

Hi Shawn. See SOLR-6315 which I opened to do exactly that and ran into issues with removing SimpleOrderedMap. I've attached a patch there with my progress thus far. I think we should collaborate there - it may not be possible to remove the class entirely, unless we add something to NamedList (e.g. preferMapOutput or something), but I'm sure we can improve the current situation (even if only javadocs).

Shai Erera
added a comment - 06/Aug/14 07:01 Hi Shawn. See SOLR-6315 which I opened to do exactly that and ran into issues with removing SimpleOrderedMap. I've attached a patch there with my progress thus far. I think we should collaborate there - it may not be possible to remove the class entirely, unless we add something to NamedList (e.g. preferMapOutput or something), but I'm sure we can improve the current situation (even if only javadocs).