.NET implementation of Avro

Details

Initial work towards getting .Net support for Avro. Support for most Avro types is working along with some of code generation. Stream support needs to be added. Unions need to be flushed out as well. Type serialization is currently handled by a reflection.emit system that is dynamically generated at runtime.

Initial work towards getting .Net support for Avro. Support for most Avro types is working along with some of code generation. Stream support needs to be added. Unions need to be flushed out as well. Type serialization is currently handled by a reflection.emit system that is dynamically generated at runtime.

Activity

I am happy to accept this issue. I've already been working on a port of avro to .net. I've currently on the middle of moving to Austin, tx so progress will be minimal over the next couple weeks. What I have done so far is available at http://github.com/jcustenborder/avro. My plans is to get things up and running with code generation support by mid summer. Hopefully sooner. The cool part is that I have a need for cross platform RPC at my new job meaning this will be one of my early development tasks.

If you want to take what is in my github as an initial patch you are more than welcome. So far it passes a lot of the schema tests and would be a great starting point for anyone that wants to help.

My longterm goals is to have a fast client and server implementation that uses reflection.emit under the hood for schema serialization.

Jeremy Custenborder
added a comment - 29/Apr/10 19:08 I am happy to accept this issue. I've already been working on a port of avro to .net. I've currently on the middle of moving to Austin, tx so progress will be minimal over the next couple weeks. What I have done so far is available at http://github.com/jcustenborder/avro . My plans is to get things up and running with code generation support by mid summer. Hopefully sooner. The cool part is that I have a need for cross platform RPC at my new job meaning this will be one of my early development tasks.
If you want to take what is in my github as an initial patch you are more than welcome. So far it passes a lot of the schema tests and would be a great starting point for anyone that wants to help.
My longterm goals is to have a fast client and server implementation that uses reflection.emit under the hood for schema serialization.

Jeremy Custenborder
added a comment - 06/May/10 22:50 Here is a patch of what I have done so far. Just to make things nice and confusing I changed my development environment to follow http://wiki.apache.org/hadoop/GitAndHadoop so now I'm working out of a branch instead of trunk. This changes my github url for anyone who is interested. The correct github url is http://github.com/jcustenborder/avro/tree/dotnet-port
I'm not too familiar with Apache patch procedures so please let me know if you need anything modified.

I've started looking into it. I couldn't get it compile because it is complaining that the namespaces like Newtonsoft and log4net are not found. I was not able to build those solutions in the contrib directory either. It reports problems with further dependencies.

I'm using VC# express 2010 on Windows XP.

Can you add a README at lang/dotnet that explains how to setup for building the project?

Thiruvalluvan M. G.
added a comment - 19/Aug/10 03:51 Hi Jeremy,
I've started looking into it. I couldn't get it compile because it is complaining that the namespaces like Newtonsoft and log4net are not found. I was not able to build those solutions in the contrib directory either. It reports problems with further dependencies.
I'm using VC# express 2010 on Windows XP.
Can you add a README at lang/dotnet that explains how to setup for building the project?
Thanks
Thiru

Sorry about the missing dependencies. It looks like the patch process ignores binary files. I'm not really familiar with the patch process. Should I just add the contrib folder as a zip file to this ticket?

Jeremy Custenborder
added a comment - 19/Aug/10 04:32 Hi Thiru,
Sorry about the missing dependencies. It looks like the patch process ignores binary files. I'm not really familiar with the patch process. Should I just add the contrib folder as a zip file to this ticket?
J

That is one way. But if your git repository is current, I can pull the stuff from there. Once we converge, let's move back to patch. I guess it will take a few iterations and having large zip attachments every time will be a problem.

Thiruvalluvan M. G.
added a comment - 19/Aug/10 05:12 - edited Hi Jeremy,
Thanks for the quick response.
That is one way. But if your git repository is current, I can pull the stuff from there. Once we converge, let's move back to patch. I guess it will take a few iterations and having large zip attachments every time will be a problem.
Thiru

I could get the code to compile on my machine. I was trying to understand the code from the bottom. Since the schema and encoding/decoding at the very bottom, the following comments apply to those parts of the code. Please bear with me, it'll take a while to go through the whole codebase. Since we wish and hope that the code will live for several years, a couple of weeks spent in getting it right should be worth it.

Most of the tests passed on my machine, some failed. One test even crashes NTest UI.

Instead of having SchemaType as a string, it'd be better to have it as an enum, which, unlike string, is close-ended.

Testing appears inadequate. In TestSchema, for example the test only covers parsing.

In the Encoder and Decoder interfaces, we are accepting the Stream in every function. Typically, a lot of encoding/decoding happens on the same stream. So attaching the stream the interface once will be better, the way we do in the Java implementation.

I see some files have both Windows and Unix line-endings. I don't know where the problem is. Do you think keeping Windows line-ending throughout will keep us from these troubles?

Most of the places the tabs are expanded into spaces, which is perfect. But certain tabs seem to have escaped. We try to avoid hard-tabs.

We can save quite a bit of code if we use parameterized tests. There is attempt to achieve parameterization in TestSchema. But it leads to some problems, I think. When tests fail, it becomes hard to find which test has failed.

I tired to capture all the above ideas (regarding the Schema not encoder/decoder) by refactoring the code. I kept it at git@github.com:thirumg/Avro.NET.git in the master branch.
Please feel free to pull it and try. I have commented out certain tests and code in Avrogen because of this refactroing. Please don't mistake me, my intention was not to arbitrarily change your code. I thought some ideas are better shown by demonstration rather than pages and pages of text.

Please let me know what you think.

Do you have any suggestion for a code coverage tool? It'll be nice to know how much coverage we get with our tests.

Thiruvalluvan M. G.
added a comment - 21/Aug/10 17:45 Hi Jeremy,
I could get the code to compile on my machine. I was trying to understand the code from the bottom. Since the schema and encoding/decoding at the very bottom, the following comments apply to those parts of the code. Please bear with me, it'll take a while to go through the whole codebase. Since we wish and hope that the code will live for several years, a couple of weeks spent in getting it right should be worth it.
Most of the tests passed on my machine, some failed. One test even crashes NTest UI.
The class hierarchy of Schema is:
Schema
ArraySchema
MapSchema
NamedSchema
EnumSchema
FixedSchema
RecordSchema
ErrorSchema
PrimitiveSchema
BooleanSchema
NullSchema
UnionSchema
It's not clear why only NullSchema and BooleanSchema are specified and the rest of the primitive schemas are left alone. I think a a more appropriate hierarchy would be:
Schema
NamedSchema
EnumSchema
FixedSchema
RecordSchema/ErrorSchema
UnnamedSchema
ArraySchema
MapSchema
PrimitiveSchema
UnionSchema
Instead of having SchemaType as a string, it'd be better to have it as an enum, which, unlike string, is close-ended.
Testing appears inadequate. In TestSchema, for example the test only covers parsing.
In the Encoder and Decoder interfaces, we are accepting the Stream in every function. Typically, a lot of encoding/decoding happens on the same stream. So attaching the stream the interface once will be better, the way we do in the Java implementation.
I see some files have both Windows and Unix line-endings. I don't know where the problem is. Do you think keeping Windows line-ending throughout will keep us from these troubles?
Most of the places the tabs are expanded into spaces, which is perfect. But certain tabs seem to have escaped. We try to avoid hard-tabs.
We can save quite a bit of code if we use parameterized tests. There is attempt to achieve parameterization in TestSchema. But it leads to some problems, I think. When tests fail, it becomes hard to find which test has failed.
I tired to capture all the above ideas (regarding the Schema not encoder/decoder) by refactoring the code. I kept it at git@github.com:thirumg/Avro.NET.git in the master branch.
Please feel free to pull it and try. I have commented out certain tests and code in Avrogen because of this refactroing. Please don't mistake me, my intention was not to arbitrarily change your code. I thought some ideas are better shown by demonstration rather than pages and pages of text.
Please let me know what you think.
Do you have any suggestion for a code coverage tool? It'll be nice to know how much coverage we get with our tests.
Thanks
Thiru

The main point in me putting out this patch was so that there are a few other corporate devs that have been in contact with me about my fork. This is an initial contribution and is not completely working release. Jeff Hammerbacher asked me to create a patch of what I have done so far. This is what that patch is.

Most of the tests passed on my machine, some failed. One test even crashes NTest UI.

It's not clear why only NullSchema and BooleanSchema are specified and the rest of the primitive schemas are left alone. I think a a more appropriate hierarchy would be:

At one point I had a schema definition for all of the primitive types. I removed all of these for just using the static variable Primitive.Boolean, Primitive.Null. I forgot to remove the NullSchema and BooleanSchema classes. These should be removed.

Instead of having SchemaType as a string, it'd be better to have it as an enum, which, unlike string, is close-ended.

This was my first approach but I ended up moving away from the enum. For the life of me I cannot remember why.

Testing appears inadequate. In TestSchema, for example the test only covers parsing.

Yep it's not a full release. It's only what I have finished so far. You are welcome to add tests.

I see some files have both Windows and Unix line-endings. I don't know where the problem is. Do you think keeping Windows line-ending throughout will keep us from these troubles?

This must have been my git settings on windows. Do you want me to try and correct it then resubmit?

I tired to capture all the above ideas (regarding the Schema not encoder/decoder) by refactoring the code. I kept it at git@github.com:thirumg/Avro.NET.git in the master branch.
Please feel free to pull it and try. I have commented out certain tests and code in Avrogen because of this refactroing. Please don't mistake me, my intention was not to arbitrarily change your code. I thought some ideas are better shown by demonstration rather than pages and pages of text.

Instead of commenting out the tests lets put the Explicit argument on them. This will allow the tests to be executed if you explicitly execute the test. If you execute all of the tests, Explicit tests will be skipped.
Changing the code does not bother me. The main reason I put this patch together is that I have been contacted by interested parties that due to corporate legal concerns needed the code to be submitted to ASF before they could add their contributions.

Do you have any suggestion for a code coverage tool? It'll be nice to know how much coverage we get with our tests.

Jeremy Custenborder
added a comment - 21/Aug/10 23:08 Hey Thiru,
The main point in me putting out this patch was so that there are a few other corporate devs that have been in contact with me about my fork. This is an initial contribution and is not completely working release. Jeff Hammerbacher asked me to create a patch of what I have done so far. This is what that patch is.
Most of the tests passed on my machine, some failed. One test even crashes NTest UI.
I'm aware of that. Schema parsing still needs some work. Testing the interop schema causes a stack overflow.
It's not clear why only NullSchema and BooleanSchema are specified and the rest of the primitive schemas are left alone. I think a a more appropriate hierarchy would be:
At one point I had a schema definition for all of the primitive types. I removed all of these for just using the static variable Primitive.Boolean, Primitive.Null. I forgot to remove the NullSchema and BooleanSchema classes. These should be removed.
Instead of having SchemaType as a string, it'd be better to have it as an enum, which, unlike string, is close-ended.
This was my first approach but I ended up moving away from the enum. For the life of me I cannot remember why.
Testing appears inadequate. In TestSchema, for example the test only covers parsing.
Yep it's not a full release. It's only what I have finished so far. You are welcome to add tests.
I see some files have both Windows and Unix line-endings. I don't know where the problem is. Do you think keeping Windows line-ending throughout will keep us from these troubles?
This must have been my git settings on windows. Do you want me to try and correct it then resubmit?
I tired to capture all the above ideas (regarding the Schema not encoder/decoder) by refactoring the code. I kept it at git@github.com:thirumg/Avro.NET.git in the master branch.
Please feel free to pull it and try. I have commented out certain tests and code in Avrogen because of this refactroing. Please don't mistake me, my intention was not to arbitrarily change your code. I thought some ideas are better shown by demonstration rather than pages and pages of text.
Instead of commenting out the tests lets put the Explicit argument on them. This will allow the tests to be executed if you explicitly execute the test. If you execute all of the tests, Explicit tests will be skipped.
Changing the code does not bother me. The main reason I put this patch together is that I have been contacted by interested parties that due to corporate legal concerns needed the code to be submitted to ASF before they could add their contributions.
Do you have any suggestion for a code coverage tool? It'll be nice to know how much coverage we get with our tests.
ncover?

Jeff Hammerbacher
added a comment - 01/Sep/10 08:24 Hey,
It looks like this patch might slip the 1.4.0 release. Not a big deal, but would love to see it cleaned up and committed to trunk. Any progress since Jeremy's comments?
Thanks,
Jeff

Yea. I started working on this. I've finished refactoring the "lower" layers viz. schema, decoder and encoder and tested them. I'm working on the generic reader/writer. I should be able to submit a patch for these in a few days. Then I'll take up the "upper" layers file, ipc and codegen.

Thiruvalluvan M. G.
added a comment - 02/Sep/10 14:11 Yea. I started working on this. I've finished refactoring the "lower" layers viz. schema, decoder and encoder and tested them. I'm working on the generic reader/writer. I should be able to submit a patch for these in a few days. Then I'll take up the "upper" layers file, ipc and codegen.

The patch is pretty complete in terms of Schema, binary encoding, binary decoding. generic reader and generic writer and schema resolution for generic reader and documentation for these. The only known issue is that it does not resolve schemas if the writer's schema does not have a field and but the reader's schema has a field with a default value. I don't have clean solution for this yet.

The patch has a fair collection of unit tests using NUnit.

The code compiles on a windows machine with Microsoft Visual C# 2010. The only dependency is that one should install NUnit 2.5.x. Since I use parametrized (data driven) tests, the minimum version of NUnit should be 2.5.

To be useful, the following tasks should be completed:

Compare the binary data produced to that by other implementations and ensure they are equivalent

Make the code compile with Mono on a Linux system

Add support for datafile, IPC and code generation

Fix the resoultion with default value problem above.

Do some benchmark and compare the performance with that of other implementations such as Java

Describe the design in a document

Have some install script

I'm planning to undertake these tasks in roughly the same order as above.

Thiruvalluvan M. G.
added a comment - 06/Sep/10 19:23 This patch is trimmed and cleaned up version of Jeremy Custenborder.
The patch is pretty complete in terms of Schema, binary encoding, binary decoding. generic reader and generic writer and schema resolution for generic reader and documentation for these. The only known issue is that it does not resolve schemas if the writer's schema does not have a field and but the reader's schema has a field with a default value. I don't have clean solution for this yet.
The patch has a fair collection of unit tests using NUnit.
The code compiles on a windows machine with Microsoft Visual C# 2010. The only dependency is that one should install NUnit 2.5.x. Since I use parametrized (data driven) tests, the minimum version of NUnit should be 2.5.
To be useful, the following tasks should be completed:
Compare the binary data produced to that by other implementations and ensure they are equivalent
Make the code compile with Mono on a Linux system
Add support for datafile, IPC and code generation
Fix the resoultion with default value problem above.
Do some benchmark and compare the performance with that of other implementations such as Java
Describe the design in a document
Have some install script
I'm planning to undertake these tasks in roughly the same order as above.
Comments are welcome.

Thiruvalluvan M. G.
added a comment - 14/Sep/10 17:40 In terms of functionality, this patch is essentially the same as the previous one, but works on both Windows and Linux.
The batch contains the following three libraries:
Newtonsoft.Json.Net20.dll from http://james.newtonking.com/projects/json-net.aspx (Creative Commons Attribution 2.5 License)
log4net.dll from http://logging.apache.org/log4net/ (Apache 2.0 license)
nunit.framework.dll from http://www.nunit.org/ (zlib/libpng license)
I've copied these files from a larger download. Can someone confirm that it's legal to copy the individual files and to use them in our project? Thanks

Doug Cutting
added a comment - 14/Sep/10 18:14 These licenses are all acceptable for inclusion.
http://www.apache.org/legal/resolved.html#category-a
For any non-Apache license, add an entry at the end of Avro's LICENSE.txt of the form:
For the lang/c-sharp/../json-net.aspx component:
.. text of its license...
If the code contains copyright notices, these should be similarly added to NOTICE.txt.

Thiruvalluvan M. G.
added a comment - 18/Dec/10 17:08 Hi Petar,
I'm really sorry, I couldn't find time to complete this.
I'm presently working on Avro C++ as my company needs it. I plan to come back to this in Jan.

You were introduced to me by Doug Cutting. I will be working on completing the dotnet implementation of avro. I grabbed the code from Jeremy's git (https://github.com/jcustenborder/avro/tree/dotnet-port) to use as my code base, but I can't apply the patches that you added here - the directory structure between the patch and git is not the same.

Would you be able to help me get the latest dotnet code for me to start with?

Dona Alvarez
added a comment - 27/Jan/11 20:44 Hi Thiru,
You were introduced to me by Doug Cutting. I will be working on completing the dotnet implementation of avro. I grabbed the code from Jeremy's git ( https://github.com/jcustenborder/avro/tree/dotnet-port ) to use as my code base, but I can't apply the patches that you added here - the directory structure between the patch and git is not the same.
Would you be able to help me get the latest dotnet code for me to start with?
Thank you
Dona

The patch (AVRo-533.patch) here is derived from Jeremy's original work (AVRO-533.zip). I refactored a lot, fixed functionality gaps, got in quite a bit of unit testing (over 80% coverage when I last measured) and it has a cleaner design. Also it works with both Microsoft.NET and Mono. I suggest you start from AVRO-533.patch.

If you are interested in basic Avro encoding and decoding, the code in this patch should be good enough except for one issue - when resolving schemas it does not insert default values (for fields that reader's schema has but missing in writer's schema).

Unimplemented functionality: (1) Avro data file format and (2) Avro RPC. (1) should be easy to implement as the work is not much and current design lends itself for data file (2) requires much more work.

Thiruvalluvan M. G.
added a comment - 28/Jan/11 03:09 Hi Dona,
The patch (AVRo-533.patch) here is derived from Jeremy's original work ( AVRO-533 .zip). I refactored a lot, fixed functionality gaps, got in quite a bit of unit testing (over 80% coverage when I last measured) and it has a cleaner design. Also it works with both Microsoft.NET and Mono. I suggest you start from AVRO-533 .patch.
If you are interested in basic Avro encoding and decoding, the code in this patch should be good enough except for one issue - when resolving schemas it does not insert default values (for fields that reader's schema has but missing in writer's schema).
Unimplemented functionality: (1) Avro data file format and (2) Avro RPC. (1) should be easy to implement as the work is not much and current design lends itself for data file (2) requires much more work.

I can also be of some assistance. I'm sorry I didn't have time to complete the project. I joined a startup and most of my time was consumed there. I'm getting to a point that I can contribute once again, so I'd be interested in seeing where I can help out. You are more than welcome to email me if you have any questions about why I implemented something one way or another. One thing I have not done is cross platform testing. I haven't tried to serialize something in the .net version and read it from the java version for example.

If you clone from my repo that would be the version that is included in AVRO-533.zip. My Zip patch contained some external libraries that I'm using. I didn't know how to include the binaries in the patching process hence the zip. I believe Thiru took my patch, refactored it, and added some unit tests. I haven't looked at that work yet. I believe that is the contents of Thiru's two patches.

Jeremy Custenborder
added a comment - 31/Jan/11 18:29 Dona,
I can also be of some assistance. I'm sorry I didn't have time to complete the project. I joined a startup and most of my time was consumed there. I'm getting to a point that I can contribute once again, so I'd be interested in seeing where I can help out. You are more than welcome to email me if you have any questions about why I implemented something one way or another. One thing I have not done is cross platform testing. I haven't tried to serialize something in the .net version and read it from the java version for example.
If you clone from my repo that would be the version that is included in AVRO-533 .zip. My Zip patch contained some external libraries that I'm using. I didn't know how to include the binaries in the patching process hence the zip. I believe Thiru took my patch, refactored it, and added some unit tests. I haven't looked at that work yet. I believe that is the contents of Thiru's two patches.
Hit me up if I can be of any assistance.
j

Adding a set of templates to Java's SpecificCompiler might simplify a C# implementations. In addition to the templates we'd also need to add some new C#-specific utility methods to SpecificCompiler, analagous to javaType(), javaEscape(), etc.

Doug Cutting
added a comment - 02/Feb/11 23:39 Adding a set of templates to Java's SpecificCompiler might simplify a C# implementations. In addition to the templates we'd also need to add some new C#-specific utility methods to SpecificCompiler, analagous to javaType(), javaEscape(), etc.

I have a partially complete port of moving the file read/write to completely async IO as well as using .nets dynamic assemblies for each of codegen. If you want the code I will zip it up and drop it here.

Steve Severance
added a comment - 02/Feb/11 23:53 I have a partially complete port of moving the file read/write to completely async IO as well as using .nets dynamic assemblies for each of codegen. If you want the code I will zip it up and drop it here.

Using Java's template generation for other languages is exactly what I had in mind when I made it templated. You'll need to pipe through some commandline arguments and such to "choose" a set of templates as opposed to the default Java ones, but this should be pretty straight-forward.

Philip Zeyliger
added a comment - 02/Feb/11 23:57 Dona,
Using Java's template generation for other languages is exactly what I had in mind when I made it templated. You'll need to pipe through some commandline arguments and such to "choose" a set of templates as opposed to the default Java ones, but this should be pretty straight-forward.
– Philip

Initially I had already started moving down the path using System.CodeDom to generate out an object graph for .Net. With CodeDom there is no overhead switching between VB.Net, C#, or any other .Net based language that supports CodeDom. Check out CodeGen/AvroGrn.cs. I started writing an implementation there. I was initially thinking of doing something similar to the Protocol buffer implementation of .Net. They use attributes to mark the types of each class.

Jeremy Custenborder
added a comment - 03/Feb/11 23:06 Initially I had already started moving down the path using System.CodeDom to generate out an object graph for .Net. With CodeDom there is no overhead switching between VB.Net, C#, or any other .Net based language that supports CodeDom. Check out CodeGen/AvroGrn.cs. I started writing an implementation there. I was initially thinking of doing something similar to the Protocol buffer implementation of .Net. They use attributes to mark the types of each class.

I have a partially complete port of moving the file read/write to completely async IO as well as using .nets dynamic assemblies for each of codegen. If you want the code I will zip it up and drop it here.

Steve it's been a while since we spoke. Did you end up implementing data files? I'm interested in seeing the async modification you made.

Jeremy Custenborder
added a comment - 03/Feb/11 23:07
I have a partially complete port of moving the file read/write to completely async IO as well as using .nets dynamic assemblies for each of codegen. If you want the code I will zip it up and drop it here.
Steve it's been a while since we spoke. Did you end up implementing data files? I'm interested in seeing the async modification you made.

Thanks for all the feedback. I looked at System.CodeDom and that allows generation of C# classes without the need of any templates. It's pretty straightforward. My team has done extensions to the Java code gen tool and so creating new velocity templates for C# is a good option to quickly re-use those added functionalities. I'll discuss this in more detail with my team. Please feel free to email me if you have additional input on this.

Jeremy, last week I worked on protocol parsing. You had that coded but that was not included in Thiru's latest patch. I fixed a few things to make it work like resolution of types from different namespaces, writing back from schema objects to JSON text, etc. I'll email you the latest code. My priority at this point is code generation, if you want you can work on other missing functionalities like data file, RPC, etc.

Dona Alvarez
added a comment - 07/Feb/11 15:52
Thanks for all the feedback. I looked at System.CodeDom and that allows generation of C# classes without the need of any templates. It's pretty straightforward. My team has done extensions to the Java code gen tool and so creating new velocity templates for C# is a good option to quickly re-use those added functionalities. I'll discuss this in more detail with my team. Please feel free to email me if you have additional input on this.
Jeremy, last week I worked on protocol parsing. You had that coded but that was not included in Thiru's latest patch. I fixed a few things to make it work like resolution of types from different namespaces, writing back from schema objects to JSON text, etc. I'll email you the latest code. My priority at this point is code generation, if you want you can work on other missing functionalities like data file, RPC, etc.
Steve, I'm interested in seeing what you have done as well
Thanks
Dona

Dona Alvarez
added a comment - 11/Mar/11 22:03
Hi,
I attached the latest version of C# Avro api (csharp_2nd_patch.zip) if anyone would like to review. The following are the features I added since the last patch on Sep 2010
Code generation
Decoding/encoding of specific objects
Schema default values
Namespace resolution
Protocol parsing
Parsing of custom properties
Alias parsing and resolution
Thanks,
Dona

I did not attempt this in Mono. The version I have installed is Visual Studio 2010, framework 4.0.

It looks like Mono 2.6.7 does not support framework 4, unless you configure it with --with-profile4=yes. From the log it says that the build is successful, but with 25 warnings. You can try with the configuration change. Or if you open the solution and change target framework to 3.5, that could work too since I don't think I used any of the new framework 4 features that is not supported in 3.5.

Dona Alvarez
added a comment - 16/Mar/11 13:00
I did not attempt this in Mono. The version I have installed is Visual Studio 2010, framework 4.0.
It looks like Mono 2.6.7 does not support framework 4, unless you configure it with --with-profile4=yes. From the log it says that the build is successful, but with 25 warnings. You can try with the configuration change. Or if you open the solution and change target framework to 3.5, that could work too since I don't think I used any of the new framework 4 features that is not supported in 3.5.

Doug Cutting
added a comment - 16/Mar/11 22:00 I don't have Visual Studio, so I will not be able to test it there.
Ideally we could get it to work with Mono too, but I don't have enough experience with either C# or Mono.
Has anyone else looked at this? Thiru?

I didn't do a proper review of the design or implementation. But I did build and had a cursory look. Here are my observations:

I could cleanly compile on my Windows XP with Visual Studio Express 2010.

Two unit tests failed. (TestResolutionMismatch_simple of GenericTests and SpecificTests). They failed because lines 123 to 126 in GenericReader.cs were commented out. Uncommenting that fixes the problem.

The test for codegen has its assertions commented out. Uncommenting it still succeeds, I'm not sure why they are commented. But still, I get an impression that the tests there is inadequate. It merely tests that compilation of the generated code goes through. It doesn't seem to test the contents of generated code.

Perhaps I'm missing something completely. To me, SpecificTests seems to test Generic rather than Specific. What am I missing?

Protocol compiler is in place. But there is no server or client implementation. This is fine, but I just wanted to check if my understanding is correct.

Thiruvalluvan M. G.
added a comment - 20/Mar/11 14:41 Thanks a lot Dona for taking this up.
I didn't do a proper review of the design or implementation. But I did build and had a cursory look. Here are my observations:
I could cleanly compile on my Windows XP with Visual Studio Express 2010.
Two unit tests failed. (TestResolutionMismatch_simple of GenericTests and SpecificTests). They failed because lines 123 to 126 in GenericReader.cs were commented out. Uncommenting that fixes the problem.
The test for codegen has its assertions commented out. Uncommenting it still succeeds, I'm not sure why they are commented. But still, I get an impression that the tests there is inadequate. It merely tests that compilation of the generated code goes through. It doesn't seem to test the contents of generated code.
Perhaps I'm missing something completely. To me, SpecificTests seems to test Generic rather than Specific. What am I missing?
Protocol compiler is in place. But there is no server or client implementation. This is fine, but I just wanted to check if my understanding is correct.

■Two unit tests failed. (TestResolutionMismatch_simple of GenericTests and SpecificTests). They failed because lines 123 to 126 in GenericReader.cs were commented out. Uncommenting that fixes the problem.

I moved the CanRead() call to Read<T>() since calling CanRead() from the top level schema recursively calls CanRead() for all its child schemas so it doesn't need to be repeated inside the read() function, with the exception of union schema. I added a call to CanRead inside ReadUnion() and that took care of the unit test issue. Let me know if you think I'm missing something here by making this change. Also, there is a big performance gain by doing it this way.

■The test for codegen has its assertions commented out. Uncommenting it still succeeds, I'm not sure why they are commented. But still, I get an impression that the tests there is inadequate. It merely tests that compilation of the generated code goes through. It doesn't seem to test the contents of generated code.

I commented it out becase for some reason I am still missing a reference to a system assembly so the compilation always errors out. I haven't had the chance to get back to it. I also have not added all my code gen test cases into Nunit.

■Perhaps I'm missing something completely. To me, SpecificTests seems to test Generic rather than Specific. What am I missing?

Yes, SpecificTests is just a copy of Generic. My testcases for this are not yet integrated in Nunit.

■Protocol compiler is in place. But there is no server or client implementation. This is fine, but I just wanted to check if my understanding is correct.

Dona Alvarez
added a comment - 22/Mar/11 20:07
Thiru,
Thanks for looking into it. Please see my answers below:
■Two unit tests failed. (TestResolutionMismatch_simple of GenericTests and SpecificTests). They failed because lines 123 to 126 in GenericReader.cs were commented out. Uncommenting that fixes the problem.
I moved the CanRead() call to Read<T>() since calling CanRead() from the top level schema recursively calls CanRead() for all its child schemas so it doesn't need to be repeated inside the read() function, with the exception of union schema. I added a call to CanRead inside ReadUnion() and that took care of the unit test issue. Let me know if you think I'm missing something here by making this change. Also, there is a big performance gain by doing it this way.
■The test for codegen has its assertions commented out. Uncommenting it still succeeds, I'm not sure why they are commented. But still, I get an impression that the tests there is inadequate. It merely tests that compilation of the generated code goes through. It doesn't seem to test the contents of generated code.
I commented it out becase for some reason I am still missing a reference to a system assembly so the compilation always errors out. I haven't had the chance to get back to it. I also have not added all my code gen test cases into Nunit.
■Perhaps I'm missing something completely. To me, SpecificTests seems to test Generic rather than Specific. What am I missing?
Yes, SpecificTests is just a copy of Generic. My testcases for this are not yet integrated in Nunit.
■Protocol compiler is in place. But there is no server or client implementation. This is fine, but I just wanted to check if my understanding is correct.
Correct, at this point we (my company) only needed a protocol parser.
Thanks,
Dona

Doug Cutting
added a comment - 25/Mar/11 18:12 I cannot personally review & commit this unless I can get it to run on Linux under Mono. But if someone else (e.g., Thiru) can independently build and test it on Windows then it could be committed.

I will be out until 4 April. For issues and support, please contact - DG GCP Odin Dev for Odin (Fendy Zhong) DG IG Tech Camden for Camden/Castle (Kevin Mooney, Deep Mistry) If needed, I can be reached @ 312-802-7258.

Visual Studio 2008
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Bin/Compact/New
tonsoft.Json.Compact.dll' without full index line
error: dotnet/src/contrib/Json35r6/Bin/Compact/Newtonsoft.Json.Compact.dll: patc
h does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Bin/Compact/New
tonsoft.Json.Compact.pdb' without full index line
error: dotnet/src/contrib/Json35r6/Bin/Compact/Newtonsoft.Json.Compact.pdb: patc
h does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Bin/DotNet/Newt
onsoft.Json.dll' without full index line
error: dotnet/src/contrib/Json35r6/Bin/DotNet/Newtonsoft.Json.dll: patch does no
t apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Bin/DotNet/Newt
onsoft.Json.pdb' without full index line
error: dotnet/src/contrib/Json35r6/Bin/DotNet/Newtonsoft.Json.pdb: patch does no
t apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Bin/DotNet20/Ne
wtonsoft.Json.Net20.dll' without full index line
error: dotnet/src/contrib/Json35r6/Bin/DotNet20/Newtonsoft.Json.Net20.dll: patch
does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Bin/DotNet20/Ne
wtonsoft.Json.Net20.pdb' without full index line
error: dotnet/src/contrib/Json35r6/Bin/DotNet20/Newtonsoft.Json.Net20.pdb: patch
does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Bin/Silverlight
/Newtonsoft.Json.Silverlight.dll' without full index line
error: dotnet/src/contrib/Json35r6/Bin/Silverlight/Newtonsoft.Json.Silverlight.d
ll: patch does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Bin/Silverlight
/Newtonsoft.Json.Silverlight.pdb' without full index line
error: dotnet/src/contrib/Json35r6/Bin/Silverlight/Newtonsoft.Json.Silverlight.p
db: patch does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Documentation.c
hm' without full index line
error: dotnet/src/contrib/Json35r6/Documentation.chm: patch does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Src/Doc/donate.
gif' without full index line
error: dotnet/src/contrib/Json35r6/Src/Doc/donate.gif: patch does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Src/Lib/LinqBri
dge.dll' without full index line
error: dotnet/src/contrib/Json35r6/Src/Lib/LinqBridge.dll: patch does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Src/Lib/NUnitLi
te.dll' without full index line
error: dotnet/src/contrib/Json35r6/Src/Lib/NUnitLite.dll: patch does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Src/Lib/nunit.f
ramework.dll' without full index line
error: dotnet/src/contrib/Json35r6/Src/Lib/nunit.framework.dll: patch does not a
pply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Src/Lib/nunit.f
ramework.silverlight.dll' without full index line
error: dotnet/src/contrib/Json35r6/Src/Lib/nunit.framework.silverlight.dll: patc
h does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Src/Newtonsoft.
Json.Tests/bunny_pancake.jpg' without full index line
error: dotnet/src/contrib/Json35r6/Src/Newtonsoft.Json.Tests/bunny_pancake.jpg:
patch does not apply
error: cannot apply binary patch to 'dotnet/src/contrib/Json35r6/Src/Newtonsoft.
Json/Dynamic.snk' without full index line
error: dotnet/src/contrib/Json35r6/Src/Newtonsoft.Json/Dynamic.snk: patch does n
ot apply

Dona Alvarez
added a comment - 25/Mar/11 18:42 Attahed csharp_3rd.zip. This contains the following changes:
-performance enhancements
-testcases for codegen and specific objects
This is the complete source code (not a patch) so please use this one instead.

The code now compiles and tests pass under Mono in Ubuntu 10.10 as well as Windows XP/Visual C# Express 2010.

The .csproj files were referring to TargetFrameworkVersion 4.0, which does not work with Mono. Blindly replacing it with 3.5 fails to work in Windows. I made TargetFrameworkVersion configurable. Please see README.

The code was using Enum.TryParse() which is not available in 3.5. I replaced it with Enum.Parse()

The Newtonsoft.Json.dll was not the latest. It was causing some errors.

Some of the DLL references in csproj were incorrect. I suppose it worked in Windows because visual studio was pulling them using registry.

Code Generation and Specific tests used Microsoft.CSharp.dll. I made them use either Mono.CSharp.dll or Microsoft.CSharp.dll depending on the Runtime.

Thiruvalluvan M. G.
added a comment - 27/Mar/11 19:41 This is same as Dona's csharp_3rd.zip with the following changes:
The code now compiles and tests pass under Mono in Ubuntu 10.10 as well as Windows XP/Visual C# Express 2010.
The .csproj files were referring to TargetFrameworkVersion 4.0, which does not work with Mono. Blindly replacing it with 3.5 fails to work in Windows. I made TargetFrameworkVersion configurable. Please see README.
The code was using Enum.TryParse() which is not available in 3.5. I replaced it with Enum.Parse()
The Newtonsoft.Json.dll was not the latest. It was causing some errors.
Some of the DLL references in csproj were incorrect. I suppose it worked in Windows because visual studio was pulling them using registry.
Code Generation and Specific tests used Microsoft.CSharp.dll. I made them use either Mono.CSharp.dll or Microsoft.CSharp.dll depending on the Runtime.
Made all text files use unix line endings. It was mixed earlier.

Your version of NUnit is old. The Avro unit tests use NUnit 2.5 features extensively. Since NUnit tests use C# attributes (roughly equivalent to Java annotations), the old NUnit runner does not recognise the attributes of the new version; it doesn't fail either. It'll be nice if we can somehow make the tests report error if run with older runner. My quick search didn't reveal any easy solution.

Thiruvalluvan M. G.
added a comment - 30/Mar/11 05:31 Any idea what I've done wrong?
Your version of NUnit is old. The Avro unit tests use NUnit 2.5 features extensively. Since NUnit tests use C# attributes (roughly equivalent to Java annotations), the old NUnit runner does not recognise the attributes of the new version; it doesn't fail either. It'll be nice if we can somehow make the tests report error if run with older runner. My quick search didn't reveal any easy solution.
NUnit 2.5 or later should be able to run the tests.

I don't think anyone has done interop tests with other language implementations. I'm sure most of it would pass; the only thing that might cause trouble is encoding of double and float. If there is indeed incompatibility that would require data migration for the .net users in the future. I'll try to get it done today.

I suggest that this Jira is assigned to Dona before committing this. I don't know how to do that, since the assignees list does not have her name.

Thiruvalluvan M. G.
added a comment - 31/Mar/11 03:16 I don't think anyone has done interop tests with other language implementations. I'm sure most of it would pass; the only thing that might cause trouble is encoding of double and float. If there is indeed incompatibility that would require data migration for the .net users in the future. I'll try to get it done today.
I suggest that this Jira is assigned to Dona before committing this. I don't know how to do that, since the assignees list does not have her name.

I did interop testing with Java, and we did not see issues with decoding Java data in C# and vice versa. Encoding of double worked. We didn't have value for the float field though so that would still need to be tested.

What is the process for submitting changes after the initial work is comitted? I have a few changes that I would like to submit later on, mostly related to making codegen extensible and some issues with writing union.

Dona Alvarez
added a comment - 31/Mar/11 14:21
I did interop testing with Java, and we did not see issues with decoding Java data in C# and vice versa. Encoding of double worked. We didn't have value for the float field though so that would still need to be tested.
What is the process for submitting changes after the initial work is comitted? I have a few changes that I would like to submit later on, mostly related to making codegen extensible and some issues with writing union.
Thanks,
Dona

We should integrate this code into the top-level build.sh, including interop testing. Ideally that would be in the initial patch, but I'm not sure we need to wait for that to commit this.

Dona, once this is committed the process for making changes is to open new Jira issues for each change, attach patches to the issues, to give others the opportunity to review them before they're committed.

Doug Cutting
added a comment - 31/Mar/11 17:24 We should integrate this code into the top-level build.sh, including interop testing. Ideally that would be in the initial patch, but I'm not sure we need to wait for that to commit this.
Dona, once this is committed the process for making changes is to open new Jira issues for each change, attach patches to the issues, to give others the opportunity to review them before they're committed.

Thiruvalluvan M. G.
added a comment - 31/Mar/11 18:52 I tested the .net version for compatibility with Java using different data types. I also tested its ability to read Java's blocking binary encoding. Both the tests pass.
+1 for committing this.

I also added license headers to a few files that were missing them. Some of these appear to be Visual Studio files that might be auto-generated, so maybe instead of adding a license we should add them to share/rat-excludes.txt.

Could someone please verify that the version I committed builds with Visual Studio? I only built & tested with Mono.

Doug Cutting
added a comment - 31/Mar/11 22:22 I committed this.
I added a simple build.sh file that handles 'test' and 'clean' targets only; no interop targets or binary release packaging yet.
I also added license headers to a few files that were missing them. Some of these appear to be Visual Studio files that might be auto-generated, so maybe instead of adding a license we should add them to share/rat-excludes.txt.
Could someone please verify that the version I committed builds with Visual Studio? I only built & tested with Mono.

Patrick Angeles
added a comment - 17/May/11 00:17 Noticed some things here,
JIRA says fixed for 1.5.1, but the code is not in 1.5.1 (however, it is in trunk).
Tried writing C# and reading Java and vice-versa, seems it doesn't work. It appears the C# side is not reading nor writing the header. (This starts with
{'O','b','j', 0x01 }
.)

Patrick Angeles
added a comment - 17/May/11 13:35 On second look, it seems that C# does not support Avro data files yet.
https://cwiki.apache.org/confluence/display/AVRO/Supported+Languages
Filing a separate JIRA to track this.