Cherry-picking is a huge problem that must be confronted. Nice work defining and banning it in the code.

(v) "Cherry picking" means pointing to individual cases or data that seem to confirm a particular position, while ignoring a significant portion of related cases or data that may contradict that position and may constitute scientific fraud, suppressing evidence, or the fallacy of incomplete evidence.

Have you checked the code of conduct of the American Statistical Association? The Digital Analytics Association also has one. Both organizations offer certifications. DataScienceCentral offers an apprenticeship specifically to become a real data scientist. You might also want to google "fake data science".

Ultimately, it is the responsibility of the data science users to do his due diligence before paying for data science services. A university degree is not guarantee of competency. The same applies to anything - lawyer, doctor, SEO expert, when buying a car, a vacation trip or a house

There is a lot of hype around the title "Data Scientist" and we need to protect the field from charlatans and snake oil salesmen who would sully the field. People might not think to check for an ASA code of conduct signature when vetting a data scientist.

I suggest data science is a profession and requires a robust and durable code of conduct considering the great damage data science malpractice can do to society. The best and brightest decision makers rely on data science to make decisions and are unable to differentiate between good and bad data science.

For example, consider that data strongly suggests the majority of financial quants are not adding any value. http://bit.ly/J2r2Oa

Perhaps if the financial quants had a code of conduct the majority would not have produced valueless models that created a false view of reality. The illusion of understanding produced bad decisions by policy makers and financial heads of state. This contributed to the financial melt-down.

Likewise, data science malpractice can produce an illusion of understanding that can blow up firms and create bad public policy.

Hedge Funds Underperform Stock and Bond Indices by Huge Margin

See: http://bit.ly/J2r2Oa

Reply

Peter Levy

3/8/2013 04:35:02 am

Data science needs rules of conduct - the stakes are too high at both organizational and systemic level.

Reply

Peter Levy

3/8/2013 04:37:39 am

Interesting definition:

(c) "Data Scientist" means a professional who uses scientific methods to liberate and create meaning from raw data.

Liberate?

Much debate about what constitutes "scientific methods"

Reply

Peter Levy

3/8/2013 04:42:12 am

Another interesting definition:

(j) "Noise" means a competing interpretation of data not grounded in science that may not be considered scientific evidence. Yet noise may be manipulated into a form of knowledge (what does not work).

Competing interpretation of data? Great insight. All about interpretation.

Not grounded in science? Debate about what is true science. Who is the arbiter of true data science?

if I may, I'd like to borrow the skeleton from CFA institute(see below). I think your document covers most of the essential area. Maybe:

i.) a bit more on Rule 7: Duties to the clients

ii.) perhaps cover more about the process/documentation of works.
ii) duties to the employers

I like the way you define the terms at the beginning. If I understand correctly, it's very much cover the professional conduct of data/ information management. However, I wish that it can also include more about the output: results and findings of the analytics.

Terms like data scientist or analytics, I am flexible. There are merits from each phase: the former has a stronger intellectual flavor whereas the later has a stronger emphasis on value.

For your reference: adopted from CFA institute code of conduct

1. Professionalism: knowledge of the law, misrepresentation and misconduct

It's hard to "knowingly" commit misconduct when your paycheck depends on your client's needs being met. Data scientists will succumb to many of the same forms of "capture" as regulators, consultants, physicians, and other professionals. They tend to tell their "clients" whatever they need to hear. A code of conduct might help a bit but don't count on it working much better than those used in medicine, accounting or other professions. The key for the data science practitioner is finding "good" clients who will pay for unvarnished recommendations based on solid empirical methods. If the clients paying the bills are corrupt, they'll compromise any professionals they hire.

An effective code of conduct needs to focus on both sides of the data science transaction. We need to define and consider ethical client behavior along with the behavior of the data scientist. The ASA, CFA, CPA, medical and other professional conduct codes have been relatively ineffective in promoting ethical behavior.

Cultures that promote ethical behavior would be much preferred to codes of conduct. If conduct codes won't change the culture of data science transactions they won't be very effective.

Great comment! I agree in large part. I would also add that professional authorities have misused and abused Codes to attack those members they do not like or agree with. The goal would be to learn form other professional codes - adopt the good and eliminate the bad - and create checks and balances to prevent abuse. I argue a code of conduct is better than no code.

Let me put it another way: data science either becomes a self-regulating profession or congress shall impose a regulatory scheme on data scientists.

I do not trust congress to design a good scheme. Thus, self-regulation via a code of conduct is better than no code.

(d) If a data scientist reasonably believes a client is misusing data science to communicate a false reality or promote an illusion of understanding, the data scientist shall take reasonable remedial measures, including disclosure to the client, and including, if necessary, disclosure to the proper authorities. The data scientist shall take reasonable measures to persuade the client to use data science appropriately.

(e) If a data scientist knows that a client intends to engage, is engaging or has engaged in criminal or fraudulent conduct related to the data science provided, the data scientist shall take reasonable remedial measures, including, if necessary, disclosure to the proper authorities.

I was trying to point out the deficiencies in the parts of the draft code addressing unethical client conduct. As the in the examples you refer to, each such part is prefaced with "if a data scientist (knows, believes,....)."

My comment alludes to a famous quote by Upton Sinclair: “It is difficult to get a man to understand something if his salary depends upon his not understanding it”

Darrell Waters

3/19/2013 07:20:01 am

In any profession, the best you can do is prescribe the behavior of the professional, not the customer. Subscribing to the code would obligate that the customer is informed accurately about accuracy, assumptions, consequences, legality, etc. I think the obligation of the data scientist ends there with respect to the client. If the client acts criminally, then notification of authorities is covered.

How do you see this code being implemented: Voluntary subscription; panel certification?

Good points. Look at it from the clients view. Client is interested in using data science to make better decisions, improve processes and operations, increase revenues / decrease costs and better manage risks - the big goal of gaining competitive advantage.

Thus, clients want to hire a data scientist(s) who is competent and ethical - will keep the firm out of legal problems and not harm the firms relationship with customers or regulators.

To achieve this, the client will likely desire to hire a data scientist who:

1) has passed a certification test to assure a base level of competency

2) agrees to comply with a code of professional conduct

As a result, if I were hiring a data scientist, I would rather hire a data scientist who is certified and honors a pro code over a data scientist who is not certified and is free to practice data science according to whim.

A major concern of clients using data science professional services is protecting the organizations family jewels - their confidential information.

Clients need to be reassured if you design and execute an algorithm specifically for them that the data scientist will not sell or use this algorithm for another organization or client.

I drafted a section on Confidential Information to address this concern. Please help improve:

Rule 5 - Confidential Information

(a) Confidential information is information that the data scientist creates, develops, receives, uses or learns in the course of employment as a data scientist for a client, either working directly in-house as an employee of an organization or as an independent professional. It includes information that is not generally known by the public about the client, including client affiliates, employees, customers or other parties with whom the client has a relationship and who have an expectation of confidentiality. The data scientist has a professional duty to protect all confidential information, regardless of its form or format, from the time of its creation or receipt until its authorized disposal.

(b) Confidential information is a valuable asset. Protecting this information is critical to a data scientists reputation for integrity and relationship with clients, and ensures compliance with laws and regulations governing the client's industry.

(c) A data scientist shall protect all confidential information, regardless of its form or format, from the time of its creation or receipt until its authorized disposal.

(d) A data scientist shall not reveal information relating to the representation of a client unless the client gives informed consent, the disclosure is impliedly authorized in order to carry out the representation or the disclosure is permitted by paragraph (e).

(e) A data scientist may reveal information relating to the representation of a client to the extent the data scientist reasonably believes necessary:

(1) to prevent reasonably certain death or substantial bodily harm;

(2) to prevent the client from committing a crime or fraud that is reasonably certain to result in substantial injury to the financial interests or property of another and in furtherance of which the client has used or is using the data scientist's services.

(f) A data scientist shall make reasonable efforts to prevent the inadvertent or unauthorized disclosure of, or unauthorized access to, information relating to the representation of a client, which means:

(1) Not displaying, reviewing or discussing confidential information in public places, in the presence of third parties or that may be overheard;

(2) Not e-mailing confidential information outside of the organization or professional practice to a personal e-mail account or otherwise removing confidential information from the client by removing hard copies or copying it to any form of recordable digital media device; and

(3) Communicating confidential information only to client employees and authorized agents (such as attorneys or external auditors) who have a legitimate business reason to know the information.

(g) A data scientist shall comply with client policies that apply to the acceptance, proper use and handling of confidential information, as well as any written agreements between the data scientist and the client relating to confidential information.

(h) A data scientist shall protect client confidential information after termination of work for the client.

(i) A data scientist shall return any and all confidential information in possession or control upon termination of the data scientist - client relationship and, if requested, execute an affidavit affirming compliance with obligations relating to confidential information.

Codes of conduct are not going to fix "big breakage", where someone is in a system where unprofessional behaviour is endemic.

Where it will make a difference is in inculcating an attitude that it is in your long-term interest to behave in a professional manner, i.e. behave professionally and you will get more work with people and organisations that will treat you in a reasonable way; correspondingly if you have a reputation for cutting corners and generating reliable results (rather than generating results to first someone's argument) then you'll have a harder time getting work and those you work for are more likely to treat you badly.

Reply

Leave a Reply.

Subscribe to our mailing list

Rose Technology

Our mission is to identify, design, customize and implement smart technologies / systems that can interact with the human race faster, cheaper and better.