Often, we need find some docs in internal website and read them to figure out how to do some task.Internal vs InternetThis is different from searching internet, which usually we can find plenty of resources, after read multiple articles, we can usually have a sense how to do it. It's fine if we don't read some articles carefully, - not ideal, but usually we can find related/similar articles and can understand or get it at that time.

Usually there would be only a few docs/pages for internal docs: they are well documented, and give all information we need.But if you don't read them carefully or miss some key information, you would be mot able to solve the problem by just reading the docs. - you can still ask help from others. Be organized when read the docs- Maybe open all related docs in a different browser, in one window- Maybe start with the entrance page (maybe given by someone) , and follow the links while read it

Notice/Write down what you don't understand or is strange to you-- Usually they are the keys to solve the problemUse tools like evernote to help read the docs- add note, highlight content etcFirst find/know all internal websites that are usefulSearch them, read them carefullyExamples: docker sidecar, yubikey-ssh

How to compare different approaches- Always think about different approaches (even if you already finished/committed code)- Don't just choose one that looks good- List them and compare them- Always ask you why choose this approache- Try hard to find problems in your current approach, and how to fix themFor small coding- Implement them if possible- Then compare which makes code cleaner, less change etcExample: Exclude source and javadoc from -jarAPP_BIN=$APP_BIN_DIR/$(ls $APP_BIN_DIR | grep -E 'jarName-version.*jar' | grep -v sources | grep -v javadoc | grep -v pom)How to quickly scan/learn new classesSometimes we need quickly scan/check a bunch of related classes to check how to implement a function, use a method etc- Check the class's Javadoc- Check the class signature- Check main methods: - static methods - using ctrol+o or outline view- Check call hierarchy in source code- Check test code/examples- Google search code exampleWhen refactor/change the code, also check/change/improve its related code.Find related doc, check/read the doc carefully.- Mark/Note the important part of the doc.For some task, we can use the trial and error approach, just do it, then fix it.But for some task(production or physical hardware related), it's better to figure out the right way to do it first.

Evaluate the outcome of the action. - best/worst outcome

How to implement/work on a feature- what's goal, what to achieve- how to test/verify/deploy/enable it in test or production environment easily - useful tricks: dry-run, - able to enable/disable default configuration automatically, but override it manually- how to measure whether the change makes improvementThink different/3+/more approachesCompare themDon't stop until find a solution that looks good to youUse tools(notebook, whiteboard)- Usually we are not happy with the first approach that comes to our minds, and find out a different approach: maybe it's a little better or maybe just be different. In some cases, we stop there: maybe we started to talk it with others or present it (to get others' ideas)-Example: add Explanation to transform actionsBefore ask a question- Try to Solve it by yourself- Make sure you read all related code/doc: from top to bottom(quickly, scan but don't ignore any code that may be important)- Example: NightlyTestRunnerVerify the assumption- Be aware of the the assumption we or others made in the design or the code.- Verify whether it's true or not- Example: one line one record in csv, one-to-one between tms id and bam_idAlternatives- Check and realize alternativesSometimes, we want A, but maybe B also works and is even better.- Example: asset letter or account statementUse toolsWrite down in whiteboard or notebook or appTake a picture now - always bring the phoneCheck carefully and verify your claim before blame others or think others are wrongWe incline to think others are wrong or made a mistake even if someone told you he/she did that - we made a very brief search and didn't check carefully, then started to think they are wrong.Prefer to use code to enforce the rule than documentation
Example: all tests must extends XbaseUnitTest.

Make API/feature easier- to use- to test/rollback in production (feature flag)Realize your assumption/decision and verify it first- Otherwise you may go farther but on the wrong pathStep by step and verify each stepIdentify useful info and active quickly (or you may forget about it)Don't let your past experience affect you- Try it, it may be different this timeExample: big item deliveryWhen some thing totally doesn't make sense:- maybe they are totally different things, you are comparing Apple with BanabaExample: 12/15, 1/15RelatedLessons Learned about Programming and Soft Skills - 2016

org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(AddUpdateCommand)if (!forwardToLeader) { // false dropCmd = versionAdd(cmd);}It will add to its local at this stage// It doesn't forward this request to itself again, so no stage update.distrib=TOLEADERCase 3: The add request is sent to a leader which should not own this docCase 4: The add request is sent to a leader which should not own this docThe coordinator node will forward the add request to the leader of the shard that should store the requestDistributedUpdateProcessor.setupRequest(String, SolrInputDocument, String)ClusterState cstate = zkController.getClusterState();DocCollection coll = cstate.getCollection(collection);Slice slice = coll.getRouter().getTargetSlice(id, doc, route, req.getParams(), coll);String shardId = slice.getName();decide which shard this doc belongs toreturn nodes - the leader that should store the docorg.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(AddUpdateCommand)update.distrib=TOLEADER&distrib.from=this_node_urlcmdDistrib.distribAdd(cmd, nodes, params, false, replicationTracker);Case 5: Send multiple docs in one command to a followerXMLLoader.processUpdate(SolrQueryRequest, UpdateRequestProcessor, XMLStreamReader)while (true) { if ("doc".equals(currTag)) { if(addCmd != null) { log.trace("adding doc..."); addCmd.clear(); addCmd.solrDoc = readDoc(parser); processor.processAdd(addCmd); } else { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Unexpected tag without an tag surrounding it."); } }}It calls processAdd for each doc.Related CodeUpdateRequestProcessorFactory.RunAlwaysUpdateRequestProcessorChain.createProcessor(SolrQueryRequest, SolrQueryResponse)

UpdateRequestProcessorChain.init(PluginInfo)if the chain includes the RunUpdateProcessorFactory, but does not include an implementation of the DistributingUpdateProcessorFactory interface, then an instance of DistributedUpdateProcessorFactory will be injected immediately prior to the RunUpdateProcessorFactory.if (0 <= runIndex && 0 == numDistrib) { // by default, add distrib processor immediately before run DistributedUpdateProcessorFactory distrib = new DistributedUpdateProcessorFactory(); distrib.init(new NamedList()); list.add(runIndex, distrib);}DistribPhase phase = DistribPhase.parseParam(req.getParams().get(DISTRIB_UPDATE_PARAM))boolean isOnCoordinateNode = (phase == null || phase == DistribPhase.NONE);

- brain teasers, puzzles, riddles- problems only because you are interested, you just happen to know, or you just learned recently

Know the questions very well- Different approaches- Expect different approaches that you don't even know - Verify it(use example, proof), if it works, the candidate does a good job and you also learn something newKnow common cause of bugs- Able to detect bugs in candidate's code quicklyGive candidates the opportunity to prove themselves and shineWe are trying to evaluate the candidate's skills thoroughly, what he/she is good at, what not.If you plan to ask 2 coding questions, one simple, and one more difficult, tell candidatesLet the candidates know your expectationMake the candidates learn something- If the candidate doesn't give right solution/answer, and at the end of the interview, he/she wants to know how to approach it, tell him/her.Why?- Candidates takes a lot of effort for the interview (one day off and commute), if they desire to learn something, and learning something make them feel good- Prove that you know the solution and have reasonable answer, and not ask questions you even don't know muchNo surpriseIf you find issues/bugs in candidate's code or design, point them outThe candidate should have a rough idea about how he/she performs in this interviewBe fairPhone interviewPrefer coding question over design question- as design is partly about communication and it's hard to test communication skills over phone

This would be a short list that about I am good at and what I should improve.- I will keep updating it, and hope when I retrospect after 1 year, I will realize that I have improved and learned a lot of things.StrengthRetrospect and Learning Logs- I like to summarize what I have learned, and write them downSharing KnowledgeProblem Solving and troubleshooting- I like to solve difficult problems as I can always learn something from it.- I also summarize how(what steps) I take to solve the problems, what I learned that can make me solve problems quicker later.- Search and find resource needed to solve the problem- See more at my blog: TroubleshootingProactively find problems and fix them- such as find problems in existing design and code, and think about how to improve themBe honest- to myself and colleague about what I know and what I don'tBe moderate- I know there are still a lot of things that I should learn and improve.- I like to learn from othersProactively learning- Have a safaribooksonline account- Like to learn from book, and people- When I use Cassandra, Kafka in our project, I took time to learn not only how to use it but more importantly its high level design.- Read more at my log System DesignProgrammer: Lifelong LearningWeakness - things need improvingSystem designKnowledge about distributed systemPublic SpeakingPresentationVisibility

Talk/Think about all related- how do we store data, - client api - ui change- back compatibility: how to handle old data/clientBut focus on most important stuff (first)Talk/think about design principles/practices- such as idempotent, parallelization,monitoring, etc- Check more at System Design - Summary

What's the impact of other (internal and cross-team) components?

How others components use it?

What're the known and potential constraints/issues/flaws in current design?Don't only talk about its advantages, Also talk about issues, don't hide themWhat are alternatives?Think alternative and different approaches, this can help find better solutionWe can't really review and compare if there is no alternativesWelcome different approaches- although it doesn't mean it's better, or we will use it

Development Cost- How difficult it takes to implement?What may change and How to evolveWhat may change in (very) near future?

Visibility/MonitoringHow do can we know when the new feature works or doesn't workHow can we know problems happenFeature FlagCan we enable/disable the feature at runtimeBe PreparedOk to have informal/impromptu discussion with one or two colleaguesBut make sure everyone is prepared for the formal team design discussionAll attendees should know the topic: how they would design it

AttitudeListen firstWhen you don't agree with other's approachesDon't get too defensiveTalk about ideas not peopleBe preparedMake API/Feature easier- to use- to test/rollback in production (feature flag)

The IssueAfter deployed the change: Multi Tiered Caching - Using in-process EhCache in front of Distributed Redis to test environment (with some other change and someone did some change in the server like restart), we found out that cache.put hangs when save data to redis.Troubleshooting ProcessFirst we tried to reproduce the issue in my local setup, it always works. But we can easily reproduce it in test environment.This mde me think this maybe something related with the test environment.Then I used kill -8 processId to generate several thread dumps when reproduce the issue in test machine. I found out some suspect:
"ajp-nio-8009-exec-10" #91 daemon prio=5 os_prio=0 tid=0x00007f49c400a800 nid=0x75db waiting on condition [0x00007f495333e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)at java.lang.Thread.sleep(Native Method)at RedisCache$RedisCachePutCallback(RedisCache$AbstractRedisCacheCallback).waitForLock(RedisConnection) line: 600RedisCache$RedisCachePutCallback(RedisCache$AbstractRedisCacheCallback).doInRedis(RedisConnection) line: 564at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:207)at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:169)at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:157)at org.springframework.data.redis.cache.RedisCache.put(RedisCache.java:226)at org.springframework.data.redis.cache.RedisCache.put(RedisCache.java:194)at com.lifelong.example.MultiTieredCache.lambda$put$40(MultiTieredCache.java:130)at com.lifelong.example.MultiTieredCache$$Lambda$18/1283186866.accept(Unknown Source)at java.util.ArrayList.forEach(ArrayList.java:1249)at com.lifelong.example.MultiTieredCache.put(MultiTieredCache.java:128)at org.springframework.cache.interceptor.AbstractCacheInvoker.doPut(AbstractCacheInvoker.java:85)at org.springframework.cache.interceptor.CacheAspectSupport$CachePutRequest.apply(CacheAspectSupport.java:784)at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:417)at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:327)at org.springframework.cache.interceptor.CacheInterceptor.invoke(CacheInterceptor.java:61)

Check the code at RedisCache$AbstractRedisCacheCallback to understand how it works:for operations like put/putIfAbsent/evict/clear, @cacheable with sync =true(RedisWriteThroughCallback), it check whether there is a key like cacheName~lock in redis, if exist, it will wait until it's gone.This lock is created and deleted for @Cacheable with sync =true in RedisWriteThroughCallback which calls lock and unlock methods.This made me check the settings in redis: after created the tunnel to redis, ran command: key cacheName~lock, I found out that it's indeed there.Now everything make sense:- we did set sync=true and run performance test, then restarted the server and removed it. The cacheName~lock was left there may be due to server restart. Due to the cacheName~lock, now all resid update api would not work.After removed cacheName~lock in redis, everything works fine.Take away- When use some feature (@Cacheable(sync=true) in this case), know how it's implemented.