The easiest way to do this is to walk over the file twice, count how often a line appears the first time and print the unique ones during the second pass as they are encountered. If you have enough RAM (this will take quite a bit), you can use awk...

We can loop over 'var2' (sapply(var2,) , split the strings at white space (strsplit(x, ' ')), grep the output list elements as pattern for 'var1'. Check if there is any match, sum the logical vector and rank it. This can be used for reordering the 'var2' elements. indx <- rank(-sapply(var2,...

git log r4.1..r4.2 --name-only --grep=bug --pretty=oneline should give you a list of commit message headers and the files changed in them without the commit message itself. You could then redirect the output to get an easy to read list of classes: $ git log r4.1..r4.2 --name-only --grep=bug --pretty=oneline | grep...

increment the counter for values greater than the threshold. Print the lines if the counter is greater than the 20% of the fields checked. This will also print the header line. awk '{c=0; for(i=2;i<=NF;i++) c+=($i>=100); if(c>=0.2*(NF-1)) print $0}' input ...

If your format is like [YYYY-MM-DD HH:MM:SS], you have to create some kind of regular expression to cover the dates your are willing to grep. For example, if you just want to filter days from 6 to 8 when year is 2014 and month 10, you can say: grep '^\[2014-10-0[6-8]'...

Unix tools are written in a number of languages. Most of the classic tools are written in languages such as C and C++, but Perl and Python are also popular choices. C is still the dominant language, but it seems that Go might find some use in writing Unix command-line...

What Ansgar Wiechers' answer says is good advice. Don't string search html files. I don't have a problem with it but it is worth noting that not all html files are the same and regex searches can produce flawed results. If tools exists that are aware of the file content...

Your quoting is just about entirely wrong. pat="'("$1"|$)'" you include literal single quotes in the pattern, and you're actually not quoting the function parameter. echo \"$line\" | egrep $pat You're including literal double quotes in the echo statement, and failing to quote both variables. This is better: highlight() { while...

The -v option to grep inverts the search, reporting only the lines that don't match the pattern. Since you know how to use grep to find the lines to be deleted, using grep -v and the same pattern will give you all the lines to be kept. You can write...

We create a pattern string ('pat') for the grepl , by first splitting the 'var1' by space '\\s+'. The output will be a list. We use sapply to loop over the list, use paste with collapse= '|', and then collapse the whole vector to a single string with another paste....

It might be better to use find, since grep's include/exclude can get a bit confusing: find -type f -name "*.xml" -exec grep -l 'hello' {} + This looks for files whose name finishes with .xml and performs a grep 'hello' on them. With -l (L) we make the file name...

There are a couple of problems with your attempts. Firstly, it looks like you're escaping the [ in some of your bracket expressions, which means that the [ will be interpreted as a literal character instead. Secondly, you need to take care to match 1 to 10 legal characters, followed...

ls | awk '{ system("mv " $0 " " substr($0,1,3) "20" substr($0,8,2) "-" substr($0,4,2) "-" substr($0,6,2) substr($0,10,6))}' run the above from the dir the files are in. You should be able to figure out whats going on, so no other comment....

I'll offer you another solution as well. While in this case the pdftotext method works with reasonable effort, there may be cases where not each page has the same column widths (as your rather benign PDF shows). Here the not-so-well-known, but pretty cool Free and OpenSource Software Tabula-Extractor is the...

This will do as you ask. It accumulates into $block all the lines between the start and end patterns. When the end pattern is reached it prints the block if it contains index use strict; use warnings; my $block; while ( <DATA> ) { my $state = /^proc sql\b/ .....

To print all the lines in test1 which are not also in test2, run: $ grep -vFf test2 test1 foo How it works The options to grep have the following meanings: -v Print only lines that do not match any of the patterns. -F Interpret the patterns as fixed strings,...

HTML files can contain carriage returns at the ends of lines, you need to filter those out. curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '\r' | while read cve; do Notice that there's no need to use grep, you can use a regular expression filter in the...

I finally came up with a solution: grep -Ef <(awk '{print "([^a-zA-Z0-9-]|^)"$0"([^a-zA-Z0-9-]|$)"}' words.txt) file.txt Explanation: words.txt is my list of words (one per line). file.txt is the body of text that I would like to search. The awk command will preprocess words.txt on-the-fly, wrapping each word in a special regular...

You can use grep like this: grep ' P \| CA ' file > new_file The | expression indicates "or". We have to escape it in order to tell grep that it has a special meaning. You can avoid this escaping and using something fancier with an extended grep: grep...

You could do this with sed like sed -ne '/c/{n;d;}; p' letters.txt This will read the file letters.txt. Any line that does not match a c will be printed, if a c is matched, do not print that, and also go to the next line and delete that from the...

The issue you ran into here is that this particular usage of grep() returns indexes of matches that apply to the exact vector that grep() received. You didn't pass df$variable to grep, rather, you passed a subset of that vector, specifically df$variable[which(!is.na(df$likert_classification))]. If you're going to use the resulting indexes...

The first element of @valid_values is accessed as $valid_values[0]. The value in the first element is an array reference. To dereference an array reference, you use @{ ... }. So to get the array referenced by the array reference in the first element of @valid_values you want @{ $valid_values[0] }....

grep -oP '\S+Msi\S*(?=.*CNAME)' This uses the Perl-compatible regex engine in GNU grep. It finds a "word" containing the string Msi, where CNAME appears later on in the same line. The -o option limits the output to just the matching text....

A more flexible tool to use would be awk awk 'NR==FNR{lines[$0]++; next} $1 in lines' Example $ awk 'NR==FNR{lines[$0]++; next} $1 in lines' file1 file2 H D A What it does? NR==FNR{lines[$0]++; next} NR==FNR checks if the file number of records is equal to the overall number of records. This...

You can use Perl for this task... this should be really fast. Perl's .. (range) operator is stateful - it remembers state between evaluations. What it means is: if your definition of table starts with CREATE TABLE and ends with something like ENGINE=InnoDB DEFAULT CHARSET=utf8; then below will do what...

I'm not sure if you want to change the value in the json file to a value of a shell variable or if you want to set a shell variable to the value of the SensorValue field in the json. However, for both tasks you can use jq: Iterating over...

Updated Answer If you only want lines that start with GP, or START_DATE or measInfoId, you would modify the awk commands below to look like this: awk '/^GP/ || /^START_DATE/ || /^measInfoId/ {print $2}' file Original Answer I am not sure what you are trying to do actually, but this...

Reading FINDSTR Output in the comprehensive List of undocumented features and limitations of FINDSTR by Dave Benham aka dbenham: ... When printed, the fileName will always include any path information provided. Additional path information will be added if the /S option is used. The printed path is always relative to...

There are many ways to do this: cut -d'=' -f2- file sed 's/^[^=]*//' file awk -F= '{print $2}' file #if just one = is present cut sets a delimiter (-d'=) and then prints all the fields starting from the 2nd one (-f2-). sed looks for all the content from the...

A Perl one-liner to slurp the whole file and match across any newlines for the pattern you seek would look like: perl -000 -nle 'm{(foo2).*(\#89888)}s and print join " ",$1,$2' file The -000 switch enables "slurp" mode which signals Perl not to split the file into chunks, but rather treat...

Ruby's scan returns an array with the matched regex parts, by default. Grep doesn't do that, it returns the whole line with the match highlighted if color is set to auto. To retrieve matched parts only from grep, use the -o option. grep -o "motif1.*motif2" inputfile > outputfile Previous command...

Use popen: FILE* file = popen( "grep mykeyword", "r" ); fwrite( str_to_grep, 1, strlen( str_to_grep ), file ); pclose( file ); The echo example by Matt might not work as expected if the string has quotes or similar character interpreted specially by the shell. I assume your example with grep...

I'm assuming by "unicode character" you just mean non-ASCII characters. Character codes can mean different things depending on encodings. R represents values outside of the current encoding with a special \U sequence. Note that neither the slash nor the letter "U" actually appear in the real data. This is just...

After the first time you cd to a subdirectory you are in it for all future loop iterations so your subsequent cds will fail, as you are experiencing. You also need to quote your variables and there's other issues. Try this: pwd="$PWD" mkdir newdire while IFS= read -r dir; do...

Using ed, the standard editor: ed -s file <<< $'3,$g/789/-2,.d\nw' ed will see these commands: 3,$g/789/-2,.d w Explanation: 3,$g/789/ will mark all lines from line 3* to the end of the file that match /789/; then, for each marked line, it will execute the command: -2,.d which means: delete the...

I do not know anything about what's available on Android, but this might help you: $ dumpsys window windows | grep mCurrentFocus | grep -o '[^/]*' | grep -o '\S*$' First grep: Find the row containing mCurrentFocus Second grep: Print everything up to the first '/' Third grep: Print non-space...

Use an alternation in your regex: generate_out | grep -E '(foo|bar)[0-9]+' The use of -E enables ERE features, of this which is one. (By default, grep only supports BRE; some implementations of BRE -- such as GNU's -- may have special syntax for enabling ERE features; in the GNU case,...

You can do this: First put the string in a file (remove.txt), then run this code: echo "$(grep -vFf remove.txt phpmainfile.php)" >phpmainfile.php This will overwrite phpmainfile.php. To check if it works, dump the output into another file or omit the redirection to print on stdout or make a backup. If...

From the comments, it appears that the file has carriage returns delimiting the lines, rather than the linefeeds that grep expects; as a result, grep sees the file as one huge line, that either matches or fails to match as a whole. (Note: there are at least three different conventions...

This will do the trick which basically implements the same set of behaviours as your "Shell" script: Filter lines in a given file; Remove any line that contains a .; Get a unique set of this data; Print it Example: from __future__ import print_function lines = (line.strip() for line in...

Use sed: sed -i 's/"https": false,/"https": true,/g' /path/to/file Here the -i flag means replace and save the file using the same name. Any occurrence of "https": false, will be replaced with "https": true,/ If this string only occurs at the start of a line, use this instead: sed -i 's/^"https":...

You can use grep with the -P flag: grep -P '^rambox | rambox$| rambox ' Or even better: grep -P '(^| )rambox($| )' ^ matches beginning of line $ matches end of line | is OR regex -P, --perl-regexp PATTERN is a Perl regular expression ...

If m11;m14 is a single "option" you could modify your grep like grep -P '^\s*\d+;option$' file > file_option -P uses perl style regex, which is often nicer to look at and easier to work with. Then the regular expression looks for a line that starts with 0 or more spaces...

You've mostly got it. You can make it a bit more efficient by doing the following: import_pattern = re.compile(r'''@import.*["']{}["'];'''.format(fn)) with open(fn, 'r') as f: for line in f: if import_pattern.match(line): files.append(fn) break This will scan through each line, and break as soon as it finds what it is looking for....

Here's a way to do it with awk: rowcount=$(awk '/END-OF-DATA/{print NR-start; exit} /START-OF-DATA/{start=NR+1}' "$v_loc/$v_filenm") And here's the same, but with START-OF-DATA and END-OF-DATA as variables instead of hardcoding them into the awk script: start=START-OF-DATA end=END-OF-DATA rowcount=$(awk -v start="$start" -v end="$end" '$0 ~ end { print NR - s; exit }...

You can do the following: grep -P -o 'text=\S+ id=\S+' The -P flag for grep enables the perl regular expression. \S+ will match all non blank space characters, -o outputs only the matched portion. Assuming you need to get the fields "text" and "id" values. Modify the regular expression as...

Multiple greps and awks in pipes are never needed. Mostly the job can be done with one single tool: grep -oP '^[^,]+(?=.*ProcessOrderWebService-N.*::stringFromNetwork = 600001)' file Gives the desired list. If your really need a for loop, your have to change the internal field separator first: IFS=$'\n' for line in $(grep...

For the regex you gave here, this is simple: Change \d to [[:digit:]]. Thus: wip_scenarios=$(grep -Eo '^[[:digit:]]+ scenarios[?]' <report.log | grep -Eo '[[:digit:]]+') If your script starts with #!/bin/bash (and thus will only ever be run with bash), I'd also consider skipping the dependency on the non-standard extension grep -o,...

Using sqlite3 from bash on OS X seems fairly straightforward (I'm no expert at this, by the way). You will need to find out which table you need. You can do this with an interactive session. I'll show you with the database you suggested: /Users/fredbloggs> sqlite3 ~/Library/Application\ Support/Dock/desktoppicture.db SQLite version...

awk 'FNR==NR {a[substr($1,0,3)];next} {match($1, /0+/); if(substr($1, RSTART+RLENGTH,3) in a)print}' 1.txt 2.txt {a[substr($1,0,3)];next} - stores the first 3 characters in an associative array. match($1, /0+/);if(substr($1, RSTART+RLENGTH,3) in a) Matches the 3 charaacters after the series of zeroes and checks whether these 3 characters are present in the associative array that was...

I needed to work it in grep, so pcre do not work properly (even with pgrep). I eventually used an incredibly ugly and not-always-working regular expression: ^[^']*((('[^']*){1}|('[^']*){3}|('[^']*){5}|('[^']*){7}|('[^']*){9}|('[^']*){11})[^']+'\^.+|(('[^']*){0}|('[^']*){2}|('[^']*){4}|('[^']*){6}|('[^']*){8}|('[^']*){10})[^']+\^'.+) This works for up to 5 strings declared before the operator and eventually compares [^']+\^'.+ or [^']+'\^.+. I know, I know... but it...

What you're basically doing is: grep -q test this is a string that contains the word test hoping to match a word in a string. Grep thinks each of the words is a file, and gives output like you're seeing: grep: this: No such file or directory grep: is: No...

Your escaping is OK. The problem lies in the [], that grep understands as regular expressions. Thus, you need to somehow tell it to treat the string as literal. For this we have -F: grep -F "localStorage['api']" file Test $ cat a hello localStorage['api'] and blabla bye $ grep -F...

grep, like many other unix tools, works based on lines. That is, it never has to keep more than exactly one line in memory. There are two ways to handle this: A single buffer is (re)used, and grown whenever a line is encountered that does not fit in this buffer....

You can use a capture group: sed 's/\(\s*([^)]*remix[^)]*)\)\|\s*(\s\?[a-z0-9. ]*)/\1/gi' When the "remix branch" doesn't match, the capture group is not defined and the matched part is replaced with an empty string. When the "remix branch" succeeds, the matched part is replaced by the content of the capture group, so by...