gabriela.IOhttps://blog.gabriela.io
ideas in and out
Thu, 06 Sep 2018 13:55:54 +0000 en
hourly
1 http://wordpress.com/https://secure.gravatar.com/blavatar/548be22a5c61dcf6ddce0a4d56886ffd?s=96&d=https%3A%2F%2Fs0.wp.com%2Fi%2Fbuttonw-com.pnggabriela.IOhttps://blog.gabriela.io
Replication from External Primary/Leader into GCPhttps://blog.gabriela.io/2018/08/28/replication-from-external-primary-leader-into-gcp/
https://blog.gabriela.io/2018/08/28/replication-from-external-primary-leader-into-gcp/#respondTue, 28 Aug 2018 15:02:46 +0000http://blog.gabriela.io/?p=1072This is a post based on recent tutorials I published, with the goal of discussing how to prepare your current MySQL instance to be configured as an External Primary Server with a Replica/Follower into Google Cloud Platform.

First, I want to talk about the jargon used here. I will be using primary to represent the external “master” server, and replica to represent the “slave” server. Personally, I prefer the terms leader/follower but primary/replica currently seems to be more common in the industry. At some point, the word slave will be used, but because it is the keyword embedded on the server to represent a replica.

The steps given will be in the context of a VM running a one-click install of WordPress acquired through the Google Marketplace (formerly known as Launcher) .

To help prepare for replication you need to configure your primary to meet some requirements.

server-id must be configured; it needs to have binary logging enabled; it needs to have GTID enabled, and GTID must be enforced. Tutorial.

A dump file must be generated using the mysqldump command with some information on it.

The steps above are also necessary if you are migrating from another cloud or on-prem.

Why split the application and database and use a service like Cloud SQL?

Cloud SQL

First, you will be able to use your application server to do what it was mainly designed for: serve requests of your WordPress application (and it doesn’t much matter for the purposes of this post if you are using nginx or Apache).

Databases are heavy, their deadly sin is gluttony, they tend to occupy as much memory as they can to make lookups fairly fast. Once you are faced with this reality, sharing resources with your application is not a good idea.

Next, you may say: I could use Kubernetes! Yes, you could, but just because you can do something doesn’t mean you should. Configuring stateful applications inside Kubernetes is a challenge, and the fact that pods can be killed at any moment may pose a threat to your data consistency if it happens mid transaction. There are solutions on the market that use MySQL on top of Kubernetes, but that would be a totally different discussion.

You also don’t need to use Cloud SQL, you can set up your
database replicas, or even the primary, on another VM (still wins when compared with putting the database and application together), but in this scenario you are perpetually risking hitting the limits of your finite hardware capabilities.

Finally, Cloud SQL has a 99.95% availability and it is curated by the SRE team of Google. That means you can focus your efforts on what really matters — developing your application — and not spend hours, or even days, setting up servers. Other persuasively convenient features include PITR (Point in Time Recovery) and High Availability in case a failover is necessary.

Setting up the replica on GCP

Accessing the menu SQL in your Google Cloud Console will give you a listing of your current Cloud SQL instances. From there execute the following:

Click on the Migrate Data button

Once you have familiarized yourself with the steps shown on the screen, click on Begin Migration

In the Data source details , fill the form out as follows:

Name of data source: Any valid name for a Cloud SQL instance that will represent the primary server name

The next section Cloud SQL read replica creation, will allow you to choose:

Read replica instance ID: Any valid name for a Cloud SQL instance that will represent the replica server name

Location: choose the Region and then the Zone for which your instance will be provisioned.

Machine Type: Choose a Machine Type for your replica; This can be modified later! In some cases it is recommended to choose a higher instance configuration than what you will keep after replication synchronization finishes

Storage capacity: It can be from 10GB up to 10TB. The checkbox for Enable automatic storage increases means whenever you’re near capacity, space will be incrementally increased. All increases are permanent

(Optional) More options can be configured by clicking on Show advanced options like Authorized networks, Database flags, and Labels.

Once you’ve filled out this information, click on Create.

The following section, Data synchronization, will display the previous selected options as well the Outgoing IP Address which must be added to your current proxy, firewall, white-list to be able to connect and fetch replication data. Once you are sure your primary can be accessed using the specified credentials, and the IP was white-listed, you can click on Next. After that replication will start.

Live demo

If you want to see this feature in action, please check this video from Google Cloud Next 2018:

]]>https://blog.gabriela.io/2018/08/28/replication-from-external-primary-leader-into-gcp/feed/0logo_gcp_vertical_rgbgabiCloud SQLGenerating a mysqldump to import into Google Cloud SQLhttps://blog.gabriela.io/2018/08/28/generating-a-mysqldump-to-import-into-google-cloud-sql/
https://blog.gabriela.io/2018/08/28/generating-a-mysqldump-to-import-into-google-cloud-sql/#commentsTue, 28 Aug 2018 13:32:31 +0000http://blog.gabriela.io/?p=1069Continue reading Generating a mysqldump to import into Google Cloud SQL→]]>This tutorial is for you that is trying to import your current database into a Google Cloud SQL instance, replica, that will be setup for replication purposes.

-h the hostname or IPV4 address of the primary should replace [MASTER_IP]

-P the port or the primary server, usually [MASTER_PORT] value will be 3306

-u takes the username passed on [USERNAME]

-p informs that a password will be given

--databases a comma separated list of the databases to be imported. Keep in mind [DBS]should not include the sys, performance_schema, information_schema, and mysql schemas

--hex-blob necessary for dumping binary columns which types could be BINARY, BLOB and others

--skip-triggers recommended for the initial load, you can import the triggers at a later moment

--master-data according to the documentation: “It causes the dump output to include a CHANGE MASTER TO statement that indicates the binary log coordinates (file name and position) of the dumped server”

--no-autocommit encloses the table between a SET autocommit=0 and COMMIT statements

--default-character-set informs the default character set

--ignore-table must list the VIEW to be ignored on import, for multiple views, use this option multiple times. Views can be imported later on after promotion of the replica is done

--single-transaction a START TRANSACTION is sent to the database so the dump will contain the data up to that point in time

--set-gtid-purged writes the the state of the GTID information into the dump file and disables binary logging when the dump is loaded into the replica

After that the result is compressed in a GZIP file and uploaded to a bucket on Google Cloud Storage with gsutil cp - gs://[BUCKET]/[PATH_TO_DUMP] where [BUCKET] is the bucket you created on GCS and [PATH_TO_DUMP] will save the file in the desired path.

Be aware that no DDL operations should be performed in the database while the dump is being generated else you might find inconsistencies.

See something wrong in this tutorial? Please don’t hesitate to message me through the comments or the contact page.

This tutorial demands a service restart since some flags here presented can not be dynamically changed

What is GTID and why do I need it? Directly from the MySQL documentation (excerpt taken as is with different jargons than used here, for master/slave we are using primary/replica):

A global transaction identifier (GTID) is a unique identifier created and associated with each transaction committed on the server of origin (the master). This identifier is unique not only to the server on which it originated, but is unique across all servers in a given replication topology.

GTID assignment distinguishes between client transactions, which are committed on the master, and replicated transactions, which are reproduced on a slave. When a client transaction is committed on the master, it is assigned a new GTID, provided that the transaction was written to the binary log. Client transactions are guaranteed to have monotonically increasing GTIDs without gaps between the generated numbers. If a client transaction is not written to the binary log (for example, because the transaction was filtered out, or the transaction was read-only), it is not assigned a GTID on the server of origin.

In theory you can use replication with only binary logging enabled, however replication with GTID is significantly more reliable. And while some providers don’t require it, at Google Cloud GTID is mandatory.

Representation

To represent a GTID a pair of coordinates are used, one is the server_uuid paired with the transaction_id which is an integer. Example of valid GTID:

GTID = 8b5dbf2a-45b4-11e8-81bc-42010a800002:25

To understand more how this impacts replication, I recommend reading the section GTID Format and Storage in the MySQL documentation.

Enabling GTID

Thankfully, to enable it you don’t need to do much, edit your mysqld.cnf file to support this variables:

server-id = 2 # Or any other number, we recommend to not be 1
log-bin = mysql-bin # Or any other valid value
gtid_mode = ON
enforce-gtid-consistency = true

Restart the database server to load up the new configuration with sudo service mysql restart.

Side effects

Some applications may cause errors due to the enforce-gtid-consistency flag. That happens because usually the application is trying to do a non-transactional action that also is not possible to replicate inside a transaction.

ERROR 1787 (HY000): Statement violates GTID consistency: CREATE TEMPORARY TABLE and DROP TEMPORARY TABLE can only be executed outside transactional context. These statements are also not allowed in a function or trigger because functions and triggers are also considered to be multi-statement transactions.

What are you basically doing is telling the database to create a connection, which is fine, however the following command is a CREATE TEMPORARY TABLE. This command is bound to the current connection, and because it won’t have a transaction_id it won’t be able to replicate the statement. Temporary tables are not replicated.

If your application happens to do that, all you need to do is remove the creation of temporary tables to outside of the transaction. Unfortunately Magento does not do that.

See something wrong in this tutorial? Please don’t hesitate to message me through the comments or the contact page.

A replication user is necessary to set up the relationship Primary/Replica. This is a short step but it needs a bit more of attention.

From the MySQL 5.7 documentation (highlights are my own):

Although you do not have to create an account specifically for replication, you should be aware that the replication user name and password are stored in plain text in the master info repository file or table (see Section 16.2.4.2, “Slave Status Logs”). Therefore, you may want to create a separate account that has privileges only for the replication process, to minimize the possibility of compromise to other accounts.

The following command specifically will allow replication from all databases and tables connecting from all hosts. For security reasons you may want to limit access to replication only to the IP address of the server doing the replication.

Log into the MySQL console using a user with GRANT privileges in the primary server and execute the following:

My advice is instead of using the % wildcard, set up the IP address of your replica.

This user will be added to the primary ’s MASTER_USER option, and in theory could be any user as long it also has REPLICATION SLAVE privileges. After that, the replica will connect to the primary and perform some kind of handshake with those credentials and if they match, theprimary will allow replication to occur.

See something wrong in this tutorial? Please don’t hesitate to message me through the comments or the contact page.

What is the X-DevApi? From insidemysql.com there is a definition of the X-DevAPI and its features in the following paragraphs:

The X DevAPI is the common client-side API used by all connectors to abstract the details of the X Protocol. It specifies the common set of CRUD-style functions/methods used by all the official connectors to work with both document store collections and relational tables, a common expression language to establish query properties such as criteria, projections, aliases, and a standard set of additional database management features for handling things like transactions, indexes, etc.

The fact that most of these features share the same format and API between connectors, makes the X DevAPI a perfect fit for modern polyglot development environments such as microservices, and the fact that they are based on a well-documented format allows advanced users to extend client-side implementations and build new middleware components or extensions tailor-made for their use case.

With MySQL 8.0, the X-DevAPI can be used either with the MySQL Shell, or with the MySQL Connectors that supports the X-Protocol. For this blogpost I will be showing you code using the Node.js driver as interface with it.

As of the moment of writing of this post, the latest version for the driver (available on npm) is the 8.0.11.

Hands-on, the relational way

You can use the X-DevAPI to use it in a more “relational” way, consider a table teams from the worldcup schema:

DESCRIBE teams;

Description:

Field

Type

Null

Key

Default

Extra

id

int(11)

NO

PRI

NULL

auto_increment

name

varchar(255)

NO

NULL

players

json

YES

NULL

Notice that the field players is a JSON and it can be NULL.

Returns on a browser (data clipped):

Or on your terminal:

Extra padding on strings

This looks handy, specially if you are working as a document store (which I will talk in a bit), however there are some issues with JSON and VARCHAR fields for now:

VARCHAR fields are getting padded. Take a look at the third line, where it is “Brazil” actually has trailing spaces. Notice on the terminal is how far off the closing quote is.

JSON does work, but a bit in a limited way, in this case, it is only bringing the result set because the column is set to be possibly NULL, on the other hand if you execute the following you will have trouble:

ALTER TABLE teams MODIFY players JSON NOT NULL;

Error on NOT NULL for JSON columns

See how the error is in a string parser that tries to concatenate whatever to it. On my tests usually is empty space, I had zeroes padded on the right too (yes zero on a string), problem 1 looks like is causing problem 2.

They are aware of it, and apparently it will be fixed on a future release.

Hands-on, the document store way

MySQL 8.0 has been marketed extensively as an alternative for NoSQL, although you can actually use JSON with it, there are some caveats to the feature.

I restructured data to be inside a collection instead of a table.

To query it now, you do the following:

Something feels wrong to me with this approach of having to write strings instead of dealing with JavaScript objects inside the .find() method, using the “literal” word for OR and AND. I think there is room for improvement here. At this point I am sorry to say this looks more like as an hasty ORM put together than a real document store interface.

By the way, problem 1 and problem 2 don’t happen here:

Result set for a query

Should I use it now?

I would recommend to wait a bit more. The current version is the first General Availability version, which means next one will have bug fixes and more stability added to it. This definitely is a nifty way for those using Node.js to be able to manipulate data inside MySQL more “natively”, with that in mind, I still feel that I am actually writing SQL to be able to use it the “document store” way.

Bonus

You can actually use the X-DevAPI on Google Cloud Functions.

You need to add the @mysql/xdevapi to package.json and put the code above inside a “wrapper” that looks like this:

Keep in mind that .execute() doesn’t return a promise, but rather receives a callback function to do your data processing of each individual row. This could be a bit annoying for you.

Image header of the post belongs to Pexel. And yes, I know the code shown is not SQL or JavaScript

]]>https://blog.gabriela.io/2018/07/18/a-small-dive-into-the-mysql-8-0-x-devapi/feed/0pexels-photo-577585gabi1__node_index_jsworldcup_json_problemresult_nodeHow to use MySQL 8.0.4 with a GUIhttps://blog.gabriela.io/2018/02/22/how-to-use-mysql-8-0-4-with-a-gui/
https://blog.gabriela.io/2018/02/22/how-to-use-mysql-8-0-4-with-a-gui/#commentsThu, 22 Feb 2018 20:14:57 +0000http://blog.gabriela.io/?p=963Continue reading How to use MySQL 8.0.4 with a GUI→]]>If you want to have a look on what is about to come in the new version of the popular database and is used to Syntax Highlighting you don’t need to be chained to the Terminal.

Some of you may use tools like MySQL Workbench or Sequel Pro (as of the release of this post both tools had the following error occurring), and even if you are using the Terminal (if you are using an old version of mysql​, like 5.7) you may encounter this error:

Unable to connect to host 127.0.0.1, or the request timed out.

Be sure that the address is correct and that you have the necessary privileges, or try increasing the connection timeout (currently 10 seconds).

The reason for that is because since the 8.0.4 RC release, MySQL now uses SHA-2 Pluggable Authentication. In another words, how the database does authentication now changed.

Graphical User Interface

As of now, the only tool I could verify that it is working is Datagrip. But there is some steps to make sure you can successfully connect to the server. Follow the steps

1. The JDBC Connector

Open the JDBC Connector page. Click on “Development Releases” tab and select your operating system, as of this post 8.0.9 was the latest version.

Select the zip version of the file, if you are using macOS, select “Platform Independent”.

The website it will ask for you to login, you don’t need to login, there is a link on the bottom of the page that says: “No thanks, just start my download.”.

Unzip the zip​ file mysql-connector-java-8.0.9-rc.zip (the name may be different for you if the version is different)

A folder will be created with the name of the compressed file, inside copy the jar file to a location where you can access it later easily, for example, I put mine in ~/workspace/drivers folder.

2. The GUI configuration

Open Datagrip. Go to “File > Data Sources“. A window will open, right click on top of the “MySQL” name and select Duplicate.

A new Driver is added with the name “MySQL [2]”, rename it to “MySQL 8.0”

Then, unselect “Use” on “Driver Files” and click on the + sign. Select the jar file you downloaded on the previous section.

Click in Apply.

3. Adding the source

On the same window, click on the + sign on the top left. Select “MySQL 8.0”

Fill out the details as you would for a connection:

Click on “Test Connection“.

If everything worked, just click in “OK” to exit the screen.

]]>https://blog.gabriela.io/2018/02/22/how-to-use-mysql-8-0-4-with-a-gui/feed/5gabiData_Sources_and_Drivers.pngData_Sources_and_Drivers.pngScreen Shot 2018-02-22 at 14.50.28.pngData_Sources_and_DriversRamblings on optimizations, anti patterns and N+1https://blog.gabriela.io/2018/02/06/ramblings-on-optimizations-anti-patterns-and-n1/
https://blog.gabriela.io/2018/02/06/ramblings-on-optimizations-anti-patterns-and-n1/#commentsTue, 06 Feb 2018 18:06:52 +0000http://blog.gabriela.io/?p=959A lot of people ask me to teach them how to do query analysis and performance. The truth is: there isn’t a script to follow. The following paragraphs are a brain dump on what usually goes on my mind when I am debugging and analyzing.

Please comment on what you think I should focus on to cover here.

TL; DR;

It’s just a messy post with database-y stuff

This post doesn’t have a conclusion, it is just me laying down my thoughts on performance and optimizations.

Thoughts

Query performance is a really difficult subject to talk about. Mostly because because SQL is a declarative language, leaving it up to the Optimizer to decide which way is the best to retrieve the information needed and that is based in so many variables.

The most common problem regarding optimization I see, comes not from the Database itself, but how we handle the requests on the application layer, the following for instance would cause N+1 problems:

Code example:

Although seemingly innocent at first, this code could easily slow down performance on the database due to the amount of requests that would be made.

You also need to know about the intricacies of indexes, which one is the best, if you have a composite index, which should go first, and what happens if I only use one of the fields of a two column indexes in my search? Does it still uses the index somehow? Another rule of thumb is that if an index is a BTREE, on a single column, you can use it either ASC​ or DESC.

Or better yet: why my transactions are taking so long to complete? Does it have too many indexes on the table? Is any other query locking table X?

Even a single ​INNER JOIN could be highly costly if joining two large tables.

Why are you saving that JSON in a TEXT​ field? Since we are on the subject, you really need the JSON in the relational database and not in a document store?

You don’t need to port all your data from PostgreSQL/MySQL to MongoDB if you want to have MongoDB on your stack. Everything has its place, relational data on relational databases and non-relational data on non relational databases. I even find unfair benchmarks between a SQL database and a NoSQL one. They were made to solve different problems, you can’t possibly have the same use case for both of them.

No, it’s not ok to have category_1, category_2, ..., category_n as columns on your products table.

Relationships should also explicitly live on the RDBMS, not only on your model, if you have a user_id​ on your addresses​ table, tell the database so, naming it user_id doesn’t automatically create the foreign key.

You need:

Or your migration should look something like this:

Line 24: adds to the table addresses​ a foreign key from users.

End

And you, what you think is missing in this blogpost? What do you want to get deeper on?

]]>https://blog.gabriela.io/2018/02/06/ramblings-on-optimizations-anti-patterns-and-n1/feed/5gabiCongress, who is? – A Civic Tech projecthttps://blog.gabriela.io/2017/11/28/congress-who-is-a-civic-tech-project/
https://blog.gabriela.io/2017/11/28/congress-who-is-a-civic-tech-project/#respondTue, 28 Nov 2017 17:22:44 +0000http://blog.gabriela.io/?p=932Continue reading Congress, who is? – A Civic Tech project→]]>A while ago I had this idea for a project: To show how representatives voted, either for or against, on bills.

People elect representatives but often forget to follow what they are up to. I asked around: who is your representative? The most common response: I don’t know. If people don’t even know who their representatives are, when it comes to being listened to, how they are going to contact the House or Senate member?

That’s when Congress, who is? was born out of a 2 week project where I poured myself into and worked with the ProPublica Congress API, Twilio API and a bit of the Twitter API (those pictures must come from somewhere!).

People are able to search through their zipcode to find their representative or filter by State/Territory, Party, House or Name. Once into the member profile you can do a call directly from your browser to the member’s office.

The USA map is rendered showing a simple majority of the representation of the House. On click the listing of representatives is shown on the right.

It’s possible to also compare statistics from one politician to another. See how they vote with the party and in common between themselves.

Features to come

Show beyond current Congress, at this moment the congress number is 115, and the API can show me members since 102-115 for House and 80-115 for Senate.

Show bills and votes

Add full text search

More to be defined

Code

Code will be released under MIT license. There is a few cleaning up to do, and I want to open source it with a few issues already opened and documented. As I said, the app was developed in two weeks, but it grew on me and I want to take a step further.

Stack

Backend:

Ruby on Rails

PostgreSQL

Frontend:

React

Redux

Semantic UI

Contributions

Right now the code is running in a “closed” beta, if you can’t wait and want to help, DM me on Twitter (no need to follow back, DMs are open on my end), or use this website contact form, or simply mail me at gabriela.io.

Thank you

I want to give a special thanks to Twilio. During this year PHPWorld they hosted a competition to showcase your project using Twilio. I showcased this project and they awarded the project with some awesome amount of credits for us to run for a while on it. So thank you for the support!

Disclaimers

Calls only works on Chrome, Firefox and Safari for Desktop. The client call doesn’t work on mobile, Internet Explorer or Opera. It’s more of a technical limitation on how each browser implement their JavaScript than application level development.

The data displayed may be incorrect. That is because it is synced daily with the ProPublica API, whatever they have on record, it is what I am showing.

Abstract:

Yes, you read it correctly, we are jumping from 5.7 to 8.0 (that sounds familiar, doesn’t it?). The new version doesn’t only change the number but also changes how you write SQL. Recursive queries will allow you to generate series and work with hierarchical data. New JSON functions and performance improvements were also added to 8.0 to help you work on non-relational data. Expect to see what is new and improved in this talk to power up your application even more.